Update: April 7, 2004: Added reference 19, "New Programming Algorithm."
Update: March 26, 2004: Added reference 18, NASA Advisory.
Date: March 17, 2004
This is the fifteenth in a series of OLD News articles.
Eleven FPGAs in the SX-A and RTSX-S series, built in the 0.25 Ám MEC/Tonami process have had confirmed programmed antifuse failures to date during user testing. No failures have been reported with 0.22 Ám SX-A or eX series devices.
For the failures observed in 0.25 Ám MEC/Tonami process devices, at least one of the following applies:
- The device was subjected to an out-of-specification electrical environment.
- The device was subjected to an electrical environment not known at this time.
- The equipment used for device programming subsequently failed calibration.
Actel has reproduced programmed antifuse failure by subjecting test devices to an out-of-specification electrical environment (e.g., VCCA « GND or I/O signal voltage transients exceeding the absolute maximum limits). Qualification testing of more than 3,000 devices resulted in no failures detected for the devices operated within specification.
A number of outstanding cases of suspected failures are in various stages of analysis. Additionally, Actel is currently testing approximately 800 devices in order to better understand this phenomena. This OLD News will be updated when additional information is obtained.
Actel is currently working to produce devices with increased margins.
These findings and recommendations must be viewed in light of the fact that not all failures have been properly analyzed with the root cause determined, as well as some data not yet available. As a result, the recommendations given in this OLD News stress the conservative application of these parts, eliminating or minimizing the conditions that are suspected of being capable of causing device damage.
All relevant personnel should ensure that all specifications, manufacturer's guidance, and good engineering practices are followed. Conservative design practices should be employed; failure to follow such an approach appears to correlate with device failure. The text below is a summary with additional information is available in the references and from the Technical Contact. Additional application notes are being generated.
Actel field programmable gate arrays (FPGAs) have been used in NASA spaceborne electronics systems for over a decade. These devices have an array of logic modules interconnected by user configurable routing. The routing determines both the interconnections of the modules as well as each modules logical function. All programmable connections are made via antifuses, an element that is initially high-impedance but becomes a low resistance path when programmed.
The first three generations of Actel FPGAs used ONO (oxide-nitride-oxide) antifuses with programmed resistances of several hundred ohms that were located in channels of the gate array. Starting with the SX series of devices, metal-to-metal antifuses were employed, lowering programmed resistances approximately by an order of magnitude resulting in greater speed. Another fundamental change was placing the antifuses above the logic modules, between the upper layers of metallization in the modern semiconductor processes, eliminating the routing channels, resulting in a smaller devices with increased performance. For the Actel metal-to-metal antifuses-based microcircuits, which are the topic of this OLD News, there have been several versions of production devices. Radiation-tolerant devices for space, the 0.6 Ám, 3.3V SX-series, were processed by Matsushita Electric Co. (MEC) at the Uozu fabrication facility. MEC fabricated most previous antifuse-based FPGAs for NASA flight use. Commercial SX devices are based on a 0.35 Ám, 3.3V process at Chartered Semiconductor. The next generation of devices for both commercial and military/aerospace applications SX-A and RTSX-S were processed by MEC at the Tonami fabrication facility in a 0.25 Ám, 2.5V process. The RTSX-S devices shared a common process and radiation-hardened antifuse with the commercial MEC SX-A devices with the detailed RTSX-S design modified for radiation hardness (TID, SEU) and I/O performance. Since 1999 over 1 million of SX-A (MEC) production devices, as well as approximately 10 thousand RTSX-S devices have been delivered. Note that the SX-A devices for commercial and industrial uses have migrated to the 0.22 Ám, 2.5V process at UMC, with zero failures reported.
The manufacturers reliability numbers for these SX-A/RTSX-S (MEC) devices are considered high-rel (the failure rate is approximately 10 FITS). Based on the currently available data, it has been concluded that the devices are reliable when operated per specification with no data presented to indicate otherwise. There are, however, variations in programmed antifuses with some not as robust as others; therefore, 1 to 2% of the devices appear to be more susceptible to damage when operated outside of specified limits. The limited available data shows that a damaged antifuse may increase the propagation delay of the signal it carries by as little as tens of nanoseconds, and up to microseconds. The long-term stability of damaged antifuses is unknown.
2. Signal Integrity
It is critical to maintain good signal integrity on all I/O pins. Overvoltage on I/Os may result in transistor breakdown, snapback, or a VCCA « GND transient. The latter may result in an out-of-specification increase in bias across the programmed antifuse, and, hence, cause damage. There are a number of techniques that can be utilized to ensure good signal integrity, including simulation and design iterations with IBIS models, proper terminations, controlling loads, and specifying the slew rate of the outputs. The potential signal integrity problems are exacerbated by the fact that the drivers used in SX-A/RTSX-S are both fast and powerful (fast voltage transition times and high current capability).
3. Simultaneous Switching Outputs
Simultaneous switching outputs (SSOs) should be limited and properly distributed, rather than optimized solely for printed circuit board routability. Low slew output configurations should be used for all buses and other signals when possible. High slew output usage should be minimized and justified. Simultaneous switching signals should also be spread out in distance, as well as time. Loading on all outputs should be conservative (e.g., buffers should be used for driving memory arrays.) Long lines, backplanes, harnesses, etc., should be driven with either buffers (preferred) or through isolation resistors, as appropriate. When controlling output timing, synchronous delay techniques are preferred. If asynchronous techniques are used, the delays should be carefully verified to ensure that the logically unneeded delay elements are either not optimized out of the design or connected via high-speed routing (e.g., fast connect). In general, delay elements should be hand placed and "fixed" using the placement tool.
4. Power Supply
Power supply noise should be minimized as transients on the order of a nanosecond can damage a programmed antifuse. Robust bypassing should be employed and the inductance of capacitor connections as well as power and ground planes should be minimized.
5. Programming Equipment
The following recommendations are issued:
- For each programmer used, all programming activity should be logged with programming yields computed.
- Actel normally achieves a 95% programming yield. The 5% dropout is typical since the programming utilizes previously untested paths and not all antifuses are expected to program satisfactorily.
- Actel customers on average achieve a significantly lower programming yield than the manufacturer with the average dropout rate of approximately 10% twice as high as Actels. Neither the discrepancy itself nor the potential implications of it are currently understood. One possible explanation for such difference is lack of proper care for the programmer, power conditioning, and poor device handling practices.
- Based on the currently unexplained lower programming yield, the following defensive and conservative practices are recommended:
- All programmers should utilize properly conditioned AC power.
- The programmer and adapter socket should be verified by the calibration routines prior to each use.
- Each programmer should have complete programming records to detect any trends.
- A single programmer should be used for flight devices at each facility.
6. Automatic Test Equipment and Procedures
Improperly performed electrical measurements have the capacity to degrade or damage the parts while proper testing should not degrade the quality of the parts.
A survey of test equipment outside of the manufacturers facility has failed to find a single facility that meets acceptable and safe standards. The observed examples include lack of control of key clock signals, absence of adequate bypassing and voltage control of the supplies, and failure to prevent bus contention. The personnel designing and operating test equipment are often not sufficiently familiar with the modern, complex devices under test, device design considerations, or device limitations, as well as some critically important operational characteristics of their own equipment.
The design of test equipment, with respect to the electrical environment to which device is subjected to, must be performed to flight standards including application analysis, margin analysis, and device protection analysis. For modern, high-performance devices with reduced margins for damage, these standards will be of increasing importance. The design of test equipment currently falls outside of the flight review process and constitutes a risk to the flight hardware. All test equipment must be thoroughly and properly reviewed for safety to the devices under test.
User post-programming electrical test (PPET) is not encouraged or recommended. It has been concluded, based upon available data, that in general the risk to the health of the part outweighs the possible benefits. This is the case for many of the test sets that have been examined.
- Stuck-at fault coverage testing is poor, ranging from approximately 15% to 60% and could provide false confidence. The NASA ASIC guide recommends a level of at least 99%.
- The type of testing conducted, including at-speed testing, is unlikely to detect cases of damaged antifuses unless the damage manifests itself as a gross timing violation; in such case, the failure should be detectable in a proper board level test. ATE and board-level tests can not determine the true slack for the majority of timing paths.
- A survey of various ATE testers, including those that have tested RTSX-S parts for extended periods of time, has shown that every test system failed the review. This represented a credible risk to the part by not operating the part as designed, exceeding specification limits, and/or stressing the device.
- If special ATE testing is required, these tests should be performed by the device manufacturer prior to shipment, thus minimizing the number of different boards interacting with the flight device.
- If ATE is used for PPET, then the tester and all test programs must be qualified to flight (electrical) standards, including a full analysis combined with direct measurements of the electrical environment to which the part will be subjected. This is a higher level of review and certification than what has been practiced historically. It is found that operators of these equipments are often not familiar with the electrical environment that they are creating for the flight devices.
- The use of the ActionProbe feature of SX-A and RTSX-S devices permits many delay paths to be measured non-invasively on the flight board and trends observed over time, voltage, and temperature, without limiting measurements to the path with the minimum slack time. This test is safe since it utilizes the standard Actel Silicon Explorer and an oscilloscope.
7. Post Programming Burn-In
Post-programming burn-in is neither encouraged nor recommended.
Based on available data, it has been concluded that the risk to the health of the part outweighs the possible benefits. There is no standard or generally accepted procedure for burning-in programmed units and often node toggle rates are either low or unknown. This test is not easy to perform well and instances of out of specification slew rates, as just one example, have been observed, violating the "fly as you test, test as you fly" principle. Actel data shows that a properly constructed test set for post-programming burn-in will not damage the parts. This is based on testing approximately 3,000 devices for qualification with two failures, both of which were attributed to electrical overstress from testing accidents. Data examined has shown that a poorly constructed test set for post-programming burn-in may damage devices under test. This has been shown both from the analysis of field testing data as well as special tests run at the manufacturer, where identifying the source of the problem and protecting the devices under test was non-trivial.
If post-programming burn-in testing is desired, then the rationale must first be clearly stated with a quantitative analysis providing acceleration factors and justification for the risk. The equipment and all procedures must be qualified to flight (electrical) standards. A full analysis combined with direct measurements of the electrical environment that the part will be subjected to must be performed. Operators of the equipment must be trained to understand the electrical environment that they are creating for the flight devices, requirements for that device, and how to properly monitor that environment. Properly conditioned power should be used with appropriate power monitors.
8. Flight Hardware Verification
The available data shows that the symptoms of a damaged antifuse are not always detectable by either traditional-style board level or ATE testing. An increase in propagation delay is a typical indication of a damaged programmed antifuse. Such increase can range from tens of nanoseconds to microseconds and therefore may not be detected via traditional board-level functional tests. The use of the Actionprobe feature may be able to non-invasively detect a subset of damaged antifuses exploiting the devices existing internal test structures.
Flight boards should be carefully instrumented and qualification must not rely solely on functional testing. I/O signal quality should be carefully measured to ensure that the manufacturer's limits are not exceeded. When measuring voltage spikes from VCCA « GND, high bandwidth scope probes and careful attention to grounding must be used.
9. Handling of Failed or Suspect Devices
All failures should be properly analyzed, including non-flight devices as well as prototype units. Although not normally tracked by flight paperwork, failures during development must be understood to prevent a marginal electrical environment from remaining in the flight hardware.
All NASA projects, both in-house and contractor-based, should send all failure reports of this class to the Office of Logic Design (email@example.com) for analysis and trending. All failures will be cross-checked with the manufacturer to ensure that no failed devices fall through the cracks, proper failure reports are generated, and that NASA is fully informed of the results, and can distribute recommendations and advisories, as appropriate.
Since the root cause of the known field failures has not been rigorously established, it is recommended that a conservative approach be taken, with any suspect devices be replaced. It is also highly recommended that the number of failure-free operating hours be maximized.
- "The First Summary Report on the Independent Review of RTSX-S FPGA Reliability on NASA Space Flight Missions," February 11, 2004.
- "OLD News #14: Testing and Application of Modern Microelectronic Devices: Do's, Don'ts, and Failures," November 19, 2003.
- "Post Programming Burn In (PPBI) for RT54SXS Actel FPGAs," Dan Elftmann and Minal Sawant, Actel Corporation, 2002 MAPLD International Conference, Laurel, MD., September 2002.
- "2nd Advisory Letter," Actel Corporation, Esmat Z. Hamdy, March 3, 2004.
- "Regarding Actel RT54SX32S and RT54SX72S FPGAs," Esmat Z. Hamdy, Actel Corporation, December 16, 2003.
- "Actel RTSX-S EOS Information Pack," Actel Corporation, December 2003.
- "Handling of Parts - Subsequent Testing or Analysis," March 2004.
- "Actel Reliability Report, Q3 2003."
- "Reliability of Antifuse-Based Field Programmable Gate Arrays for Military and Aerospace Applications," (figures) McCollum, John, Roy Lambertson, Jeewicka Ranweera, Jennifer Moriarta, Jih-Jong Wang, and Frank Hawley, Actel Corporation, 2001 MAPLD International Conference, Laurel, MD., September 2001.
- "Actel 54SX32A Ground Bounce Testing Results," Johns Hopkins University/Applied Physics Laboratory, December 2002.
- "Failure Analysis Report for RT54SX72S-CQ256B Group C RTSX-S Qualification," Solomon Wolday, July 24th , 2002.
- "Failure Analysis Report for A54SX72A-CQ208B Group C HIREL A54SX-A Qualification," Solomon Wolday, July 16, 2003.
- "Designing For Signal and Power Integrity in FPGA Systems," Mark Alexander, 2002 MAPLD International Conference, Laurel, MD, September 2002.
- "IBIS Models: Background and Usage," Actel Corporation, January, 2002.
- " FPGA High Speed and Signal Quality."
- "Drive Strength of Actel FPGAs," R. Katz, NASA Office of Logic Design, March 2004.
- "Analysis of Printed Circuit Board Artwork: Bypassing," Rod Barto, NASA Office of Logic Design, March 2004.
- "NASA Advisory: Actel RTSX-S and SX-A Programmed Antifuses" March 26, 2004.
- "New Programming Algorithm" Reference 19 for OLD News #15: "Actel SX-A and RTSX-S Programmed Antifuses," April 7, 2004.
In my new OLD (Office of Logic Design) position, I am now making some of my informal e-mail lists semi-formal. These mailings will have pointers to technical tips that can [hopefully] proactively prevent errors from getting into flight designs or make things go faster and smoother. I have included an array of people from a number of organizations; different NASA Centers, ESA, etc., as you all may distribute to people in your own organizations and other colleagues. Please let me know if you are on this list in error or if someone should be added to it. This list is targeted towards those that either will design or review space flight digital electronics. Feel free to suggest topics for discussion and research or to contribute news items. [Note for this web-based release: to become a recipient on this mailing list, please send e-mail to: firstname.lastname@example.org.]
All application notes are uploaded onto my www site. New additions are noted on the what's new page. I will give these mailings from time to time; too much and they will be filtered and ignored - too little and not enough information flows. So I'll try and hit a good balance.
Home - NASA
Office of Logic Design
Last Revised: February 03, 2010
Digital Engineering Institute
Web Grunt: Richard Katz