NASA Office of Logic Design

NASA Office of Logic Design

A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.


OLD News #17

Actel SX-A, RTSX-S, and RTSX-SU FPGAs in Mission- and Safety-Critical Systems:
A Summary and Snapshot of a Dynamic Situation

Date: November 3, 2004

This is the seventeenth in a series of OLD News articles.

.pdf version (Courtesy of Marty Fraeman of JHU/APL.  Links not active in .pdf)

Problem Description

RTSX-S and SX-A FPGAs produced in the 0.25 µm MEC/Tonami process have experienced programmed antifuse parametric failures during controlled laboratory testing, with the number of failures significantly exceeding the expected fall out rate for a part of this class. These failures were detected in devices operated in an in-specification electrical environment, utilizing the “old" programming algorithm.  Failures were also detected in devices programmed with the "new" programming algorithm at the "4B2" stress level.  Data sets show a decreased failure rate for devices programmed with the new programming algorithm at varying levels; some of the most recent failures are still undergoing analysis and may be the result of lot-specific or wafer-specific processing problems or variations.  As a result, Actel has implemented a new wafer level visual inspection; 4 die from each wafer will be examined for alignment and photoresist residue.

A significant number of failures of this class may not detectable by testing either at the part level by ATE or at the board or box level in the target system. The failure mechanism is a timing fault, and requires that testing be sensitive to timing faults.  An examination of the current test data shows a failure rate decreasing with time and accelerated by a combination of increased voltage and temperature.  Detailed information can be provided upon request.

No programmed antifuse failures have been observed to date in the 0.22 µm SX-A, 0.22 µm eX, or 0.25 µm RTSX-SU FPGAs produced at the UMC foundry. These UMC-produced devices have an antifuse structure physically different from those produced in the MEC foundry.  Additionally, there were other design changes at the circuit and structural levels.

Note: The "old programming algorithm" refers to software versions prior to DOS 3.81/Win 4.44.0.  The "new programming algorithm" is any later version up until at least the time of this writing.

Recommendations

Actel MEC SX-A FPGAs should not be used in safety-critical applications.  RTSX-S FPGAs should not be used in manned, safety-critical applications.  This applies to both the “old"and “new" programming algorithms.  Some faults present in the flight hardware may be undetectable. Other failures may manifest themselves after the conclusion of the test program.

Actel UMC A54SX-A and the new RTSX-SU are currently undergoing a multitude of tests; Actel internal testing has detected no programmed antifuse failures. Current and prospective users of these devices are urged to closely follow the progress of NASA Office of Logic Design testing published on klabs.org.

Programs may employ the following techniques to mitigate (not eliminate) the risks associated with Actel MEC SX-A and RTSX-S FPGAs:

These are also good practices to utilize for programs using all FPGA devices, including Actel UMC SX-A and RTSX-S FPGAs.

Discussion

Failures of RTSX-S Devices

The Actel Industry Tiger Team (AITT) is composed of members from The Aerospace Corporation, JPL, Northrop-Grumman, Lockheed-Martin, General Dynamics, Boeing, and NASA, represented by the Office of Logic Design.  Tests and analysis are being performed to understand the root cause of failure and to develop screening procedures. 

The key finding from the AITT testing was a failure rate of over 6% for the RT54SX32S, with the devices operated within the manufacturer's specification; results are expected to scale with size for devices such as the RT54SX72S.  The conditions for these tests were relatively benign.  The devices were biased with VCCA = 2.5V, which is in the midpoint of the recommended operating range, the temperature was nominal (approximately 40 ºC from self-heating in the test chamber), and there was effectively no output switching.  Thus, the test conditions were less stressful than a typical flight design.

The Damaged Programmed Antifuse

A high current damaged programmed antifuse can exhibit changes in propagation delay from a few ns to over 1 µs.  Failures have been detected as early as "Time 0" -- the first observation of the device after programming.  Failures have also been detected as late as 2,100 hours.  Testing continues.

A low current damaged programmed antifuse can exhibit changes in propagation delay of less then 1 ns.  Since antifuses of this class are used to connect the routed array clock to the clock input of the R-Cell, even such small increases in propagation delays may result in a hold time violation, when a parallel, single edge clocking architecture is employed.  Independent of any programmed antifuse damage, these structures are not recommended for use as described in OLD News #13: Minimum Delays and Clock Skew in SX-A and SX-S FPGAs

No completely open damaged programmed antifuses have been detected as of this writing.

Detection of the Damaged Programmed Antifuse

Damaged programmed antifuses are not always detectable by testing at the device, board, or box level.

The range of delay increases is such that timing slack in the user circuit may be sufficient to hide the increase, thus rendering functional testing ineffective in detecting such damage.  Increasing the test temperature and operating frequency will decrease available timing slack and result in a higher detection level; however, it is still not sufficient to catch all failures of high current antifuses.  Increased temperature may also be an accelerator of programmed antifuse failure.  Stability of the damaged programmed antifuse is addressed in the section below.

Note that often there is some external parameter such as an ICC signature that can be used an indicator of integrated circuit damage.  No such indicator is available for this failure mode.

The ActionProbe feature of Actel antifuse FPGAs can be used as a technique to measure propagation delays internal to the device.  This can be exploited to detect some programmed antifuse failures that are not observable through functional test.  Not all failures are detectable since the ActionProbe access point is upstream of the output antifuse which limits the use of this technique.  For example, the case of a C-Cell driving an R-Cell will have programmed antifuse delays (assuming no direct connect) that can not be measured using this timing technique.

Stability of the Damaged Programmed Antifuse

The NASA-DoD Independent Assessment Team addressed this issue in January, 2004.  At the time no data was available on this topic; thus a conservative approach was used -- programmed damaged antifuses were assumed not to be stable.  Recent data [1] confirmed that to be a correct assumption.  The example below shows that even after 600 hours of operation a damaged programmed antifuse is not stable.  Therefore, one can not qualify a device by test and assume sufficient long term reliability.  From the data available, relatively small changes in delay were observed either after a failure or after exposure to accelerated test conditions.


T=0 is after 600 hours of operation for two sample parts with known
damaged programmed antifuses.  Note that discontinuity in delay as a function
of time for Part 2 at approximately the 0.14 hour point of this bench level test.

UMC Produced Devices

To date, no programmed antifuse failures have been observed in the 0.22 µm SX-A, 0.22 µm eX, or 0.25 µm RTSX-SU FPGAs produced at the UMC foundry. These UMC-produced devices have a physically different antifuse structure than the MEC 0.25 µm A54SX-A or RTSX-S FPGAs, along with other design changes at the circuit and structural levels.

These devices are currently undergoing an independent NASA test.  Results are being published on klabs.org and users are encouraged to follow the results.

Actel has completed their internal qualification testing on these devices, using both their Qualification Burn In (QBI) pattern and the AITT pattern.  Military specifications for these devices are available on the Defense Supply Center Columbus (DSCC) www site for the RTSX32SU (5962-01508) and the RTSX72SU (5962-01515).  Additional information is available in Note 19 below.

The results below are complete as of the time of this writing.  While the QBI pattern is representative of user type logic structures, the AITT pattern was designed specifically to be sensitive to timing variations that were caused by damaged programmed antifuses.


Test Summary for UMC SX-A and RTSX-SU FPGAs: Qualification Burn In (QBI) Pattern

High Temperature Operating Life (+125 ºC)
Design Pattern: Qualification Burn-In Design

Product Units Failures Hours Unit Hours
RTSX72SU 132 1* 1,000 132,000
RTSX72SU 8 0 168 1,344
RTSX32SU 135 0 168 22,680

Subtotal

275     156,024
SXA (VCCA=3.0V) 101 0 2,000 202,000
SXA 345 0 1,000 345,000
SXA 1 0 500 500
SXA 572 0 168 96,096
SXA 69 0 120 8280

Total

1363 1   807,900

* ESD damage identified.  No evidence of antifuse damage.

 

Low Temperature Operating Life (-55 ºC)
Design Pattern: Qualification Burn-In Design

Product Units Failures Hours Unit Hours
RTSX72SU 134 0 500 67,000
RTSX72SU 8 0 168 1,344

Subtotal

142     68,344
SXA (VCCA=3.0V) 101 0 2,000 202,000
SXA 226 0 1,000 226,000

Total

640 0   525,072

 

Temperature Cycling (-65 ºC to +150 ºC)
Design Pattern: Qualification Burn-In Design

Product Units Failures Cycles Total Cycles
RTSX32SU* 135 0 100 13,500
SXA 252 0 1,000 252,000
SXA 38 0 500 19,000

Total

425 0   284,500

* Completed 168 hour HTOL test first, then temperature cycles.


Test Summary for UMC SX-A and RTSX-SU FPGAs: AITT Pattern

Sample Lot No. Product/Wafer Lot Units Failures Hours Unit Hours Testing Type
1 RTSX32SU/D110A1 198 0 168 33,264 P7
0 168 33,264 P4B2
2 RTSX32SU/D110A1 100 0 596 59,600 P7
0 168 16,800 P4B1
0 168 16,800 P4B1, Vcca=2.50V, T=85 ºC
0 168 16,800 P4B1, Vcca=2.75V, T=85 ºC
0 168 16,800 P4B1, Vcca=3.00V, T=85 ºC
3 RTSX32SU/D110A1 168 0 168 28,224 P7
4 RTSX72SU/DOY311 200 0 168 33,600 P7
5

Total

  0   255,152  

Unless otherwise noted, T is approximately room temperature plus self-heating effects in the test chamber.

P7 Details Few I/O toggling (3 monitor pins)
Array toggle rate = 12.5%
Undershoot < -0.4V
   
P4B1 Details 17 I/Os simultaneously toggling
I/O toggle rate = 25%
Array toggle rate = 12.5%
Undershoot = -1V
   
P4B2 Details 70 I/Os simultaneously toggling
I/O toggle rate = 50%
Array toggle rate = 12.5%
Undershoot = -2V

Same data shown graphically (note sequential testing of sample lots):

 

Radiation testing has been completed on these devices.  Please see references 3 through 7 for the data sets.

References and Notes

  1. "Propagation Delay Stability in Logic Devices," R. Katz, NASA Office of Logic Design, 2004 MAPLD International Conference, September 8-10, 2004, Washington, D.C.
  2. Independent NASA Test of Actel SX-A, SX-S, and SX-SU Field Programmable Gate Arrays (FPGAs)
  3. RT54SX72SU Heavy Ion SEU Test, April 2004 at Brookhaven National Labs. 
  4. RT54SX72SU Heavy Ion Damage Test, April 2004 at Brookhaven National Labs. 
  5. Total Ionizing Dose Test Report No. 04T-RTSX72S(U)-D0Y311, August 4, 2004.
  6. Total Ionizing Dose Test Report No. 04T-RTSX72S(U)-D0YMJ1, September 1, 2004.
  7. Total Ionizing Dose Test Report No. 04T-RTSX32S(U)-D110A1, September 14, 2004.
  8. "NASA Advisory: Actel RTSX-S and SX-A Programmed Antifuses," March 26, 2004.
  9. "OLD News #15: Actel SX-A and RTSX-S Programmed Antifuses," March 17, 2004.
  10. "New Programming Algorithm" Reference 19 for OLD News #15: "Actel SX-A and RTSX-S Programmed Antifuses," April 7, 2004.
  11. "The First Summary Report on the Independent Review of RTSX-S FPGA Reliability on NASA Space Flight Missions," February 11, 2004.
  12. "OLD News #14: Testing and Application of Modern Microelectronic Devices: Do's, Don'ts, and Failures," November 19, 2003.
  13. "3rd Advisory Letter," Actel Corporation, Esmat Z. Hamdy, April 14, 2004.
  14. "2nd Advisory Letter," Actel Corporation, Esmat Z. Hamdy, March 3, 2004.
  15. "Regarding Actel RT54SX32S and RT54SX72S FPGAs," Esmat Z. Hamdy, Actel Corporation, December 16, 2003.
  16. "Actel RTSX-S EOS Information Pack," Actel Corporation, December 2003.
  17. "Handling of Parts - Subsequent Testing or Analysis," March 2004.
  18. "Post Programming Burn In (PPBI) for RT54SXS Actel FPGAs," Dan Elftmann and Minal Sawant, Actel Corporation, 2002 MAPLD International Conference, Laurel, MD., September 2002.
  19. "Reliability of Antifuse-Based Field Programmable Gate Arrays for Military and Aerospace Applications," (figures) McCollum, John, Roy Lambertson, Jeewicka Ranweera, Jennifer Moriarta, Jih-Jong Wang, and Frank Hawley, Actel Corporation, 2001 MAPLD International Conference, Laurel, MD., September 2001.
  20. "Actel 54SX32A Ground Bounce Testing Results," Johns Hopkins University/Applied Physics Laboratory, December 2002.
  21. "Failure Analysis Report for RT54SX72S-CQ256B Group C – RTSX-S Qualification," Solomon Wolday, July 24th , 2002.
  22. "Failure Analysis Report for A54SX72A-CQ208B Group C – HIREL A54SX-A Qualification," Solomon Wolday, July 16, 2003.
  23. "Designing For Signal and Power Integrity in FPGA Systems," Mark Alexander, 2002 MAPLD International Conference, Laurel, MD, September 2002.
  24. "IBIS Models: Background and Usage," Actel Corporation, January, 2002. 
  25. " FPGA High Speed and Signal Quality."
  26. "Drive Strength of Actel FPGAs," R. Katz, NASA Office of Logic Design, March 2004.
  27. "Analysis of Printed Circuit Board Artwork: Bypassing," Rod Barto, NASA Office of Logic Design, March 2004.
  28. "Actel Reliability Report, Q3 2003."
  29. The following excerpt from SMD 5962-01508 (RTSX32SU) discusses the issue of Actel DSCC qualification.  This is similar to that found in SMC 5962-01515 (RTSX72SU).
    Approved sources of supply for SMD 5962-01508 are listed below for immediate acquisition only and shall be added to MIL-HDBK-103 and QML-38535 during the next revision. MIL-HDBK-103 and QML-38535 will be revised to include the addition or deletion of sources. The vendors listed below have agreed to this drawing and a certificate of compliance has been submitted to and accepted by DSCC-VA. This information bulletin is superseded by the next dated revision of MIL-HDBK-103 and QML-38535.

In my new OLD (Office of Logic Design) position, I am now making some of my informal e-mail lists semi-formal. These mailings will have pointers to technical tips that can [hopefully] proactively prevent errors from getting into flight designs or make things go faster and smoother. I have included an array of people from a number of organizations; different NASA Centers, ESA, etc., as you all may distribute to people in your own organizations and other colleagues. Please let me know if you are on this list in error or if someone should be added to it. This list is targeted towards those that either will design or review space flight digital electronics. Feel free to suggest topics for discussion and research or to contribute news items.  [Note for this web-based release: to become a recipient on this mailing list, please send e-mail to: richard.b.katz@nasa.gov.]

All application notes are uploaded onto my www site. New additions are noted on the what's new page. I will give these mailings from time to time; too much and they will be filtered and ignored - too little and not enough information flows. So I'll try and hit a good balance.

whats_new.htm

Best regards,

-- rk


Home - NASA Office of Logic Design
Last Revised: February 03, 2010
Digital Engineering Institute
Web Grunt: Richard Katz
NACA Seal