NASA Office of Logic Design

A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.

OLD News #12

Summary of Recent EEPROM Failures

Date: July 3, 2003

This is the twelfth in a series of OLD News articles.

Summary and Conclusion

EEPROM technology-based devices are attractive components since they are both rewritable and non-volatile.   Many of the products used in civil space systems are based on the Hitachi 1 Mbit commercial die.  They are packaged by various vendors into either single chip packages or multi-chip modules. 

Bit failures, in two distinct EEPROM die, were reported on a flight instrument in the first year and a half of flight, with the initial analysis concluding that these were due to random defects.  The analysis, however, could not be supported and a wider survey of EEPROM usage was conducted by various NASA Centers and contractors.  It has been found that there have been a number failures ranging from single bit to page loss.   The number of failures relative to the small sample size causes serious concern.

Features should be incorporated such as permanent fixed memories and/or DMA that is independent of processor actions, so that either the EEPROM or other memory devices can be reloaded and/or patched if data becomes corrupted.  Failure of EEPROM devices is a credible scenario that can not be dismissed.

Note that the projects have not yet determined the failure mechanisms or root causes for any of the in-flight or ground failures.

Device Characteristics

All of the devices in this report are based on the Hitachi HCN58C1001 Mbit EEPROM die.   These devices are (commercial part numbers):

Basic characteristics of the die are:

Summary of the Failures

Genesis Failure #1

Approximately 6 months into the mission, a bit failed.  The contents should have been a '0' but readback indicated it was a '1'.

Genesis Failure #2

Approximately 13 months into the mission, another bit failed.  The contents should have been a '0' but again readback indicated it was a '1'.

Mars Exploration Rover Failure #1

December 5, 2002: Page failure was detected during breadboard test.   It was pattern sensitive and there was a bit of trouble getting the problem to repeat after the device was removed from the system.

Mars Exploration Rover Failure #2

March 1, 2003: Numerous fluctuating errors were found during pre-launch testing that were all confined to a single page of the device.  The values continued to fluctuate for over an hour after being written and were still fluctuating when testing ceased.

Mars Exploration Rover Failure #3

December 5, 2002: Three bit errors were reported in one page with only the last one verified.

Deep Impact HRI Failure #1

October 31, 2002: After programming and verification, the board was left unpowered for several days.    The failure started out as a single location but as time went on the failures seemed to be in an address region and not a single location.


