NASA Office of Logic Design

NASA Office of Logic Design

A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.


EDAC and Dynamic Faults

Introduction

Error Detection and Correction (EDAC) circuits are commonly employed in many spaceflight digital systems.  The common EDAC implementation has the power to correct a single bit error while being able to detect any two bit error, which is known as SEC/DED for "single error correction/double error detection."  This is very useful for memory systems where the probability of multiple bit errors is sufficiently small (assuming correct device selection) or bits in a logical word are in separate physical chips (which is not a guarantee depending on the path of the particle).  More error detecting and correcting power is available in some EDAC circuits which implements more powerful codes.  The error correcting capability of a code is a function of the Hamming distance.  An EDAC of this type is a combinational circuit; the outputs are strictly a function of the inputs.  Any latches or flip-flops are present only for system considerations.

When a flip-flop in a RAM device gets "flipped," it normally will be a stable logic level and can be corrected by rewriting the location.  But such is not always the case.  For example, one failure mode in a digital system such as open solder joint can result in a non-logic level signal input to the EDAC resulting in non-stable values -- e.g., oscillation.  Or a so-called "weak bit" in an EEPROM device can manifest itself in a similar fashion by not providing a full input to the device's sense amplifier.

"Observations in Characterizing a Commercial MNOS EEPROM for Space," E. E. King, R. C. Lacoe, G. Eng, and M.S. Leung
Two oscillating outputs from an EEPROM.  From King, et. al.

Many project personnel have assumed that the presence of an EDAC circuit with SEC/DED capability would protect their system from any single bit memory error.  But considering the construction of EDAC combinational circuits, it was speculated that a single oscillating bit into an EDAC circuit may result in an output that goes through transient states prior to settling to a stable and correct state, although just a single bit is in error.  In principle, a single bit error from memory may induce transients on multiple bits of the "corrected" data bus.  To prevent any transients on the "corrected" output, the EDAC would have to be designed to be static hazard free.

Experiment

A VHDL description [1] was obtained of a 16-bit EDAC circuit that implemented a code that was double error correcting, more than sufficient to correct any single bit error.  Logic simulations were run on a test case with data equal to AAAA16.  The check bits generated by the EDAC was 303016.  The EDAC then was configured into it's correct mode with the data and check bits used above; the circuit operated correctly.  Next, the data pattern from the simulated memory was changed to AAAB16, injecting a single bit error, and again the circuit was shown to operate correctly.

The VHDL description was synthesized using Synplify 8.1, an RTAX250S was targeted, the device placed and routed using Designer 7.0, and the simulation experiment was rerun using back annotated delays set to "typical."  In the figure below, membus and parbus are inputs to the EDAC which is in read mode and dbus is the "corrected" output.

At t = 50 ns, a single bit error is injected on membus, which simulated a memory device switching from AAAA16 to AAAB16, simulating the leading edge of the first pulse of an oscillation.  A transient error (which starts at t ~ 58 ns for this particular case) is observed on the corrected output (dbus).


Single bit error is injected at t = 50 ns, generating transients.
Only the leading edge of a pulse from the simulated memory is shown.

 

This second simulation shows bit 0 of membus oscillating resulting in multiple transients on dbus in response.  Membus bit 0 makes 4 transitions on 10 ns boundaries.


Simulation with bit 0 oscillating.  The EDAC output, dbus, is not stable.

Strategies

A simple strategy, converting a dynamic into a static fault, that can work in some situations is to add a register in the EDAC circuit.  This will provide stable values to the error detection and correction logic and eliminate glitches.  Each system must be carefully analyzed and the factors to be considered include: system performance, probability of an oscillating memory output or broken solder joint, probability of an SEU (many of which will be corrected by the EDAC), in the additional register, etc.

Conclusion

A combinational EDAC circuit can provide error detection and correction capabilities against static errors.  Proper analysis must be conducted for dynamic errors such as signals that oscillate or have non-logic levels.

Reference:

  1. The 16-bit EDAC VHDL description is courtesy of Hans Tiggeler.

  2. "Single Event Upsets for Space Shuttle Flights of New General Purpose Computer Memory Devices," P.M. O'Neill and G.D.Badhwar, NASA, IEEE Trans. on Nuclear Science, Vol. 41., No. 5, October 1994, pp. 1755 - 1764

  3. "Observations in Characterizing a Commercial MNOS EEPROM for Space," E. E. King, R. C. Lacoe, G. Eng, and M.S. Leung
    The Aerospace Corporation, 2004 MAPLD International Conference, Washington, D.C., September 8-10, 2004

 

Error Detection, Correction, and Fault Tolerance Page


Home - NASA Office of Logic Design
Last Revised: February 03, 2010
Web Grunt: Richard Katz
NACA Seal