NASA Office of Logic Design

NASA Office of Logic Design

A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.


White Paper on Definitions for and Approach to Anomaly Handling

Professor Nancy Leveson
Aeronautics and Astronautics Department
MIT

Rich Katz
NASA Office of Logic Design

December 21, 2004

INTRODUCTION

The current definitions of in-family and out-of-family performance used by some organizations essentially allow any anomalous behavior to be labeled as “in family” once it has been accepted or waived, whether or not it satisfies the requirements, specified performance limits, or good engineering practice.  By not determining the root cause and mechanisms and by not bounding worst-case performance, limits on the effects of repeated events are not bound. The Columbia accident and the treatment of the foam shedding over the life of the Shuttle demonstrate the problems with these definitions and practices.

RECOMMENDATION:  It is recommended that the definition of “in-family” be:

Behavior is in-family when performance meets requirements, is within specification limits, and is within expected values (range) from previous samples, i.e., the behavior does not differ significantly from that previously observed for the same or similar equipment (even if the behavior meets specifications).  Additionally, the mechanism underlying two or more events must be the same for the set of events to be considered in-family.

RECOMMENDATION:  It is recommended that the definition of “out-of-family” be:

Operation or performance is out-of-family if it does not conform to performance requirements, does not meet specification limits, or is outside expected values from previous samples and experience. Performance may be within both within specification and tolerable limits, but can be out-of-family if the parameter values either differs from previous samples or if the trend line indicates that performance is headed away from previous samples, indicating the possibility of unidentified underlying failure modes. If after analysis and other engineering activities, including the identification of mechanism and root cause and a worst-case analysis, a determination is made that the requirements should be changed to include the behavior, then by definition the behavior becomes in-family from that time onward, otherwise it remains out-of-family.

 


Definitions themselves, however, are not enough. Appropriate processes and procedures that are used to handle out-of-family behavior must also be followed.

RECOMMENDATION: When out-of-family behavior is first observed, engineering analyses and tests must identify the mechanism and root cause. Following that, bounds for worst-case behavior must be determined. Finally, system performance and safety analysis must be conducted to ensure both mission success with acceptable system performance and system safety.

Original requirements and specifications may need to be modified, or waivers written, after such an analysis if acceptable performance and safety can be proven. Acceptable rationales may include: assumptions underlying the original analysis were incorrect; the original requirements and specifications were tighter than necessary; production variances based change of materials, or processes, etc.


Discussion

This definition of "in family" is a tighter definition than is often used.  For example, [1] uses the following criteria for the generation of PFRs  (Problem/Failure Reports):

PFRs are generated for any departure from a design, performance, testing, or handling requirement that affects the function of flight equipment, ground support equipment that interfaces with flight equipment or that could compromise mission objectives.

However, there have been cases where performance did not depart from the governing requirement but did differ from expected or "in-family" values, indicating a problem.  For example, the following chart shows leakage currents for a group of RTSX32SU Field Programmable Gate Arrays currently undergoing an accelerated life test.


Figure 1.  "Out of family" but in specification ICCA value.

Examining Figure 1, we see an ICCA value for S/N 50718 of less than 2.5 mA at +125 ºC, far below the specification value of 25 mA total.  The other contributor to leakage current, ICCI, was less then 4 mA.

Thus, we find that S/N 50718 is well within specification.  However, in judging this device's performance it is seen to be "outside expected values from previous samples."  In this case, all samples were from a single wafer resulting in a tight distribution, as seen in the statistics listed in Figure 1.  S/N 50718 and the failure analysis, which is currently underway, has shown a device fault.

Examples such as this illustrate the importance in determining "in family" and "out of family" performance levels and not solely relying on absolute performance limits.

Reference:

[1] "Pre-Flight Problem / Failure Reporting Procedures," one of the NASA Reliability Preferred Practices for Design & Test.


Home - NASA Office of Logic Design
Last Revised: February 03, 2010
Digital Engineering Institute
Web Grunt: Richard Katz
NACA Seal