A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.

2004 MAPLD International Conference

Ronald Reagan Building and International Trade Center
Washington, D.C.

September 8-10, 2004

Using Root-Cause Analysis to Understand Failures

MODULE LEADER: Faith Chandler, NASA Headquarters


Software is much like people; it has an infinite number of failure modes, some having the potential to cause an accident. Completely understanding a failure that has occurred, and its causes is important to prevent recurrence of similar failures and accidents. When a failure or accident occurs, analysts often identify and describe "what" happened, rather than "why" it happened. Invariably, they trace the failure through the system down to the specific subcomponent, part, or code that failed and identify the direct cause of that failure. The resulting solutions are focused on eliminating or mitigating the effects of the direct cause. If the underlying root causes (the events, conditions, or organizational factors that created the direct cause) are not addressed, the root causes have the potential to promulgate future failures in the existing system and perhaps related systems. We must conduct a systematic evaluation of the root causes, to ensure effective solutions are generated. At NASA, we perform Root-Cause Analysis (RCA), a structured evaluation method that identifies the root causes of an undesired outcome and the actions adequate to prevent recurrence. We have developed job aids to ensure a comprehensive systematic evaluation is completed for all elements of the system, including hardware, software, human, and facilities. Our techniques help the analyst consider all possible root causes such as those related to the design, planning, requirements, specifications, fabrication, testing, inspection, shipping, storage, use, maintenance, and repair of the system. Our policy dictates that if an accident occurs, investigators must perform RCA as a part of a mishap investigation process, identify the proximate (direct) cause, the root causes, and contributing factors and generate corrective actions that eliminate the systemic problems or mitigate their effects.

Presentation: root_cause

Last Revised: February 03, 2010
