A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.

Lessons Learned

A lot of the most valuable information for engineers comes from experience.  The basics are easily found in data sheets, applications notes, and text books.  The hard things are learned by failures.  While unfortunate, they will happen.  What is critical, is that they do not happen again.  Indeed, that is part of the motivation for the Office of Logic Design (OLD) and forms the basis for a lot of the work that is published and disseminated.

Below is the link to the NASA Lessons Learned Information System.  It is quite general.  This web site focuses on the design, analysis, verification, and test of digital systems, a relatively narrow focus.

You can help by submitting the lessons that you have learned on your job, helping to prevent an accident later.  When knowledge isn't shared across our organizations, accidents do happen.  NASA, in 1999, lost a small satellite mission ($75,000,000.02) since the engineers were not aware of an already learned lesson.

If you can contribute a lesson, please take a few minutes to do so.   A formal write-up isn't necessary as we will work with whatever information you supply and produce an application note, white paper, or whatever is appropriate.   Of course, care will be taken to respect those who submit the lessons and they will not be identified, unless they wish to be, as the important thing is the what, how, and why, not the who.

Lessons on any digital engineering topic will be accepted.  These include hardware such as processors, memories, FPGAs, ASICs, PALs, or other microcircuits and their packaging and application.  Also, bugs, "work arounds," or other considerations for various computer aided engineering software tools are valuable.   Checklists, review criteria, and similar material to aid in analyses and reviews are also desired.  Lastly, DPA results, reliability studies, and related items are of use to this community.

To submit information, please contact:

Richard B. Katz
NASA Goddard Space Flight Center
Head, Office of Logic Design
Tel: (301) 286-9705
Fax: (301) 286-0220


Apollo 13 Guidance, Navigation, and Control Challenges

John L. Goodman, United Space Alliance
AIAA Space 2009 Conference & Exposition
September 14-17, 2009
Pasadena, California
AIAA 2009-6455

Abstract: Combustion and rupture of a liquid oxygen tank during the Apollo 13 mission provides lessons and insights for future spacecraft designers and operations personnel who may never, during their careers, have participated in saving a vehicle and crew during a spacecraft emergency. Guidance, Navigation, and Control (GNC) challenges were the reestablishment of attitude control after the oxygen tank incident, re-establishment of a free return trajectory, resolution of a ground tracking conflict between the LM and the Saturn V S-IVB stage, Inertial Measurement Unit (IMU) alignments, maneuvering to burn attitudes, attitude control during burns, and performing manual GNC tasks with most vehicle systems powered down. Debris illuminated by the Sun and gaseous venting from the Service Module (SM) complicated crew attempts to identify stars and prevented execution of nominal IMU alignment procedures. Sightings on the Sun, Moon, and Earth were used instead. Near continuous communications with Mission Control enabled the crew to quickly perform time critical procedures. Overcoming these challenges required the modification of existing contingency procedures.

Best Practices for Researching and Documenting Lessons Learned

John Goodman
United Space Alliance
Houston, Texas 77058

Introduction (excerpt)
     Identification, resolution, and avoidance of technical and programmatic issues are important for ensuring safe and successful space missions. Although the importance of applying lessons learned to reduce risk is frequently stressed, there is little material available to help technical and management personnel research and document lessons learned. Collecting, researching, identifying, and documenting lessons learned that will be useful to current and future management and engineering personnel is not always a straightforward task. This white paper presents lessons learned and best practices concerning the research and documentation of technical and organizational lessons learned. It is intended to enable organizations to initiate or improve lessons learned research and documentation efforts.
     The content of this white paper is based on four technical lessons learned projects:
  • GPS Lessons Learned From the ISS, Space Shuttle and X-38
  • Lessons Learned From Seven Space Shuttle Missions
  • Space Shuttle Rendezvous and Proximity Operations Experience Report
  • Navigation Technical History with Lessons Learned

Learning from Other People's Mistakes

Paul Cheng and Patrick Smith

Most satellite mishaps stem from engineering mistakes. To prevent the same errors from being repeated, Aerospace has compiled lessons that the space community should heed.

Lessons Learned From Seven Space Shuttle Missions

John Goodman
United Space Alliance
Houston, Texas 77058

Introduction (excerpt)

Incidents resulting in loss of life or loss of spacecraft drive thorough investigation by independent boards and publication of accident reports. Much can be learned from well-written descriptions of the technical and organizational factors that lead to an accident (Challenger, Columbia).  Subsequent analysis by third parties of investigation reports and associated evidence collected during the investigations can lead to additional insight.3-7 Much can also be learned from documented close calls that do not result in loss of life or a spacecraft, such as the Mars Exploration Rover Spirit software anomaly, the SOHO mission interruption, and the NEAR burn anomaly.  Seven space shuttle incidents discussed in this paper fall into the latter category:

  • Rendezvous Target Failure On STS-41B

  • Rendezvous Radar Anomaly and Trajectory Dispersion On STS-32

  • Rendezvous Lambert Targeting Anomaly on STS-49

  • Rendezvous Lambert Targeting Anomaly Before STS-51

  • Zero Doppler Steering Maneuver Anomaly Before STS-59

  • Excessive Propellant Consumption During Rendezvous On STS-69

  • Global Positioning System Receiver and Associated Shuttle Flight Software Anomalies on STS-91

Three Years of Global Positioning System Experience on International Space Station

Susan Gomez
NASA Johnson Space Center
August 2006

The International Space Station Global Positioning System (GPS) receiver was activated in April 2002. Since that time, numerous software anomalies surfaced that had to be worked around. Some of the software problems required waivers, such as the time function, while others required extensive operator intervention, such as numerous power cycles. Eventually, enough anomalies surfaced that the three pieces of code included in the GPS unit have been rewritten and the GPS units were upgraded. The technical aspects of the problems are discussed, as well as the underlying causes that led to the delivery of a product that has had numerous problems. The technical aspects of the problems included physical phenomena that were not well understood, such as the affect that the ionosphere would have on the GPS measurements. The underlying causes were traced to inappropriate use of legacy software, changing requirements, inadequate software processes, unrealistic schedules, incorrect contract type, and unclear ownership responsibilities.

Maintainability of Unmanned Planetary Spacecraft: A JPL Perspective

P. Kobele, JPL
AIAA/NASA Symposium on the Maintainability of Aerospace Systems
July 26-27, 1989, Anaheim, CA


The requirements for mission success in unattended environments which do not allow direct repair of spacecraft faults have posed significant challenges in the areas of spacecraft design and mission operations. These challenges have resulted in innovative design requirements and implementation approaches intended to maximize the likelihood of being able to reconfigure the spacecraft to accommodate any of a myriad of spacecraft faults. Autonomous fault detection and correction algorithms and the mission operations elements of recent JPL interplanetary projects have been able to utilize these design features in their operational strategies to recover the spacecraft from what might have been mission terminating occurrences and to allow continuation of essentially undegraded missions.

Destructive Physical Analyses (DPAs) on Field Programmable Gate Arrays (FPGAs) and Non- Volatile Memory Devices, Failure Reports, and Lessons Learned

NASA Advisory NA-GSFC-2006-01
January 12, 2006

  In both FPGA and EEPROM device applications, the realization of past parts issues was delayed, since the failure rate was low. Failures in non-flight parts are not always treated with the same rigor as failures in flight qualified devices.  Additionally, proprietary and stove-piped information barriers, along with a cultural resistance to discussing failures, prevent the user community from pooling their data collectively, observing trends, and “connecting the dots.”  Together, this had led to delays in manufacturers improving their parts, processes, and software.
  NASA GSFC kindly requests other NASA and non-NASA programs and projects to share with the Advisory Technical Point of Contact (see block 13) all DPA and Failure Reports on FPGAs and non-volatile memory devices, from both flight and engineering model usage along with lessons learned that can benefit the community.  Note that prior to dissemination on the NASA Office of Logic Design web site, appropriate care (i.e. deleting items such as contractor names) will be taken.

From Data Collection to Lessons Learned Space Failure Information Exploitation at The Aerospace Corporation

Jonathan F. Binkley, Paul G. Cheng, Patrick L. Smith, and William F. Tosney
The Aerospace Corporation

First International Forum on Integrated System Health Engineering and Management in Aerospace
November 7-10, 2005
Napa, California, USA

The Aerospace Corporation extracts lessons learned from launch vehicle and satellite anomalies to help the space community avoid repetition of mishaps. Incorporated in reports to industry, program reviews, and journal publications, the lessons lend themselves to influence new acquisition guidelines and military specifications. Government and the commercial space communities, which share a common interest in quality improvement, should work together to establish more comprehensive and effective approaches to developing and disseminating lessons learned.

Knowledge Capture and Management for Space Flight Systems

John L. Goodman
United Space Alliance
October 2005


The incorporation of knowledge capture and knowledge management strategies early in the development phase of an exploration program is necessary for safe and successful missions of human and robotic exploration vehicles over the life of a program. Following the transition from the development to the flight phase, loss of underlying theory and rationale governing design and requirements occur through a number of mechanisms. This degrades the quality of engineering work resulting in increased life cycle costs and risk to mission success and safety of flight. Due to budget constraints, concerned personnel in legacy programs often have to improvise methods for knowledge capture and management using existing, but often sub-optimal, information technology and archival resources. Application of advanced information technology to perform knowledge capture and management would be most effective if program wide requirements are defined at the beginning of a program.

GPS Lessons Learned from the International Space Station, Space Shuttle and X-38

John L. Goodman
United Space Alliance
November 2005


Preface (excerpt)
This document is a collection of writings concerning the application of Global Positioning System (GPS) technology to the International Space Station (ISS), Space Shuttle, and X-38 vehicles. An overview of how GPS technology was applied is given for each vehicle, including rationale behind the integration architecture, and rationale governing the use (or non-use) of GPS data during flight. For the convenience of the reader, who may not be interested in specific details of the ISS, Shuttle and X-38 applications, the lessons learned chapter is at the beginning of the document. Most of this material can be understood without reading the sections specific to the ISS, Shuttle and X-38.

Apollo Spacecraft

George M. Low
NASA Manned Spacecraft Center
AIAA 6th Anual Meeting and Technical Display
Anaheim, California, October 20-24, 1969

The flawless performance of the five manned Apollo flights is attributed to reliable hardware; thoroughly planned and executed flight operations; and skilled, superbly trained crews.  Major factors contributing to spacecraft reliability are simplicity and redundancy in design; major emphasis on tests; a disciplined system of change control; and closeout of all discrepancies.  In the Apollo design, the elimination of complex interfaces between major hardware elements was also an important consideration.  The use of man, in flying and operating the spacecraft, evolved during the course of the program, with a tendency to place more reliance on automatic systems; however, the capability for monitoring and manual takeover was always maintained.  The spacecraft test effort was increased during the 18 months preceding the first manned flight with emphasis on environmental acceptance testing.  This test method screened out a large number of faulty components prior to installation.

Knowledge Capture and Management - Key To Ensuring Flight Safety and Mission Success

John L. Goodman
United Space Alliance
AIAA Space 2005 Conference
Long Beach, CA, August 30 - September 1, 2005.

Copyright 2005 by United Space Alliance, LLC. These materials are sponsored by the National Aeronautics and Space Administration under Contract NAS9-20000. The U.S. Government retains a paidup, nonexclusive, irrevocable worldwide license in such materials to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the U.S. Government. All other rights are reserved by the copyright owner.

The incorporation of knowledge capture and knowledge management strategies early in the development phase of an exploration program is necessary for safe and successful missions of human and robotic exploration vehicles over the life of a program. Following the transition from the development to the flight phase, loss of underlying theory and rationale governing design and requirements occur through a number of mechanisms. This degrades the quality of engineering work resulting in increased life cycle costs and risk to mission success and safety of flight. Due to budget constraints, concerned personnel in legacy programs often have to improvise methods for knowledge capture and management using existing, but often sub-optimal, information technology and archival resources.  Application of advanced information technology to perform knowledge capture and management would be most effective if program wide requirements are defined at the beginning of a program.

Product Assurance Program Planning - Some Lessons Learned from Apollo

Gerald Sandler, Grumman Aerospace Corporation
AIAA Paper No. 72-247
AIAA Man's Role in Space Conference
Cocoa Beach, Florida, March 27-28, 1972


Over the past decade we have developed the technical and programmatic approaches needed to provide the levels of reliability required for manned space missions. The combination of design, test and control or assurance programs used on Apollo have proven very effective. In the design approach we have learned how to minimize the number of potential single-point failures that could result in mission failure. In test and product assurance areas, screens and controls were established that effectively prevented a. latent defect from filtering through the system and occurring in flight. The cost of these combined efforts, however, have been a large percentage of total program costs. The challenge of this decade, I believe, is how to achieve the same or improved levels of reliability at lower program costs.

The area of primary concentration, at this time, should be failures that are "human oriented" rather than "design oriented". Our engineering techniques have gone a long way in reducing the latter problem area. on Apollo half or more of the failures that occurred in the test programs were classified as workmanship, procedural or quality-oriented problems. We have learned how to screen them out by test; what we have to do now is to prevent them from occurring or catch them earlier. In addition, recognizing that failures will always occur in our test programs, the cost challenge is to design units and systems for maintain- ability, rework and proper isolation, so that we can minimize the extent of retesting for adequate confidence.


F.J. Bailey, Jr., NASA Manned Space Center
AIAA Space Flight Testing Conference
Cocoa Beach, Florida, March 18-20, 1963

   The papers presented so far in this session have described specific measures taken in preparing the launch vehicle and spacecraft for Mercury missions.  the purpose of the present paper is to review, in somewhat more general terms, some of the more significant lessons learned in the Mercury program, to see where changes or additional measures may be desirable in future programs.
   The lessons that have been learned fall broadly into two main areas, the first applying to program planning, the second to detailed design.

When Spacecraft Won't PointXL

Christopher D. Hall
2003 AAS/AIAA Astrodynamics Specialists Conference
Big Sky, Montana, August 2003
Paper # AAS 03-505


The Spacecraft Attitude Dynamics and Control course at Virginia Tech is primarily taken by juniors as an alternative to the aircraft stability and control course. Such a course can be taught in many different ways. On one extreme, one could invoke the powerful machinery of geometric mechanics, including the momentum map, so(3), SO(3), cotangent bundles and symplectic manifolds. At the other extreme, one might use a handbook with convenient sizing formulas for designing ADCS hardware. Somewhere in between these extreme approaches lie the approaches used in most courses. In any case, students can better appreciate the significance of the selected topics covered if they are provided with concrete examples. One particularly interesting type of example is the ADCS failure or anomaly, especially where a failure is caused by the same type of error that the students are being asked to understand and not make.

ACTS PYRO Separation Band Anomaly (Shuttle Orbiter)


Minor damage to the Shuttle was caused when the firing of the primary explosive cord to deploy the payload from the cargo bay also triggered the backup cord. End-to-end system tests had validated the erroneous design rather than the end function. Document electrical-mechanical interfaces, protect hazardous systems against any possible unintended operation, and consider use of a single cord configuration.

Collective Knowledge Gained from Gemini

Charles W. Mathews
NASA Manned Spacecraft Center
AIAA Paper No. 66-1027
AIAA Third Annual Meeting, Boston, MA, Nov. 29-Dec. 2, 1966.

The Gemini Program has comprised 12 space flights, 10 of which were manned operations.  The information gained is difficult to summarize within a brief paper, but more detailed information has and will continue to be made available to those who have an interest in it.  With minor exceptions, the objectives of the program were met, having been expanded well beyond original concepts and examined in considerably more depth than expected.  Gemini leaves a legacy of results that, hopefully, will further accelerate man's efforts to explore and utilize the frontier of space.

Summary of Gemini Rendezvous Experience

Glynn S. Lunney
NASA Manned Spacecraft Center
AIAA Paper No. 67-272
AIAA Flight Test, Simulation and Support Conference
Cocoa Beach, Florida, Feb. 6-8, 1967

A significant portion of the Gemini program was devoted to the rendezvous problem. One of the major objectives was to establish a base of operational experience and confidence in the required techniques. In this paper, the planning and flight test cycle is reviewed to provide an outline of the Gemini results. Many various considerations were studied and several of the more important factors are discussed as to their influence on the different choices and subsequent operations. The flight test results are summarized according to technique and performance such as propellant costs, satisfaction of conditions, et cetera. Overall, the conclusion is that the base of experience has been established, the rendezvous sequence is practical, the systems and the management of these systems have been satisfactory in accuracy and performance. Further study and a continued, detailed preparation will be the key to the future uses of rendezvous.

MSFC Skylab Lessons Learned

NASA TM X-64860
July 1974

Key lessons learned during the Skylab Program that could have impact on on-going and future programs are presented.  They present early and sometimes subjective opinions; however, they give insights into key areas of concern. These experiences from a complex space program management and space flight serve as an early assessment to provide the most advantage to programs underway. References to other more detailed reports are provided for the individual's specific area of interest.

Lessons learned on the Skylab program (JSC) - 1974



The lessons learned in the Skylab Program are described in five basic documents prepared by and representing the experience of NASA Headquarters, the Lyndon B. Johnson Space Center, the John F. Kennedy Space Center, and the Skylab and Saturn Program Offices at the George C. Marshall Space Flight Center. The documents are intended primarily for use by persons who are familiar with the disciplines covered and who are involved in other programs. Thus, the individual lessons are brief rather than detailed.

Authors of the lessons have been encouraged to be candid. The reader may detect apparent differences in approach in some areas, illustrating that equally effective management action in a particular area frequently can be accomplished by several approaches.

The recommendations and actions described are not necessarily the only or the best approaches, but they reflect Skylab experience that must be tailored to other situations and should be accepted by the reader as one input to the management decision making process. As such, these recommendations, which are based on approaches that were found to be effective in the Skylab Program, should be used to help identify potential problems of future space programs. Many of the lessons are subjective and represent individual opinions and should not be interpreted as official statements of NASA positions or policies.

In addition to the Skylab Lessons Learned documents, Skylab Mission Evaluation Reports are being issued by the previously mentioned NASA agencies to provide detailed evaluation results. The results of the scientific experiments will be disseminated by the Principal Investigators.

Gemini: Mercury Experience Applied

Jerome B. Hammack and Walter J. Kapryan
NASA - Manned Spacecraft Center
Houston, Texas


     It is the intent of this paper to show how the Gemini program has attempted to draw upon and profit from Mercury experience.
     The Gemini Project has evolved as a NASA space program with its prime mission of providing a flexible space system that will enable us to gain proficiency in manned space flight and to develop new techniques for advanced flights, including rendezvous.  To achieve these objectives, we must have a space vehicle with substantially greater capability than the Mercury spacecraft.  This increased capability will include provisions for two men, instead of one, as in the Mercury spacecraft and for space missions of up to two weeks' duration.  It is the intent of the Gemini Project to build upon the experience gained from Mercury so that most of the energies of the new program can be devoted to the solution of the problems associated with achieving its primary mission objectives and not have to fight its way through a swelter of old problems.

Lessons Learned but Forgotten from the Space Shuttle Challenger Accident

Allan J. McDonald, ATK Thiokol Propulsion (Retired)
Space 2004 Conference and Exhibit
September 28-30, 2004, San Diego, California
AIAA 2004-5830


At the time of the Challenger accident, I was the Director of the Space Shuttle Solid Rocket Motor Project for Morton Thiokol Inc.. The cause of the failure and the controversy surrounding the decision to launch the Challenger in such cold weather is discussed in detail in the Presidential Commission's Report on the Challenger Accident. The Challenger was launched at 16:38:00:010 GMT on January 28th, 1986 from the Kennedy Space Center (KSC). I was in the Launch Control Center (LCC) at the time of the launch. The Mission Management Teams’ (MMT) decision to launch the Challenger was flawed because of the lack of communication both horizontally and vertically within the NASA organizational structure. The Columbia accident suffered from a similar breakdown in communications along with failure to consider the seriousness of engineers' concerns much like the Challenger. This paper will discuss the details leading to the failure of the Challenger and the lessons learned from the accident. The paper will also show how the mistakes from the Challenger accident in 1986, the 25th flight of the Space Shuttle, were repeated in the loss of the Columbia in 2003, some 17 years and 88 flights later.

Commercial Off The Shelf (COTS) Digital Signal Processor Experienced Destructive Events as a Result of Ionizing Radiation Testing


The Fluids and Combustion Facility (FCF) project at the NASA Glenn Research Center subjected the Digital Signal Processor (DSP) based Data Acquisition board to ionizing radiation testing to simulate the International Space Station US Lab radiation environment. Components on the board were irradiated by a 200 MeV proton beam and were exposed to a ten year equivalent dose (600 Rads with a 1.5 Safety margin) of ionizing radiation. All exposures resulted in destructive events in the DSP chips on board.

The Digital Signal Processor (DSP) based Data Acquisition board is a commercial off-the-shelf product that was not designed for space applications. There are four identical DSP chips on the board. The DSP chips are commercial microcircuits, also not intended for space applications. The DSP chips are utilized for image acquisition from FCF Serial Data Link (SDL) supported cameras.

No other devices on the Data Acquisition boards were observed to fail, however the boards were not tested beyond about 1-2% of the total intended proton fluence when the specific SHARC DSP chips were exposed directly.

MER Spirit Flash Memory Anomaly (2004)

NASA Public Lessons Learned System (PLLS) Database

Shortly after the commencement of science activities on Mars, an MER rover lost the ability to execute any task that requested memory from the flight computer. The cause was incorrect configuration parameters in two operating system software modules that control the storage of files in system memory and flash memory. Seven recommendations cover enforcing design guidelines for COTS software, verifying assumptions about software behavior, maintaining a list of lower priority action items, testing flight software internal functions, creating a comprehensive suite of tests and automated analysis tools, providing downlinked data on system resources, and avoiding the problematic file system and complex directory structure.

Apollo Experience Reports


Managing the Moon Program: Lessons Learned From Project Apollo

Monographs in Aerospace History, No. 14, 1999

Moderator John M. Logsdon.  Participants: Howard W. Tindall, George E. Mueller, Owen W. Morris, Maxime A. Faget, Robert A. Gilruth, and Christopher C. Kraft.

Lessons Learned From Flights of “Off the Shelf” Aviation Navigation Units on the Space Shuttle

John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC

The Space Shuttle program began flying atmospheric flight navigation units in 1993, in support of Shuttle avionics upgrades. In the early 1990s, it was anticipated that proven in-production navigation units would greatly reduce integration, certification and maintenance costs.  However, technical issues arising from ground and flight tests resulted in a slip in the Shuttle GPS certification date.  A number of lessons were learned concerning the adaptation of atmospheric flight navigation units for use in low-Earth orbit. They are applicable to any use of a navigation unit in an application significantly different from the one for which it was originally designed. Flight experience has shown that atmospheric flight navigation units are not adequate to support anticipated space applications of GPS, such as autonomous operation, rendezvous, formation flying and replacement of ground tracking systems.

The Space Shuttle and GPS – A Safety-Critical Navigation Upgrade

John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC

In 1993, the Space Shuttle Program selected an off-the-shelf Global Positioning System (GPS) receiver to eventually replace the three Tactical Air Navigation units on each space shuttle orbiter. A proven, large production base GPS receiver was believed to be the key to reducing integration, certification, and maintenance costs. More GPS software changes, shuttle flight software changes, and flight and ground testing were required than anticipated. This resulted in a 3-year slip in the shuttle GPS certification date. A close relationship with the GPS vendor, open communication among team members, Independent Verification and Validation of source code, and GPS receiver design insight were keys to successful certification of GPS for operational use by the space shuttle.

A Software Perspective on GNSS Receiver Integration and Operation

John L. Goodman
NASA Johnson Space Center, United Space Alliance, LLC

The GNSS industry is focusing on potential threats to satellite navigation integrity, such as intentional and unintentional interference, signal-in-space (satellite) and ground support infrastructure anomalies, shared spectrum issues, and multipath. The experience of the International Space Station (ISS) program, the Space Shuttle program, the Crew Return Vehicle (CRV) program and other users of GNSS indicate that navigation outages due to receiver software issues may pose as great a risk, if not more, to the user than threats currently under study.  The improvement in GNSS receiver tracking capability and navigation accuracy has been accompanied by an increase in software quantity and complexity. Current and future GNSS receivers will interface with multiple systems that will further increase software complexity. Rather than viewing GNSS receivers as “plug and play” devices, they should be regarded as complex computers that interface with other complex computers, sometimes in safety critical applications. The high cost of meeting strict software quality standards, and the proprietary nature of GNSS receiver software, makes it more difficult to ensure quality software for safety-critical applications. Lack of integrator and user insight into GNSS software complicates the integration and test process, leading to cost and schedule issues.

Beyond Normal Accidents and High Reliability Organizations: The Need for an Alternative Approach to Safety in Complex Systems

Karen Marais, Nicolas Dulac, and Nancy Leveson
Engineering Systems Symposium, March 24, 2004

Organizational factors play a role in almost all accidents and are a critical part of understanding and preventing them. Two prominent sociological schools of thought have addressed the organizational aspects of safety: Normal Accident Theory (NAT) and High Reliability Organizations (HRO). In this paper, we argue that the conclusions of HRO researchers (labeled HRO in the rest of this paper) are limited in their applicability and usefulness for complex, high-risk systems. HRO oversimplifies the problems faced by engineers and organizations building safety-critical systems and following some of the recommendations could lead to accidents. NAT, on the other hand, does recognize the difficulties involved but is unnecessarily pessimistic about the possibility of effectively dealing with them. An alternative systems approach to safety in described, which avoids the limitations of NAT and HRO. While this paper uses the Space Shuttle, particularly the Columbia accident, as the primary example, the conclusions apply to most high-tech, complex systems.

Lessons from the Shuttle Independent Assessment

Dr. Tina L. Panontin
Chief Engineer, NASA Ames Research Center
RMC III, September 19, 2002


  • Origin of the Shuttle Independent Assessment
  • Shuttle Independent Assessment Team (SIAT)
  • Assessment Structure
  • Assessment Method
  • General Results
  • Example Findings
  • Case Study: SSME LOX Pin Ejection
  • Recommended Improvements to Current Methods
  • Recommended Future Improvements

Satellite GN&C Anomaly Trends

Brent Robertson*, Eric Stoneking*
NASA Goddard Space Flight Center


On-orbit anomaly records for satellites launched from 1990 through 2001 are reviewed to determine recent trends of unmanned space mission critical failures. Anomalies categorized by subsystems show that Guidance, Navigation and Control (GN&C) subsystems have a high number of anomalies that result in a mission critical failure when compared to other subsystems. A mission critical failure is defined as a premature loss of a satellite or loss of its ability to perform its primary mission during its design life. The majority of anomalies are shown to occur early in the mission, usually within one year from launch. GN&C anomalies are categorized by cause and equipment type involved. A statistical analysis of the data is presented for all anomalies compared with the GN&C anomalies for various mission types, orbits and time periods. Conclusions and recommendations are presented for improving mission success and reliability.

Conclusion (excerpt)
A study of past on-orbit anomalies was undertaken to assess how future satellite program resources might be best spent to ensure mission success. Spacecraft anomaly trends were surveyed over the last decade, with the hope of learning ways to improve the process of GN&C system development, to reduce the failure rate of future missions. One conclusion that was apparent during the data survey was that industry-wide data is not shared on a routine basis. It is difficult to learn from history if anomaly records are kept out of the public domain.

Propulsion Lessons Learned from the Loss of Mars Observer

Carl S.Guernsey
Jet Propulsion Laboratory Pasadena,CA

AIAA 2001-3630
37th AIAA/ASME/SAE/ASEE Joint Propulsion Conference
8-11 July 2001
Salt Lake City,Utah

Contact with the Mars Observer (MO) spacecraft was lost in August 1993, three days before it was to have entered orbit around the planet Mars.  The spacecraft's transmitter had been turned off in preparation for pressurization of the propulsion system, and no signal was ever detected from the vehicle again.  Due to the lack of telemetry, it was never possible to determine with certainty what caused the loss of the spacecraft, and review boards from JPL, the Naval Research Laboratory (NRL), and the spacecraft contractor were only able to narrow the probable cause of the failure to a handful of credible failure modes.  This paper presents an overview of the potential failure modes identified by the JPL review board and presents evidence, discovered after the failure reviews were complete, that the loss was very likely due to the use of an incompatible braze material in the flow restriction orifice of the pressure regulator.  Lessons learned and design practices to avoid this and other propulsion failure modes considered candidates for the loss of MO are discussed.

The NEAR Discovery Mission: Lessons Learned

R. H. Maurer and A. G. Santo

The 10th Annual AIAA/ Utah State University Conference on Small Satellites


Under a contract from NASA The Johns Hopkins University Applied Physics Laboratory built and launched a spacecraft that will rendezvous and orbit the near earth asteroid 433 Eros. The Near Earth Asteroid Rendezvous (NEAR) spacecraft is the first under NASA’s Discovery Program, which is a series of low cost solar system missions. While in orbit around Eros the spacecraft will measure the bulk, surface, and internal properties of the asteroid for 10 months. This paper describes the lessons learned from design, test, and fabrication that are appropriate to other programs in quick development, or of an interplanetary nature.

Aerospace Corporation Lessons Learned
For access, contact


Paul G. Cheng, Douglas D. Chism, Wayne H. Goodman, Patrick L. Smith, and William F. Tosney The Aerospace Corporation

Colonel Michael S. Giblin
USAF Space and Missile Systems Center



The space community has long held it vital to learn from past experiences and avoid the repetition of mishaps. Procedures to collect and disseminate these "lessons learned" have been set up to serve this need. However, existing lessons-learned systems have several drawbacks: they are confined to particular technical areas, are difficult to access, or are not enforceable. The 1999 U.S. Air Force Broad Area Review (BAR) of launch vehicles recognized these deficiencies, and recommended the creation of an improved lessons-sharing mechanism. The U.S. Air Force Space and Missile Systems Center (SMC), with The Aerospace Corporation’s support, has implemented a "Space Systems Engineering Lessons Learned" system based on the BAR recommendation. This new procedure for information sharing has a broad scope as well as an active dissemination mechanism. It spans all facets of program development, including systems engineering, design, software, manufacturing, test, launch, and on-orbit operations. The lessons are electronically available to the U.S. space community, and adoption of best practices espoused in these lessons would improve SMC’s Operational Safety, Suitability, and Effectiveness (OSS&E) process.

Qualification by Test: An Example with Clock Skew



Showing design margins by test demonstrated on the ground can not be used to predict reliability on orbit for this class of circuit.  For other classes of circuits, such as the change of propagation delay between two clock edges of a crystal clock oscillator, margin testing can have some value.

Showing design margin by logic simulation can not be used to predict reliability on orbit for this class of circuit.  Most logic simulators switch models between runs -- min, typ, and max -- and are incapable of performing min-max analysis.  The simulation algorithms assume that the variable parameters track.  As seen in Figures 4 and 5, for example, showing the effects of life and antifuse resistance, this is not the case.  Real radiation environments are also a concern.  The "tracking" assumption is simply wrong and is no more than "engineering by arm waving."

(June 8, 2002)

How Software Errors Contribute to Satellite Failures - Lessons Learned

scsra.pdf (open version)


Dr. Paul G. Cheng
The Aerospace Corporation
Risk Assessment & Management Subdivision, Systems Engineering Division
April 24, 2002

The slides (NASA version) are not yet released for open distribution.  E-mail me at for a copy.

CEO draws quality lessons from design failures

By Peggy Aycinena, Integrated System Design
March 22, 2000

CEO draws quality lessons from design failures
(external link)

In a fast-moving keynote address at the first IEEE International Symposium on Quality Electronic Design (ISQED), John East, president and chief executive officer of Actel Corp., highlighted a number of widely-known design failures from the last several decades and offered a primer on the complexities of design quality.  (April 16, 2002)

Lessons Learned from FPGA Developments

FPGA-001-01, Version 0.0
April 2002
Prepared by Sandi Habinc


This document is a compilation of problems encountered and lessons learned from the usage of
Field Programmable Gate Array (FPGA) devices in European Space Agency (ESA) and National Aeronautics and Space Administration (NASA) satellite missions. The objective has been to list the most common problems which can be avoided by careful design and it is therefore not an exhaustive compilation of experienced problems.

This document can also been seen as a set of guidelines to FPGA design for space flight applications. It provides a development method which outlines a development flow that is commonly considered as sufficient for FPGA design. The document also provides down to earth design methods and hints that should be considered by any FPGA designer. Emphasis has also been placed on development tool related problems, especially focusing on Single Event Upset (SEU) hardships in once-only-programmable FPGA devices. Discussions about re-programmable FPGA device will be covered only briefly since outside the scope of this document and will become the focus of a separate future technical report. (April 16, 2002)

JPL Common Threads Workshop Summary Report

May 31, 1996
JPL D-13776
Arthur F. Brown and John E. Koch


A Common Threads (CT) workshop was held on 31 May 1996 in the Pasadena Technical Center.  The objective of the workshop was to attempt to convey some of the knowledge of seasoned Project Managers (PM) to the new generation of PMs.   The permise and the theme of the workshop was that "common threads" exist which appear in program after program, in the form of similar flight and test failures and failure mechanisms, recurring programmatic issues and sometimes serious oversights.   These problems are understood and often solved in some innovative way on one program, but the knowledge is frequently not passed to another program with a similar problem, and the cycle repeats.

Retain Engineering Rights Retain engineering rights to all designs, analyses, procedures, and test results.
NASA Lessons Learned The Lessons Learned Information System (LLIS) is a NASA-wide lessons learned repository. The LLIS offers search capabilities to permit various searches (e.g., NASA Center, date, Project, search string, etc.). Additional categorization capability is under evaluation for future implementation by the LLIS Steering Committee.

The NASA Lessons Learned url link will take you directly to the LLIS Home Page.

Lessons Learned Links Lessons Learned Links (external link, Navy)
Chandra Lessons Learned Marshall Space Flight Center (MSFC) was responsible for the development of the Chandra X-ray Observatory, successfully launched in July, 1999. Chandra is the third of NASA's "Great Observatories." This captures program management lessons learned from Chandra's inception through its launch in 1999.

Survey of NASA's Lessons Learned Process

September 5, 2001




The National Aeronautics and Space Administration's (NASA) procedures and guidelines require that program and project managers review and apply lessons learned from the past throughout a program's or project's life cycle and to document and submit any significant lessons learned in a timely manner. Lessons learned systems are used by many military, commercial and government organizations to capture, store, disseminate, and share knowledge gained from past experiences. NASA's principal mechanism for collecting and sharing lessons learned from programs, projects, and missions agency wide is the Lessons Learned Information System (LLIS). The goal of LLIS is to ensure that NASA does not have to keep "relearning" the lessons of the past. NASA also shares lessons learned through revisions to its policies and guidance. Further, lessons learned from a mishap or operational event are captured in procedure and process documents. GAO surveyed all of NASA's program and project managers to obtain their perspectives on the mechanisms NASA has in place to ensure that past lessons learned from mission failures are being applied. GAO's survey highlighted fundamental weaknesses in the collection and sharing of lessons learned in NASA by program and project managers as well as in the agency's LLIS. While some lessons learning does take place, lessons are not routinely identified, collected, or shared by program and project managers. In addition, many respondents indicated that they are dissatisfied with NASA's lessons learned processes and systems. Respondents also identified challenges or barriers to the sharing of lessons learned as well as areas of improvement.

NASA: Better Mechanisms Needed for Sharing Lessons Learned

January 30, 2002

Executive Summary

In the early 1990s, the National Aeronautics and Space Administration (NASA) administrator challenged the agency to complete projects faster, better, and cheaper. The intent was to reduce costs, become more efficient, and increase scientific results by conducting more and smaller missions in less time. Although NASA maintained a high success rate under the faster, better, and cheaper strategy, a few significant mission failures also occurred—particularly the loss of the Mars Polar Lander and Climate Orbiter spacecraft. NASA investigations of these failures, as well as its review of other programs, raised concern that lessons from past experiences were not being applied to current programs and projects.

At the request of the Chairman and Ranking Minority Member, Subcommittee on Space and Aeronautics, House Committee on Science, GAO assessed whether NASA has adequate mechanisms in place to ensure that past lessons learned from mission failures are being applied.   Specifically, GAO (1) identified the policies, procedures, and systems NASA has in place for lessons learning, (2) assessed how effectively these policies, procedures, and systems facilitate lessons learning, and (3) determined whether further efforts are needed to improve lessons learning.
Lessons Learned at JPL

Lessons Learned At JPL From the HESSI mishap

Considerable newspaper and technical publication coverage was given to an overly-severe March 21, 2000 vibration test in Room 144 of Building 100 at the Jet Propulsion Laboratory, Pasadena, California. The over-test caused significant damage (over $1,000,000) to the High Energy Solar Spectroscopic Imager (HESSI) satellite built by the University of California at Berkeley (UCB). A Mishap Investigation Board (MIB) was convened.

Skylab_Lessons.htm These lessons learned are from Skylab Lessons Learned as Applicable to a Large Space Station, a dissertation submitted to the faculty of The School of Engineering and Architecture of the Catholic University of America for the Degree Doctor of Engineering by William C. Schneider, Washington, D.C., 1976.

Thanks to Lisa Coe of NASA/MSFC for suggesting this page.

Home -
Last Revised: February 03, 2010  --  Web Grunt: Richard Katz