Invited Mishap and Lessons Learned Talk:
"Computer Overload and The Apollo 11 Lunar Landing: An Insider's View"
Lockheed-Martin Information Technology (formerly NASA JSC/MSC)
Long, long ago in galaxies far, far away [at least it seems so] there were a bunch of very young NASA engineers with some extraordinary responsibility. Anyone over 30 was an “old man” and most of the “old men” came from the military. These young engineers worked at many NASA locations, but the “Manned Spacecraft Center” in Houston Texas was the home of Apollo program management, astronauts, mission control, and spacecraft design and development. It wasn’t the “rocket science” of the Marshall Spaceflight Center that built the Saturn V launch vehicle. It wasn’t the vehicle test and launch operations of the Kennedy Space Center. But it was one heck of a magnet.
The Apollo spacecraft – both the Command and Service Module (CSM) and the Lunar Module (LM or “lem”) were to be guided by computers, and were to be able to return home safely without support from “The Ground” (Mission Control). The Instrumentation Lab of MIT (now Charles Stark Draper Laboratories) designed the computers and designed and built the software. Software was heavy (memory) and took power, so size was a real factor. The computer in each vehicle was single string and ultra-reliable (more or less the converse of the approach in Shuttle in later years). The LM did have a secondary backup computer for abort guidance during landing or ascent.
All this meant that the software had to be ultra-reliable, capable of handling the unknown, recovering from any failure, working in a demand-response priority world of sensors, computations, commands, as well as crew and ground input. And so, almost literally, it was. And there are a number of folks who worked at MIT and elsewhere at the time who are the real experts in that part of the history.
The Mission Control Center or MCC in Houston was staffed with some of these young NASA engineers, as well as “old men” – in what was called the “flight control team” or “The Ground”. As a group, these people were flight controllers, support engineers, contractors, and many many support systems and laboratories around the country and around the world all “hooked up” together during Apollo flights.
Murphy’s Law, paraphrased, states that if something can go wrong it will. On the first landing on the moon as well as at the first vertical launch attempt of the Space Shuttle it did… in software. The first contributed to the second in a very subtle way.
But more importantly, the story behind the flight control team’s reaction to the program alarms that occurred during the Apollo 11 descent is almost classic in terms of showing why preparation, practice, test, and tenacity can, in the end, pay off. It is a story of taking a system that could recover “from any failure”, and then preparing for failures that just can’t happen. Computers can’t run out of time when they are programmed to run in real time, but the LGC did. And it caused a master caution and warning alarm at a critical time during the first lunar landing. And it caused endless debates in software testing architecture, and eventually drove the design of the Shuttle flight computer software that is still flying today.
Pencil-written checklist that was under the Plexiglas on Garman's console during
the Apollo 11 landing. This was the official result of the quick study of all
alarms we did following the aborted simulation just a few weeks prior to the flight.
"No word processors, just typewriters, ergo no easy way to
make drawings, tables, etc., except by hand on grid-paper so we did!"
Notes for the 2005 MAPLD International Conference Invited Mishap and Lessons Learned Talk: "Computer Overload and The Apollo 11 Lunar Landing," part of Session G: "Digital Engineering and Computer Design: A Retrospective and Lessons Learned for Today's Engineers"
2005 MAPLD International Conference Home Page