NASA Office of Logic Design

NASA Office of Logic Design

A scientific study of the problems of digital engineering for space flight systems,
with a view to their practical solution.

Shuttle ALT Free Flight 1: GPC 2 Failure

4.2.5   Avionics Data Processing Systems

The overall performance of the data processing system during the free-flight test phase was satisfactory except for the problems discussed.

Flight 1:

During preflight checkout activities, computer 3 failed to synchronize while in operational sequence 1. The computer was replaced prior to flight and the replacement computer performed normally.  Subsequently, a memory dump performed on the failed computer disclosed that a machine check error had occurred. The error was attributed to a central processing unit parity error. Extensive testing was performed on the failed computer but the problem could not be duplicated. A memory interface page was replaced since this was the most probable cause of the problem. The computer was retested and installed in the computer 3 location prior to free flight 2. The computer performed satisfactorily in that location for the remainder of the Approach and Landing Test Program.  Failure analysis of the removed memory interface page has not been completed.

A second problem occurred at the time of separation When computer 2 stopped processing and the redundant computers voted computer 2 out of the redundant set. During subsequent testing at the vendor, the anomaly was reproduced while the computer was undergoing low-level vibration testing.  The anomaly was traced to a faulty solder joint. Redesigned pages were installed; the computer was retested and replaced in the orbiter in the computer 1 location. The computer performed satisfactorily on all subsequent flights. Additional details of this anomaly are given in paragraph 7.2.1.

Flights 2-5:


4.3.1   Free Flight 1 Separation Through Touchdown

The separation event was marked by a sharp, but not loud, explosive sound and a brief, sharp, upward lurch. Neither the noise nor the jolt were particularly distracting and did not affect the accomplishment of the planned procedures.  A right roll after separation had been predicted from the load cell data but was not noticed.

Immediately after the separation event, a master alarm occurred and a computer caution and warning light, a computer annunciation matrix column on general purpose computer 2, and a big "X" on cathode ray tube 2 were noticed (ref. par. 7.2.1). At this time, the crew also sensed that the pitch rate had decreased almost to zero. The attitude, as indicated by the attitude indicator, was observed at 2 to 3, and the pitch rate was 1 per second. Additional pitch-up command was made with the rotational hand controller to increase pitch rate to 2 per second and to attain the desired pitch attitude. After a 10 pitch attitude was established, a 20 right bank was established. Both chase aircraft calls came sooner than expected with the Chase-2 "clear" coming just as 20 bank was achieved. The call to Mission Control on general purpose computer 2 "fail to sync" and pushover were accomplished together. It was obvious from the combined pitch/roll task after separation that the orbiter was handling well on three primary computers .

The general purpose computer 2 mode switch was placed to STANDBY for approximately 2 seconds and then to HALT. After receiving a "go" for terminal area energy management from Mission Control, major mode 203 was selected with the inputs made to CRT 3. The data processing system malfunction procedures were then completed which involved turning off aerosurface servo amplifier 2, pulling the three accelerometer assembly circuit breakers, and pulling the air data transducer assembly 3 circuit breaker.


7.2.1 General Purpose Computer 2 Lost Synchronization at Separation

Computer 2 (system F8) lost synchronization at separation on free flight 1.  (Dump data showed that the first failure indication occurred within approximately 20 milliseconds after separation.) Fourteen of fifteen input-output errors logged by computer 2 after separation were on busses commanded by computer 2. The input-output processor/central processing unit interface was executed in an unusual manner with missing or unsolicited interrupts and receipt of an unknown level B input-output error. In addition, several unexplained or unexpected computer 2 memory locations were altered, including changes in input-output processor code, an abnormally large input-output processor program data variable and unexpected modification of input-output control blocks.

Computers 1, 3 and 4 logged eight input-output errors after separation. All but one were on busses commanded by computer 2. Computers 1, 3 and 4 saw separation A discrete only, while computer 2 saw separation B discrete. Computer 2 did open flight control limits and initiate separation guidance, navigation and control processing.

Postflight testing on the vehicle (including pyrotechnic shock and electromagnetic interference tests) did not reproduce the problem. Also, the grounding paths in the vehicle were measured and verified to be proper. However, the problem was reproduced at the vendor's facility when the flight unit (input-output processor, serial number 7) was subjected to low-level vibration testing at 0.01 g2/Hz. Subsequent inspection revealed a solder crack at a prom lead on the queue page (fig. 7-3). The solder had failed to wick in a plated-through hole. The unit had been acceptance tested at 0.04 g2/Hz after 1848 hours of field run time.  The failure occurred after only 150 additional hours. The failure was probably caused by fatigue due to vibration and thermal cycling.  Acceptance testing is unable to screen out potential fatigue failures.

In-line changes had been implemented to circumvent this kind of problem, but not in time to be applied to system F8. Using the old verification procedure, the crowded page configuration made even oblique X-ray examination of some solder joints unsatisfactory for verification of the complete page. To correct this situation, the procedure was modified so that component X-ray inspection of solder wetting is accomplished before back-plate installation. Other changes consisted of doubling the copper thickness of the signal planes to increase physical strength during solder heating and providing thermal relief around the ground plane junction, reducing thermal conductivity away from the solder connection (fig. 7-3).  The thermal relief modification provides a smaller, controlled heat path between the solder connection and the rest of the ground plane, slowing the heat sink rate and allowing flow, filling, and bonding of solder to at least 33 percent of full depth.

The changed procedure is applicable to the local store page, the queue page and the two prom pages in the input-output processor, and the two prom pages in the central processing unit. All flight computers were retrofitted with the improved pages prior to free flight 2 and the computers performed satisfactorily.

This anomaly is closed.

Figure 7-3. - Change in computer page lead connections to improve solder wicking.

DG. Have there been any unusual fail to syncs?

Killingbeck. We did have a slow failure during an ALT flight, the first time we dropped the shuttle from a 747. You'd like to think that when a computer quits it just quits. In this particular case, though, a computer interspersed 12 I/O errors among some good I/O before it failed. The whole process took about four-tenths of a second--ten flight control cycles. The computer had a cracked solder joint that was opening and closing because of the high acceleration rate, and good data were intermittently interspersed with the noise.

AS. Because of that it was too slow and missed a sync point?

Killingbeck. Well, no. It was getting to the sync points but saying it had I/O errors. In fact, because it was commanding certain sensors and the commands weren't going out, all of the computers were having to deal with I/O errors of various types. We got a couple of cycle overruns, and finally, after about four-tenths of a second, the bad computer was isolated and removed from the set and everything recovered. The crew then powered it off and flew to a successful landing.

We now have a test case called "Free-Flight One," which we've used throughout the OFT (Orbital Flight Test) development. It uses massive I/O errors to determine whether the remaining computers can recover.


Space Shuttle Orbiter Approach and Landing Test:
Final Evaluation Report

Approved by: Aaron Cohen and Deke Slayton
February 1, 1978
Report Number: NASA-TM-79404; JSC-13864

Abstract: (excerpt)
The Approach and Landing Test Program consisted of a series of steps leading to the demonstration of the capability of the Space Shuttle orbiter to safely approach and land under conditions similar to those planned for the final phases of an orbital flight. The tests were conducted with the orbiter mounted on top of a specially modified carrier aircraft. The first step provided airworthiness and performance verification of the carrier aircraft after modification. The second step consisted of three taxi tests and five flight tests with an inert unmanned orbiter. The third step consisted of three mated tests with an active manned orbiter. The fourth step consisted of five flights in which the orbiter was separated from the carrier aircraft.

The Space Shuttle  Primary Computer System

Communications of the ACM
September 1984 Volume 27 Number 9
pp. 872-900

IBM's Federal Systems Division is responsible for supplying "error-free" software for NASA's Space Shuttle Program. Case Studies Editors David Gifford and Alfred Spector interview the people responsible for designing, building, and maintaining the Shuttle's Primary Avionics and Software System.

This copy is by permission of the Association for Computing Machinery.  The ACM permits copies of this article to be made without fee provided that they are not made or distributed for direct commercial advantage.

Space Shuttle Computers and Avionics Page

Home - NASA Office of Logic Design
Last Revised: February 03, 2010
Web Grunt: Richard Katz