A $250 Million Epic Fail and the Wonders of Fly-By Wire

September 20, 2017 - ProAV News, rAVe [PUBS],

Why are we looking at the aircraft industry and their now decades long journey through automation, computerized systems, and software? BECAUSE, the problems, disasters, failures and the issues these raised mirror those within our own industry in a remarkably accurate way.

Certainly aircraft and audio are very different worlds, but the ways in which software, computerized systems and platforms, and their development have progressed provide many hard won lessons for our technological journey. The lack of if-then-what scenarios within the computerized flight systems and the kind of what-ifs commented on below by a VERY experienced pilot are the same kind of problems we face in systems where the software/hardware package does not allow or provide for likely real-world application issues or requirements. Thankfully, in our case it usually doesn’t result in the catastrophic disasters caused when a plane falls from the sky, but consider this:

What if the highly sophisticated computer controlled mixed use combination paging/voice-evac/fire-alarm multi-purpose system installed in large airports crashed and was not able to provide the needed warnings or be re-booted instinctively or quickly — and there was no manual override? It has happened on planes, and it could happen to us. The stories below should serve as both a lesson and a warning — designs should never assume the computers/software are infallible — there always needs to be a way for a human to take over and use common sense, experience and situational awareness to correct the fault and if that option does not exist or if users fail to recognize the need to follow that path… well?

In an A300 SERIES cockpit (shown in photo), like many other modern fly-by-wire aircraft, everything is computerized, and when the system is working properly — under what the manufacturers refer to as “normal law” — it should be impossible for the crew to make a mistake. (In non-aircraft language — pre-programmed flight control “laws” are essentially the software code that govern the automation systems various operational modes and set conditions and restrictions on what a pilot can and cannot do.)

Audio professionals may instantly recognize the multiple touch screen layout of the cockpit as remarkably similar to many current generation digital audio consoles (like the CADAC CDC8-32 console shown below), with many of the same non-instinctive where is it located perception issues/aspects, including what menu layer is something on or how do you even access the right menu — that pilots have faced and continue to struggle with.

“Airbus thinks they’ve designed a computer that’s smarter than a pilot,” one training captain for a major international carrier (who was not named to protect his work status) stated unequivocally. If a pilot moves the controls so as to turn the airplane upside down, the computer will refuse. This is logical. But if a pilot makes a last moment landing approach abort decision because he sees something on the runway, the automation may or may not allow the engines to be spooled up since it thinks the plane is landing for example. This occurred in an air-show incident early on with the then brand new A-320, because of conditions that hadn’t been programmed into the software – “The software only knows what it’s been programmed for, and it seems as if sometimes the programming then and even now does not know what a human pilot knows instinctively,” he added.

It is critical to recognize that in these systems, like any other “automation” system for any industry, product or function, the millions of lines of extremely complex software code and algorithms built into the systems are written by programmers, not pilots or audio professionals. Therefore it is possible that unusual situations can occur that a pilot (or experienced audio pro) could handle, but the automated systems simply don’t know how to react. It is now known from the BEA (the French equivalent of the NTSB) report that the Air-France 449 crash over the south Atlantic was due to a situation like this. That report stated, “In the case of Air France 449 for example, once the loss of speed and altitude data from the external sensors occurred, within seconds, the main flight computers ceased to function properly, while the flight-control computer system ‘no longer considered as valid’ the data being delivered to it. After that, its processors started to crash.”

Much like the failure of a network link in a large audio system, if no training has been done, operators don’t know what to do to restore function. The same thing is a major issue in aircraft cockpits. Crews can rely too heavily on an “intelligent” aircraft’s flying ability, and should the “intelligent” systems fail, they may not have had enough (or in many cases no) simulator training to cope with a sudden “flip” to mechanical systems. The issue is exacerbated by the small and often difficult to interpret warning indications in the cockpit (located in the center of the A-320 photos above, near the top of the instrument cluster (see arrow).

It’s possible for the pilots to override the computer, effectively switching to manual control — what Airbus calls “direct law.” But even then, they remain dependent on electronics. In older airplanes the throttles in the cockpit are hooked to the fuel controllers on the engines, for example, by a steel throttle cable or electrical servo-actuators and wires. On any current generation fly-by-wire aircraft (civilian or military), nothing in the cockpit is real. Everything is electronic. The throttles, rudder and brake pedals and the side-stick controller (or control wheel/stick) are hooked to custom built rheostats that talk to a computer interface, which talks to an electric hydraulic servo valve, which in turn – hopefully – moves something.

To understand all this it is essential to recognize that all of this automated pilot software is entirely dependent on one of several redundant (supposedly) Electronic Centralized Aircraft Monitor or ECAM units. ECAM’s are highly complex, multi-layer computer/software/hardware packages, tied into the display panel that alert the crew to faults. The other main component in this fly-by-wire wonderland is the again supposedly multiply redundant – ADIRUs, Air Data Inertial Reference Units (the units that failed on Air France 449), which gather data from the plane’s sensors about its speed, direction, position, altitude and angle of attack. Just for clarity — losing the ADIRUs is the digital equivalent of flying blind. Much like the loss of a powered fiber optic stage box in live sound, when all of a sudden there no signal at the console inputs.

In the BEA incident report on flight 49 there was this chilling statement: “Forced to fly semi-manually (they never had enough time to get to or implement FULL manual control), the pilots’ ability to maintain flight was swiftly being compromised, as one by one their electronic instruments failed. Further warnings in quick succession said the plane was no longer computing their safe speed window or the maximum angle it was safe to move the rudder. After that, the pilots were successively deprived of auto-thrust, their main navigational and collision-warning systems, their flight-path vector – showing the direction they were flying – and their back-up speed instruments.”

The take-away here is simple — operators of any system, plane, PA, nuclear reactor, etc. need to always have the option to override or disconnect the “smart software” and use their experience, knowledge and skills to solve the problem. Failure to leave this option open will inevitably lead to disaster.

The other side of this coin is that no matter how much the smart systems can do or could do, training and knowledge are essential. Not getting that — up front, produced the following:

Epic Fail — Or Why the Instructions and Training Really Do Matter

Location: Toulouse Blagnac Airport, France at 1610 hrs (UTC), 15 November 2007,  Airbus Industries Delivery Center 

Aircraft Type: Airbus A340-600

 Event: Acceptance testing and final check, including engine ground test

What happened: On 15 November 2007, an Airbus A340-600 due to be delivered to Etihad Airways crashed during ground engine test at Airbus' facilities in Toulouse Blagnac International Airport. The brand new US$250 million aircraft, damaged beyond repair, was written off. 

Summary explanation: According to the French Bureau d’Enquetes et d’Analyses (BEA), "a lack of detection and correction" of violations to test procedures caused the accident when the four Rolls Royce Trent 500 engines, producing 56,000 pounds of thrust each -- a total thrust of224,000 pounds, were being tested at high power with the wheels unchocked, causing the aircraft to impact the barriers at 55mph. The report added that when the aircraft suddenly surged forward, the ground test technician focused on the braking system and attempted to steer away from the test-pen wall instead of reducing the engines' thrust. The impact of the 220-tonne aircraft moving at 55kmh nearly split the aircraft in two.

The result of a failure to follow procedure, read the test manual and other similar discrepancies is shown below.
 
It is crucial to note that the "trained" crew involved was chosen, based on this statement from Airbus: "Adopting a common cockpit across the new Airbus series would allow operators to make significant cost savings; flight crews would be able to transition from one model to another after one week of training." They should have be able to handle the process with minimal additional training. Etihad Airways - which is the Official Airline of the United Arab Emirates UAE operates nearly 80 Airbus 300 series aircraft.

However, in a classic case of overconfidence, the test crew did not undertake the additional one-week of training or review the test procedures manual prior to the incident, according to investigators. If they had, they would have noted this statement: "The Aircraft Maintenance Manual (AMM) and the CAM (Customer Acceptance Manual) state that engine tests must be carried out with the use of wheel chocks for the main landing gears."

The reason to include this short story is to point out that while it is crucial to have a way for humans to control the automation and software based system when things go off the rails, it’s equally important that those same humans understand these systems, the rules they work by and the instructions that allow both humans and machines to interface and function. In the case above, the lack of adherence to the “read the manual” policy almost proved fatal for the humans and certainly was fatal for the aircraft.