System Design Lessons from the USS McCain

The Navy installed touch-screen steering systems to save money.

Ten sailors paid with their lives.

ProPublica
USS McCain in 2019 (U.S. Navy Photo)

Ten sailors died after the crew of the destroyer USS John S. McCain lost control of their vessel, causing a collision with the merchant tanker Alnic MC. There was nothing technically wrong with the vessel or its controls. Though much of the blame was put on the Sailors and Officers aboard, the real fault rests with the design of the Integrated Bridge & Navigation System (IBNS).

Accident Summary

If you haven’t read the ProPublica article linked above, it’s well worth your time. While I’ll summarize some of the key points from the story, the Navy memo, and the official NTSB report, the article provides much more detail1.

Diagram of the McCain Ship Control Console (SCC) from the IBNS technical manual

Background Information

Fact 1: Backup Mode. The steering system had four different computer-assisted modes plus a backup manual mode. The CO had ordered the steering to operate in backup manual mode, which had limited safeguards. One impact of this mode is that any of the multiple helm stations could unilaterally take control over steering and thrust, while other modes supported effective control handover among helms. The decision to operate in backup mode was common among similarly-equipped Navy ships, with the belief that automated modes were too complex and that backup manual mode was a “more direct form of communication to the steering” (it was meant for emergency situations when normal procedures were not practical).

Fact 2: Alarms. While explicitly excluded by the NTSB as a contributing factor, the IBNS was known to alarm frequently with false and minor issues. This fostered distrust in the system generally, contributed to the culture of operating in backup manual mode, and, according to confidential reports cited by ProPublica, was the reason the CO of the McCain decided to operate in backup manual mode.

Fact 3: Split Helm and Unganged Thrust. To reduce crew workload, the CO ordered that one helm station would control steering and another thrust. This is a common procedure, but the aforementioned backup mode limitations would make the transfer tedious. Propulsion control would be transferred from the helm station to the lee helm station, and it would have to be completed one propeller shaft at a time. Normally, the two propeller shafts are ganged; that is, they’re both controlled together. The process of transferring them individually meant that they’d necessarily be unganged and would need to be re-ganged afterwords.

Fact 4: Missing Instructions. While the leaders onboard McCain were criticized for incomplete training, there were glaring omissions in the manuals and training materials available to the crew. Relevant to this accident, the manuals “did not contain steps for transferring control of thrust between bridge stations, there were no procedures for ganging and unganging throttles, and there were no notes or warnings about actions that automatically unganged the throttles. […T]he technical manual, did not contain instructions for ganging throttles, and there was no description of the ‘ganged’ indicator on the […] display. Additionally, written instructions did not contain a procedure for shifting steering from one bridge station to another, other than […] during the initial steering system alignment.”

Fact 5: Big Red Button. As a final backup, some helm stations featured a big red emergency button used to immediately take control of the ship, labeled “EMERGENCY OVERRIDE TO MANUAL”. However, the purpose of this button was misunderstood to send control to an emergency control station in the aft of the ship. In the critical seconds leading up to the emergency, control would ping-pong when multiple Sailors pressed the big red button, thinking it would send control when it would actually take control.

Order of Events Leading Up to the Collision

  1. Control was changed to backup manual mode as the ship entered the busy Singapore Strait.
  2. The inexperienced helmsman was overtasked, so split helm was ordered to reduce workload.
  3. The crew was not practiced in the procedures for splitting the helm. While splitting the helm, the lee helm station accidentally and unilaterally took steering control. Nobody realized that this had happened, causing the crew to think that steering was lost (this wouldn’t have been possible in any of the other control modes, which would have required confirmation of the transfer).
  4. The perceived loss of steering issue distracted the crew from completing the multi-step thrust control transfer. The crew reported that they completed re-ganging the port and starboard thrust controls, but the system logs show that they actually hadn’t.
  5. Because the thrust controls were unganged, attempts to slow the McCain as it turned into the path of the Alnic actually only affected the port propeller, causing the ship to veer more quickly left and directly into the path of the Alnic. The bridge crew, focused on the loss of steering issue, never realized the propellers were unganged.
  6. In a last attempt to regain control, an officer ordered that the aft station (in the rear of the ship, not on the bridge) take control. A Sailor at the aft station correctly pressed their Big Red Button. Seconds later, a Sailor on bridge also pressed their Big Red Button, trying to send control aft but actually taking control themselves.
  7. The aft station took control again, but too late to avoid a collision less than thirty seconds later.

Design Lessons

Lesson 1: Critical controls must be physical controls

If McCain had physical throttle controls, there would have been no confusion about unganged throttles. Everyone in line of sight would have seen the mis-matched throttles at a glance and it would be trivial to correct. These controls would also be a source of feedback, with inputs at any station reflected by the physical controls at every other station, improving situational awareness of controls for all bridge personnel.

At the 2019 American Society of Naval Engineers Fleet Maintenance and Modernization Symposium, PEO Ships Rear Adm. Bill Galinis stated that digital-only controls were part of the problem. “We really made the helm control system […] just overly complex, with the touch screens under glass and all this kind of stuff.”

Airplanes have solved this problem. When the autopilot is controlling thrust, the thrust levers move to reflect autopilot input. Pilots have intuitive, visual understanding of autopilot status; this automatic situational awareness allows them to very quickly take over should the need arise.

Lesson 2: Match workflows and mental models

Would Sailors ever want to transfer thrust control one shaft at a time? I can’t imagine the use case. Yet, that was the only way to transfer thrust when in backup manual mode.

Instead, the system should have been designed to match user mental models and workflows by default. Transferring thrust should transfer the entire thrust control at once, not shaft-by-shaft. Unusual options, like transferring control of only one shaft, should require additional steps to access.

This is a frequent issue with software systems. With hardware, physical linkages drive both designs and mental models. Turning the steering wheel in a car directly moves linkages to the control arms; it’s easy to understand even without knowing the details of the gears and power steering components2. With software interfaces, the controls and displays are only what the designer decides to provide with sometimes nebulous connection to anything that actually happens. Good interface design intentionally understands and builds around user mental models.

Lesson 3: Quality and trust matter

The computer-assisted manual mode would have better met the needs of the crew of McCain. They would have had the manual control they desired with safeguards against unintended commands. However, they didn’t trust the system after multiple issues. It didn’t help that “computer-assisted” suggests exactly the automation they despised, rather than basic safeguards.

This is a consequence of releasing a system without adequate development and testing. It’s acceptable in some systems to have bugs that get caught later. It’s unacceptable in a warship or other military system where mistakes are deadly.

Outcome

The Navy pinned the blame for this incident squarely on the crew of the ship, finding shortcomings in training, seamanship and navigation, and leadership and culture. The CO was charged with dereliction of duty, hazarding a vessel, and negligent homicide, eventually pleading guilty to a single charge of negligence.

It’s true that there were many non-design failures by the CO and the Navy, but that shouldn’t let the designers of the system off the hook.

The event caused Navy leadership to investigate bridge and ship control design issues, finding a number of shortcomings and general dislike of touchscreen controls. Rear Adm. Galinis summed up the challenges with modern interface design philosophies: “Just because you can doesn’t mean you should.”


Footnotes:

  1. not to mention eloquence
  2. That’s not to say that we shouldn’t be intentional about hardware interface design, just that the mental models are easier to form when users can directly experience the cause and effect