Commercial airlines are incredibly safe—far safer, mile for mile, than cars—despite traveling hundreds of miles an hour at 30,000 feet. That’s because there are numerous safeguards at multiple levels. There are strict rules on how you develop a new plane, how you certify it, and how you maintain it. We have software for quality control of air traffic control software. In addition to all those procedures, there are also institutions (like the US National Transportation Safety Board) in place, to investigate any accidents and to share learning from those investigations, because we can never anticipate absolutely everything in advance and so we must learn from our mistakes as well. We need all of these safeguards, both the forethought and the afterthought. That’s good oversight. And that’s why airplanes are so safe.
When you build a house, you propose a plan; the town certifies that plan; building inspectors come regularly during the process and, at the end, they check the work. Cars are less regulated than airplanes, but multiple layers are still involved. For example, Title 49 of the United States Code, Chapter 301, written by Congress, describes regulations around motor vehicle safety, and the National Highway Traffic Safety Administration both administers those laws (e.g., it licenses manufacturers to make sure they meet certain safety requirements) and investigates accidents.
It’s madness to think that more and more powerful AI should be exempt from similarly thorough oversight.
Missy Cummings, a former F-18 pilot and current professor and director of the Mason Autonomy and Robotics Center (MARC) at George Mason University, has explained how negative consequences can happen through flaws at four different steps along the journey from proposed software to real-world implementation.1 Her taxonomy highlights four fundamental risks: inadequate oversight (e.g., because regulatory policies are too weak, or because of external pressure to use AI in risky domains where it is inappropriate), inadequate design (e.g., software for merging results from different sensors might not be adequate), inadequate maintenance (e.g., models built in 2023 might not work as well in 2024 if there are different laws, new kinds of cars on the road, or other changes), and inadequate testing (e.g., if developers rely too much on tests in simulation, rather than in the real world).
We need multiple layers of oversight because problems can develop anywhere along the way.
In AI, at the crudest level, we need at least two stages of oversight: licensing of those models before they are widely deployed, as Canadian Member of Parliament Michelle Rempel Garner (PC MP) and I proposed, and auditing after they are released.2
One obvious pre-deployment model is the system that the FDA uses to regulate drugs and medical devices (though hopefully moving with considerably more speed). The more something is novel, and the more risks it might pose, the higher the bar should be for approval.
Auditing procedures post deployment are also critical. A recent review paper by Merlin Stein and Connor Dunlop at the Ada Lovelace Institute summarizes the essentials of how this works at the FDA:
The FDA has extensive auditing powers, with the ability to inspect drug companies’ data, processes and systems at will. It also requires companies to report incidents, failures and adverse impacts to a central registry. There are substantial fines for failing to follow appropriate regulatory guidance, and the FDA has a history of enforcing these sanctions.3
Stein and Dunlop fundamentally call for five things (simplifying their words, with minor edits):
- • Continuous, risk-based evaluations and audits
- • Empowering regulatory agencies to evaluate critical safety evidence directly
- • Independence of regulators and external evaluators
- • Structured access to models for evaluators and civil society
- • A pre-approval process that shifts the burden of proof to developers.4
This seems exactly right to me.
I do not happen to think—though many well-known figures in the world do—that the safety risks of AI are on par with nuclear war.5 But certainly, as we saw in the chapter on risks, there is plenty to be worried about. Having a serious, layered process of oversight, just as we employ for airplanes and pharmaceuticals, is essential.