8

Trust

Gods always behave like the people who make them.

—ZORA NEALE HURSTON, TELL MY HORSE

It is not nice to throw people.

—ANNA’S ADVICE TO THE SNOW GIANT MARSHMALLOW IN JENNIFER LEE’S 2013 DISNEY FILM FROZEN

As we’ve seen, machines with common sense that actually understand what’s going on are far more likely to be reliable, and produce sensible results, than those that rely on statistics alone. But there are a few other ingredients we will need to think through first.

Trustworthy AI has to start with good engineering practices, mandated by laws and industry standards, both of which are currently largely absent. Too much of AI thus far has consisted of short-term solutions, code that gets a system to work immediately, without a critical layer of engineering guarantees that are often taken for granted in other fields. The kinds of stress tests that are standard in the development of an automobile (such as crash tests and climate challenges), for example, are rarely seen in AI. AI could learn a lot from how other engineers do business.

For example, in safety-critical situations, good engineers always design structures and devices to be stronger than the minimum that their calculations suggest. If engineers expect an elevator to never carry more than half a ton, they make sure that it can actually carry five tons. A software engineer building a website that anticipates 10 million visitors per day tries to make sure its server can handle 50 million, just in case there is a sudden burst of publicity. Failing to build in adequate margins often risks disaster; famously, the O-rings in the Challenger space shuttle worked in warm weather but failed in a cold-weather launch, and the results were catastrophic. If we estimate that a driverless car’s pedestrian detector would be good enough if it were 99.9999 percent correct, we should be adding a decimal place and aiming for 99.99999 percent correct.

For now, the field of AI has not been able to design machine-learning systems that can do that. They can’t even devise procedures for making guarantees that given systems work within a certain tolerance, the way an auto part or airplane manufacturer would be required to do. (Imagine a car engine manufacturer saying that their engine worked 95 percent of the time, without saying anything about the temperatures in which it could be safely operated.) The assumption in AI has generally been that if it works often enough to be useful, then that’s good enough, but that casual attitude is not appropriate when the stakes are high. It’s fine if autotagging people in photos turns out be only 90 percent reliable—if it is just about personal photos that people are posting to Instagram—but it better be much more reliable when the police start using it to find suspects in surveillance photos. Google Search may not need stress testing, but driverless cars certainly do.

Good engineers also design for failure. They realize that they can’t anticipate in detail all the different ways that things can go wrong, so they include backup systems that can be called on when the unexpected happens. Bicycles have both front brakes and rear brakes partly to provide redundancy; if one brake fails, the second can still stop the bike. The space shuttle had five identical computers on board, to run diagnostics on one another and to be backups in case of failure; ordinarily, four were running and the fifth was on standby, but as long as any one of the five was still running, the shuttle could be operated. Similarly, driverless car systems shouldn’t just use cameras, they should use LIDAR (a device that uses lasers to measure distance) as well, for partial redundancy. Elon Musk claimed for years that his Autopilot system wouldn’t need LIDAR; from an engineering standpoint, this seems both risky and surprising, given the limitations on current machine-vision systems. (Most major competitors do use it.)

And good engineers always incorporate fail-safes—last-ditch ways of preventing complete disaster when things go seriously wrong—in anything that is mission-critical. San Francisco’s cable cars have three levels of brakes. There are the basic wheel brakes, which grab the wheels; when those don’t work, there are the track brakes, big wooden blocks that push the tracks together to stop the car; and when those don’t work there is the emergency brake, a massive steel rod that is dropped and jams against the rails. When the emergency brake is dropped, they have to use a torch to get the train car free again; but that’s better than not stopping the car.

Good engineers also know that there is a time and a place for everything; experimentation with radically innovative designs can be a game changer when mapping out a new product, but safety-critical applications should typically rely on older techniques that have been more thoroughly tested. An AI system governing the power grid would not be the place to try out some hotshot graduate student’s latest algorithm for the first time.

The long-term risk in neglecting safety precautions can be serious. In many critical aspects of the cyberworld, for example, infrastructure has been badly inadequate for decades, leaving it extremely vulnerable to both accidental failures and malicious cyberattacks.*1 The internet of things, ranging from appliances to cars that are connected to the web, is notoriously insecure; in one famous incident, “white hat hackers” were able to take control of a journalist’s jeep as it rode down the highway. Another huge vulnerability is GPS. Computer-operated devices of all kinds rely on it, not only to power automated driving directions, but also to provide location and timing for everything from telecommunications and business to aircraft and drones. Yet it is fairly easy to block or spoof, and the consequences could potentially be catastrophic. We also know that the Russian government has hacked into the United States power grid; nuclear power plants; and water, aviation, and manufacturing systems. In November 2018, America’s water supply was described as “a perfect target for cybercriminals.” If a film director wants to make an apocalyptic sci-fi movie set in the near future, these kinds of scenarios would be much more plausible than Skynet, and almost as scary. And before long cybercriminals will try to undermine AI, too.

The challenges don’t end there. Once a new technology is deployed, it has to be maintained; and good engineers design their system in advance so that it can easily be maintained. Car engines have to be serviceable; an operating system has to come with some way of installing updates.

This is no less true for AI than for any other domain. An autonomous driving system that recognizes other vehicles needs to be seamlessly updated when new models of cars are introduced, and it should be obvious enough to a new hire how to fix what the original programmer set up, if the original programmer departs. For now, though, AI is dominated by big data and deep learning, and with that comes hard-to-interpret models that are difficult to debug and challenging to maintain.

If general principles of robust engineering apply as much to AI as to other domains, there are also a number of specialized engineering techniques that can and should be drawn from software engineering.

Experienced software engineers, for example, routinely use modular design. When software engineers develop systems to solve a large problem, they divide the problem into its component parts, and they build a separate subsystem for each of those parts. They know what each subsystem is supposed to do, so that each can be written and tested separately, and they know how they are supposed to interact, so that those connections can be checked to make sure they are working. For instance, a web search engine, at the top level, has a crawler, which collects documents from the web; an indexer that indexes the documents by their keywords; a retriever, which uses the index to find the answer to a user query; a user interface, which handles the details of communication with the user, and so on. Each of these, in turn, is built up from smaller subsystems.

The sort of end-to-end machine learning that Google Translate has made popular deliberately tries to flout this, achieving short-term gains. But this strategy comes at a cost. Important problems—like how to represent the meaning of a sentence in a computer—are deferred to another day, but not really resolved. This in turn raises the risk that it might be difficult or impossible to integrate current systems together with whatever might be needed in the future. As Léon Bottou, research lead at Facebook AI Research, has put it, “The problem of [combining] traditional software [engineering] and machine learning remains wide open.”

Good engineering also requires good metrics—ways of evaluating progress so that engineers know their efforts are really making progress. The best-known metric of general intelligence by far is the Turing test, which asks whether a machine could fool a panel of judges into thinking it was human. Unfortunately, although well known, the test is not particularly useful. Although the Turing test ostensibly addresses the real world, in an open-ended way, with common sense as a potentially critical component, the reality is that it can easily be gamed. As has been clear for decades, since Eliza in 1965, ordinary people are easy to fool using a variety of cheap tricks that have nothing to do with intelligence, such as avoiding questions by appearing to be paranoid, young, or from a foreign country with limited facility for the local language. (One recent prizewinning competitor, a program named Eugene Goostman, combined all three, pretending to be a bratty thirteen-year-old from Odessa.) The goal of AI shouldn’t be to fool humans; it should be to understand and act in the world in ways that are useful, powerful, and robust. The Turing test just doesn’t get at that. We need something better.

For this reason, we and many of our colleagues at places such as the Allen Institute for Artificial Intelligence have been busy in recent years proposing alternatives to the Turing test, involving a wide array of challenges, ranging from language comprehension, inferring physical and mental states, and understanding YouTube videos, elementary science, and robotic abilities. Systems that learn some video games and then transfer those skills to other games might be another step. Even more impressive would be a robot scientist that could read descriptions of simple experiments in “100 Science Experiments for Kids,” carry them out, understand what they prove, and understand what would happen instead if you did them a little differently. No matter what, the key goal should be to push toward machines that can reason flexibly, generalizing what they have learned to new situations, in robust ways. Without better metrics, it would be hard for the quest for genuine intelligence to succeed.

Finally AI scientists must actively do their best to stay far away from building systems that have the potential to spiral out of control. For example, because consequences may be hard to anticipate, research in creating robots that could design and build other robots should be done only with extreme care, and under close supervision. As we’ve often seen with invasive natural creatures, if a creature can reproduce itself and there’s nothing to stop it, then its population will grow exponentially. Opening the door to robots that can alter and improve themselves in unknown ways opens us to unknown danger.

Likewise, at least at present, we have no good way of projecting what full self-awareness for robots might lead to.^*2 AI, like any technology, is subject to the risk of unintended consequences, quite possibly more so, and the wider we open Pandora’s box, the more risk we assume. We see few risks in the current regime, but fewer reasons to tempt fate by blithely assuming that anything that we might invent can be dealt with.

We are cautiously optimistic about the potential contribution to AI safety of one software engineering technique in particular, known as program verification, a set of techniques for formally verifying the correctness of programs that at least thus far is more suited to classical AI than to machine learning. Such techniques have used formal logic to validate whether a computer system works correctly, or, more modestly, that it is at least free from specific kinds of bugs. Our hope is that program verification can be used to increase the chance that a given AI component will do what it is intended to do.

Every device that plugs into a computer, such as a speaker, a microphone, or an external disk drive, requires a device driver, which is a program that runs the device and allows the computer to interact with it. Such programs are often extremely complicated pieces of code, sometimes hundreds of thousands of lines long. Because device drivers necessarily have to interact closely with central parts of the computer’s operating system, bugs in the driver code used to be a major problem. (The problem was made even more acute by the fact that the device drivers were typically written by hardware manufacturers rather than by the software company that built the operating systems.)

For a long time, this created total chaos, and numerous system crashes, until eventually, in 2000, Microsoft imposed a set of strict rules that device drivers have to follow in their interactions with the Windows operating systems. To ensure that these rules were being followed, Microsoft also provided a tool called the Static Driver Verifier, which uses program verification techniques to reason about the driver’s code, in order to ensure that the driver complies with the rules. Once that system was put in place, system crashes were significantly reduced.

Similar reasoning systems have been used to check for bugs of particular kinds in other large programs and hardware devices. The computerized control program for the Airbus airliners was verified—which is to say formally and mathematically guaranteed—to be free of bugs that could cause their enormously complex software to crash. More recently, a team of aerospace engineers and computer scientists from Carnegie Mellon and Johns Hopkins combined software verification with reasoning about physics to verify that the collision avoidance programs used in aircraft are reliable.

To be sure, program verification has limits. Verification can estimate how the plane will respond in different kinds of environments; it can’t guarantee that human pilots will fly the planes according to protocol, or that sensors will work properly (which may have been a factor in two fatal accidents involving the Boeing 737 Max), or that maintenance workers will never cut corners, nor that parts suppliers will always meet their specifications.

But verifying that the software itself won’t crash is a very important start, and far better than the alternative. We don’t want our airplane’s software to reboot midflight, and we certainly don’t want our robot’s code to crash while it’s busy assembling a bookshelf, nor for it to suddenly mistake our daughter for an intruder.

AI researchers should be thinking hard about how to emulate the spirit of that work, and more than that, they should be thinking about how the tools of deep understanding might themselves open new approaches to having machines reason about the correctness, reliability, and robustness of software.

At the very least, as the technology advances, it might become possible to prove that the system avoids certain kinds of mistakes; for instance, that, under normal circumstances, a robot will not fall over or bump into things; or that the output of a machine translation is grammatically correct. More optimistically, the cognitive power of AI itself may be able to take us further, eventually emulating the ability of skilled software architects to envision how their software works in a wide range of environments, improving coding and debugging.

Every technique we have reviewed requires hard work, and more than that, patience. We belabor them (even if some may seem obvious) because the kind of patience we advocate here is too easily ignored in the heat of the moment; often, it’s not even valued. Silicon Valley entrepreneurs often aspire to “move fast and break things”; the mantra is “Get a working product on the market before someone beats you to it; and then worry about problems later.” The downside is that a product created this way often works at one scale, but needs to be wholly rewritten when the situation changes; or it works for the demo, but not the real world. This is known as “technical debt”: you get a first, sometimes bug-ridden, version of the product you want; but you often have to pay later, with interest, in making the system robust, rooting out stopgaps and rebuilding foundations. That might be OK for a social media company but could prove dangerous for a domestic robot company. Shortcuts in a social-networking product might lead to user outages, bad for the company but not for humanity; shortcuts in driverless cars or domestic robots could easily be deadly.

Ultimately there is no single cure-all for good AI design, any more than there is for engineering in general. Many convergent techniques must be used and coordinated; what we have discussed here is just a start.

Deep-learning- and big-data-driven approaches pose an additional set of challenges, in part because they work very differently from traditional software engineering.

Most of the world’s software, from web browsers to email clients to spreadsheets to video games, consists not of deep learning, but of classical computer programs: long, complex sets of instructions carefully crafted by humans for particular tasks. The mission of the computer programmer (or team of programmers) is to understand some task, and translate that task into instructions that a computer can understand.

Unless the program to be written is extremely simple, the programmer probably won’t get it right the first time. Instead, the program will almost certainly break; a big part of the mission of a programmer is to identify “bugs”—that is, errors in the software—and to fix those bugs.

Suppose our programmer is trying to build a clone of Angry Birds, in which flaming tigers must be hurled into oncoming pizza trucks in order to forestall an obesity epidemic. The programmer will need to devise (or adapt) a physics engine, determining the laws of the game universe, tracking what happens to the tigers as they are launched into flight, and whether the tigers collide with the trucks. The programmer will need to build a graphics engine to make tigers and pizza trucks look pretty, and a system to track the users’ commands for maneuvering the poor tigers. Each and every component will have a theory behind it (I want the tigers to do this and then that when this other thing happens), and a reality of what happens when the computer actually executes the program.

On a good day, everything aligns: the machine does what the programmer wants it to do. On a bad day, the programmer leaves out a punctuation mark, or forgets to correctly set the first value of some variable, or any of ten thousand other things. And maybe you wind up with tigers that go the wrong way, or pizza trucks that suddenly appear where they shouldn’t. The programmer herself may spot the bug, or the software might get released to an internal team that discovers the bug. If the bug is subtle enough, in the sense of happening only in unusual circumstances, maybe the bug won’t be discovered for years.

But all debugging is fundamentally the same: it’s about identifying and then localizing the gap between what a programmer wants their program to do, and what the program (executed by an infinitely literal-minded computer) is actually doing. The programmer wants the tiger to disappear the instant that it collides with the truck, but for some reason, 10 percent of the time the image of the tiger lingers after the collision, and it’s the programmer’s job to figure out why. There is no magic here; when programs work, programmers understand why they work, and what the logic is that they are following. Generally, once the underlying cause of the bug is identified, it’s not hard to understand the logic of why something doesn’t work. Hence, once the cause of the bug is found, it is often easy to remedy.

By contrast, a field like pharmacology can be very different. Aspirin worked for years before anybody had a clear idea of how it worked, and biological systems are so complex that it is rare for the actions of a medicine to be completely and fully understood. Side effects are the rule rather than the exception because we can’t debug drugs the way we can debug computer programs. Our theories of how drugs work are mostly vague, at some level, and much of what we know comes simply from experimentation: we do a drug trial, find that more people are helped than harmed and that the harm is not too serious, and we decide that it is OK to use the drug.

One of the many worries about deep learning is that deep learning is in many ways more like pharmacology than like conventional computer programming. The AI scientists who work on deep learning understand, in broad terms, why a network trained over a corpus of examples can imitate those examples on new problems. However, the choice of network design for a particular problem is still far from an exact science; it is guided more by experimentation than theory. Once the network is trained to carry out its task, it is largely mysterious how it works. What one winds up with is a complex network of nodes whose behavior is determined by hundreds of millions of numerical parameters. Except in rare cases, the person who builds the network has little insight into what any of the individual nodes do, or why any of the parameters have their particular value. There is no clear explanation of why the system gets the right answer when it works correctly, or why it gets the wrong answer when it doesn’t. If the system doesn’t work, it’s largely a trial-and-error process to try to fix things, either through subtle alterations of the network architecture, or by building better databases of training data. (For this reason there is a recent thrust in both machine-learning research and public policy toward “explainable AI,” though no clear results yet.)

And vast storehouses of human knowledge that could be used to make systems better and more reliable are currently neglected, because it is far from clear how to integrate them into a deep learning workflow. In vision, we know a lot that is relevant about the shapes of objects and the way that images are formed. In language, we know a lot about the structure of language: phonology, syntax, semantics, and pragmatics. In robotics, we know a lot about the physics of robots and their interactions with external objects. But if we use end-to-end deep learning to build an AI program for these, all that knowledge goes out the window; there is simply no way to take advantage of it.

If Alexa had a well-engineered commonsense system in place, it would not start laughing out of the blue; it would recognize that people tend to laugh in response to particular circumstances, such as at jokes and in awkward moments. With common sense installed, Roomba would not smear dog poop around; it would recognize that a different solution was required; at the very least it would ask for help. Tay would recognize that large constituencies would be offended by its descent into hate speech, and the hypothetical butler robot would be careful not to break glasses on its way to pouring wine. If Google Images had a clearer idea of what the world is actually like, it would realize that there are many, many mothers who are not white. And, as we will explain later, with common sense, we’d also be significantly less likely to be all transformed into paper clips.

Indeed, a large swath of what current AI does that seems obviously foolish or inappropriate could presumably be avoided in programs that had deep understanding, rather than just deep learning. An iPhone would not autocorrect to “Happy Birthday, dead Theodore” if it had any idea what “dead” means and when one wishes a person Happy Birthday. If Alexa had any idea about what kinds of things people are likely to want to communicate, and to whom, it would double-check before sending a family conversation to a random friend. The estrus-prediction program would realize that it is not doing its job if it never predicts when the cows are in estrus.

Part of the reason we trust other people as much as we do is because we by and large think they will reach the same conclusions as we will, given the same evidence. If we want to trust our machines, we need to expect the same from them. If we are on a camping trip and both simultaneously discover that the eight-foot-tall hairy ape known as Sasquatch (aka Bigfoot) is real and that he looks hungry, I expect you to conclude with me, from what you know about primates and appetite, that such a large ape is potentially dangerous, and that we should immediately begin planning a potential escape. I don’t want to have to argue with you about it, or to come up with ten thousand labeled examples of campers who did and didn’t survive similar encounters, before acting.

Building robust cognitive systems has to start with building systems with a deep understanding of the world, deeper than statistics alone can provide. Right now, that’s a tiny part of the overall effort in AI, when it should really be the central focus of the field.

Attacked by Sasquatch, while robot sorts through data in search of a plan

Finally, in order to be trustworthy, machines need to be imbued by their creators with ethical values. Commonsense knowledge can tell you that dropping a person out of a building would kill them; you need values to decide that that’s a bad idea. The classic statement of fundamental values for robots is Isaac Asimov’s “Three Laws of Robotics,” introduced in 1942.

· A robot may not injure a human being or, through inaction, allow a human being to come to harm.

· A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

· A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

For the many straightforward ethical decisions that a robot must make in everyday life, Asimov’s laws are fine. When a companion robot helps someone do the shopping, it generally should not shoplift, even if its owners tell it to, because that would hurt the shopkeeper. When the robot walks someone home, it generally should not push other pedestrians out of the way, even though that might get the person it is accompanying home faster. A simple regimen of “don’t lie, cheat, steal, or injure,” as special cases of causing harm, covers a great many circumstances.

As University of Pittsburgh ethicist Derek Leben has pointed out, though, things start to get murkier in many other cases. What kind of harm or injury need a robot consider beyond physical injury: loss of property, of reputation, of employment, of friends? What kinds of indirect harm need the robot consider? If it spills some coffee on an icy sidewalk and someone later slips on it, has it violated the First Law? How far must the robot go in not allowing humans to be harmed by inaction? In the time that it takes you to read this sentence, four human beings will die; is it a robot’s responsibility to try to prevent those deaths? A driverless car (which is, again, a wheeled robot) that pondered all of the places where it could be at any moment might never make it out of the driveway.

Then there are moral dilemmas, including many situations in which, whatever the robot does, someone will be injured, like the one that Gary introduced in The New Yorker in 2012, in homage to Philippa Foot’s classic Trolley problems: What should a driverless car do if it encounters a school bus full of children spinning out of control hurtling toward it on a bridge? Should the car sacrifice itself and its owner to save the schoolchildren, or protect itself and its owner at all costs? Asimov’s First Law doesn’t really help, since human lives must be sacrificed one way or the other.

Real-life moral dilemmas are often even less clear-cut. During World War II, a student of the existentialist philosopher Jean-Paul Sartre was torn between two courses of action. The student felt that he should go join the Free French and fight in the war, but his mother was entirely emotionally dependent on him (his father had abandoned her and his brother had been killed). As Sartre put it: “No general code of ethics can tell you what you ought to do.” Maybe someday in the distant future, we could build machines to worry about such things, but there are more pressing problems.

No current AI has any idea what a war is, much less what it means to fight in a war, or what a mother or country means to an individual. Still, the immediate challenge isn’t the subtle stuff; it’s to make sure that AIs don’t do things that are obviously unethical. If a digital assistant wants to help a person in need who has little cash, what’s to stop the AI from printing dollar bills on the color printer? If someone asks a robot to counterfeit, the robot might figure there is little harm; nobody who gets or spends the bill in the future will be hurt, because the counterfeit is undetectable, and it might decide that the world as a whole may be better off because the spending of the extra money stimulates the economy. A thousand things that seem utterly wrong to the average human may seem perfectly reasonable to a machine. Conversely, we wouldn’t want the robot to get hung up on moral dilemmas that are more imaginary than real, pondering for too long whether to rescue people from a burning building because of the potential harm the occupants’ great-grandchildren might someday inflict on others.

The vast majority of the time, the challenge for AIs will not be to succeed in extraordinary circumstances, solving Sophie’s choice or Sartre’s student’s dilemma, but to find the right thing to do in ordinary circumstances, like “Could striking this hammer against this nail on this board in this room at this moment plausibly bring harm to humans? Which humans? How risky?” or “How bad would it be if I stole this medicine for Melinda, who can’t afford to pay for it?”

We know how to build pattern classifiers that tell dogs from cats, and a golden retriever from a Labrador, but nobody has a clue how to build a pattern classifier to recognize “harm” or a “conflict” with a law.

Updated legal practices will of course be required, too. Any AI that interacts with humans in open-ended ways should be required, by law, to understand and respect a core set of human values. Existing prohibitions against committing theft and murder, for example, should apply to artificial intelligences—and those who design, develop, and deploy them—just as they do for people. Deeper AI will allow us to build values into machines, but those values must also be reflected in the people and companies that create and operate them, and the social structures and incentives that surround them.

Once all this is in place—values, deep understanding, good engineering practices, and a strong regulatory and enforcement framework—some of the field’s biggest worries, like Nick Bostrom’s widely discussed paper-clip example, start to dissolve.

The premise of Bostrom’s thought experiment, which at first blush seems to have the feel of inexorable logic, is that a superintelligent robot would do everything in its power to achieve whatever goal had been set for it—in this case, to make as many paper clips as possible. The paper-clip maximizer would start by requisitioning all readily available metal to make as many paper clips as possible, and when it ran out, it would start mining all the other metal available in the universe (mastering interstellar travel as a step along the way), and eventually, when other obvious sources of metal have been used up, it would start mining the trace atoms of metal in human bodies. As Eliezer Yudkowsky put it, “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” Elon Musk (who tweeted about Bostrom’s book) seemed to have been influenced by this scenario when he worried that AI might be “summoning the demon.”

But there’s something off about the premise: it assumes that we will eventually have a form of superintelligence smart enough both to master interstellar travel and to understand human beings (who would surely resist being mined for metals), and yet possessed of so little common sense that it never comes to realize that its quest is (a) pointless (after all, who would use all those paper clips?), and (b) in violation of even the most basic moral axioms (like Asimov’s).

Whether it is even possible to build such a system—superintelligent yet entirely lacking in both common sense and basic values—seems to us unclear. Can you construct an AI with enough of a theory of the world to turn all the matter of the universe into paper clips, and yet remain utterly clueless about human values? When one thinks about the amount of common sense that will be required in order to build a superintelligence in the first place, it becomes virtually impossible to conceive of an effective and superintelligent paper-clip maximizer that would be unaware of the consequences of its actions. If a system is smart enough to contemplate enormous matter-repurposing projects, it is necessarily smart enough to infer the consequences of its intended actions, and to recognize the conflict between those potential actions and a core set of values.

And that—common sense, plus Asimov’s First Law, and a fail-safe that would shut the AI down altogether in the event of a significant number of human deaths—ought to be enough to stop the paper-clip maximizer in its tracks.

Of course, people who enjoy the paper-clip tale can extend it endlessly. (What if the maximizer is spectacularly good at deceiving people? What if the machine refused to allow people to turn it off?) Yudkowsky argues that people who expect AIs to be harmless are merely anthropomorphizing; they are unconsciously reasoning that, since humans are more or less well-intentioned, or at least mostly don’t want to wipe out the human race, that AIs will be the same way. The best solution in our view is neither to leave the matter to chance nor to have the machine infer all of its values directly from the world, which would be risky in a Tay-like way. Rather, some well-structured set of core ethical values should be built in; there should be a legal obligation that systems with broad intelligence that are powerful enough to do significant harm understand the world in a deep enough fashion to be able to understand the consequences of their actions, and to factor human well-being into the decisions they make. Once such precautions are in place, irrationally exuberant maximization with seriously deleterious consequences ought to be both illegal and difficult to implement.^*3

So for now let’s have a moratorium on worrying about paper clips, and focus instead on imbuing our robots with enough common sense to recognize a dubious goal when they see it. (We should also be careful not to issue entirely open-ended instructions in the first place.) As we have stressed, there are other, much more pressing and immediate concerns than paper-clip maximizers that our best minds should be agonizing over, such as how to make domestic robots that can reliably infer which of its actions are and are not likely to cause harm.

On the plus side, AI is perhaps unique among technologies in having the logical potential to mitigate its own risks; knives can’t reason about the consequences of their actions, but artificial intelligences may someday be able to do just that.

We both first learned about AI through science fiction as kids, and we constantly marvel at what has and what has not been accomplished. The amount of memory and computing power and networking technology packed into a smart watch amazes us, and even a few years ago we didn’t expect speech recognition to become so ubiquitous so quickly. But true machine intelligence is much further from being achieved than either of us expected when we started thinking about AI.

Our biggest fear is not that machines will seek to obliterate us or turn us into paper clips; it’s that our aspirations for AI will exceed our grasp. Our current systems have nothing remotely like common sense, yet we increasingly rely on them. The real risk is not superintelligence, it is idiots savants with power, such as autonomous weapons that could target people, with no values to constrain them, or AI-driven newsfeeds that, lacking superintelligence, prioritize short-term sales without evaluating their impact on long-term values.

For now, we are in a kind of interregnum: narrow but networked intelligences with autonomy, but too little genuine intelligence to be able to reason about the consequences of that power. In time, AI will grow more sophisticated; the sooner it can be made to reason about the consequences of its actions, the better.

All of which connects very directly to the larger theme of this book. We have argued that AI is, by and large, on the wrong path, with the majority of current efforts devoted to building comparatively unintelligent machines that perform narrow tasks and rely primarily on big data rather than on what we call deep understanding. We think that is a huge mistake, for it leads to a kind of AI adolescence: machines that don’t know their own strength, and don’t have the wherewithal to contemplate the consequences of their own actions.

The short-term fix is to muzzle the AI that we build, making sure that it can’t possibly do anything of serious consequence, and correcting each individual error that we discover. But that’s not really viable in the long term, and even in the short term, we often (as we have seen) wind up with Band-Aids rather than comprehensive solutions.

The only way out of this mess is to get cracking on building machines equipped with common sense, cognitive models, and powerful tools for reasoning. Together, these can lead to deep understanding, itself a prerequisite for building machines that can reliably anticipate and evaluate the consequences of their own actions. That project itself can only get off the ground once the field shifts its focus from statistics and a heavy but shallow reliance on big data. The cure for risky AI is better AI, and the royal road to better AI is through AI that genuinely understands the world.