1 THE SECRET OF THE AUTOMATON
THE FLUTE PLAYER
In the year 1737, at the dawn of the Industrial Revolution, the French mechanical genius Jacques de Vaucanson completed a masterpiece: a statue that could create music from a flute like a real human. Holding a real flute up to its mouth, the life-sized statue would blow into the instrument with its mechanical lungs to produce a note. By moving its lips and adjusting how hard it blew, and by moving its fingers precisely over the holes, the statue could produce a sequence of notes to form a complete song “as perfectly as any human being.”1 Vaucanson, not content with a statue that could play just a single song on its flute, endowed the statue with the ability to play 12 different songs.2
The public had seen devices like the Flute Player before, although this one was special. They knew such machines as automata, and they simply couldn’t get enough of them. Commissioning such devices had become a hobby among the wealthy elite throughout Europe.3 For a little while Vaucanson charged the equivalent of a week’s salary for each member of a small audience to see his strange device. Its natural movement and the complexity of its behavior were simply unknown at the time. Eventually Vaucanson toured this and several of his other automata around other parts of Europe.
But how did it work? Was it dark magic? A church official had ordered a decade earlier that one of Vaucanson’s workshops be destroyed, because he considered it profane; so Vaucanson was sure to steer clear of doing anything that might look too much like magic. Was it a hoax? Just a few years before the Flute Player, an automaton that could apparently play the harpsichord had enchanted the French king Louis XV. The king, insisting on learning how the device worked, discovered that it was just a puppet, with a five-year-old girl inside.4 But Vaucanson, keenly aware of this hoax, eagerly showed his audiences the inner mechanics of his Flute Player. It moved so fluidly and naturally, yet, as he showed them, it was apparently just following a sequence of instructions encoded into its mechanical bowels.
To further legitimize his invention, Vaucanson presented the automaton to the French Academy of Sciences, offering a dissertation titled “Mechanism of the Automaton Flute Player.” In his dissertation, Vaucanson explained precisely how the fantastic machine worked. The statue was constructed of wood and cardboard, painted to look like marble, with leather on its fingertips to form a tight seal with the flute’s holes. The mechanical drivers of the automaton were two rotating axles. To produce the statue’s breath, one of these axles—the crankshaft—pumped three sets of bellows, which produced flows of air at three different pressures: low, medium, and high. These three streams combined into a single artificial trachea that fed into the statue’s mouth. The other axle of the device slowly rotated a drum covered with small studs. As the drum rotated, these studs pressed against fifteen spring-loaded levers. Via chains and cables, these levers actuated various parts of the automaton. Some of the levers controlled the movement of the fingers and lips.5 The remaining levers determined which of the three pressure ranges should be used to blow into the flute, as well as which position the device’s tongue should take to modify the airflow. By placing the studs onto the correct positions on the rotating drum, Vaucanson could program the statue to play virtually any song he wanted; it was little more than a gigantic—albeit sophisticated—music box. The academy accepted his dissertation with a glowing review.6
Vaucanson’s masterpiece was just one of many automata developed by the inventors of that century, over the course of decades. The automaton was popular precisely because it was fully autonomous and because it appeared to replicate human intelligence. The Flute Player and others like it were the artificially intelligent harbingers of the Industrial Revolution: as the materials and inventions that would enable it became available over the course of decades, the technologists and hobbyists of the time used them in their uniquely human quest to replicate our bodies and minds.
TODAY’S AUTOMATA
Fast-forward to the present day. Real-life self-driving cars now cruise around the cities of Silicon Valley day and night. We’ve trained computer programs to play Atari games far better than humans can by offering them treats, the same way you would train a dog to sit or to rollover. A computer program managed somehow to defeat two world champions at the game of Jeopardy! We’ve developed a computer program that can beat the best humans at the ancient game of Go. Meanwhile, the artificial intelligence behind these breakthroughs has been improving at a rate that’s astonishing even to experts in the field.
It’s hard to overstate this last point. The team that created Watson to play Jeopardy said it wasn’t yet possible to create a program that could beat the world’s best players, just before they embarked on a system that did just that. Many experts thought that it would take another decade to create a computer program that could play Go competitively up until they were proven wrong by AlphaGo, a program trained over the course of months to beat a leading world champion. Within 20 months, AlphaGo’s creators developed another version of the program that taught itself thousands of years’ worth of accumulated knowledge about the game within the course of three days; this version of AlphaGo defeated the previous version in 100 out of 100 matches with a 10th of the computing power. This was in part due to advances in artificial neural networks, the technology underlying AlphaGo and the focus of intensive research over the past decade. These networks don’t just play games: they now have an ability to recognize images in photographs and spoken text that rivals humans’ abilities.
As these breakthroughs have continued to make headlines, they naturally pique our curiosity: How do they work? Just as 18th-century Europeans wondered about the Flute Player and other automata of the time, this question often lingers unanswered, always beneath the surface, when we talk about these new automata.
Fortunately, and comparable to the way Vaucanson presented his dissertation to the French Academy of Sciences, the creators of many of these recent advances have documented in precise detail how to build these smart computer programs. That detail is spread across many different places; so in this book I have attempted to organize it, and to explain in simple terms, how these smart machines think.
Unlike the hoax automaton with the five-year-old girl hidden inside, the breakthroughs we’ll look at in this book are legitimate scientific advances. Although they might look like magic, academic communities have vetted them all carefully, just like the Academy of Sciences vetted the Flute Player. Also like the Flute Player, they’re examples of automata. An automaton is a self-moving machine. It appears to operate independently, often like a person or an animal, as if it could think for itself. But by definition, automata follow programs. These programs are predetermined sequences of instructions, like the programs Vaucanson developed for the Flute Player to play its songs.
As we’ll see, it turns out that technologists haven’t changed much over the past few centuries. They’re still building and programming automata to replicate the human mind and body, and they sometimes still create hoax automata. The only difference is that now they’ve upgraded their tool chest to the levers, gears, and engines of the 21st century: computers and the software that runs on them.
THE SWING OF A PENDULUM
The automata of the 18th century sometimes used the cutting edge of precision technology at the time—mechanical clockwork—to carry out their programs. They were powered with mechanical energy: a hefty weight lifted high or a wound-up coil turned by a key. Their creators were often watchmakers, and the automata’s technological ancestors were clocks that performed elaborate and entertaining mechanical sequences at the strike of an hour. These kept the time and performed their feats by drawing from potential energy stored within them before they were set in motion. Their clockwork enabled them to carry out their programs, step by step, by releasing this stored energy in small increments.
Mechanical clocks keep time with the swing of a pendulum. The pendulum swings with such regular frequency that it was the best method for timekeeping until the 1930s.7 With each swing, a series of latches and gears registers the passage of another epoch, releasing a bit of stored energy so the clock can do something interesting, and to give the pendulum a small push to keep it swinging. And then the process repeats itself. A mechanical watch works on a similar principle: a finely coiled spring spins a circular disk back and forth around its center. With each twist of the disk, a gear moves one or two teeth at a time, so that the rest of the clockwork can do something interesting.
To a first approximation, this is the same machinery that enables electronic computers to run their programs. Computers use the principle of latches and gears; but instead of the quiet swing of a clock’s pendulum, they use the swing of electrons, as they silently whoosh from one part of the circuit to another and back again. When the electrons are halfway to their destination at either extreme, they keep their momentum as they flow through another part of the circuit: a coiled piece of wire, for example (an electromagnet); or even the elastic swing of a crystalline tuning fork (a lab-grown and precisely cut piece of sand) whose vibrations at millions of times per second offers the circuit an extraordinarily precise resonant frequency. These crystal oscillators replaced physical pendula because they were stable—resistant to external forces like earthquakes, temperature changes, and the acceleration of airplanes and submarines—and because they were fast (millions of swings-per-second fast).
Each time these electrons swing from one part of the circuit to the other, electronic latches—analogous to the physical latches of a mechanical clock or watch—register the passage of another epoch in which to carry out another instruction in the program. Then the instruction counter moves forward, the clockwork waits for the electrons to swing back (or for new electrons to take their place), and the process repeats itself.
AUTOMATA WE’LL DISCUSS IN THIS BOOK
The swing of these electrons, and the intelligent behavior they enable, will be the focus of this book. In this book we won’t ever look at the low-level instructions of these programs—that is, the variable and function names that the programmers wrote down to create their programs or the machine code generated by their programs. But we will look at the intermediate building blocks that make up these automata—basically the “statistical gears and bellows” one level higher. By understanding the building blocks that make up these automata, my hope is that you’ll be better prepared to understand how other modern automata work. For example, now that you know how Vaucanson’s Flute Player worked, you could probably make some educated guesses at how parts of his famous Digesting Duck worked. This automaton could flap its wings, quack, eat, digest, and (apparently) defecate.8
Vaucanson’s automata couldn’t react to the world. The automata of his day followed simple, predefined sequences of steps. Our modern-day automata can react to a changing environment because they have an ability to perceive. They can react not only to the press of a button on a keyboard, but also to the sight of cars and pedestrians passing through a crowded intersection, or to the subtle clues laid out in a Jeopardy question. Today’s automata can do these things in ways that would have left Vaucanson and his contemporaries in awe.
I’ve written this book for anyone interested in how these devices work. You won’t need to have a college degree in computer science to understand this book, although I’ll assume that you’re familiar with some basic facts about computers, such as that they follow explicit instructions encoded by humans, that images are represented by computers based on the amount of red, green, and blue they have in each pixel, and so on. And if you’re already familiar with artificial intelligence or robotics, some parts of this book will probably still be new to you. Although you might have learned about the building blocks of these devices in your classes—the elements of machine learning and artificial intelligence—there’s still a good chance that you haven’t learned about how these building blocks have been put together to create these breakthroughs, because these topics aren’t all typically taught in a single place. And finally, I’ve written this book so that you can usually jump straight to the topic that most interests you if you don’t feel like reading all the way through. You shouldn’t need to backtrack more than a couple of chapters to catch up on the machine learning and artificial intelligence background you need to know.
What are machine learning and artificial intelligence, anyway? Artificial intelligence (AI) is a broad field of study devoted to giving computers the ability to do intelligent things. There’s no promise in AI that computers will do these things the way humans do them, and as we’ll see, they often do things very differently than humans would do them. AI simply addresses how they can do intelligent things, and usually it addresses this question for very narrow domains, like finding a path through a maze. Machine learning is a closely related field devoted to enabling machines to do smart things by learning from data. As we’ll see in this book, neither AI nor machine learning on their own can do everything. There will be cases where we’ll need algorithms that can dumbly brute-force their way to intelligent solutions without using any data whatsoever; and there will be cases where we need to design algorithms that can learn from billions of data points but are still useless until we combine them with the dumb, brute-forced solutions. We’ll need to combine algorithms of both types to do interesting things.
I’ve already mentioned some of the wonderful advances in machine learning and AI that we’ll explore in How Smart Machines Think. In the first half of the book, I’ll outline some of the key ideas that enable intelligent machines to perceive and interact with the world. We’ll see what enables self-driving cars to stay on the road and to navigate through crowded urban environments. We’ll see how neural networks can enable these cars—and other machines—to perceive the world around them, and we’ll see how they can recognize objects in pictures or words in a recording of human speech. I’ll also outline how one of the best movie-recommendation engines in the world worked, both because the story behind it is so fascinating and because many of the core ideas from that system permeate the other machines we’ll look at in this book. Then I’ll tell you how we can train computers to perform certain behaviors by feeding them treats and how they can perceive the world with artificial neural networks. Later in this book we’ll look more closely at how computers can play a variety of games. Specifically, we’ll take a look at AlphaGo and Deep Blue, which beat reigning world champions Lee Sedol and Garry Kasparov at the strategy games of (respectively) Go and chess; as well as IBM’s Watson, which beat Jeopardy champions Ken Jennings and Brad Rutter.
Throughout this book, we’ll follow the stories behind how these breakthroughs have occurred. We’ll meet many of the researchers involved, and we’ll see the factors beyond their technology and methodology that made these advances possible. One recurring theme, for example, is that a competitive research community can help to focus efforts and to catalyze progress. This is what thrust the field of self-driving cars into the public imagination and into its modern form: hundreds of research teams competed in a contest to build self-driving robot cars that could travel for miles in the desert, without human drivers. And that’s where our story begins—on a cool morning in the Mojave Desert, as some of these teams prepared their cars for the race.