THE JEOPARDY MACHINE’S birthplace—if a computer can stake such a claim—was the sprawling headquarters of the global research division named after its flesh-and-blood ancestor, IBM’s founder, Thomas J. Watson. In 1957, when IBM presided over the rest of the infant computer industry, the company cleared woods on a hill in Yorktown Heights, New York, about forty miles north of midtown Manhattan, and hired the Finnish-American architect Eero Saarinen to design a lab. If computing was the future, as seemed inevitable, it was on this hill that a good part of it would be dreamed up, modeled mathematically, and prototyped. Saarinen was a natural choice to express this sparkling future in glass and rock. A year earlier, he had designed the winged TWA Terminal for the new Idlewild Airport (later called JFK). Before that, he’d drawn up the majestic Gateway Arch that would loom over St. Louis. In Yorktown, it was as if he had laid the Gateway Arch on its side. The building, with three stories of glass walls, curved along the top of the hill. For visitors strolling the wide corridors decades later, the combination of the structure’s rough stone and the broad vistas of rolling hills still delivered just the right message of wealth, vision, and permanence.
The idea for a Jeopardy machine, at least according to one version of the story, dates back to an autumn day in 2004. For several years, top executives at the company had been pushing researchers to come up with the next Grand Challenge. In the ’90s, the challenge had been to build a computer that would beat a grand champion in chess. This produced Deep Blue. Its 1997 victory over Garry Kasparov turned into a global event and fortified IBM’s reputation as a giant in cutting-edge computing. (This grew more important as consumer and Web companies, from Microsoft to Yahoo!, threatened to steal the spotlight—and the young brainpower. Google was still just a couple of grad students at Stanford.) Later, in another Grand Challenge in the first years of the new century, IBM produced Blue Gene, the world’s fastest supercomputer.
What would the next challenge be? On that fall day, a senior manager at IBM Research named Charles Lickel drove north from his lab, up the Hudson, to the town of Poughkeepsie, and spent the day with a small team he managed. That evening, the group went to the Sapore Steakhouse in nearby Fishkill, where they could order venison, elk, or buffalo, or split a whopping fifty-two-ounce porterhouse steak for two. There, something strange happened. At seven o’clock, many of the diners stood up from their tables, their food untouched, and filed into the bar, which had a television set. “The dining room emptied,” Lickel said. People were packed in there, three rows deep, to see whether Ken Jennings, who had won more than fifty straight matches on Jeopardy, would win again. He did. A half hour later, the crowd returned to their food, raving about the question-answering phenom. As Lickel noted, their steaks had to have been stone cold.
Though he hadn’t watched much Jeopardy since he was a kid, that scene in the bar gave him an idea for the next Grand Challenge. What if an IBM computer could beat Ken Jennings? (Other accounts have it that the vision for a Jeopardy computer was already circulating along the corridors of the Yorktown lab. The original idea, it turns out, is tough to trace.)
In any event, Lickel pushed the idea. In the first meeting, it provoked plenty of dissent. Chess was nearly as clean and timeless as mathematics itself, a cerebral treasure handed down through the ages. Jeopardy, by contrast, looked questionable from the get-go. Produced by a publicly traded company, Sony, and subject to ratings and advertisers, it was in the business of making money and pleasing investors. It was Hollywood, for crying out loud. “There was a lot of doubt in the room,” Lickel said. “People wanted something more obviously scientific.” A second argument was perhaps more compelling: people playing Jeopardy would in all likelihood annihilate an IBM machine. “They all grabbed me after the meeting,” Lickel recalled, “and said, ‘Charles, you’re going to regret this.’”
In the end, it was up to Paul Horn. A former professor of physics at the University of Chicago, Horn had headed IBM’s three-thousand-person research arm since 1996. “If you think about smart machines,” he later said, “Blue Gene by some measures had the raw computing power of the human brain, at least within an order of magnitude or two.” Horn discussed those early days in his sun-splashed office at New York University, where he took up residence after his 2008 retirement from IBM. He had a black beard, and a tiny ponytail poked out from the back of his head.
“So here we have a machine that’s as fast as your brain, or close,” he said. “But it doesn’t think the way we think. So what would be an appropriate grand challenge that would have high visibility and excite people?” He didn’t remember the idea coming from Lickel or hearing about the Fishkill dinner. In fact, Horn thought the idea might have come from him. In any case, he liked it—and promptly ran into resistance. “The general response was negative,” he recalled. “People said, ‘It can’t be done. It’s too much of a publicity stunt. The only reason that you’re interested in it is because it’s a show on TV.’” But Horn thought that building a savvy answering machine was the ideal challenge for IBM. While he maintained that he viewed the grand challenge as pure research, it also made plenty of sense.
IBM’s business had undergone a radical transformation over the course of Horn’s thirty-year career at the company. As late as the 1970s, IBM ruled the computer industry. It launched its first computers for business in 1952. But it was its breakthrough mainframe in 1964, Series 360, that established a single standard of computing in business, industry, and science. IBM pitched itself as a safe, if expensive, bet for companies looking to computerize. Its buttoned-down sales and consulting teams spread a compelling message around the world: “Nobody ever got fired for buying IBM.” Big Blue, a name derived from the massive blue mainframes it sold, grew so big that its rivals, including Sperry, Burroughs, Honeywell, and four other companies, came to be known as the Seven Dwarfs. During this time, IBM researchers at Saarinen’s edifice and at other labs around the world churned out an array of new technologies. They came up with magnetic strips for credit cards and floppy disks for computer data storage. Yet it was computers that drove the business. When Horn arrived at IBM Research in 1979, the greatest threat to IBM appeared to be a decade-long antitrust complaint brought by the U.S. Justice Department. It alleged that IBM had violated the Sherman Act by attempting to monopolize the fast-growing industry for business computers. Whether or not Big Blue had broken the law, its dominance was beyond question.
By 1982, when the Justice Department dropped the suit for lack of evidence, the computer world was shifting under Big Blue’s feet. The previous year, IBM had unveiled its first personal computer, or PC. Priced at $1,500, it provided both legitimacy and a standard for the young industry. Early on, as corporate customers gobbled up PCs, it seemed as though IBM would go on to dominate this next stage of computing. But there was a crucial difference between these desktop machines and the mainframes. Nearly every component of the mainframes, including their processors and software, was made by IBM. In the lingo of the industry, the computers were vertically integrated. This was not the case with PCs. In order to get to market quickly at a low price, IBM built them from off-the-shelf technology—microprocessors from Intel and a rudimentary operating system, MS-DOS, from a Seattle startup called Microsoft. Since the PC had commodity innards, it took no time at all for newcomers, including Compaq and Taiwan’s Acer, to plug them into cheaper “IBM-compatible” computers, or clones. IBM found itself slugging it out with a slew of upstarts while Intel and Microsoft ran away with the profits and grew into titans. Big Blue was in decline, falling faster than most people imagined. And in 1992, the vast industrial behemoth stunned the business world by registering a $4.97 billion loss, the largest in U.S. history at the time. In the space of a decade, a company that had been synonymous with cutting-edge technology now looked tired and wasteful, a manufacturing titan ill-suited to the Information Age. It almost went under.
A new chief executive, Louis V. Gerstner, arrived in 1993 and transformed IBM. He sold off or shuttered old manufacturing divisions and steered the company toward businesses based on information. IBM did not have to sell machinery to be a leader in technology, he said. It could focus on the intelligence to run the technology—the software—along with the know-how to put the systems to good use. That was services, including consulting, and it led IBM back to growth.
Technology, in the early ’90s, was convulsing entire industries and the new World Wide Web promised even more dramatic change. IBM’s customers, which included virtually every blue-chip company on the planet, were confused about how these new networks and services fit into their businesses. Did it make sense to shift design work to China or India and have teams work virtually? Should they remake customer service around the Web? They had loads of questions, and IBM decided it could sell the answers. It could even take over tech operations for some of its customers and charge for the service.
This push toward services and software continued under Gerstner’s successor, Samuel J. Palmisano. Two months after Charles Lickel came back from Poughkeepsie with the idea for a Jeopardy computer that could play Jeopardy, IBM sold its PC division to Lenovo Group of China. That year IBM Global Services registered $40 billion in sales, more than the $31 billion in hardware sales and a much larger share of profits. (By 2009, services would grow to $55 billion, nearly 60 percent of the company’s revenue. And the consultants working in the division sold lots of IBM software, which registered $21 billion in sales.) Naturally, a Jeopardy computer would run on IBM hardware. But the heart of the system, like IBM itself, would be the software created to answer difficult questions.
A Jeopardy machine would also respond to another change in technology: the move toward human language. For most of the first half-century of the computer age, machines specialized in orderly rows of numbers and words. If the buyers in a database were listed in one column, the products in another, and the prices in a third, everything was clear: Computers could run the numbers in a flash. But if one of the customers showed up as “Don” in one transaction and “Donny” in another, the computer viewed them as two people: The two names represented different strings of ones and zeros, and therefore Don ≠ Donny. Computers had no sense of language, much less nicknames. In that way, they were clueless. The world, and all of its complexity, had to be simplified, structured and spoon-fed to these machines.
But consider what hundreds of millions of ordinary people were using computers for by 2004. They were e-mailing and chatting. Some were signing up for new social networks. (Facebook launched in February of that year.) Online humanity was creating mountains of a messy type of digital data: human language. Billions of words were rocketing through networks and piling up in data centers. Those words expressed what millions of people were thinking, desiring, fearing, and scheming. The potential customers of IBM’s clients were out there spilling their lives. Entire industries grew by understanding what people were saying and predicting what they might want to do, where they might want to go, and what they were eager to buy. Google was already mining and indexing words on the Web, using them to build a media and advertising empire. Only months earlier, Google had debuted as a publicly traded company, and the new stock was sky-rocketing.
IBM wasn’t about to mix it up with Google in the commercial Web. But Big Blue needed state-of-the-art tools to provide its corporate customers with the fastest and most insightful read of the words cascading through their networks. To keep a grip on its gold-plated consulting business, IBM required the very smartest, language-savvy technology—and it needed its customers to know and trust that it had it. It was central to IBM’s brand.
So in mid-2005 Horn took up the challenge with a number of his top researchers, including Ferrucci. A twelve-year veteran at the company, Ferrucci managed a handful of research teams, including the five people who were teaching machines to answer simple questions in English. Their discipline was called question-answering. Ferrucci knew the challenges all too well. The machines stumbled in understanding English and appeared to plateau, in competitions sponsored by the U.S. government, at a success rate of about 35 percent.
Ferrucci wasn’t a big Jeopardy fan, but he was familiar with it enough to appreciate the obstacles involved. Jeopardy tested a combination of knowledge, speed, and accuracy, along with game strategy. The show featured three contestants, each with a buzzer. In the course of about twenty minutes, they raced to respond to sixty clues representing a combined value of $54,000. Each one—and this was a Jeopardy quirk—was in fact an answer, some far more complex than others. The contestant had to provide the missing question. For example, in an unusual Tournament of Champions game that aired in November 1994, contestants were presented with this $500 clue1 under the category Furniture: “French term for a what-not, a stand of tiered shelves with slender supports used to display curios.” The host, Alex Trebek, read the clue from the big game board. The moment he finished, a panel around the question lit up setting off the race to buzz. On average, contestants had about four seconds to read and consider the clue before buzzing. The first to buzz was, in effect, placing a bet. The right response—“What is an étagère?”—was worth $500 and gave the contestant the chance to pick again. (“Let’s try European Capitals for $200.”) A botched response wiped the same amount from a contestant’s score and gave the other two a chance to try. (In this example, no one dared to buzz. Such a clue, uncommon in Jeopardy, is known as a “triple-stumper.”)
To compete in Jeopardy, a machine not only would need to come up with the answer, posed as a question, within four seconds, but it would also have to gauge its confidence in its response. It would have to know what it knew. “Humans know what they know like that,” Ferrucci said later, snapping his fingers. Replicating such confidence in a computer would be tricky. What’s more, the computer would have to calculate the risk according to where it stood in the game. If it was far ahead and had only middling confidence on “étagère,” it might make more sense not to buzz. In addition to piling up knowledge, a computer would have to learn to play the game.
Complicating the game strategy were four wild cards. Three of the game’s sixty hidden clues were so-called Daily Doubles. In that 1994 game, a contestant named Rachael Schwartz, an attorney from Bedminster, New Jersey, asked for the $400 clue in the Furniture category. Up popped a Daily Double giving her the chance to bet some or all of her money on a furniture-related clue she had yet to see. She wagered $500, a third of her winnings, and was faced with this clue: “This store fixture began in 15th century Europe as a table whose top was marked for measuring.” She missed it, guessing, “What is a cutting table?,” and lost $500. (“What is a counter?” was the correct response.) It was early in the game and didn’t have much impact. The three players were all around the $1,000 mark. But later in a game, Ferrucci saw, Daily Doubles gave contestants the means to storm back from far behind. A computer playing the game would require a clever game program to calibrate its bets.
The biggest of the wild cards was Final Jeopardy, the last clue of the game. As in Daily Doubles, contestants could bet all or part of their winnings on a single category. But all three contestants participated—as long as they had positive earnings. Often the game boiled down to betting strategies in Final Jeopardy. Take that 1994 contest, in which the betting took a strange turn. Going into Final Jeopardy, Rachael Schwartz led Kurt Bray, a scientist from Oceanside, California, by a slim margin, $9,200 to $8,600. The category was Historic Names. To lock down a win, she had to assume he would bet everything, reaching $17,200. A bet of $8,001 would give her one dollar more, provided she got it right. But if they both bet big and missed, they might fall to the third-place contestant, Brian Moore, a Ph.D. candidate from Pearland, Texas. In the minute or so that they took to place their bets, the two leaders had to map out the probabilities of a handful of different scenarios. They wrote down their dollar numbers and waited for the clue: “Though he spent most of his life in Europe, he was governor of the Bahamas for most of World War II.”
The second-place player, Bray, was the only one to get it right: “Who was Edward VIII?” Yet he had bet only $500. It was a strange number. It placed him $100 behind the leader, not ahead of her. But the bet kept him beyond the reach of the third-place player. Most players bet at least something on a clue. If Schwartz had wagered and missed, he would win. Indeed, Schwartz missed the clue. She didn’t even bother guessing. But she had bet nothing, leaving herself $100 ahead and winning the game.
The betting in Final Jeopardy, Ferrucci saw, might actually play to the strength of a computer. A machine could analyze betting patterns over thousands of games. It could crunch the probabilities and devise optimized strategies in a fraction of a second. “Computers are good at that kind of math,” he said.
It was the rest of Jeopardy that appeared daunting. The game featured complex questions and a wide use of puns posing trouble for literal-minded computers. Then there was Jeopardy’s nearly boundless domain. Smaller and more specific subject areas were easier for computers, because they offered a more manageable set of facts and relationships to master. They provided context. A word like “leak,” for example, had a specific meaning in deep-sea drilling, another in heart surgery, and a third in corporate press relations. A know-it-all computer would have to recognize different contexts to keep the meanings clear. And Jeopardy’s clues took the concept of a broad domain to a near-ludicrous extreme. The game had an entire category on Famous Understudies. Another was on the oft-forgotten president Rutherford B. Hayes. Worse, from a computer architect’s point of view, the game demanded answers within seconds—and penalized players for getting them wrong. A Jeopardy machine, just like the humans on the show, would have to store all of its knowledge in its internal memory. (The challenge, IBM figured, wouldn’t be nearly as impressive if a bionic player had access to unlimited information on the Web. What’s more, Jeopardy would be unlikely to accept a Web-surfing contestant, since others didn’t have the same privilege.) Beating humans in Jeopardy, it seemed, was more than a stretch goal. It appeared impossible and spelled potential disaster for researchers. To embarrass the company on national television—or, more likely, to flame out before even getting there—was no way to manage a career.
Ferrucci’s pessimism was also grounded in experience. In annual government competitions, known as TRec (Text Retrieval Conference), his question-answering (Q-A) team developed a system called Piquant. It struggled far below Jeopardy levels with a much easier test. In TRec, the competing teams were each given a relatively small “corpus” of about one million documents. They then had to train the machines to answer questions based on the material. (In one version from 2004, several of the questions had to do with Tom Cruise and his ex-wife.)
In answering these questions, the computer, for all its processing power and memory, resembled nothing so much as a student with serious brain damage. An apparently simple question could turn it into knots. In 2005, it was asked: “What is Francis Scott Key best known for?” The first job was to determine which of those words represented the subject of the question, the “entity,” and whether that might be a person, a state, or perhaps an animal or a machine. Each one had different characteristics. “Francis” and “Scott” looked like names. But “Key”? That could be a metal tool to open doors or a mental breakthrough to solve problems. In its hunt, the computer might even spend a millisecond or two puzzling over Key lime pies. Clearing up these doubts might require a visit to the system’s “disambiguation” unit, where the answering program consulted a dictionary or looked for contextual clues in the surrounding words. Could “Key” be something the ingenious Francis Scott invented, collected, planted, or stole? Could he have baked it? Probably not. The structure of the question, with no direct object, made it look like the third name of a person. The capital K on Key strengthened that case.
A person confronting that question either knew or did not know that Francis Scott Key wrote the U.S. national anthem, “The Star-Spangled Banner.” But he or she wasted no time searching for the subject and object in the sentence or wondering if it was a last name, a metal tool, or a tangy South Florida dessert.
For the machine, things only got worse. The question lacked a verb, which could disorient the computer. If the question were, “What did Francis Scott Key write?” the machine could likely find a passage of text with Key writing something, and that something would point to the answer. The only pointer here—“is known for”—was maddeningly vague. Assuming the computer had access to the Internet (a luxury it wouldn’t have on the show), it headed off with nothing but the name. In Wikipedia, it might learn that Key was “an American lawyer, author and amateur poet, from Georgetown, who wrote the words to the United States national anthem, ‘The Star-Spangled Banner.’” For humans, the answer was right there. But the computer, with no verb to guide it, might answer that Key was known as an amateur poet or a lawyer from Georgetown. In the TRec competitions, IBM’s Piquant botched two out of every three questions.
All too often, the system failed to understand the question or to put it in the right context. For this, a growing school of Artificial Intelligence argued, systems needed to spend more time in the computer equivalent of infancy, mastering the concepts that humans take for granted: time, space, and the basic laws of cause and effect.
Toddlerhood is a tribulation for computers, because it represents knowledge that is tied to the human experience: the body and the senses. While crawling, we learn about space and physical objects, and we get a sense of time. The toddler reaches for the jar on the table. Moments later pieces of it lie scattered on the floor. What happened between those two states? It fell. Such lessons establish notions of before and after, cause and effect, and the nature of gravity. These experiences, most of them accompanied by a steady stream of human language, set the foundation for practically everything we learn. “You crawl around and bump into things,” said David Gunning, a senior manager at Vulcan Inc., an AI incubator in Seattle. “That’s basic research.” It isn’t just jars that fall, the toddler notices. Practically everything does. (Certain balloons are exceptions, which seem magical.) The child turns these observations into theory. Unlike computers, humans generalize.
Even the metaphors in our language lead back to the tumbles and accidents seared into our consciousness in our early years. We “fall” for a sales pitch or “fall” in love, and we cringe at hearing “sharp” words or “stinging” rebukes. We process such expressions on such a basic level that they seem closer to feeling than thought (though for humans, unlike computers, the two are intertwined). Over the course of centuries, these metaphors infused language and, consequently, were fundamental to understanding Jeopardy clues. Yet to a machine with no body or experience in the physical world, each one was a puzzle.
In some Artificial Intelligence labs, scientists were attempting to transmit these elementary experiences to computers. Sajit Rao, a professor at MIT, was introducing computers equipped with vision to rumpus-room learning, showing them objects moving, falling, obstructing paths, and piling on top of one another. The goal was to establish a conceptual understanding so that eventually computers could draw conclusions from visual observations. What would happen, for example, when vehicles blocked a road?
Several years later, the U.S. Defense Department’s Advanced Research Projects Agency (DARPA) would fund Rao’s research for a program called Mind’s Eye. The idea was to teach machines not only to recognize objects but to be able to reason about what they were doing, where they might have come from. This work, they hoped, would lead to smart surveillance cameras, which would mean that computers could replace humans in the tedious and exhausting task of monitoring a spot—what the Pentagon calls “persistent stare.” Instead of simply recording movements, these systems would interpret them. If a man in Afghanistan went into a building carrying a package and emerged without it, the system would conclude that he had left it there. If he walked toward another person with a suitcase in his hand, it would predict that he was going to give it to him. A seeing and thinking machine that could generate hypotheses based on observations might zero in on potential roadside bombs or rooftop snipers. This type of intelligence, according to DARPA, would extend computer surveillance from objects to actions—from nouns to verbs.
This skill required the computer to understand relationships—precisely the stumbling block of IBM’s Piquant as it struggled with questions in the TRec competition. But potential breakthroughs such as Mind’s Eye were still in the infant stage of research and wouldn’t be ready for years—certainly not in time to give a Jeopardy machine a dose of human smarts. What’s more, Ferrucci was busy managing another big software project. So after consulting his team and assembling the discouraging evidence, he broke the news to a disappointed Paul Horn. His team would not pursue the Jeopardy challenge. It was just too hard to guarantee results on a schedule.
Free of that distraction, the Q-A team returned to its work, preparing Piquant for the next TRec competition. As it turned out, though, Ferrucci had won them only a respite, and a short one at that. Months later, in the summer of 2006, Horn returned with exactly the same question: How about Jeopardy?
Reluctantly, Ferrucci and his small Q-A team gathered in a small room at the Hawthorne research center, a ten-minute drive south from Yorktown. (It was a far less elegant structure, a cuboid of black glass in an office park. But unlike Yorktown, where the public spaces were bathed in natural light and the offices windowless, Hawthorne’s offices did have views, mostly of parking lots.) The discussion followed the familiar, depressing lines: the team’s travails in the TRec competitions, the insanely broad domain of Jeopardy, and the difficulty of coming up with answers and a betting strategy in three to five seconds. TRec had no time limit at all, and the computer often churned away for minutes trying to answer a single question.
While the team talked, Ferrucci sat at the back of the room, uncharacteristically quiet. He had a laptop open and was typing away. He was looking up Jeopardy clues online and then searching for answers on Google. The answers certainly didn’t pop up. But in many cases, the search engine led to the right neighborhood. He started thinking about the technologies needed to refine Google’s vague pointer to a precise answer. It would require much of the tech muscle of IBM. He’d have to bring in top natural-language researchers and experts in machine learning. To speed up the answering process, he’d need to spread out the computing to hundreds or even thousands of machines. This would require a crack hardware unit. His team would also need to educate the machine in strategy. Ferrucci had a few colleagues who focused on game theory. Several of them were training computers to play the Japanese game Go (whose computational complexity made chess look like Tic-Tac-Toe). Putting together all the pieces of this electronic brain would require a large multidisciplinary team and a huge investment—and even then they might fail. But the prospect of success, however remote, was tantalizing. Ferrucci looked up from his computer and said “Hey, I think we can do this.”
At the dawn of Artificial Intelligence (AI), a half century ago, scientists predicted that computers would soon be speaking and answering questions fluently. A pioneer in the field, Herbert Simon, predicted in 1965 that “machines w[ould] be capable, within twenty years, of doing any work a man can do.” These were the glory days of AI, a period of boundless vision and bounteous funding. Machines, it seemed, would soon master language, recognize faces, and maneuver, as robots, in factories, hospitals, and homes. In short, computer scientists would create a superendowed class of electronic servants. This led, of course, to failed promises, to such a point that Artificial Intelligence became a term of derision. Bold projects to build bionic experts and conversational computers lost their sponsors. A long AI winter ensued, lasting through much of the ’80s and ’90s.
What went wrong? In retrospect, it seems almost inconceivable that leading scientists, including Nobel laureates like Simon, believed it would be so easy. They certainly appreciated the complexity of the human brain. But they also realized that a lot of that complexity was tied up in dreams, memories, guilt, regrets, faith, desires, along with the controls to maintain the physical body. Machines wouldn’t have to bother with those details. All they needed was to understand the elements of the world and how they were related to one another. Machines trained in the particulars of sick people; ambulances and hospitals, for example, could conceivably devote their analytical skills to optimizing emergency services. Yet teaching the machines proved extraordinarily difficult. One of the biggest challenges was to anticipate the responses of humans. The machines weren’t up to it. And they had serious trouble with even the most basic forms of perception, such as seeing. For example, researchers struggled to teach machines to perceive the edges of things in the physical world. As it turned out, it required experience and knowledge and advanced powers of pattern recognition just to look through a window and understand that the oak tree in the yard was a separate entity. It was not connected to the shed on the other side of it or a pattern on the glass or the wallpaper surrounding the window.
The biggest obstacle, though, was language. In the early days, it looked beguilingly easy. It was just a matter of programming the machine with vocabulary and linking it all together with a few thousand rules—the kind you’d find in a grammar book. If the machine still underperformed? Well, just give it more vocabulary, more rules.
Once the electronic brain mastered language, it was simply a question of teaching it about the world. Asia’s over there. This is the United States. We have a democracy. That’s the Pacific Ocean between the two. It’s big, and wet. If researchers kept adding facts, millions of them, and defining their relationships, by the end of the grant cycle they might have a talking, thinking machine that “knew” what humans did.
Language, of course, turns out to be far more complicated. Jaime Carbonell, a top researcher at Carnegie Mellon University, has been teaching language to machines for decades. The way he describes it, our minds are swimming with cultural and historical allusions, accumulated over millennia, along with a complex scheme of who’s who. Words, when spoken or read, vary wildly according to context. (Just imagine if the cops in New York raced off to Citi Field, sirens wailing, every time someone was heard saying, “The Mets are getting killed!”)
Carbonell, sitting in his Pittsburgh office, gave another example. He issued a statement: “I want a juicy hamburger.” What does it mean? Well, if a child says it to his mother, it’s a request or a plea. If a general says it to a corporal, it’s a tacit command. And if a prisoner says it to a cellmate, it might be nothing more than a wish. Scientists, of course, could attempt to teach a computer those variables as rules. But new layers of complexity pop up. Is the general a vegan or speaking sarcastically? Or maybe “hamburger” means something entirely different in prison lingo?
This flexibility isn’t a weakness of language but a strength. Humans need words to be inexact; if they were too precise, each person would have a unique vocabulary of several billion words, all of them unintelligible to everyone else. You might have a unique word for the sip of coffee you just took at 7:59 A.M., which was flavored with the anxiety about the traffic in the Lincoln Tunnel or along Paris’s Périphérique. (That single word would be as useless to you as to everyone else. A word has to be used at least twice to have any purpose.)
Each word is a lingua franca, a fragment of a clumsy common language. Imagine a man saying a simple sentence to a friend: “I’m weary.” He’s thinking about something, but what is it? Has he carried a load a long way in the sun? Does he have a sick child or financial troubles? His friend certainly has different ideas, based on his own experience, about what “weary” means. In addition to the various contexts, it might send other signals. Maybe where he comes from, the word has a slightly rarefied feel, and he’s wondering whether his friend is trumpeting his sophistication. Neither one knows exactly what the other is thinking. But that single word, “weary,” extends an itsy bridge between them.
Now, with that bridge in place, the word shared, they dig deeper to see if they can agree on its meaning. They study each other’s expression and tone of voice. As Carbonell noted, context is crucial. Someone who has won the Boston Marathon might be contentedly weary. Another, in a divorce hearing, is anything but. One person may slack his jaw in an exaggerated way, as if to say “Know what I mean?” In this tiny negotiation, far beyond the range and capabilities of machines, two people can bridge the gap between the formal definition of a word and what they really want to say.
It’s hard to nail down the exact end of AI winter. A certain thaw set in when IBM’s computer Deep Blue bested Garry Kasparov in their epic 1997 showdown. Until that match, human intelligence, with its blend of historical knowledge, pattern recognition, and the ability to understand and anticipate the behavior of the person across the board, ruled the game. Human grandmasters pondered a rich set of knowledge, jewels that had been handed down through the decades—from Bobby Fischer’s use of the Sozin Variation in his 1972 match with Boris Spassky to the history of the Queen’s Gambit Denied. Flipping through scenarios at about three per second—a glacial pace for a computing machine—these grandmasters looked for a flash of inspiration, an insight, the hallmark of human intelligence.
Equally important, chess players tried to read the minds of their foes. This is a human specialty, a mark of our intelligence. Cognitive scientists refer to it as “theory of mind”; children develop it at about age four. It’s what enables us to imagine what someone else is experiencing and to build large and convoluted structures based on such analysis. “I wonder what he was thinking I knew when I told him . . .” Most fiction, from Henry James to Elmore Leonard, revolves around this very human analysis, something other species—and computers—cannot even approach. (It’s also why humans make such expert liars.)
Unlike previous AI visions, in which a computer would “think” more or less the way we do, Deep Blue set off on a different course. It played on the strengths of a supercomputer: a fabulous memory and extraordinary calculating speed. Statistical approaches to machine intelligence had been around since the dawn of AI, but the numbers mavens had never witnessed anything approaching this level of computing power and speed. Deep Blue didn’t try to read Garry Kasparov’s mind, and it certainly didn’t count on flashes of inspiration. Instead, it raced through a century of grandmaster games, analyzing similar moves and situations. It then constructed the most probable scenarios for each possible move. It analyzed two hundred million moves per second (nearly seventy million for each one the humans considered). A similar approach for a computer writing poetry would be to scrutinize the patterns and vocabulary of every poem ever written before choosing each word.
Forget inspiration, creativity, or blinding insight. Deep Blue crunched data and won its match by juggling statistics, testing thousands of scenarios, and calculating the odds. Its intelligence was alien to human beings—if it could be considered intelligence at all. IBM at the time described the machine as “less intelligent than the stupidest person.” In fact, the company stressed that Deep Blue did not represent AI, since it didn’t mimic human thinking. But the Deep Blue team made good on a decades-old promise. They taught a machine to win a game that was considered uniquely human. In this, they passed a chess version of the so-called Turing test, an intelligence exam for machines devised by Alan Turing, a pioneer in the field. If a human judge, Turing wrote, were to communicate with both a smart machine and another human, and that judge could not tell one from the other, the machine passed the test. In the limited realm of chess, Deep Blue aced the Turing test—even without engaging in what most of us would recognize as thought.
But knowledge? That was another challenge altogether. Chess was esoteric. Only a handful of specially endowed people had mastered the game. Yet all of us played the knowledge game. By advancing from chess to Jeopardy, IBM was shifting the focus from a remote island off the coast straight to our cognitive mainland. Here, the computer would grapple with far more than game theory and math. It would be competing in a field utterly defined by human intelligence. The competitors in Jeopardy, as well as the other humans writing the clues, would feast on knowledge tied to experiences and sensations, sights and tastes. The machine, by contrast, would be blind and deaf, with no body, no experience, no life. Its only memories—if you could call them that—would be millions of lists and documents encoded in ones and zeros. And the entire game would be played in endlessly complex and nuanced language—a cinch for humans, a tribulation for machines.
Picture one of those cartoons in which a land animal, perhaps a coyote, runs off a cliff and continues to run so fast in midair that it manages to fly (at least for a while). Now imagine that animal not only surviving but flying upward and competing with birds. That would be the challenge facing an IBM machine. It would have to use its native strengths in speed and computation to thrive in an utterly foreign setting. Strictly speaking, the machine would be engaged in a knowledge game without “knowing” a thing.
Still, Ferrucci believed his team had a fighting chance, though he wasn’t quite ready to commit. He code-named the project Blue J—Blue for Big Blue, J for Jeopardy—and right before the holidays, in late 2006, he asked Horn to give him six months to see if it was possible.