DAVID FERRUCCI HAD driven the same stretch hundreds of times from his suburban home to IBM’s Yorktown labs, or a bit farther to Hawthorne. For fifteen or twenty minutes along the Taconic Parkway he went over his endless to-do list. How could his team boost Watson’s fact-checking in Final Jeopardy? Could any fix ensure that the machine’s bizarre speech defect would never return? Was the pun-detection algorithm performing up to par? There were always more details, plenty to fuel both perfectionism and paranoia—and Ferrucci had a healthy measure of both.
But this January morning was different. As he drove past frozen fields and forests, the pine trees heavy with fresh snow, all of his lists were history. After four years, his team’s work was over. Within hours, Watson alone would be facing Ken Jennings and Brad Rutter, with Ferrucci and the machine’s other human trainers reduced to spectators. Ferrucci felt his eyes well up. “My whole team would be judged by this one game,” he said later. “That’s what killed me.”
The day before, at a jam-packed press conference, IBM had unveiled Watson to the world. The event took place on a glittering new Jeopardy set mounted over the previous two weeks by an army of nearly a hundred workers. It resembled the set in Culver City: the same jumbo game board to the left, the contestants’ lecterns to the right, with Alex Trebek’s podium in the middle. In front was a long table for the Jeopardy officials, where Harry Friedman would sit, Rocky Schmidt at his side, followed by a line of writers and judges, all of them with monitors, phones, and a pile of old-fashioned reference books. Every piece was in place. But this East Coast version was plastered with IBM branding. The shimmering blue wall bore the company’s historic slogan, Think, in a number of languages. Stretched across the shiny black floor was a logo that looked at first like Batman’s emblem. But closer study revealed the planet Earth, with each of the continents bulging, as if painted by Fernando Botero. This was Chubby Planet, the symbol of IBM’s Smarter Planet campaign and the model for Watson’s avatar. In the negotiations with Jeopardy over the past two years, IBM had lost out time and again on promotional guarantees. It had seemed that Harry Friedman and his team held all the cards. But now that the match was assured and on Big Blue’s home turf, not a single branding opportunity would be squandered.
The highlight of the press event came when Jennings and Rutter strode across the stage for a five-minute, fifteen-clue demonstration. In this test run, Watson held its own. In fact, it ended the session ahead of Jennings, $4,400 to $3,400. Rutter trailed with $1,200. Within hours, Internet headlines proclaimed that Watson had vanquished the humans. It was as if the game had already been won.
If only it were true. The demo match featured just a handful of clues and included no Final Jeopardy—Watson’s Achilles’ heel. What’s more, after the press departed that afternoon, Watson and the human champs went on to finish that game and play another round—“loosening their thumbs,” in the language of Jeopardy. In these games Ferrucci saw a potential problem: Ken Jennings. It was clear, he said, that Jennings had prepped heavily for the match. He had a sense of Watson’s vulnerabilities and an aggressive betting strategy specially honed for the machine. Brad Rutter was another matter altogether. Starting out, Ferrucci’s team had been more concerned about Rutter than Jennings. His speed on the buzzer was the stuff of legend. Yet he appeared relaxed, almost too relaxed, as if he could barely be bothered to buzz. Was he saving his best for the match?
In the first of the two practice games, Jennings landed on all three Daily Doubles. Each time he bet nearly everything he had. This was the same strategy Greg Lindsay had followed to great effect in three sparring games a year earlier. The rationale was simple. Even with its mechanical finger slowing it down by a few milliseconds, Watson was lightning fast on the buzzer. The machine was likely to win more than its share of the regular Jeopardy clues. So the best chance for the humans was to pump up their winnings on the four clues that hinged on betting, not buzzing: the Daily Doubles and Final Jeopardy. Thanks to his aggressive betting, Jennings ended the first full practice game with some $50,000, a length ahead of Watson, which scored $39,000. Jennings was fired up. When he clinched the match, he pointed to the computer and exclaimed, “Game over!” Rutter finished a distant third, with about $10,000. In the second game, Jennings and Watson were neck and neck to the end, when Watson edged ahead in Final Jeopardy. Again, Rutter coasted to third place. Ferrucci said that he and his team left the practice rounds thinking, “Ken’s really good—but what’s going on with Brad?”
When Ferrucci pulled in to the Yorktown labs the morning of the match, the site had been transformed. The visitors’ parking lot was cordoned off for VIPs. Security guards noted every person entering the building, checking their names against a list. And in the vast lobby, usually manned by one lonely guard, IBM’s luminaries and privileged guests circled around tables piled with brunch fare. Ferrucci made his way to Watson’s old practice studio, now refashioned as an exhibition room. There he gave a half-hour talk about the computer to a gathering of IBM clients, including J. P. Morgan, American Express, and the pharmaceutical giant Merck & Co. Ferrucci recalled the distant days when a far stupider Watson responded to a clue about a famous French bacteriologist by saying: “What is ‘How Tasty Was My Little Frenchman’?” (That was the title of a 1971 Brazilian comedy about cannibals in the Amazon.)
His next stop, the makeup room, revealed his true state of mind. The makeup artist was a woman originally from Italy, like much of Ferrucci’s family. As she began to work on his face she showered him with warmth and concern—acting “motherly.” This rekindled his powerful feelings about his team and the end of their journey, and before he knew it, tears were streaming down his face. The more the woman comforted him, the worse it got. Ferrucci finally stanched the flow and got the pancake on his face, but he knew he was a mess. He hunted down Scott Brooks, the lighthearted press officer. Maybe some jokes, he thought, “would take the lump out of my throat.” Brooks laughed and warned him that people might compare him to the new U.S. Speaker of the House, John Boehner, whose frequent tears had recently earned him the sobriquet “Weeper of the House.”
This irritated the testy Ferrucci and, to his relief, knocked him out of his fragile mood. He joined his team for a last lunch, all of them seated at a long table in the cafeteria. As they were finishing, just a few minutes before 1 P.M., a roaring engine interrupted their conversations. It was IBM’s chairman, Sam Palmisano, landing in his helicopter. The hour had come. Ferrucci walked down the sunlit corridor to the auditorium.
Ken Jennings woke up that Friday morning in the Crown Plaza Hotel in White Plains. He’d slept well, much better than he usually did before big Jeopardy matches. He had good reason to feel confident. He had destroyed Watson in one of the practice rounds. Afterward, he said, Watson’s developers told him that the game had featured a couple of “train wrecks”—categories in which Watson appeared disoriented. Children’s Literature was one. For Jennings, train wrecks signaled the machine’s vulnerability. With a few of them in the big match, he could stand up tall for humans and perhaps extend his legend from Jeopardy to the broader realm of knowledge. “Given the right board,” he said, “Watson is beatable.” The stakes were considerable. While IBM would give all of Watson’s winnings to charity, a human winner would earn a half-million-dollar prize, with another half-million to give to the charity of his choice. Finishing in second and third place was worth $150,000 and $100,000, with equal amounts for the players’ charities.
A little after eleven, a car service stopped by the hotel, picked up Jennings and his wife, Mindy, and drove them to IBM’s Yorktown laboratory. Jennings carried three changes of clothes so that he could dress differently for each session, simulating three different days. As soon as he stepped out of the car, Jeopardy officials whisked him past the crush of people in the lobby toward the staircase. Jeopardy had cleared out a few offices in IBM’s Human Resources Department, and Jennings was given one as a dressing room.
On short visits to the East Coast, Brad Rutter liked to sleep late in order to stay in sync with West Coast time. But the morning of the match, he found himself awake at seven, which meant he faced four and a half hours before the car came for him. Rutter was at the Ritz-Carlton in White Plains, about a half mile from Jennings. He ate breakfast, showered, and then killed time until 11:30. Unlike Jennings, Rutter had grounds for serious concern. In the practice rounds, he had been uncharacteristically slow. The computer had exquisite timing, and Jennings seemed to hold his own. Rutter, who had never lost a game of Jeopardy, was facing a flameout unless he could get to the buzzer faster.
Shortly after Rutter arrived at IBM, he and Jennings played one last practice round with Watson. To Rutter’s delight, his buzzer thumb started to regain its old magic, and he beat both Jennings and the machine. Now, in the three practice matches, each of the players had registered a win. But Jennings and Rutter noticed something strange about Watson. Its game strategy, Jennings said, “seemed naive.” Just like beginning Jeopardy players, Watson started with the easy, low-dollar clues and moved straight down the board. Why wasn’t it hunting for Daily Doubles? In the Blu-ray Discs given to them in November, Jennings and Rutter had seen that Watson skipped around the high-dollar clues, hunting for the single Daily Double on the first Jeopardy board and the two in Double Jeopardy. Landing on Daily Doubles was vital. It gave a player the means to build a big lead. And once the Daily Doubles were off the board, the leader was hard to catch. But in the practice rounds, Watson didn’t appear to be following this strategy.
The two players were led to a tiny entry hall behind the auditorium. As the event began, shortly after 1 P.M., they waited. They listened as IBM introduced Watson to its customers. “You know how they call time-outs before a guy kicks a field goal?” Jennings said. “We were joking that they were doing the same thing to us. Icing us.” Through the door they heard speeches by John Kelly, the chief of IBM Research, and Sam Palmisano. Harry Friedman, who decades earlier had earned $5 a joke as a writer for Hollywood Squares, delivered one of his own. “I’ve lived in Hollywood for a long time,” he told the crowd, “so I know something about Artificial Intelligence.” When Ferrucci was called to the stage, the crowd rose for a standing ovation. “I already cried in makeup,” he said. “Let’s not repeat that.”
Finally, it was time for Jeopardy, and Jennings and Rutter were summoned to the stage. They walked down the narrow aisle of the auditorium, Jennings leading in a business suit and yellow tie, the taller, loose-gaited Rutter following him, his collar unbuttoned. They settled at their lecterns, Jennings on the far side, Rutter closer to the crowd. Between them, its circular black screen dancing with colorful jagged lines, sat Watson.
The show began with the familiar music. A fill-in for the legendary announcer, Johnny Gilbert (who hadn’t made the trip from Culver City), introduced the contestants and Alex Trebek. Even then, Jennings and Rutter had to wait while an IBM video told the story of the Watson project. In a second video, Trebek asked Ferrucci about the machinery behind the bionic player—now up to 2,880 processing cores. Then Trebek gave viewers a tutorial on Watson’s answer panel. It would reveal the statistical confidence that the computer had in each of its top responses. It was a window into Watson’s thinking.
Trebek, in fact, had been a late convert to the answer panel. Like the rest of the Jeopardy team, he was loath to stray from the show’s time-honored formulas. People knew what to expect from the game: the precise movements of the cameras, the familiar music, voices, and categories. Wouldn’t the intrusion of an electronic answer panel distract them and ultimately make the game less enjoyable to watch? Trebek raised that concern on a visit to IBM in November. But the prospect of televising the game without Watson’s answer panel horrified Ferrucci. Millions of viewers, he believed, would simply conclude that the machine had been fed all the answers. They wouldn’t appreciate what Watson went through to arrive at the correct response. So while Trebek was eating lunch that day, Ferrucci had his technicians take down the answer panel. When the afternoon sessions began, it took only one game for Trebek to ask for it to be restored. Later, he said, watching Watson without the panel’s analysis was “boring as hell.”
Finally, it was time to play. A hush settled over the auditorium. Ferrucci, sitting between David Gondek and Eric Brown, laced his hands tightly and made a steeple with his index fingers. He watched as Trebek, with a wave of his arm, revealed the six categories for the first round of Jeopardy. One was Literary Character APB. Trebek explained that APB stood for “all points bulletin.” This clarification was lost on the deaf Watson, which irked Ferrucci and the IBM team. Other categories were Beatles People, Olympic Oddities, Name that Decade, Final Frontiers, and Alternate Meanings. None of them looked especially vexing for the computer.
Rutter had won the draw, so he started and chose Alternate Meanings for $200. “A four-letter word for vantage point,” Trebek read, “or belief.” Rutter, famous for his prowess with the buzzer, won this first clue and responded correctly: “What is view?”
He asked for the $400 clue in the same category. Trebek read: “Four-letter word for the iron fitting on the hoof of a horse, or a card-dealing box in a casino.”
Watson won the buzz and uttered its first syllables for an audience of millions, answering correctly: “What is a shoe?” It pronounced the final word meekly, as if unsure of itself or perhaps embarrassed. Still, with that response, Watson had $400—positive winnings against the greatest of human players. That alone was a threshold that four years earlier had appeared daunting to many—including some in the audience.
With control of the board, Watson pursued the merciless strategy mapped out by David Gondek and his team. Departing from its passive approach in the practice rounds, it moved straight to the high-dollar boxes, hunting for the Daily Double. “Let’s try Literary Character APB for eight hundred,” Watson said. The zinging sound of space guns echoed through the auditorium, announcing that the machine, on its first try, had landed on the Daily Double. The two APPLAUSE signs flashed over the stage, but they were hardly needed. This was Watson’s crowd.
In truth, the Daily Double in the first round of Jeopardy is not terribly important, especially this early in the game. The players at this stage have very little money to bet. It’s in Double Jeopardy, when the end is in sight and the contestants have piled up much higher winnings, that a laggard can vault toward victory, winning $10,000 or even more with a single bet. Watson’s Daily Double strategy was less about padding its own lead than keeping these dangerous wild cards from its rivals.
Though Watson had won only $400, Jeopardy rules allow players to bet the maximum dollar number on the board, or $1,000. This would risk dropping Watson’s score into negative territory. But before the machine could place its bet, Alex Trebek stopped the game. His monitor had blacked out. Technicians scurried across the black stage as Jennings slumped at his lectern. Rutter, his feet crossed at the ankles, drummed his fingers. Between them, the computer’s avatar traced its endless lines of blue and red, behaving much like its close cousin, the screen-saver. When it came to patience, Watson was in a league of its own.
“Ready to go,” said a voice from the control booth. “Five, four, three, two, one.” The crowd again heard the recording of Watson calling for the $800 clue, the sound of zinging space guns, and the applause. Then Trebek cut in live—an art he had perfected in his twenty-seven years on the show—asking Watson how much it wanted to bet. “One thousand, please,” the computer said. Then it faced this clue: “Wanted for killing Sir Danvers Carew, appearance pale and dwarfish, seems to have a split personality.”
Watson didn’t hesitate. “Who is Hyde?”
“Hyde, yes,” Trebek said. “Dr. Jekyll and Mr. Hyde.” The crowd applauded. Jennings and Rutter politely joined in. This was the custom in Jeopardy, though such sportsmanship seemed a bit odd when standing next to a machine.
Watson didn’t stop there. Beating Jennings and Rutter to the buzz, it answered clues about the Beatles’ Jude, the swimmer Michael Phelps, the monster Grendel in Beowulf, the 1908 London Olympics, the boundaries of black holes (event horizons), Lady Madonna, and Maxwell’s Silver Hammer. By the time Rutter jumped in on a clue about the Harry Potter books (“What is Voldemort?”), Watson had $5,200, far ahead of Rutter’s $1,000. Jennings trailed with only $200.
It was time for a commercial break. Off camera, Trebek shook his head as he walked across the set toward Jennings and Rutter. “I can’t help but wonder if Watson was sandbagging yesterday,” he said. Was the computer, like a poker player holding a royal flush, masking its strength? Rutter didn’t know, but he noticed that Watson’s strategy had changed. “He wasn’t jumping around the way he is today,” he said.
“He’s a hustler,” Jennings said.
In fact, before the match technicians had switched Watson to its “championship” mode. This involved two changes. First, this exhibition match was a double game. The player with the highest cumulative score in the two games would win. This changed the players’ strategy. Instead of following the safest path to win each game, if only by a single dollar, players had to pile up winnings. In addition to adjusting Watson’s betting algorithms for double games, the IBM team directed the machine to hunt for Daily Doubles. The practice rounds, they said, were to test the machinery and the buzzer. The goal in the match was to win. These tweaks hadn’t much affected Watson’s scoring in this early round. The computer had simply chanced on comfortable clues, from the Beatles to black holes. That could change.
And it did. As the opening game progressed, Watson faltered. In the Final Frontiers category, it buzzed confidently on a Latin term for end, “a place where trains can also originate.” But the machine picked the wrong Latin word: “What is finis?” Jennings got “terminus” on the rebound, and inched closer.
Then Watson fell into a couple of cognitive traps. The $1,000 clue under Olympic Oddities asked about “the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904.” Jennings won the buzz and after a pause ventured: “What is . . . he was missing a hand?” That was incorrect. Watson buzzed on the rebound.
“What is leg?” it said.
“Yes,” Trebek said. But before they moved to the next clue, a judge called a halt to the game. Eyser’s “leg” wasn’t the anatomical oddity. Instead, it was the fact that he was missing a leg. After five minutes of consultation onstage with the judges, Trebek, and IBM’s David Shepler—Watson’s advocate—the computer’s response was ruled incorrect. “It was my boo-boo,” Trebek told the audience. Then he redubbed his response to Watson: “No, I’m sorry I can’t accept that. I needed you to say, “What is ‘He’s missing a leg’?”
Watson’s mistake, though subtle, reflected its misreading of the lexical answer type (LAT) in the clue. Despite years of training from James Fan and others, in this example it failed to understand precisely what it was seeking. For a national audience initially wowed by the Jeopardy computer, it would serve as a reminder that the machine, for all its prodigious powers, could succumb to confusion. For Jennings and Rutter, the upshot was simpler. It chopped $2,000 from Watson’s lead.
This was a misstep for Watson but hardly an embarrassment. That would come later, on a $1,000 clue asking about the decade that gave birth to Oreo cookies and the first modern crossword puzzle. Jennings won the buzz and answered, “What are the twenties?’ This was wrong. The deaf Watson won the rebound and promptly repeated the same wrong answer. The machine, for all its brilliance, was in many aspects oblivious. This was no secret in IBM’s War Room, but now the whole world could see it.
As this first Jeopardy round came to a close, Rutter climbed and Watson tumbled. They ended in a tie, the co-leaders at $5,000, with Jennings at $2,000. That would end the first of the three-day television event in February, meaning that viewers would tune in for Day Two fully expecting to see a Double Jeopardy round featuring men and machine in a tense, closely fought tussle.
Watson, it turned out, had other ideas. After an intermission, in which the host and the human contestants changed clothes, Trebek unveiled the categories for Double Jeopardy. This round, which offered more background information on Watson, would occupy the second of the half-hour television shows. The names on the board gave Jennings and Rutter room for hope. A couple of them, Hedgehog Podge and Etude Brute, sounded confusing—potential Watson train wrecks. The others—Don’t Worry About It, The Art of the Steal, Cambridge, and Church & State—looked more straightforward. But they wouldn’t know for sure until they started to play.
It didn’t take long to see that Watson was in a groove. The machine monopolized the buzzer, hunted down the Daily Doubles, and appeared to understand every clue. Jennings, whose lectern was right next to Watson’s bionic hand, later said that its staccato rhythm as it pressed the buzzer three times reminded him of “the soundtrack from The Terminator.” Rutter said that playing against Watson filled him with a new type of empathy. “I thought, ‘This must be what it feels like to play against Ken or me,’” he said.
Watson’s buzzer speed also affected the humans’ game. They felt compelled to jump faster than usual for the buzzer. This often led to quarter-second penalties for early buzzing—a trap Watson never fell into. And in their eagerness to win control of the board, they found themselves hurrying to respond to clues, sometimes before reading them, resulting in mistakes. “Against human players, you have a window,” Jennings said. “Against Watson, that window essentially does not exist.”
In the first minutes of the game, Watson ransacked the board for Daily Doubles. This led it through the high-dollar clues on everything from Sergei Rachmaninoff and Franz Liszt to leprosy and albinism. The frustrated humans kept trying to buzz, to no effect. The computer nearly tripled Rutter’s score, to $14,600, and then, under Cambridge, landed on the board’s first Daily Double. “I’ll wager six thousand four hundred thirty-five dollars,” Watson said. This figure, so unusually precise, drew laughter from the crowd. Like everything else on the board, the clue turned out to be friendly to Watson. “The chapels at Pembroke and Emmanuel Colleges were designed by this architect.” Watson could have handled this one in its infancy. The clue featured simple syntax and a crystal-clear LAT—an architect—connected to easily searchable proper nouns. By answering “Who is Sir Christopher Wren?” Watson raised its winnings to $21,035.
Two questions later, a clue appeared in the wrong box. These glitches, which would continue through the afternoon, made life even harder for Jennings and Rutter. They had to stand at the podiums with their backs turned to the Jeopardy board so that they wouldn’t see a clue if one happened to pop up. These delays often lasted for five or ten minutes at a time. While the contestants stood there, attendants mopping their foreheads or offering them water, Trebek worked to keep the audience engaged. He told jokes and answered questions about Jeopardy. He mentioned, for example, that Merv Griffin, the game’s founder, raked in an astounding $83 million during his lifetime for rights to his Jeopardy jingle. One time, as technicians labored behind him, Trebek intoned: “We realize that if we keep you waiting here three hours on the tarmac, we have to provide you with a meal, and perhaps accommodation.”
The malfunction during Watson’s runaway game arrived at a strange moment. Watson had chosen the $1,600 clue under Hedgehog Podge. The clue seemed almost designed for the computer: “Garry Kasparov wrote the foreword for The Complete Hedgehog, about a defense in this game.” Watson, as usual, won the buzz. Its answer panel showed 96 percent confidence in its first response: “What is chess?” It was Watson’s digital role model, Deep Blue, that had beaten Kasparov in the famous man-machine match in 1997. Yet as Trebek waited for a response, saying, “Watson?” the computer said nothing. After its time ran out, Jennings scored on the rebound. “Chess is right,” Trebek said. “And I think Deep Blue will never forgive Watson for missing that one.”
It turned out, though, that when the clue had popped up in the wrong box on the board, it disoriented the machine, leading Watson to keep mum. Eventually, Jeopardy had to replace that clue with another one—much to the IBM crowd’s regret. It would have been nice, after all, to have a reference to Deep Blue in the match. But in an afternoon full of technical mishaps, the chess clue fell out. “There’s a line Watson’s familiar with,” Trebek told the audience off camera. He made a sweeping gesture with his arm and said, “_____ happens.”
As this second half of the first game neared its end, Watson continued its rampage, ending with $36,681. Rutter and Jennings had barely inched ahead, to $5,400 and $2,200, respectively. Their best hope was that the machine, known to be weaker in Final Jeopardy, would bet heavily—looking for a knockout punch—and miss. The category was U.S. Cities. The clue: “Its largest airport is named for a World War II hero, its second largest for a World War II battle.”
To many, this sounded like an easy one for Watson. It was a city big enough to have two airports, each of them connected thematically to the Second World War. But Watson, assuming it understood the clue, had to carry out separate searches for many of the airports in the country, looking for connections to long lists of heroes and battles. Numerous names overlapped. New York’s biggest airport, for example, was named for John F. Kennedy, who happened to be a hero of World War II. Its second airport was La Guardia. Was there a battle in the Italian campaign by that name? No doubt Watson burrowed through thousands of documents, finding along the way “battles” involving New York City’s feisty mayor, Fiorello La Guardia. In the end, the computer was bewildered.
Jennings and Rutter both responded correctly: “What is Chicago?” (The bigger airport took its name from Butch O’Hare, a fighter pilot; the smaller one from the Battle of Midway.) Jennings doubled his meager winnings, to $4,400. Rutter added $5,000 to his, reaching $10,400. When Watson missed the clue, the gap promised to narrow. Its response, which drew laughter from the crowd, was: “What is Toronto??????” (The IBM team had programmed the machine to add those question marks on wild guesses so that the spectators would see that the computer had low confidence. Its awareness of what it didn’t know was an important aspect of its intelligence.) Fortunately for Watson, it had wagered a mere $947 on its answer. It had established a big lead and was programmed to hold on to it. Even after the airport flub, it headed into the second and deciding game with a $25,000 advantage over Rutter and a bit more than $30,000 ahead of Jennings.
In the break between the two games, the crowd emptied into the lobby for refreshments. IBM’s Sam Palmisano greeted Charles Lickel, the recently retired manager whose visit to a Fishkill restaurant at the height of Ken Jennings’s winning streak led to the idea for the Jeopardy challenge. Palmisano was thrilled with Watson’s performance. But was it too much of a good thing? Would Watson come off as a bully or make the show boring? “Maybe we should have dialed it down a little,” he said to Lickel.
Nearby, Ferrucci was huddled with John Kelly, the director of IBM Research. He was explaining to Kelly how the machine could possibly have picked Toronto as a U.S. city with World War II–themed airports. He noted that Watson had very low confidence in Toronto and that its second choice, just a hair behind, was Chicago. Watson, he said, was programmed not to discount answers based on one apparent contradiction. After all, there could be towns named Toronto in the United States. And from Watson’s perspective, Toronto, Ontario, had numerous U.S. connections. For instance, its baseball team, the Blue Jays, was in the American League.
As the second and final game began, Trebek, who was born in Canada, had a little fun at Watson’s expense. The three things he had learned in the previous match, he said, were that Watson was fast and capable of some weird wagers—and that “Toronto is now a U.S. city!”
The challenge for Jennings and Rutter was clear. To catch up with Watson, one of them had to rack up earnings quickly and then land on two or three Daily Doubles, betting the farm each time. That was the only way to reach sky-high scores in the remaining game. To catch Watson, one of them would probably need to reach $50,000, or even higher.
Watson promptly took off on a Daily Double hunt. It answered clues about Istanbul and the European Parliament, and identified Arabic as the mother tongue of Maltese. But it lost $1,000 by naming Serbia, instead of Slovenia, as the one former Yugoslav republic in the European Union.
It was then that Rutter and Jennings happened on a weak category for Watson: Actors Who Direct. The clues were simply the names of movies, such as A Bronx Tale or Into the Wild. The contestants had to come up with the directors’ names—Robert De Niro and Sean Penn, in those examples. Watson was slow to the buzzer in this category because the clues were so short. It took Trebek only a second or so to read them, and Watson required at least two seconds to find the answer. Jennings worked his way up the category. But when he reached the lower-dollar clues, he switched columns. The reasoning was simple. While he was safe from Watson in the category, he might lose the buzz to Rutter, who would then be in a position to win one of the Daily Doubles.
As he hunted for Daily Doubles, Jennings lost control of the board several times, but he was making money. He had $3,600—$800 less than Watson—when he called for the $600 clue in Breaking News. The space-gun sound rang through the auditorium. A human finally had a Daily Double. Jennings bet everything he had and then saw the clue: “Senator Obama attended the 2006 groundbreaking for this man’s memorial, a half mile from Lincoln’s.” Jennings paused. “I was about to say FDR,” he later admitted. But then he wondered why the Jeopardy writer would mention Obama before he became president, and “figured it had to be about civil rights.” And so he answered: “Who was Martin Luther King?” That was correct, and it raised Jennings’s total to $7,200. By the end of the Jeopardy round, Jennings had $8,600. Watson trailed at $4,800, with Rutter third, at $2,400.
There was one more Double Jeopardy board in the match: thirty clues, two of them Daily Doubles, plus Final Jeopardy. Jennings’s best chance on this home stretch was to double his money on the first Daily Double, double it again on the second, and again—if necessary—in Final Jeopardy. If he got to $10,000 before beginning this magical run, he could conceivably end up with $80,000. No one had ever pulled that off on Jeopardy—much less against the likes of Watson and Brad Rutter. The all-time single-game record in the show was Roger Craig’s $77,000, and his competition had been far humbler. “I knew the odds were stacked against me,” Jennings said. “It was my only shot.”
He and Rutter both started off hunting in the high-dollar clues, but it was Watson that landed on the first Daily Double. It was the $1,200 clue in the Nonfiction category. The computer bet a conservative $2,127—and promptly botched a devilishly complex clue: “The New Yorker’s 1959 review of this said that in its brevity and clarity it is unlike most such manuals. A book as well as a tool.” Watson, clearly mystified, said: “Let’s try ‘Who is Dorothy Parker?’” (The correct response: “What is The Elements of Style?”)
Even without landing on a Daily Double, Jennings added to his lead. Nearing the end of the game, his winnings reached $20,000, $2,000 ahead of Watson. The second Daily Double was still on the board. Reaching $80,000 was still a possibility.
But then Jennings made a blunder that would no doubt haunt him for years to come. He had control of the board, and the only remaining category with likely Daily Double spots was Legal “E”s. Jennings was certain that it was hidden under either the $1,200 or $1,600 slot, but which one? His theory, widely accepted among Jeopardy aficionados, was that the game would not feature two Daily Doubles on the same board under the same dollar amount. But what was the dollar amount of that first Daily Double? Jennings seemed to recall that it was $1,600, so he asked Trebek for the $1,200 clue in Legal “E”s. It turned out he had it backward. This was a mistake that Watson, with its precise memory, would never have made. The $1,200 clue described the person who carries out the “directions and requests” in a person’s will. Watson won the buzz and answered, “What is executor?” It then proceeded to the clue Jennings should have picked. The space guns went off. Watson had the last Daily Double. The researchers in the room, who understood exactly what this meant, erupted in cheers.
“At that point it was over,” Ferrucci said. “We all knew it.” The machine had triumphed. In the few clues that were left, Rutter and Jennings carried out a battle for second place. In the end, as the computer and the two humans revealed their Final Jeopardy responses to a clue about the author of Dracula, Bram Stoker, Jennings added a postscript on his card: “I, for one, welcome our new computer overlord.”
Watson, despite a few embarrassing gaffes, appeared to be just that, at least in the domain of Jeopardy. It dominated both halves of the double match, reaching a total of $77,147. Jennings finished a distant second, with $24,000, just ahead of Rutter, with $21,600.
The audience filed out of the auditorium. Nighttime had fallen. The lobby, its massive Saarinen windows looking out on snow-blanketed fields, was now decked out for a feast. Waiters circulated with beer and wine, shrimp cocktails, miniature enchiladas, and tiny stalks of asparagus wrapped in steak. The home team had won and the celebration was on, with one caveat: Everyone in the festive crowd was sworn to secrecy until the televised match a month later.
Two days later, Alex Trebek was back home in Los Angeles’ San Fernando Valley. He was unhappy about the exhibition match. His chief complaint was that IBM had unveiled one version of Watson for the practice rounds and then tweaked its strategy for the match. “I think that was wrong of IBM,” he said. “It really pissed me off.” For Trebek, the change was tantamount to upping a car’s octane before a NASCAR race. “IBM didn’t need to do that,” he said. “They probably would have won anyway. But they were scared.” He added that after the match was over, “I felt bad for the guys, because I felt they had been jobbed just a little.” Jennings, while disappointed, said he also had masked certain aspects of his strategy during the practice games and didn’t see why Watson couldn’t do the same. Rutter said that “some gamesmanship was going on. But there’s nothing illegal about that.”
Ferrucci, for his part, said that during practice sessions his team was focused on the technical details of Watson’s operations, making sure, for example, that it was getting the electronic feed after each clue of the correct response. Jennings and Rutter, he said, had already seen Watson hunting for Daily Doubles in the videos of the sparring rounds that they’d received months earlier. “Every respectable Jeopardy player knows how to hunt for them,” he added. Was Watson supposed to play dumb?
Fourteen years earlier, Garry Kasparov had registered a complaint similar to Trebek’s after succumbing to Deep Blue in his epic chess match. He objected to the adjustments that the IBM engineers had made to the program in response to what they were learning about his style of play. These disagreements were rooted in questions about the role of human beings in man-machine matches. It was clear that Watson and Deep Blue were on their own as they played. But did they also need to map out their own game strategies? Was that part of the Grand Challenge? IBM in both cases would say no. Jennings and Rutter, on that Friday afternoon in Yorktown Heights, were in fact playing against an entire team of IBM researchers, and the collective intelligence of those twenty-five Ph.D.s was represented on the stage by a machine.
In that sense, it almost seemed unfair. It certainly did to Trebek, who also complained about Watson’s blazing speed and precision on the buzzer. But consider the history. Only three years earlier, Blue J—as Watson was then known—fared worse on Jeopardy clues than an average twelve-year-old. And no one back then would have thought to complain about its buzzer reflexes, not when the machine struggled for nearly two hours to respond to a single clue. Since then, the engineers had led their computer up a spectacular learning curve—to the point where the former dullard now appeared to have an unfair advantage.
And yet Watson, for all its virtues, was still flawed. Its victory was no sure bet. Through the fall, it lost nearly one of three matches to players a notch or two below Jennings and Rutter. A couple of train wreck categories in the final game could have spelled defeat. Even late in the second game, Jennings could have stormed back. If he had won that last Daily Double, Trebek said, “he could have put significant pressure on Watson.” After the match, Jennings and Rutter stressed that the computer still had cognitive catching up to do. They both agreed that if Jeopardy had been a written test—a measure of knowledge, not speed—they both would have outperformed Watson. “It was its buzzer that killed us,” Rutter said.
Looking back, it was fortunate for IBM that Jeopardy had insisted on building a finger for Watson so that it could press the physical buzzer. This demand ten months earlier had initially irked Ferrucci, who worried that Jeopardy’s executives would continue to call for changes in their push for a balanced match. But if Watson had beaten Jennings and Rutter to the buzz with its original (and faster) electronic signal, the match certainly would have been widely viewed as unfair— just as Harry Friedman and his team had warned all along.
Still, despite Watson’s virtuosity with the buzzer and its remarkable performance on Jeopardy clues, the machine’s education is far from complete. As this question-answering technology expands from its quiz show roots into the rest of our lives, engineers at IBM and elsewhere must sharpen its understanding of contextual language. And they will. Smarter machines will not call Toronto a U.S. city, and they will recognize the word “missing” as the salient fact in any discussion of George Eyser’s leg. Watson represents merely a step in the development of smart machines. Its answering prowess, so formidable on a winter afternoon in 2011, will no doubt seem quaint in a surprisingly short time.
Two months before the match, Ken Jennings sat in the empty Wheel of Fortune studio on the Sony lot, thinking about a world teeming with ever-smarter computers. “It does make me a little sad that a know-it-all like me is not the public utility that he used to be,” he said. “There used to be a guy in every office, and everyone would know which cubicle you would go to find out things. ‘What’s the name of the bassist in that band again?’ Or ‘What’s the movie where . . . ?’ Or ‘Who’s that guy on the TV show . . . he’s got the mustache?’ You always know who the guy to ask is, right?”
I knew how he felt. And it hit me harder after the match, as I made my way from the giddy reception through a long, narrow corridor toward the non-VIP parking lot. Halfway down, in an office strewn with wires and cameras, stood a discouraged Jennings and Rutter. They were waiting to be filmed for their postgame reflections. It had been a long and draining experience for them. What’s more, the entire proceeding had been a tribute to the machine. Even the crowd was pulling for it. “We were the away team,” Jennings said. And in the end, the machine delivered a drubbing.
Yet I couldn’t regret the outcome. I’d come to know and appreciate the other people in this drama, the ones who had devoted four years to building this computer. For them, a loss would have been even more devastating than it was for Jennings and Rutter. And unlike the two Jeopardy stars, the researchers had to worry about what would come next. Following a loss, there would be extraordinary pressure to fine-tune the machine for a rematch. Watson, like Deep Blue, wasn’t likely to retire from the game without winning. The machine could always get smarter. This meant that instead of a deliverance from Jeopardy, the team might be plunged back into it. This time, though, instead of a fun and unprecedented event, it would have the grim feel of a do-or-die revenge match. For everyone concerned, it was time to move on. Ferrucci, his team, and their machine all had other horizons to explore. I did too.
But the time I spent with Watson’s programmers led me to think more than ever about the programming of our own minds. Of course, we’ve had to adapt our knowledge and skills for millennia. Many of us have decided, somewhere along the way, that we don’t need to know how to trap a bear, till a field, carry out long division, or read a map. But now, as Jennings points out, the value of knowledge itself is in flux. In a sense, each of us faces the question that IBM’s Jeopardy team grappled with as they outfitted Watson with gigabytes of data and operating instructions. What makes sense to store up there? And what cognitive work should be farmed out to computers?
The solution, from a purely practical view, is to fine-tune the mind for the jobs and skills in which the Watsons of the world still struggle: the generation of ideas, concepts, art, and humor. Yet even in these areas, the boundaries between humans and their machines are subject to change. When Watson and its kin scour databases to come up with hypotheses, they’re taking a step toward the realm of ideas. And when Watson’s avatar builder, Joshua Davis, creates his works of generative art, who’s to say that the computer doesn’t have a hand in the design? In the end, each of us will calibrate our own blends of intelligence and creativity, looking for more help, as the years pass, from ever-smarter computers.
But just because we’re living with these machines doesn’t mean that we have to program ourselves by their remorseless logic. Our minds, after all, are far more than tools. In the end, some of us may choose to continue hoarding facts. We are curious animals, after all. Beyond that, one purpose of smart machines is to free us up to do the thousand and one things that only humans enjoy, from singing and swimming to falling in love. These are the opportunities that come from belonging to a species—our species—as it gets smarter. It has its upside.