IN THOSE EARLY DAYS of 2007, when Blue J was no more than a conditional promise given to Paul Horn, David Ferrucci harbored two conflicting fears. By nature he was given to worrying, and the first of his nightmare scenarios was perfectly natural: A Jeopardy computer would fail, embarrassing the company and his team.
But his second concern, failure’s diabolical twin, was perhaps even more terrifying. What if IBM spent tens of millions of dollars and devoted centuries of researcher years to this project, played it up in the press, and then, perhaps on the eve of the nationally televised Jeopardy showdown, someone beat them to it? Ferrucci pictured a solitary hacker in a garage, cobbling together free software from the Web and maybe hitching it to Wikipedia and other online sites. What if the Jeopardy challenge turned out to be not too hard but too easy?
That would be worse, far worse, than failure. IBM would become the laughingstock of the tech world, an old-line company completely out of touch with the technology revolution—precisely what its corporate customers paid it billions of dollars to track. Ferrucci’s first order of business was to make sure that this could never happen. “It was due diligence,” he later said.
He had a new researcher on his team, James Fan, a young Chinese American with a fresh doctorate from the University of Texas. As a newcomer, Fan was free of institutional preconceptions about how Q-A systems should work. He had no history with the annual TRec competitions or IBM’s Piquant system. Trim and soft-spoken, his new IBM badge hanging around his neck, Fan was an outsider. Unlike most of the team, based in New York or its suburbs, Fan lived with his parents in Parsippany, New Jersey, some seventy miles away. He was the closest thing Ferrucci had to a solitary hacker in a garage.
Fan, who emigrated as an eighteen-year-old from Shanghai to study at the University of Iowa and later Texas, had focused his graduate work on teaching machines to come to grips with our imprecise language. His system would help them understand, for example, that in certain contexts the symbol H2O might represent a single molecule of water while in others it could refer to the sloshing contents of Lake Michigan. This expertise might eventually help teach a machine to understand Jeopardy clues and to hunt down answers. But it hardly prepared him for the job he now faced: building a Jeopardy computer all by himself. His system would be known as Basement Baseline.
As Fan undertook his assignment, Ferrucci ordered his small Q-A team to adapt their own system to the challenge, and he would pit the two systems against each other. Ferrucci called this “a bake-off.” The inside team would use the Piquant technology developed at IBM while the outside team, consisting solely of James Fan, would scour the entire world for the data and software to jury-rig a bionic Jeopardy player. They each had four weeks and a set of five hundred Jeopardy clues to train on. Would either system be able to identify the parent bird of the roasted young squab (What is a pigeon?) or the sausage celebrated every year since 1953 in Sheboygan, Wisconsin (What is bratwurst?)? If so, would either have enough confidence in its answers to bet on them?
Ferrucci suspected at the time that his solitary hacker would come up with ideas that might prove useful. The bake-off, he said, would also send a message to the rest of the team that a Jeopardy challenge would require reaching outside the company for new ideas and approaches. He wanted to subject everyone to Darwinian pressures. The point was “to have technologies competing,” he said. “If somebody’s not getting it done, if he’s stuck, we’re going to take code away from him and give it to someone else.” This, he added, was “horrific for researchers.” Those lines of software may have taken months or even years to develop. They contained the researcher’s ideas and insights reduced to a mathematical elegance. They were destined for greatness, perhaps coder immortality. And one day they could be ripped away and given to a colleague—a competitor, essentially—who might make better use of them. Not everyone appreciated this. “One guy went to his manager,” Ferrucci said, “and said that the bake-off was ‘bad for morale.’ I said, ‘Welcome to the WORLD!’”
So on a February day in 2007, James Fan set out to program a Q-A machine all by himself. He was relatively isolated in a second-floor office while the rest of Ferrucci’s team mingled on the first floor. He would continue to run into them in the cafeteria, and they would attend meetings together. After all, they were colleagues, each one of them engaged in a venture that many in the company viewed as hopeless. “I was the most optimistic member of the team,” Fan later said, “and I was thinking, ‘We can make a decent showing.’” As he saw it, “decent” meant losing to human champions but nailing a few questions and ending up with a positive score.
Fan started by drawing up an inventory of the software tools and reference documents he thought he’d need for his machine. First would be a so-called type system. This would help the computer figure out if it was looking for a person, place, animal, or thing. After all, if it didn’t know what it was looking for, finding an answer was little more than a crapshoot; generating enough “confidence” to bet on that answer would be impossible. The computer would be lost.
For humans, distinguishing President George Washington from the bridge named after him wasn’t much of a challenge. Context made it clear. Bridges didn’t deliver inaugural addresses; presidents were rarely jammed at rush hour, with half-hour delays from New Jersey. What’s more, when placed in sentences, people usually behaved differently than roads or bridges.
But what was simple for us involved hard work for a Q-A computer. It had to comb through the structure of the question, picking out the subjects, objects, and prepositions. Then it had to consult exhaustive reference lists that had been built up in the industry over decades, laying out hundreds of thousands of places, things, and actions and the web of relationships among them. These were known as “ontologies.” Think of them as cheat sheets for computers. If a finger was a subject, for example, it fell into human anatomy and was related to the hand and the thumb and to verbs such as “to point” and “to pluck.” (Conversely, when “the finger” turned up as the object of the verb “to give,” a sophisticated ontology might steer the computer toward the neighborhood of insults, gestures, and obscenities.)
In any case, Fan needed both a type system and a knowledge base to understand questions and hunt for answers. He didn’t have either, so he took a hacker’s shortcut and used Google and Wikipedia. (While the true Jeopardy computer would have to store its knowledge in its “head,” prototypes like Fan’s were free to search the Web.) From time to time, Fan found, if he typed a clue into Google, it led him to a Wikipedia page—and the subject of the page turned out to be the answer. The following clue, for example, would confound even the most linguistically adept computer. In the category The Author Twitters, it reads: “Czech out my short story ‘A Hunger Artist’! Tweet done. Max Brod, pls burn my laptop.” A good human Jeopardy player would see past the crazy syntax, quickly recognizing the short story as one written by Franz Kafka, along with a reference to Kafka’s Czech nationality and his longtime associate Max Brod.
In the same way, a search engine would zero in on those helpful key words and pay scant attention to the sentence surrounding them. When Fan typed the clue into Google, the first Wikipedia page that popped up was “Franz Kafka,” the correct answer. This was a primitive method. And Fan knew that a computer relying on it would botch the great majority of Jeopardy clues. It would be crashing and burning in the game against even ignorant humans, let alone Ken Jennings. But one or two times out of ten, it worked. For Fan, it was a start.
The month passed. Fan added more features to Basement Baseline. But at the end, the system was still missing vital components. Most important, it had no mechanism for gauging its level of confidence in its answers. “I didn’t have time to build one,” Fan said. This meant the computer didn’t know what it knew. In a game, it wouldn’t have any idea when to buzz. Fan could conceivably have programmed it with simple rules. It could be instructed to buzz all the time—a serious money loser, considering it flubbed two clues for every one it got right. Or he could have programmed it to buzz in every category in which it got the first clue right. That would signal that it was oriented to the category. But his machine didn’t have any way to learn that its response was right or wrong. It lacked a feedback loop. In the end, Fan blew off game strategy entirely and focused simply on building a machine that could answer Jeopardy clues.
It soon became clear that the bake-off, beyond a test of technologies, also amounted to a theater production staged by David Ferrucci. It was tied to inside politics. Ferrucci didn’t believe that the Piquant platform could ever be adapted to Jeopardy. It wasn’t big or robust enough. Yet there were expectations within the company that Piquant, which represented more than twenty researcher years, would play an important role. To build the far bigger machine he envisioned, Ferrucci needed to free himself, and the project, from the old guard’s legacy. For this, Piquant had to fail. He didn’t spell this out. But he certainly didn’t give the team the guidance, or the time, to overhaul the system. So besides training the machine on five hundred Jeopardy clues and teaching it to answer them in the form of questions, the Piquant team left the system largely unchanged. “You could have guessed from the outset that the success rate was not going to be very high,” said Jennifer Chu-Carroll, a member of the team. Piquant was being led to a public execution.
The bake-off took place on a March morning at the Hawthorne lab. The results, from Ferrucci’s perspective, were ideal. The Piquant system succeeded on only 30 percent of the clues, far below the level needed for Jeopardy. It had high confidence on only 5 percent of them, and of those it got only 47 percent right. Fan’s Basement Baseline fared almost as well by a number of measures but was still woefully short of what was needed. Fan proved that a hacker’s concoction was far from Jeopardy standards—which was a relief. But by nearly matching the company’s state-of-the-art in Q-A technology, he highlighted its inadequacies.
The Jeopardy challenge, it was clear, would require another program, another technology platform, and a far bolder approach. Ferrucci wouldn’t hesitate to lift algorithms and ideas from both Piquant and Basement Baseline, but the project demanded far more than a recasting of IBM technologies. It was too big for a single company, even one as burly as IBM. The Blue J machine, Ferrucci said, would need “the most sophisticated intelligence architecture the world has ever seen.” For this, the Jeopardy team would have to reach out to the universities doing the most exciting work in AI, including MIT, Carnegie Mellon, and the University of Texas. “We needed all the brains we could get behind this project,” he said.
Back in the late ’70s, when he was commuting from the Bronx to his high school in suburban New Rochelle, Ferrucci and his best friend at the time, Tony Marciano, had an idea for a new type of machine. They called it “a reverse-dictionary.” The idea, Ferrucci said, was to build a machine that could find elusive words. “You know how it is when you want to express something, but you can’t think of the right word for it? A dictionary doesn’t help at all, because you don’t know what to look up. We wanted to build the machine that would give you the word.” This was before they’d ever seen a computer. “We were thinking of a mechanical thing.”
It sounded like a thesaurus. But Ferrucci bridled a bit at the suggestion that his dream machine had existed for centuries as a book. “No, you don’t give it the synonyms, just the definition,” he said. “Basically we were scratching this idea that the computer could understand your meaning, your words, your definitions, and could come up with the word.”
Ferrucci was a hot shot at science at Iona Grammar School, a Catholic boys school. He and Marciano—who, according to Ferrucci, “did calculus on his cuff links”—regarded even their own brains as machines. Marciano, for example, had the idea that devoting brain space to memory storage was wasteful. It distracted neurons from the more important work of processing ideas. So when people asked him questions requiring recall, he would respond, “Ask Dave. He’s willing to use memory.”
Ferrucci’s father, Antonio, had come to the United States from Italy after the Second World War. He had studied some law and engineering in the dying days of Mussolini’s regime, but he arrived in New York without a profession and ended up driving trucks and working in construction. He and Ferrucci’s mother, Connie, wanted their son to be a doctor. One summer during high school, Ferrucci had planned just to hang around with his friends and “play.” His father wouldn’t stand for it. “He’d gotten something in the mail about a math and computer course at Iona College. He says, ‘You’ve got the grades, why don’t you sign up for that?’”
At Iona, Ferrucci came face-to-face with his first computer. It featured a hulking cathode ray tube with a black screen and processed data encoded on teletype. He fell for it immediately. “Here was a machine,” he said. “You told it to do stuff, and it did what you told it. I thought, ‘This is big.’ I called up Tony Marciano, and I said, ‘You get your butt over here, into this room at Iona College. You’ve got to see this machine.’”
Marciano, who later studied computer science and went on to become a finance professor at New York University’s Stern Business School, met Ferrucci later that afternoon. The two of them stayed long into the evening, paging through a single manual, trying out programs on the computer and getting the machine to spit out differential equations. At that point, Ferrucci knew that he wanted to work with computers. However, he didn’t consider it a stand-alone career. A computer was a tool, as he saw it, not a destination. Anyway, he was going to be a doctor.
He went on to Manhattan College, a small Catholic school that was actually in The Bronx, a few miles north of Manhattan. There he followed the pre-med track as a biology major and took computer science on the side. “I did a bunch of programming for the physiology lab,” he said. “Everything I did in biology I kept relating to computers.” The way technology was advancing, it seemed, there had to be a place for computers in medicine.
One night, Ferrucci was taking a practice exam in a course for the MCATs, the Medical College Admission Tests. “I was with all my pre-med friends,” he said. “This is midway through the course. The proctor says, ‘Open to page 10 and start taking the sample chemistry test.’ I opened it up and I started doing the questions, and all of a sudden I said, ‘You know what? I’m not going to be a doctor!’ And I closed the test and I went up to the proctor and I said, ‘I’m quitting. I don’t want to be a doctor.’ He said, ‘You’re not going to get your $500 back.’ I said, ‘Whatever.’”
Ferrucci left the building and made two phone calls. He dialed the easier one first, telling his girlfriend that he’d just walked out of the MCAT class and was giving up on medicine. Then he called his father. “That was a hard call to make,” Ferrucci said. “He was very upset in the beginning.”
His MCAT insight, while steering him away from medicine, didn’t put him on another clear path. He still didn’t know what to do. “I started looking for graduate programs in physiology that had a strong computing component,” he said. “After about a week or two of that, I suddenly said, ‘Wait a minute.’” He called this his “second-level epiphany.” He asked himself why he was avoiding the obvious. “I was really interested in the computer stuff,” he said, “not the physiology. So I’d have to make a complete break.” He applied to graduate school in computer science and went upstate, to Rensselaer Polytechnic Institute (RPI), in Troy, New York.
In his first stint at IBM Research, between getting his master’s and his doctorate at RPI, Ferrucci delved into AI. By that time, in the late ’80s, the industry had split into two factions. While some scientists still pursued the initial goal of thinking machines, or general intelligence, others looked for more focused applications that could handle real jobs (and justify the research). The king of “narrow AI,” and Ferrucci’s focus, was the expert system. The idea was to develop smart software for a specific industry. A program designed, say, for the travel industry could answer questions about Disneyland or Paris, find cheap flights, and book hotels. These specialists wouldn’t have to puzzle out the context of people’s conversations. The focus of their domains would make it clear. For that electronic expert in travel, for example, “room” would mean only one thing. The computer wouldn’t have to concern itself with “room” in the backseat of a Cadillac or “room” to explore in the undergraduate curriculum at Bryn Mawr. If it were asked about such things, it would draw a blank. Computers that lacked range and flexibility were known as brittle. The one-trick ponies seen as expert systems almost defined the term. Many in the industry didn’t consider them AI at all. They certainly didn’t think or act like people.
To build a more ambitious-thinking machine, some looked to the architecture of the human brain. Indeed, while Ferrucci was grappling with expert systems, other researchers were piecing together an altogether different species of program, called “neural networks.” The idea had been bouncing around at least since 1948, when Alan Turing outlined it in a paper called “Intelligent Machinery.” Like much of his thinking, Turing’s paper was largely theoretical. Computers in his day, with vacuum tubes switching the current on and off, were too primitive to handle such work. (He died in 1954, the year that Texas Instruments produced the first silicon transistor.) However, by the ’80s, computers were up to the job. Based on rudimentary models of neurons, these networks analyzed the behavior of complex systems, such as financial markets and global weather, and used statistical analysis to predict how they would behave over time.
A neural network functioned a bit like a chorus. Picture a sing-along concert of Handel’s Messiah in Carnegie Hall. Some five thousand people show up, each one wearing a microphone. You play the music over loudspeakers and distribute musical scores. That’s the data input. Most of the people start singing while others merely hum or chat with their neighbors. In a neural net, the learning algorithm picks out the neurons that appear to be replicating the pattern, and it gives them more sway. This would be like turning up the microphones of the people who are singing well, turning down the mikes of those who sing a tad off key—and shutting out the chatterers altogether. The net focuses not only on the individuals but on the connections among them. In this analogy, perhaps the singers start to pay attention to one another and organize, the tenors in one section, sopranos in another. By the end of a long training process, the Carnegie Hall network both interprets the data and develops an expertise in Handel’s motifs and musical structure. The next week, when the music switches to Gershwin, new patterns emerge. Some of the chatterers, whose mikes were turned off, become stars. With time, this assemblage can identify new pieces of music, recognizing similar themes and variations. And the group might even set off an alarm if the director gets confused and starts playing Vivaldi instead of Handel.
Neural networks learned, and even evolved. In that sense, they crudely mimicked the human brain. People driving cars, for example, grow to respond to different patterns—the movement of traffic, the interplay between the wheel and the accelerator—often without thinking. These flows are reflected by neural connections in the brain, lots of them working in parallel. They’re reinforced every time an experience proves their usefulness. But a change, perhaps a glimpse of a cyclist riding against traffic, snaps them from their reverie. In much the same way, neural networks became very good at spotting anomalies. Credit card companies began to use them to note unexpected behavior—an apparent teetotaler buying $500 of Finnish vodka or a frugal Nebraskan renting luxury suites in Singapore. Various industries, meanwhile, used neural networks to look ahead. As long as the future stayed true to the past—not always a safe assumption, as any mortgage banker can attest—they could make solid predictions.
Unlike the brittle expert systems, neural networks were supple. They specialized in pattern detection, not a series of if/then commands. They never choked on changes in the data but simply adjusted. While expert systems processed data sequentially, as if following a recipe, the electronic neurons crunched in unison—in parallel. Their weakness? Since these collections of artificial neurons learned by themselves, it was nearly impossible to figure out how they reached their conclusions or to understand what they were picking up about the world. A neural net was a black box.
By the time Ferrucci returned to IBM Research, in 1995, he was looking beyond expert systems and neural nets. In his spare time, he and a colleague from RPI, Selmer Bringsjord, were building a machine called Brutus, which wrote fiction. And they were writing a book about their machine, Artificial Intelligence and Literary Creativity. Brutus, they wrote, is “utterly devoid of emotion, but he nonetheless seems to have within his reach things that touch not only our minds, but our heart.”
The idea for the program, Ferrucci later said, came when Bringsjord asked him if a machine could create its own story line. Ferrucci took up the challenge. Instead of teaching the machine to dream up plots, he programmed it with about a dozen themes, from betrayal to revenge. For each theme, the machine was first given a series of literary examples and then a program to develop stories along those lines. One of its models for betrayal was Shakespeare’s Julius Caesar (the program was named for Caesar’s confidant-turned-conspirer, Brutus). The program produced serviceable plots, but they were less than riveting. “The one thing it couldn’t do is figure out if something was interesting,” Ferrucci said. “Machines don’t understand that.”
In his day job, Ferrucci was teaching computers more practical lessons. As head of Semantic Analysis and Integration at IBM, he was trying to instruct them to make sense of human communication. On the Internet, records of our words and activities were proliferating as never before. Companies—IBM and its customers alike—needed tools to interpret these new streams of information and put them to work. Ideally, an IBM program would tell a manager what customers or employees were saying or thinking as well as what trends and insights to draw from them and perhaps what decisions to make.
Within IBM itself, some two hundred researchers were developing a host of technologies to mine what humans were writing and saying. But each one operated within its own specialty. Some parsed sentences, analyzing the grammar and vocabulary. Others hunted Google-style for keywords and Web links. Some constructed massive databases and ontologies to organize this knowledge. A number of them continued to hone expert systems and neural networks. Meanwhile, the members of the Q-A team coached their computer for the annual TRec competitions. “We had lots of different pockets of researchers working on these different analytical algorithms,” Ferrucci said. “But any time you wanted to combine them, you had a problem.” There was simply no good way to do it.
In the early 2000s, Ferrucci and his team put together a system to unify these diverse technologies. It was called UIMA, Unstructured Information Management Architecture. It was tempting to think of UIMA as a single brain and all of the different specialties, from semantic analysis to fact-checking, as cognitive regions. But Ferrucci maintained that UIMA had no intelligence of its own. “It was just plumbing,” he said. Idle plumbing, in fact, because for years it went largely unused.
But a Jeopardy project, he realized, could provide a starring role for UIMA. Blue J would be more than a single machine. His team would pull together an entire conglomeration of Q-A approaches. The machine would house dozens, even hundreds of algorithms, each with its own specialty, all of them chasing down answers at the same time. A couple of the jury-rigged algorithms that James Fan had ginned up could do their thing. They would compete with others. Those that delivered good answers for different types of questions would rise in the results—a bit like the best singers in the Handel sing-along. As each one amassed its record, it would gain stature in its specialty and be deemed clueless in others. Loser algorithms—those that failed to produce good results in even a single niche—would be ignored and eventually removed. (Each one would have to prove its worth in at least one area to justify its inclusion.) As the system learned which algorithms to pay attention to, it would grow smarter. Blue J would evolve into an ecosystem in which the key to survival, for each of the algorithms, would be to contribute to correct responses to Jeopardy clues.
While part of his team grappled with Blue J’s architecture, Ferrucci had several researchers trolling the Internet for Jeopardy data. If this system was going to compete with humans in the game, it would require two types of information. First, it needed Jeopardy clues, thousands of them. This would be the machine’s study guide—what those in the field of machine learning called a training set. A human player might watch a few Jeopardy shows to get a feel for the types of clues and then take some time to study country capitals or brush up on Shakespeare. The computer would do the same work statistically. Each Jeopardy clue, of course, was unique and would never be repeated, so it wasn’t a question of learning the answers. But a training set would orient the researchers. Given thousands of clues, IBM programmers could see what percentage of them dealt with geography, U.S. presidents, words in a foreign language, soap operas, and hundreds of other categories—and how much detail the computer would need for each. The clue asking which presidential candidate carried New York State in 1948, for example (“Who is Thomas Dewey?”), indicated that the computer would have to keep track of White House losers as well as winners. What were the odds of a presidential loser popping up in a clue?
Digging through the training set, researchers could also rank various categories of puzzles and word games. They could calculate the odds that a Jeopardy match would include a puzzling Before & After, asking, for example, about the “Kill Bill star who played 11 seasons behind the plate for the New York Yankees” (“Who is Uma Thurman Munson?”). A rich training set would give them a chance to scrutinize the language in Jeopardy clues, including abbreviations, slang, and foreign words. If the machine didn’t recognize AKA as “also known as” or “oops!” as a misunderstanding, if it didn’t recognize “sayonara,” “au revoir,” “auf Wiedersehen,” and hundreds of other expressions, it could kiss entire Jeopardy categories goodbye. Without a good training set, researchers might be filling the brain of their bionic student with the wrong information.
Second, and nearly as important, they needed data on the performance of past Jeopardy champs. How often did they get the questions right? How long did they take to buzz in? What were their betting strategies in Double Jeopardy and Final Jeopardy? These humans were the competition, and their performance became the benchmark for Blue J.
In the end, it didn’t take a team of sleuths to track down much of this data. With a simple Internet search, they found a Web site called J! Archive, a trove of historical Jeopardy data. A labor of love by Jeopardy fans, the site detailed every game in the show’s history, with the clues, the contestants, their answers—and even the comments by Alex Trebek. Here were more than 180,000 clues, hundreds of categories, and the performance of thousands of players, from first-time losers to champions like Brad Rutter and Ken Jennings.
In these early days, the researchers focused only on Jennings. He was the gold standard. And with records of his seventy-four games—more than four times as many as any other champion—they could study his patterns, his strengths and vulnerabilities. They designed a chart, the Jennings Arc, to map his performance: the percentage of questions on which he won the buzz and his precision on those questions. Each of his games was represented by a dot, and the best ones, with high buzz and high accuracy, floated high on the chart to the extreme right. His precision averaged 92 percent and occasionally reached 100 percent. He routinely dominated the buzz, on one game answering an astounding 75 percent of the clues. For each of these games, the IBM team calculated how well a competitor would have to perform to beat him. The numbers varied, but it was clear that their machine would need to win the buzz at least half the time, get about nine of ten right—and also win its share of Daily Doubles.
In the early summer of 2007, after the bake-off, the Jeopardy team marked the performance of the Piquant system on the Jennings Arc. (Basement Baseline, which lacked a confidence gauge, did not produce enough data to be charted there.) Piquant’s performance was so far down and to the left of Ken Jennings’s dots, it appeared to be . . . well, exactly what it was: an alien species—and not destined for Jeopardy greatness.
When word of this performance spread around the Yorktown labs, it only fueled the concerns that Ferrucci’s team was heading for an embarrassing fall—if it ever got that far. Mark Wegman, then the head of computer science at IBM Research, described himself as someone who’s “usually wildly optimistic about technology.” But when he saw the initial numbers, he said, “I thought there was a 10 percent chance that in five years we could pull it off.”
For Ferrucci, Piquant’s failure was anything but discouraging. It gave him the impetus to march ahead on a different path, toward Blue J. “This was a chance to do something really, really big,” he said. However, he wasn’t sure his team would see it this way. So he gathered the group of twelve in a small meeting room at the Hawthorne labs. He started by describing the challenges ahead. It would be a three- to five-year project, similar in length to a military deployment. It would be intense, and it could be disastrous. But at the same time they had an opportunity to do something memorable. “We could sit here writing papers for the next five years,” he said, “or we build an entirely new type of computer.” He introduced, briefly, a nugget of realpolitik. There would be no other opportunities for them in Q-A technologies within IBM. He had effectively engineered a land grab, putting every related resource into his Jeopardy ecosystem. If they wanted to do this kind of science, he said, “this was the only place to be.”
Then he went around the room with a simple question: “Are you in or are you out?”
One by one, the researchers said yes. But their response was not encouraging. The consensus was that they could build a machine that could compete—but probably not beat—a human champion. “We thought it could earn positive money before getting to Final Jeopardy,” said Chu-Carroll, one of the only holdovers on Ferrucci’s team from the old TRec unit. “At least we wouldn’t be kicked off the stage.”
With this less than ringing endorsement, Ferrucci sent word to Paul Horn that the Jeopardy challenge was on. He promised to have a machine, within twenty-four months, that could compete against average human players. Within thirty-six to forty-eight months, his machine, he said, would beat champions one-quarter of the time. And within five to seven years, the Jeopardy machine would be “virtually unbeatable.” He added that this final goal might not be worth pursuing. “It is more useful,” he said, “to create a system that is less than perfect but easily adapted to new areas.” A week later, Ferrucci and a small team from IBM Research flew to Culver City, to the Robert Young Building on the Sony lot. There they’d see whether Harry Friedman would agree to let the yet-to-be-built Blue J play Jeopardy on national television.