Luis von Ahn, a PhD candidate in computer science at Carnegie Mellon, was on an airplane in late 2001 when he noticed each passenger in his row, all six of them, pen in hand, scribbling answers to a crossword puzzle. Von Ahn, an avid gamer and TV addict (he loves reality shows) mulled all the effort and intellectual intensity his fellow travelers were pouring into a simple game. Computers can’t do that, he thought. (Later he would find out they could.) Here he was twenty thousand feet in the air and six passengers, by their own volition, were tackling word problems. Why couldn’t he get people to solve every problem, willingly, and for free? What if, he wondered, he could harness all this energy for practical purposes?
His ruminations led to his PhD dissertation, in which he coined the term “human computation,” aka “a way to combine human brainpower with computers to solve large-scale problems that neither can solve alone.” While computers are good at many things, they are also bad at many things. Want a machine to crunch an algorithm? A PC does a fine job. Command it to identify Brad Pitt and Angelina Jolie in a photo as they pose on the red carpet at the Oscars, and it’ll choke on its processors. A human, however, can do this fairly easily.
What’s more, under the right circumstances, humans could teach computers to handle the types of tasks they’re normally unable to decode, such as classifying objects, comprehending spoken words, deciphering handwriting, recognizing faces—tasks a child can master. As any parent knows, you need almost Gandhi-like patience to teach young children even though we’re biologically hardwired to do it. Few have the patience to answer almost illimitable niggling questions with the aim of helping some bloodless machine get smarter.
Von Ahn knew to accomplish all this would require vast numbers of people. The grandest project of antiquity, the pyramids of Egypt, took tens of thousands of men and more than twenty years to build. In the last century, laborers digging the Panama Canal clocked twenty million hours, while those raising the Empire State Building tallied seven million hours. The tallest building in the world, Burj Khalifa in Dubai, which was completed in 2010, racked up twenty-two million worker-hours over six years of construction. Nevertheless, with the advent of the Internet, von Ahn recognized the potential to harness the collective skills of half a billion people or more. After all, online collective behavior on such a mass scale already exists, made possible by the Internet. Computer solitaire, for instance, eats up billions of worker hours a year. But other than kill time and entertain its players, what tangible good does this activity do? Tens of millions play the massively popular multiplayer online role-playing game World of Warcraft, while millions more have joined online social networks. Every month Americans collectively spend the equivalent of one hundred thousand years on Facebook. Von Ahn figured if he could conjure a system to harness this engagement in a way that could make boring chores fun, he would be on his way to his ultimate goal: to make all of humanity more efficient by exploiting the time that gets wasted.
I traveled to Pittsburgh to meet von Ahn in 2011 in his office on the seventh floor of Carnegie Mellon’s Gates Center, a visually alluring, zinc-coated building that evokes dominos and Lego pieces (it received an award from the American Institute of Architects). Von Ahn, a former MacArthur “genius” grant winner born in Guatemala, talks in quick mumbles, as if his words can’t keep pace with his thoughts. His parents are doctors and his extended family owns a candy factory. When he was eight he asked for a Nintendo for Christmas, but his parents got him a Commodore 64, which he used to play games. He first taught himself programming by cracking the copyright protection of games he wanted to pirate.
Von Ahn has a novel approach to work. He’ll sit down at his computer for a few minutes then jump up, pace, fidget, settle down again to type a word or phrase, get up, stroll around, then repeat the process. Somehow he gets a lot done. It occurs to me if floor panels could capture the kinetic energy von Ahn creates on his meanderings and use it to power the lights and electronics in his office, that would be the ultimate efficiency.
The idea of transforming what has been typically viewed as unproductive time and creating something productive out of it is an extension of a trend that has long been unfolding. We constantly wring productivity from traditionally unproductive times. We text while traipsing down the street, update our blogs while careening in a taxi through urban jungles, post a photo to Facebook at a diner, check in on Foursquare at the pharmacy, read e-mail at the mall, tweet before the start of a movie. This helps us squeeze a minute or two of work and social interaction into an already crammed day. Naturally this ADD culture that technology enables is not always beneficial to the human condition. There was, for instance, the walking and texting Staten Island teen who fell into an open manhole and despite being covered in raw sewage and losing a shoe, never let go of her phone. The desire for stimulation is an integral part of our technoevolution.
But von Ahn knew that there are very few activities that capture our attention. So to convince people to accomplish a task you give them, you have to make it fun. With this in mind, von Ahn designed the ESP Game. Its purpose: to label images so they could be searchable online. For this to happen, you need metadata, words tagged to photos that describe what’s in them. Think of metadata as a Dewey decimal system for categorizing images instead of books. Google’s search engine can’t sift through photos and tell you what’s in them; it gloms on to whatever tags of text are associated with them, and returns results based on what it finds. His inspiration was HotorNot, a site that allowed users to rate the relative hotness of people who submitted photos on a scale of one to ten. Almost immediately after it went live in October 2000 the site went viral. Within a week it had logged two million page views and in the span of a few months was one of the most popular destinations on the Internet.
Here’s how von Ahn designed his game: two randomly selected players logged on to a Web site and were flashed the same photograph. They had two and a half minutes to label fifteen images by typing words each believed best described what was in the picture. Players tallied points only when their words matched their partner’s, then they were rewarded with another picture. Because the only way to tally points was for both parties to agree on a word, they had to enter reasonable and accurate labels to have any chance of scoring. The same images were used repeatedly to ensure the accuracy of the labels different sets of players suggested. Over time the game became more challenging as images garnered “taboo” words—descriptions that couldn’t be used again once they were associated with a specific picture. These were typically the most obvious descriptors, the low-hanging fruit, so participants were forced to be more creative and specific.
For example the first time the image popped up in the game it would have a clean slate. Player one might type “Brad Pitt, tuxedo, red carpet, Oscars,” while player two could opt for “Angelina Jolie, Versace dress, Academy Awards, Ferragamo shoes, Brad Pitt.” As soon as the players settle on “Brad Pitt” they receive points and move on to the next picture. When that photo of Brad and Angie comes up again to new people, the players could not use “Brad Pitt.” Others might earn points for agreeing on “Angelina Jolie” and then her name would also become taboo. The third set of players might settle on “Oscars,” and it, too, turns taboo; the fourth, “red carpet,” and so forth. Von Ahn designed the system to fact-check, so images that generated a list of descriptors would be stripped of all of its taboo words and start a fresh round with new sets of players. And each set of players was shown the photograph only once.
As games go, the ESP Game wasn’t as much fun as, say, Angry Birds, Myst, or Doom. Nevertheless, the game was designed to make it engaging, even provocative. In his initial prototype, von Ahn didn’t even include a points system. Players simply tried to agree on a word that best described an image. It was so bare-bones he couldn’t imagine anyone finding it enjoyable, yet each time he played it he’d race through thirty or forty images. Still, to make it “super-duper fun,” he decided it needed to be more gamelike, so he added points. That made it more enjoyable, but he knew that wasn’t enough, so he added a timer and gave players two and a half minutes to agree on as many photos as possible. That revved up the action. But he noticed the game became noticeably less engaging when player after player chose humdrum words such as “man” or “woman” to describe a person in a photo. So his final innovation involved adding obstacles in the form of those taboo words. When von Ahn decided to disallow such basic characterizations, not only did the game get more fun, users’ descriptions became more sophisticated.
Starting with fifty thousand photos he downloaded off the Internet, von Ahn posted the game to his Web site. His only marketing consisted of telling friends, who told their friends, and so on, until half a million people had played it within the first six weeks and the ESP Game was featured on CNN. It proved surprisingly addictive. Some players stuck to it for hours, labeling image after image and e-mailing von Ahn to complain if a glitch interrupted their marathon sessions. Based on user habits, he continually tweaked it. A leaderboard motivated the top players, von Ahn found, but those with little chance of cracking the top twenty—usually the most recent devotees—were discouraged, so he added a second leaderboard to track the day’s high scores, which helped.
After word of the ESP Game spread, von Ahn was invited to give a talk at Google. Afterward company founders Sergey Brin and Larry Page approached him about licensing the game, which they did, renaming it Google Image Labeler. Until von Ahn’s game came along, Google had been forced to depend on the person uploading the photo to its image database to provide metatags. With von Ahn’s game engine, however, Google could confirm the validity of these metatags and add other terms, which made photos more searchable and created more pathways to each photo. That in turn generated more page views and revenue. From 2006 to 2011 the game helped expand and improve Google’s image search database to the tune of millions upon millions of photos. (A Google press representative said the company couldn’t share specific numbers for Image Labeler.)
Now a tenured Carnegie Mellon computer science professor who drives a Porsche, von Ahn is but one of legions of computer scientists, educators, entrepreneurs, game designers, marketers, media organizations, start-ups, corporations, and city governments that have been pushing game mechanics beyond simple entertainment and layering them into all aspects of our lives. And von Ahn is on to perhaps his most grandiose project, which started with a simple question he posed to one of his graduate students: “How can we get 100 million people to translate the Internet into every major language for free?”
He traces the germ of this idea to another invention he came up with as a graduate student called CAPTCHA, which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. (Alan Turing was a computer scientist who in 1950 invented a test to analyze whether a machine could pass for a human.) Yahoo! had come to Carnegie Mellon and asked von Ahn’s adviser if there was any way to stamp out online fraud. Fraudsters were deploying armies of spambots to automatically register e-mail accounts on a massive scale, and the company needed to do something about it. Von Ahn’s solution was ingeniously simple. He came up with a system to create numbers and letters that would be fuzzy enough so that a machine couldn’t read them but a human could. Ever since, people have cursed him for it. Nevertheless CAPTCHA works and is used on millions of sites.
One day, von Ahn learned that roughly two hundred million CAPTCHAs were being typed every day. If it took the average person ten seconds to complete one, then he calculated that humanity as a whole was wasting half a million hours every day typing these annoying numbers and letters. This prompted him to come up with reCAPTCHA, which was the same premise as CAPTCHA only the material came from old books. It was a way to take an act that was unproductive and derive something from it on a mass scale.
To scan an old book and digitize the contents is a laborious process, akin to snapping a photograph of every page. Then it’s up to a computer, using optical character recognition (OCR), to decipher each word. The process often results in plenty of mangled text. For older books, those published more than fifty years ago, pages have often yellowed and the ink has faded, which leads to an error rate as high as 30 percent. Von Ahn is taking words the computer can’t recognize and getting people typing reCAPTCHAs to recognize them for him. He offers two words because one comes from a book, which the computer doesn’t recognize, and the other is a word the computer already knows. The system doesn’t tell the user which is which. If she types the correct word that the computer knows the answer to, it will assume she is human and have some confidence she typed the other word properly, too. If ten people agree, then the system has successfully edited another word.
For his new project, von Ahn decided he would borrow concepts from reCAPTCHA and harness an army of helpers to help translate the entire Internet. They would be paid—if that’s what you call it—with free language lessons.
On its face, it might sound preposterous. The indexed Web has 15 billion pages. But von Ahn is serious. ReCAPTCHA shows that it’s possible to organize truly massive numbers of users. Facebook, Ticketmaster, and 350,000 other sites deploy reCAPTCHA, which works out to about 100 million words a day from 2.5 million books that are, one word at a time, being cleaned up. About 750 million people have typed at least one reCAPTCHA. That means that about 10 percent of the world’s population has helped to digitize the world’s knowledge.
By capturing a million users for his new project, which is a fraction of the number of people who solve reCAPTCHAS, von Ahn calculated it would take just eighty hours to translate the entirety of Wikipedia from English to Spanish, which, if he had to pay for translators, would run a cool $50 million. Assisting is a National Science Foundation grant for $460,000 and $3.3 million in venture backing from celebrity Ashton Kutcher and Union Square Ventures for a private company called Duolingo, named after the language game cum Web translation tool developed with one of his PhD students, Severin Hacker (yes, his real name. He’s Swiss.)
In 2009, Hacker and von Ahn were discussing ways to repurpose user activities for translating material. You couldn’t just gather every person who spoke a second language and try to get them to work for you or identify every translator. It wasn’t economically viable, first of all. You certainly couldn’t pay them. And you couldn’t ask them to work for free, plus how would you organize them? The obvious solution was to mimic reCAPTCHA’s model of repurposing work in such a way that while a person did something, she was actually doing something else—and not only that, also enjoying herself.
But how? Translation is not exciting for most people, and only a tiny minority would claim to be passionate about it. But if their goal was to translate the entire Web, they would need to enlist millions of people. It was logical to look at education, and they knew they had something when they found out there was tremendous demand for language learning. In fact, they learned that 1.2 billion people worldwide are learning a foreign language. And one proven method of learning a language is through translation.
What they came up with was a language-learning game available free over the Web that doubles as a crowdsourced text translation service called Duolingo. From the perspective of the user, it’s simple. Sign up and choose a language. For English speakers it’s available in French, German, Italian, Portuguese, and Spanish; for Italian, Portuguese, and Spanish speakers, there’s only English (so far). It doesn’t matter if you have never spoken a particular language before. You click on a lesson and receive a quick tutorial, a series of tips, and frequently asked questions. For each lesson you receive three hearts. Make a mistake and you lose one. Four mistakes and you must repeat the lesson.
Each “skill” contains several exercises. On the left side of the screen you see a phrase in Spanish that you can read (or listen to). Your task is to translate it and type the correct words into a box on the right. For example, you might see “No quiero una bicicleta porque quiero un carro.” The word “porque” might be translated for you. (It means “because.”) Then you would type, “I don’t want a bicycle because I have a car.” If you’re right, move on to the next answer. If you’re wrong, lose a heart. When you complete one skill it unlocks other skills, which gives you the keys to progress from basic vocabulary to verb tenses and all the way through the language. Duolingo is intuitive—it’s easy to figure out how to play. It’s also motivational, since you get points and can level up. In addition, there’s a social component, so you can compete against friends.
Later, each phrase that users attempt to translate is voted on by other users, which helps improve the content of the game. If they need a clue, users can scroll over a word for the translated equivalent, and the program is smart enough to detect obvious mistakes. As they progress through the material, players amass “skill points.” A skill is considered “learned” when a player concludes all of the lessons contained within, and “mastered” after completing the assigned number of translations. He earns up to thirteen points per lesson, and loses a point for every mistake. There is also a timed practice feature. Players are given thirty seconds to answer twenty questions. For each correct answer they receive a skill point and an additional seven seconds. Because of an artificial intelligence engine, the game learns as it goes, tracking problematic words or concepts and presenting them in future lessons. Each user, just by playing, is helping the system get better.
In hindsight it sounds easy, but it wasn’t. One problem they encountered early on had to do with special characters like the German umlaut, French accents, or the Spanish question mark. Americans with standard keyboards would not easily be able to create these special characters, so the developers threw in their own virtual keyboard and streamlined the interface. Another was keeping users motivated. As many of us can attest, starting a second language is easy, but sticking with it isn’t.
Early on the game designers had users learn linearly, that is, complete one sentence at a time, but they ran into the problem of short-term memory overload. By the time a player got to the third or fourth sentence, his attention rate dropped. “They were like completely depleted,” says Jose Fuentes, one of von Ahn’s PhD students who has been working in the company to develop its business applications. “They were just like, ‘Uh, I don’t want to continue doing this.’ And then their attention rate was really plummeting. Because in a sense, if you think about it, this is kind of like a game of attention, right?”
The team looked at various studies that found that the magic number for maximum attention is roughly seven minutes. In Duolingo, players’ performance deteriorated at or around seven minutes, when they would finish then click away to somewhere else. The problem was they would then stop using the site. “The lesson there was that, yes, this was a very good way to create translations, but it wasn’t a good way to maintain users wanting to continue learning,” Fuentes says. The solution was to move to a nonlinear approach. A player would be shown a translation, then shown a different activity, like a discussion thread (called “the stream”). This helps players reset their short-term memory, which has resulted in greater engagement and better results.
Von Ahn told me he and his team had to work hard to gamify Duolingo, because at its core it’s not a game. They looked to Zynga, particularly Farmville, which von Ahn loathes. “Farmville shouldn’t even be fun,” he says. “It’s just so dumb. But [Zynga] has done an amazing job gamifying really crappy tasks.”
As of early 2013, Duolingo had amassed about 1 million users, with 100,000 spending time on the site daily, and an additional 15,000 to 20,000 new users, most hailing from outside the United States, joining per day. At this rate the number of users could quintuple by the end of the year. And people are learning. Von Ahn commissioned a study of its language-training effectiveness, which found that by using Duolingo, a person with no knowledge of Spanish could cover the equivalent of an entire college semester language course in just thirty-four hours.
Ultimately, though, the Guatemalan-born von Ahn wants humans to perform these mind-numbingly repetitive tasks—in this case, translating the Web—so that one day we will teach computers to do it and ultimately they’ll do it all for us.
“Don’t worry,” he tells me. “We’re a good fifty years away from the machines taking over.”