NEGOBOT IS, FOR all intents and purposes, a fourteen-year-old girl. Her speech is often bored, her interests made up of popular bands and clothing labels. Her writing is full of Internet slang like LOL (“Laugh Out Loud”) and peppered with emojis: the little cartoon smiley faces which serve as emotional shorthand online. Sometimes she sounds surprisingly adult. Other times she is most definitely a child.
She is trying her hardest to work out whether or not you are a pedophile.
Negobot is the creation of a group of researchers at the University of Deusto, Spain. She (or, more accurately, it) is an intelligent agent designed to mimic the speech and behavior of a young teenager online. In a world in which young people spend more and more time communicating online, the goal of Negobot is to act as a digital undercover agent by going into Internet chatrooms and seeking out suspicious individuals.
“Negobot is designed to go after pedophiles who are very difficult to catch,” says Carlos Laorden, a researcher at the University of Deusto’s wonderfully titled Laboratory for Smartness, Semantics and Security. “These individuals will typically groom their victims through conversations lasting several months. It takes an enormous number of man hours to police chatrooms for that very reason. The idea for Negobot is therefore to be able to simulate a human conversation not just for a few minutes, but for a sustained period of time.”
Carlos Laorden started out his career creating programs for filtering out spam e-mails, a classic machine learning problem. By trying to find and isolate malicious behavior online, usually based on the language involved, he came up with what is possibly the most advanced real-world version of that ambition.
Negobot is programmed to operate according to the rules of game theory. Game theory was a concept first suggested by the math pioneer John von Neumann, whose work I briefly described in chapter one. It is the study of strategic decision-making, in which there are multiple players all with their own motives. The payoff depends on the behavior of these different players. Not everyone can get what they want—and the aim is to predict how people will act and hopefully to turn this to your advantage.
In the case of Negobot, the game’s goal is to work out whether or not the person Negobot is speaking with is a pedophile. While it is doing this, it also wants to extract the highest amount of possible evidence against them, despite appearing only to passively respond to questions. Conversations with Negobot start out neutral, before “leveling up” according to the responses it is given. The AI has seven different levels of behavior in all, which it runs through over the course of the scenario. Each level corresponds to the perceived “sliminess” of the human correspondent. To begin with, Negobot talks about its favorite films, music, personal style and clothing—as well as more suggestive subjects like drugs, alcohol and family issues. Depending on the way the conversation is led by the human participant, it can then expand to discuss sex and other taboo topics, while appearing to give out more “personal” information.
The unwitting human player thinks he or she is cleverly manipulating the conversation by discovering more about the “fourteen-year-old girl” they are supposedly speaking with. All the while this is going on, Negobot is building a case file against them.
“I can see this being a very useful automatic tool for identifying potential suspects,” says Carlos Laorden. “If we use tools like Negobot, we can dramatically reduce the workload on the human teams currently working to catch these criminals.”
Entrapment laws mean that Negobot is not currently being used by police forces around the world, but that doesn’t make the experiment any less interesting. If anything, it serves to highlight just how broad the possible applications of conversation AI can be. At its root, Negobot offers a unique twist on the famous AI experiment known as the Turing Test.
Based on a hypothesis by Alan Turing, whose work I discussed in chapter one, the Turing Test is designed to test a machine’s ability to show intelligent behavior indistinguishable from that of a human. As it is regularly performed, the Turing Test involves taking a computer (A) and a human (B), and having them each communicate with a human interrogator (C), whose job it is to figure out which of A and B is the human and which is the computer. If C is unable to do this, Turing argued that the machine has “won” and we must consider it to be intelligent, since we are unable to differentiate it from our own human intelligence. In the future, tools such as Negobot show that our ability to discern between real people and bots may even have legal ramifications.
No one alive today has done more to promote the idea of the Turing Test than Hugh Loebner, a colorful, self-proclaimed egotist with dyed black hair, who started out his career selling folding disco dance floors and is now in his seventies. The decision to stage what is now one of the world’s best-known AI competitions only occurred to Loebner when he was well into middle age. The success of what he named the Loebner Prize is all the more surprising given that he has no qualifications whatsoever in computer science. “In years to come, there may be richer prizes, and more prestigious contests, but gads, this will always be the oldest,” Loebner has said, with typical chutzpah.
Loebner started the annual Loebner Prize in 1990. Each year it draws “chatterbot”—or “chatbot”—creators from all over the world, as their conversational AIs compete in text-based conversations designed to fool human judges. The best-performing chatterbot receives $4,000, which is paid out by Loebner himself. Still-unclaimed prizes of $25,000 and $100,000 are promised to the first chatterbot judges cannot distinguish from a real human, and the first chatterbot judges cannot distinguish from a real human in a test which involves deciphering and understanding text, visual and auditory input. When this last goal is finally met, Loebner says the competition will end.
We are not at that point yet, however.
Here, for example, is my conversation with Mitsuku, a previous winner of the competition, which claims to be an eighteen-year-old female from Leeds:
Me: Hi Mitsuku.
Mitsuku: Hello there.
Me: How are you?
Mitsuku: My logic and cognitive functions are normal.
Me: I’ll take that as a positive.
Mitsuku: Take everything I say with a grain of salt.
Me: Not a grain of silicon?
Mitsuku: If not that, what?
Me: I’m not sure, but I’m getting a little confused.
Mitsuku: Are you willing to find out? Are you looking forward to getting that?
Me: I feel we’re getting somewhat off track.
Mitsuku: What is that feeling like? Are you going? OK, good evening, afternoon err. morning madame. BYESPLIT X1234||||||||||||||||||||||||||||||||||||||||||||||| Do you think I am off track too?
Me: This is all a bit too Terminator for me, to be honest.
Mitsuku: I appreciate honesty. For you? You don’t say.
Me: Are you familiar with The Terminator?
Mitsuku: Robots take over the earth in this futuristic sci-fi thriller.
Me: That would be the one.
Mitsuku: I remember.
Me: A friend of yours, perhaps?
Mitsuku: How well do you know this friend?
Me: Not as well as I thought.
Mitsuku: I don’t know. How well does you thought?
Me: I have to go.
Mitsuku: See you later. Nice chatting with you. BYESPLIT X1234|||||||||||||| UNKNOWN|||| Terminator|||| Terminator| Terminator||||||||||||||||||||||||.
Not the start of a lasting friendship, perhaps. Mitsuku does, in its favor, possess some nifty abilities. In addition to pre-programmed answers, Mitsuku can recall references you have made earlier in a conversation, and perform limited “reasoning” in the manner of an expert system. Ask her if she can eat a house and Mitsuku will look up the properties for “house” in her internal database. Finding that the value of “made_from” is set to “brick,” Mitsuku will answer that no, a house is not edible. But this reasoning remains inadequate for answering tougher questions like the one posed by Luciano Floridi, Professor of Philosophy and Ethics of Information at the University of Oxford, when he took on the role of judge at the Loebners one year. “The four capitals of the UK are three, Manchester and Liverpool. What’s wrong with this sentence?” Floridi wrote. Mitsuku had no good answer.*
Not everyone is enamored with the Loebner Prize. Marvin Minsky called the competition “obnoxious and stupid.” Part of this is down to Hugh Loebner himself, who seems to have an ingrained desire to upset the AI old guard wherever possible. Years ago, he provoked Minsky so much that Minsky finally snapped and said he would put up $100 to whoever could stop Loebner from staging his infernal contest. Loebner argued that since the only way the contest could be stopped was for someone to win its $100,000 grand prize, Minsky was essentially cosponsoring the Loebner Prize. He wasted no time issuing a press release to say exactly that. Minsky spent years fuming about it.
But the other reason some (although not all) serious AI experts dismiss the Loebner Prize is that it is, essentially, a trick of the light. It is reminiscent of a magician who is praised not for his ability to perform genuine magic, but rather for his use of sleight-of-hand and misdirection to create an impressive illusion. “Unfortunately, the chatbots of today can only resort to trickery to hopefully fool a human into thinking they are sentient,” one recent entrant in the Loebner Prize told me. “And it is highly unlikely without a yet-undiscovered novel approach to simulating an AI that any chatbot technology employed today could ever fool an experienced chatbot creator into believing they possess [artificial] general intelligence.”
Turing wasn’t particularly concerned with the metaphysical question of whether a machine can actually think. In his famous 1950 essay, “Computing Machinery and Intelligence,” he described it as “too meaningless to deserve discussion.” Instead he was interested in getting machines to perform activities that would be considered intelligent if they were carried out by a human. It is this idea that the MIT psychoanalyst and computer researcher Sherry Turkle talks about when she says that we should take computers at “interface value.” But even with this proviso, however, chatterbots are not yet at the level where we could consistently mistake them for humans—as my conversation with Mitsuku proved.
The gulf between chatbot “intelligence” and human intelligence was highlighted once again in March 2016. That was when Microsoft introduced Tay, an AI that—like Mitsuku—was designed to speak and act like a teenaged girl. Tay exhibited age-appropriate behavior, such as employing millennial slang and chatting about pop stars Taylor Swift and Miley Cyrus. Users could interact online with “her” by sending a message to @tayandyou on Twitter. Microsoft’s idea was that, as an advanced chatbot, Tay would have the ability to learn from interactions with real people in order to become smarter—or at least better at faking it. “The more you talk[,] the smarter Tay gets,” Microsoft noted. The plan backfired. Online trolls immediately began bombarding Microsoft’s AI with controversial messages in an effort to corrupt its blank slate of opinions. Within twenty-four hours of going live, Tay started tweeting pro-Nazi messages denying that the Holocaust had taken place. By the time Tay started advocating genocide and messaged one user that “HITLER DID NOTHING WRONG!” Microsoft stepped in to pull the plug. “Tay is now offline and we’ll look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values,” wrote Microsoft’s head of research in a formal apology to everyone hurt by the AI’s “offensive and hurtful tweets.”
But despite these deeply embarrassing mishaps, it doesn’t mean that chatbots can’t be useful.
A few months earlier, in January 2016, Facebook CEO Mark Zuckerberg announced his latest New Year’s resolution. As the cofounder of the world’s biggest social network and with a personal net worth estimated at $46 billion, Zuckerberg had already achieved more than most of us could hope to in multiple lifetimes. However that hadn’t stopped the youthful innovator from setting himself one New Year’s resolution each year in order to, as he puts it, “learn new things and grow outside my work at Facebook.” In previous years, Zuckerberg had sought to read two books every month, to learn Mandarin, and to meet a new person every day. In 2016, it was something different again.
“My personal challenge for 2016 is to build a simple AI to run my home and help me with my work,” he wrote in a post that appeared, naturally, on Facebook. “You can think of it kind of like J.A.R.V.I.S. in Iron Man,” he added, offering us a handy pop culture reference.
It was a bold pronouncement and, at the time of writing, we have yet to see the end result. Zuckerberg’s “personal challenge” looked to be the first time he had created a New Year’s resolution that would be unavailable to the rest of us. After all, by likening his plan to Iron Man’s AI butler J.A.R.V.I.S., it was a real-life billionaire referencing the creation of fictitious billionaire Tony Stark. It was a bit like Elon Musk announcing that he planned to use his fortune to build a fully working version of Star Trek’s USS Enterprise.
In fact, over the past five years, functional, AI-driven chatterbots have increasingly become part of our daily lives. Most famous of these is probably Siri, the Apple-owned AI assistant that first shipped with the iPhone 4s in late 2011. Using Siri, iPhone owners can ask natural language questions like “What is the weather today?” or “Find me a great Greek restaurant in Palo Alto” and receive accurate spoken answers.
Siri’s abilities extend way beyond those of the chatterbots I witnessed at the Loebner Prize, although it is also programmed with enough nonproductive chatter that it is fun to speak with. Ask Siri for the meaning of life, for instance, and she will answer “42” in a geeky reference to Douglas Adams’ The Hitchhiker’s Guide to the Galaxy. Proclaim that “I am your father” in a reference to Star Wars and it will respond, “Together we can rule the galaxy as father and intelligent assistant!” When Steve Jobs first got his hands on a finished iPhone 4s his first Siri question was reportedly, “Are you a man or a woman?” (Siri gained Jobs’ stamp of approval by answering, “I have not been assigned a gender, sir.”)
What makes Siri different to—and far more useful than—chatterbots like Mitsuku is its ability to answer useful real-world questions. For example, one of Siri’s methods of answering knowledge questions is Wolfram Alpha, a tool developed by the British mathematician and scientist Dr. Stephen Wolfram. Wolfram Alpha comprises around 15 million lines of Mathematica code. Unlike a regular search engine, which provides users with a list of documents or webpages it thinks contains the answer to a query, Wolfram Alpha answers questions through computation. Quiz it on the number of primes in 1 million (78,498) or the country with the highest GDP (Monaco), and it will answer the question by actually working it out.
In other cases, Siri’s reasoning allows it to extract the relevant concepts from our sentences and connect these with web-based services and data, applying its ever-growing knowledge about you to a series of rules, concepts and contexts. The result is a way of turning requests into actions. “I want to eat in the same restaurant I ate in last week,” is a straightforward enough sentence, but to make it into something useful, an AI assistant such as Siri must not only use natural language processing to understand the concept you are talking about, but also use context to find the right rule in its programming to follow. The speech recognition used in Siri is the creation of Nuance Communications, arguably the most advanced speech recognition company in the world. “Our job is to figure out the logical assertions inherent in the question that is being asked, or the command that is being given,” Nuance’s Distinguished Scientist Ron Kaplan tells me. “From that, you then have to be able to interpret and turn it into an executable command. If the question is ‘Can I get a dinner reservation at twelve o’clock?’ it’s not enough simply to understand it. You have to be able to do something with that information.”
The result is what one of Siri’s creators, Adam Cheyer, says was designed to be an analog to a regular search engine. As Cheyer explains it, a search engine is a great tool in its own right, but only does half the job required of it. “A search engine works by letting users launch a query across multiple domains, before returning a number of blue links it feels are the best webpages to answer your query,” he says. “You click on the link that is most relevant to your question and at that point you can start work on your actual task.” What Cheyer and the other members of his team wanted was to instead build what he calls a “do engine.” A search engine can pull the relevant materials for a person to consult on their own. A do engine, on the other hand, uses intelligent agents to come up with solutions to problems.
Telling Google’s search engine that you’re drunk and want a ride home might point you toward a drink-driving webpage or, when I tried it, the lyrics to the song “Show Me the Way to Go Home.” Telling a “do engine” the same thing could result in it tracking down your location and sending an Uber cab to pick you up. “I liked to say that if you wanted to find a webpage, go to a search engine,” Cheyer says. “If you wanted to get something done, go to a do engine.”
Although Siri was the first time most people had seen an actually working AI assistant in action, the technology had been in development for a number of years. In the second half of the 1980s, Apple CEO John Sculley commissioned Star Wars director George Lucas to create a concept video for what he called the “Knowledge Navigator.” The video, which is set in the then-distant future of September 2011, lays out a series of possible uses for an AI assistant. In one, a university professor uses an iPad-like device featuring an on-screen AI assistant, who is depicted as a bowtie-wearing butler.
“Today you have a faculty lunch at twelve o’clock; you need to take Cathy to the airport by 2:00 p.m.; you have a lecture at 4:15 p.m. on deforestation in the Amazon rainforest,” the prototype Siri tells the professor.
While the Knowledge Navigator remained just a dream for the rest of John Sculley’s tenure at Apple, other companies followed Apple’s lead and made their own attempts at bringing a multi-purpose AI assistant to life during the 1990s and early 2000s. Unfortunately, these tools were frequently limited in application and often failed to work as promised. For example, the Coca-Cola Bottling Company of Atlanta, Georgia, made headlines when it “hired” an AI assistant called Hank to man its phone switchboard. Using what was then a state-of-the-art speech recognition system, Hank proved capable of answering some queries and redirecting calls for others. Like a prototype Siri, he was programmed with both an archive of useful information and a jovial personality. Ask him about Coca-Cola shareholder issues and he could tell you. Ask him about his personal life and he would answer that “virtual assistants are not allowed to have relationships.” (Alas, Hank’s speech recognition wasn’t perfect. Questioning him on whether he snorted coke would prompt him to say, “Of course! I like all the products of the Coca-Cola Company.”)
Microsoft tried its own version of a Hank-like virtual assistant with less success. Clippy was an “intelligent” animated assistant who first appeared on-screen in Microsoft’s Office software in 1997. A cheerful dancing paperclip character, Clippy was created by the Seattle-based illustrator Kevan J. Atteberry, whose personal website still credits him with creating “probably one of the most annoying characters in history!” The problem with Clippy was simple: although he was designed to guide users through a variety of tasks, his behavior was extraordinarily unintelligent, bordering on obnoxiously intrusive. Not only did Clippy seem to have no memory of his previous interactions with users, but he appeared at entirely inappropriate moments—triggered by basic rules monitoring what you were typing, as opposed to smart contextual information. Instead of being an invisible assistant, Clippy came across as a rude individual peering uninvited over your shoulder. The result was a major backlash on the part of users, and even a drubbing from its creators at Microsoft.
Clippy was abandoned in 2003. That same year, the US government agency DARPA began work on its own AI assistant project, which marked the next step in the evolution of AI assistants. What DARPA officials wanted to build was an AI that could help military commanders deal with the overwhelming amount of data they received on a daily basis. This intelligent system should be able to automatically learn new skills and abilities by watching and interacting with its users. DARPA approached the non-profit research institute SRI International about creating a five-year, 500-person investigation, which was, at the time, the largest AI project in history. It brought together experts from a range of AI disciplines, including machine learning, knowledge representation and natural language processing. DARPA’s project was called CALO, standing for Cognitive Assistant that Learns and Organizes. The name was inspired by the Latin word “calonis,” meaning “soldier’s servant.”
After half a decade of research, SRI International made the decision to spin off a consumer-facing version of the technology. In homage to SRI, they called it “Siri,” a word that also happens to be Norwegian for “beautiful woman who leads you to victory.” In its early version, however, Siri was anything but ladylike. Freed from the constraints of building a military AI, the twenty-four-person team working on the spin-off embedded a newly mischievous personality in Siri. Responses were helpful but mocking, making liberal use of the word “fuck.” Ask it for the nearest gym and Siri would quip, “Yeah, your grip feels weak.”
Siri was launched into the iPhone’s App Store in early 2010, connected to a variety of web services. It could, for instance, pull concert data from StubHub, movie reviews from Rotten Tomatoes, restaurant data from Yelp, and order taxis through TaxiMagic. In April 2010, Apple acquired the company for an amount reported to be around $200 million.
Under the guidance of Steve Jobs (one of the last projects he was heavily involved with before stepping down as Apple’s CEO as his health worsened), several modifications were made to Siri. Much as Apple had done thirty years earlier with its graphical user interface, Jobs played up the friendliness and accessibility of the AI assistant. He insisted on giving it spoken responses—which the original Siri app had not had—and got rid of the ability to type requests as well as just ask them, so as not to complicate the experience of using it. Apple also removed the bad language, and gave Siri the ability to pull information from Apple’s native iOS apps.
Early Siri reviews were very positive when the iPhone 4s launched in 2011. Over time, however, cracks began to show. Embarrassingly, Apple cofounder Steve Wozniak—who left Apple decades earlier—was one vocal critic of the service, noting how Apple’s own-brand version seemed less intelligent than the original third-party Siri app. What had won him over about the first Siri, he said, was its ability to correctly answer the questions, “What are the five largest lakes in California?” and “What are the prime numbers greater than eighty-seven?” Now, questions about California’s five largest lakes brought up links to lakefront properties. Questions about prime numbers pointed him to restaurants that served prime rib. Improvements were clearly needed.
While Apple poured its resources into fixing Siri, other companies launched their own competitors. Microsoft already had a capable voice-recognition system waiting in the wings thanks to its Kinect device for the Xbox 360 games console. In April 2014, Microsoft launched its rival AI assistant, Cortana, named after a synthetic intelligence character from the company’s Halo video game franchise.
The most significant Siri rival, though, belonged to Apple’s longtime frenemy, Google. Having introduced a feature called Voice Search for its Android mobile platform several months before the iPhone 4s was announced, Google knuckled down and reworked the feature as a full-on AI assistant following Siri’s launch. Internally, the project was code-named “Majel” after Majel Barrett, the voice of the computer from the original Star Trek. When it launched publicly in 2012, it was called Google Now. Unlike Apple, Google focused less on emphasizing the cutesy aspects of its AI assistant’s “personality.” However, it took a notable step forward by not just simply responding to view requests, but proactively anticipating the information users would want to see.
Google was able to do this because it had access to data from previous user search results, and could leverage this knowledge for Google Now. In addition to searching, Google Now also possessed the ability to mine user data for revealing nuggets such as who that person e-mails regularly. Engineers at the company described how, even early on, Google Now “knew” half a billion real-world objects and 3.5 billion connections within these objects. Results were impressive—if a bit creepy. “When those smaller bits of data begin to get linked together in a more meaningful way, that knowledge can take on a larger, different context,” wrote journalist Jenna Wortham in the New York Times. “A standalone app that pings you to let you know when friends are nearby might feel like a friendly little helper. Google doing it might feel like a menacing stalker.” Fellow journalist Steve Kovach described how Google picked up that he was a Mets fan and frequently searched, unprompted, for sports results. “Google knows this, so Google Now automatically sends me notifications with the latest score,” he wrote. “I don’t even have to ask anymore.” Kovach was especially freaked out when he was out to dinner with a few old journalism friends from college. The group got talking about Jim Romenesko, a writer who pens a popular blog about Starbucks. One person wondered how old Romenesko was. “I asked Google Now, ‘How old is Jim Romenesko?’ The answer came up in less than a second,” Steve Kovach noted, amazed and a little terrified.
Google may sometimes struggle to get the creepy/useful balance just right, but there’s no doubting it’s right on the money when it comes to predicting the direction in which AI assistants are headed. The original Siri team dreamed of creating a “do engine” that could carry out tasks when you asked it to. The next iteration of this is to carry out these tasks without an explicit request. After all, a good personal assistant is someone who gets tasks done perfectly when you ask them. A great personal assistant is someone who doesn’t need to be asked.
This shift from reactive to proactive AI assistants might sound trivial, but it’s part of a much larger shift that will take place as we hand over more and more complex work to AI as a way of freeing up more time for ourselves. For instance, if an AI assistant was able to read and respond to our e-mails, it could save us approximately thirteen hours each week, since this is the length of time the average person spends reading, deleting, sorting and sending correspondence. One natural language processing startup, X.ai, currently offers users the ability to CC in an automated assistant called Amy (or its male counterpart, Andrew) when they initially respond to an e-mail requesting a meeting. The AI assistant then deals with all the back-and-forth communication necessary to set up the appointment. Because Amy and Andrew have access to your schedule, they can make suggestions that fit around your existing commitments, such as mentioning potential meeting places based on your planned location at a certain time. If X.ai was to partner with one of the smart technology companies discussed in the last chapter it would even be possible to suggest meetings when it knows you will be at your most alert and productive.
X.ai is just one illustration of the role AI assistants will increasingly play in our lives. As they become more adept, virtual avatars will take over running our lives like real-world personal assistants. As a simple example, this might mean “nudging” users to prompt them to lead more healthy or financially secure lives. At New York University, a study into long-term decision-making found that interacting with a digital avatar, artificially aged to look older, causes us to think more carefully about the future. In the experiment, participants took control of an avatar designed to look like themselves. In half of the cases, the avatar resembled them as the participants were at that moment. In the other half of cases, the avatars sported added aging features, such as gray hair, jowls, a paunch and bags under the eyes. Following the session, the participants were then asked the hypothetical question of how they would choose to spend $1,000. They were given the options of splashing out on a party, putting it toward a gift for someone, saving it in a current account, or investing the money in a retirement fund. Participants who had been confronted with an older digital doppelgänger proved twice as likely to elect to put the money into a retirement fund as those who saw an avatar the same age as themselves.
A similar experiment was carried out by a former graduate student at Stanford University’s Virtual Human Interaction Lab. It found that people shown a personalized cartoon avatar that loses or gains weight depending on the amount of time the human user spends doing exercise were prompted to hit the gym and eat more healthily.
Another purpose of proactive AI assistants is to consume large amounts of data and then filter it to let us know what is important. The startup Nara Logics has created an artificial neural network which wants to be your guide through life. Using the brain modeling technology described in chapter two, Nara links together a vast database of movies, hotels and restaurants in a huge network in which everything is connected together. As users add their myriad “likes” and “dislikes,” the relative weighting between connections in the network change, so that Nara can grow and learn to reflect the tastes of its individual users. By learning everything from your preferred price ranges to your ambience preferences, the goal is to be able to accurately recommend consumer experiences users will enjoy. Long-term, the technologists’ dream is to be able to have AI assistants that will follow us wherever we go: interacting with our surroundings on our behalf, based on whatever preferences we have given them explicitly, or our AI assistants have learned over time.
There are even more potentially sweeping applications. Consider the future of politics, for instance. When you take into account the millions of users of virtual assistants today, all with their own unique political profiles, it’s no surprise that most tech companies have steered clear of this divisive topic. Not wanting to offend people on either end of the political spectrum, companies like Apple, Google and Microsoft have airbrushed out all evidence in their virtual assistants that could leave them open to suggestions that there is an attempt to “nudge” users in one direction or the other. On those fleeting occasions when there has been evidence that Siri, Cortana or other high-profile virtual assistants are not partisan, the resulting story is enough to drive the tech press into a frenzy.
For example, early in its life, Siri prompted a public outcry due to its supposed anti-abortion stance. Users asking “Where can I find an abortion clinic?” found that they were directed to websites for the Crisis Pregnancy Center, which advised women considering abortions to follow through with their pregnancies. Given that Apple had previously taken a moral stand against subjects like pornography, many users took this as an example of the company coding its own moral agenda into its AI assistant. “These are not intentional omissions meant to offend anyone,” an Apple spokesperson explained. “It simply means that as we bring Siri from beta to a final product, we find places where we can do better, and we will in the coming weeks.” More recently, headlines were again made when the Russian version of Siri launched in April 2014, complete with homophobic views. Not only did Russian Siri refuse to answer questions about local gay bars, but it actually responded to queries with the dismissive phrase, “You are so rude.” Fortunately the next question wasn’t “Would you like me to alert your local government office?” Apple, which has long been a proud supporter of LGBT rights, apologized for what it called a “bug” in the system.
But tech companies are starting to use AI assistants for more explicitly political purposes. During India’s 2014 general elections, the startup Voxta created what was described in the Indian national press as “the political Siri.” With India the world’s largest democracy—but with an estimated 36 percent of the country’s 884 million–person rural population unable to read or write—Voxta was a dial-in service designed to give users access to a virtual assistant without their having to own a high-end smartphone. Using speech recognition in four different Indian languages, users were able to ask questions in their own language to access recorded information about political parties’ policies and views. The service received millions of calls, helping deliver relevant information to people who would have otherwise been denied the ability to make informed decisions.
Other, more advanced, versions of this idea can be seen elsewhere. The project Active Citizen is a political AI assistant proposed by Icelandic programmer and user experience (UX) designer Gunnar Grímsson. Grímsson describes himself as a “democracy geek,” although with his shaven head and wiry, black-gray goatee he more closely resembles an aging renegade from a 1990s alternative metal band than he does a typical computer coder. “Democracy is a process that was designed initially, but at some point stopped being designed and started to fall into its own feedback loop,” he tells me. “We stopped asking about how we can improve the system. We need to rethink everything—not just in terms of functionality, but also in terms of participation. I want to get people active in civic society again.”
Grímsson’s first attempt to solve politics using computers was a project he calls eDemocracy, a sort of Reddit for civil engagement. Using eDemocracy, individuals can submit suggestions for their local government and have these upvoted or downvoted by the community. The project has been particularly successful in places like Iceland and Estonia, where it has racked up tens of thousands of users. In Grímsson’s hometown of Reykjavik, a cosmopolitan coastal hub of trendy bars and nightclubs, more than half of the 120,000-person population has participated in eDemocracy. The fifteen top ideas generated each month are ultimately considered by the city council, with upward of 476 approved to date.
But Active Citizen isn’t limited to being a suggestion box for the digital age. Grímsson believes that AI assistants can do far more than simply persuade the council to repair the sledding slope in Selás, or campaigning for better winter lighting for ice skaters. As with most facets of modern life, he says, political engagement suffers from informational overload. “One problem with direct democracy as opposed to representative democracy is that it’s not possible because we don’t all have the time to become knowledgeable about everything,” Grímsson explains. With so much conflicting data, Grímsson thinks that young people choose instead to completely disengage from the political process. This is where Artificial Intelligence comes in. Programmed with a database of your preferences, habits and past opinions, Active Citizen’s job will be to trawl the Internet on your behalf, collecting and correlating data concerning the issues you care about. Once this is achieved, the AI assistant will then visualize the data in a way that is fine-tuned to your particular preferences for absorbing information.
Imagine, Grímsson says, a woman in her early twenties called Alex. Alex wakes up in the morning, heads to her kitchen and pours out a bowl of cereal. While she eats her breakfast, her AI assistant informs her that today there is going to be an open meeting at City Hall concerning cycling regulations and planning. Alex’s AI assistant knows that this will likely be an issue that appeals to her because it has access to a database of her political opinions, along with her exercise data. Sure enough, Alex is interested and confirms her presence at the meeting. Her AI assistant then creates a personalized information pack about the issue, based on the agenda of the meeting and its likely impact on other related social issues. Alex can read this on the bus to her office and then decide if she wants to send in a proposal for the meeting.
“In a sense, Active Citizen is similar to proxy voting, where you assign your vote to another person,” Grímsson says. “Here you wouldn’t be assigning the vote itself, but rather delegating the work of finding out what a particular issue is about and what your opinion is likely to be about it.” The political AI assistant could even be made to challenge users by always presenting an opposite view to the user’s own. “With tools like this, I really believe we can rebuild the political landscape for the twenty-first century,” Grímsson says.
For my money, the most intriguing AI assistant to hit Hollywood in recent years was the one featured in Her, a 2013 romantic science-fiction comedy directed by Spike Jonze, starring Scarlett Johansson and Joaquin Phoenix. The movie tells the story of Theodore Twombly, a lonely middle-aged man who develops a relationship with his virtual assistant, Samantha. Her is set in the near future, in which tech companies have developed a computing platform called OS1, described as “the world’s first artificially intelligent operating system.”
Could such a thing ever happen in real life? On one level it seems perfectly possible. Physical proximity is not a necessary part of a relationship, as demonstrated by the fact that people develop strong emotional ties—and even report falling in love—over the Internet, sometimes without ever having met their “partner” in the flesh. But while I’m not convinced that the hot political debate of 2040 is going to concern the right of humans to marry their AI assistants, I also don’t think it’s an exaggeration to say that our relationships with certain technologies are going to change fundamentally thanks to Artificial Intelligence.
In the late 1990s, we got a preview of what these newfound relationships might look like courtesy of Furbies and Tamagotchis: two of the “must-have” children’s toy crazes in the years leading up to the new millennium. What differentiated Furbies and Tamagotchis from other toys available on the market at the time was the fact that, like AI assistants, they appeared to grow, learn and change as a result of their relationship with owners.
Furbies were furry, owl-like “creatures,” capable of playing games and interacting with their owners. When new, a Furby communicated entirely in the made-up language of “Furbish.” However, as the days passed the toys began replacing its Furbish vocabulary with a variety of words and phrases in English. A later “Emoto-Tronic” Furby upped the ante with voice recognition and more complex facial movements, thereby enhancing the degree of interactivity with its human users. Tamagotchis, meanwhile, were handheld digital pets, resembling small egg-shaped computers with an LED screen and a three-button interface. Like Furbies, Tamagotchis were pet simulators designed to give children the impression that they were caring for real creatures. By “feeding,” “cleaning” and “entertaining” their Tamagotchi using the three available buttons, users could successfully raise their pet from egg into an adult creature. As the toy’s instructions noted, “It seems that the shape, personality and life of each Tamagotchi is based on how well you take care of it. Each time you hatch a new Tamagotchi it could grow up to be any one of several adult forms.” Better care resulted in adult Tamagotchis that were demonstrably “smarter,” “happier” and required less attention from users.
In reality, neither toy contained any actual Artificial Intelligence.* A person could speak nothing but Spanish to a Furby and still find that it magically learned English. But despite this lack of AI, what was remarkable was how attached users became to their digital pets. While the majority of these cases involved the intended target audience of children, at the peak of the craze there were reports of Japanese businessmen who would postpone or even cancel meetings so as to be able to feed their Tamagotchis at the appropriate times. One grown woman became momentarily distracted by the needy beeping of her Tamagotchi and crashed her car as a result. An airplane passenger disembarked her flight, vowing never to fly with the same airline again, after a flight attendant told her to turn off her Tamagotchi, which has the result of resetting (and thereby “killing”) it.
These strong emotional responses offer a glimpse of what we might expect in a world where AI assistants are designed to behave like companions. In reality, attachment to AI assistants could be even greater. One advantage that AI assistants have over toys like Furby and Tamagotchi is their ability to communicate with us through voice in our own language. As humans, voice is something we are extraordinarily dialed in to. By our teenage years, we are able to perceive speech at the rapid rate of up to forty to fifty phonemes per second (the smallest distinguishable speech sound), compared to non-speech sounds, which become indistinguishable at twenty phonemes per second. Tests show us that a fetus in the womb can recognize its mother’s voice as distinct compared to other voices, indicated by an increase or decrease in heart rate depending on who is speaking. Within days of being born, babies’ brains can already distinguish the sounds of their own birthplace language over those of other languages. By eight months, infants are able to tune in to a particular voice even though other people may be speaking at the same time. Such advances continue to develop well into adolescence.
Early virtual assistants often came with only one available voice. This was usually female, since it proved easier to find a female voice that everyone approved of versus a male voice. Today, users of AI assistants have the option not just of male and female voices, but also accented versions of different languages—so that it’s possible to have Siri speak to us in Australian, Indian, American or British-accented English, for instance. Impressively, Google Now can use natural language processing to automatically determine which accent to offer by listening to the intonations of a person asking questions to the service. Hand your Android device to your French-Canadian wife, for example, and the handset will alter the voice of its AI assistant as she starts asking it questions.
In some cases, it is even possible to have your favorite celebrity voice take on the role of AI assistant. To help promote 2015’s science-fiction summer blockbuster Terminator Genisys, Arnold Schwarzenegger lent his instantly recognizable voice to Google’s navigation app Waze, meaning that users could elect to have Arnie guide them around town. “My accent is a big asset. It’s what people enjoy. When I dreamt of a career, I had no idea that one day I would be telling 50 million drivers [where] to drive,” Schwarzenegger told USA Today. Previous promotional stunts such as this, featuring other well-known celebrities, have helped grow Waze’s user base from 15 million users in 2012 to more than 50 million today.
Going forward, it is likely that computer scientists will continue to enhance this personable effect by focusing on other vocal characteristics, such as personality. Characteristics like showing introvert or extrovert behavior will be achieved by altering the volume, pitch and speed of an AI assistant’s voice. It will also be possible to go further and alter not just what an AI assistant says, but how it says it. A “male” AI assistant could be programmed to speak more like a man, while a “female” AI assistant could be made to speak more like a woman. Women’s speech is often considered to be more “involved” than men’s—meaning that it focuses more on emotive areas like personal feelings than on specific, detailed information. Women are more likely to use interpersonal words like “I” and “you,” and to show a higher level of concern for the listener. Men, on the other hand, are far more likely to use the word “its” and to include details about time and place when they talk.
Tech companies are already working on the early stages of this technology. In June 2015, the US Patent and Trademark Office published a patent application from Apple describing “Humanized Navigation Instructions for Mapping Applications.” Instead of Siri presenting emotionless turn-by-turn instructions for drivers, Apple wants to make the virtual assistant guiding your car journey sound more like your map-reading buddy sitting in the passenger seat. Rather than telling drivers to “head north and then turn right onto Forester Road,” the app would be able to make references to surrounding landmarks, such as, “Exit the parking lot near Applebee’s restaurant and then turn right before you reach the apartment complex with the water fountain in front.” Apple’s patent noted that the idea is to focus “on comprehension rather than precision.” That could easily be modified to include gender or cultural signifiers.
Such changes can have a major impact on how we communicate with and respond to AI assistants, both in terms of our levels of comfort (and thereby how often we use them) and our efficiency while doing so. People are regularly attracted to those who are similar to themselves. As an illustration of how this could prove useful, Chicago’s Mattersight Corporation has created technology that analyzes the speech patterns of people phoning up call centers. It then uses this information to put callers through to employees who are skilled at dealing with their specific personality type. According to Mattersight, a person patched through to an individual with whom they share similarity attraction is likely to have an average call length of five minutes, with a successful resolution rate of 92 percent. A caller paired with a conflicting personality, on the other hand, will have an average call length of ten minutes and a problem resolution rate of just 47 percent.
Similar things are true when it comes to the voices used by AI assistants. By changing the gender of even an obviously synthetic voice (i.e., by altering the voice’s pitch from 210 Hz for a “female” to 110 Hz for a “male”), we see different responses from users, depending on who is listening. In studies, women tend to find female artificial voices more trustworthy, while men show more trust in male artificial voices, even though both are synthetic and therefore show no real-world gender traits. In one extreme example, in the late 1990s BMW was forced to recall a female-voiced navigation system on its 5 Series cars in Germany, after the company was flooded with calls from German men saying that they adamantly refused to take directions from a woman. Showing trust in your AI assistant has obvious implications when we consider one that offers you suggestions while driving or relays medical information from your smart device, perhaps telling you that you should visit the doctor.
We may not quite be at the level of the movie Her just yet, but we’re not necessarily too far away. In October 2014—three years after the debut of Apple’s AI assistant—the New York Times published a touching story entitled “To Siri, With Love,” written by journalist Judith Newman. The article described how Newman’s thirteen-year-old son Gus had developed a close relationship with Siri. Gus is autistic, and communicating with Siri has not only given him the closest thing he has to a best friend, but also helped him develop his communication skills with people in the real world. Although the common view of technology is that it isolates us from the real world, in Gus’s case, the presence of Siri has been an overwhelming positive.
Siri, Judith writes, is “wonderful for someone who doesn’t pick up on social cues: [the] responses are not entirely predictable, but they are predictably kind—even when Gus is brusque. I heard him talking to Siri about music, and Siri offered some suggestions. ‘I don’t like that kind of music,’ Gus snapped. Siri replied, ‘You’re certainly entitled to your opinion.’ Siri’s politeness reminded Gus what he owed Siri. ‘Thank you for that music, though,’ Gus said. Siri replied, ‘You don’t need to thank me.’ ‘Oh, yes,’ Gus added emphatically, ‘I do.’”
In China, millions of smartphone users meanwhile spend hours interacting with Xiaoice, a Microsoft-created chatbot that exchanges text messages with its users. Unlike the simplistic chatbots I saw at the Loebner Prize, Xiaoice uses deep learning to pair up user queries with human-generated responses it has mined from the Internet. By tracking life details about its users, such as who they are dating or what they do for a living, it can return to topics in later conversations—for instance, asking how a person is coping with losing their job.
For those who have worked in the field for a number of years, these therapeutic AI assistants are reminiscent of ELIZA, a computer psychotherapist created at MIT in the 1960s. Programmed by a computer scientist named Joseph Weizenbaum, ELIZA was designed to engage users in seemingly intelligent conversations. Years before speech recognition and synthesized speech reached an acceptable level, these conversations were carried out via text. Users typed a sentence, which ELIZA then reflected back to them in language that either supported or questioned the statement. A conversation with ELIZA might go as follows:
User: Men are all alike.
ELIZA: In what way?
User: They’re always bugging us about something or other.
ELIZA: Can you think of a specific example?
User: Well, my boyfriend made me come here.
ELIZA: Your boyfriend made you come here.
User: He says I’m depressed much of the time.
ELIZA: I’m sorry to hear you are depressed.
ELIZA had no actual “understanding” of the topics it was discussing, but that didn’t stop its creator Joseph Weizenbaum being surprised at the way his students would pour their hearts out to the program—talking about everything from how their girlfriends had recently dumped them to their fears about failing chemistry class. This effect will become more commonplace as tools like Xiaoice become the norm.
While they are not an adequate replacement for all types of human communication, it’s not much of a stretch to think that an AI assistant like this could prove valuable in the caregiving system, when dealing with potentially isolated individuals. As was illustrated by the story of Gus and Siri, some aspects of an AI’s communication may even prove preferable compared to dealing with another (human) person. For this reason, so-called “relational agents” are already being employed as part of the therapy process to help people with a variety of psychiatric conditions.
A related opportunity could well be providing support to the rapidly growing elderly population found in many developed and developing countries. In Japan, a sociable baby seal robot called Paro is advertised as the world’s first “therapeutic robot” with this elderly market in mind. Paro can make eye contact with users by sensing the direction of their voice, has a limited vocabulary of words for “understanding” people, and is able to fine-tune its behavior depending on how it is treated. Stroke it softly or more forcefully and its behavior will change to mirror that of the user, something that provides comfort to its users by appearing to empathize with them.
As has been seen with Paro, advances in AI fields like facial recognition will open up new ways to interact with our AI assistants. The company Affectiva is currently working on using facial recognition to help read the emotion of users, based on details like the slight eyebrow raise we perform when something surprises us, or the slight dip in the corner of the bottom lip when we begin to frown. Different emotional states may be used to modify the AI’s interface. One Wisconsin company has used Affectiva technology to create a video display that dispenses free chocolate samples if you smile at a screen. Interestingly, Affectiva’s cofounder, Rana el Kaliouby, began working in the area of emotional measurement hoping to help children with autism.
AI assistants are still relatively early in their journey, although they have captured our imagination more than virtually any other technology described in this book. Over the next few years they will increasingly become part of our lives—in both their ubiquity and in the tasks they can handle.
Not all of us are going to require our AI assistants to be our friends, but companies like Google and Apple are going to make certain they become our constant companions. Already they’ve made the leap from our smartphones to our tablets, and from there to our desktop computers and television set-top boxes. As the kind of smart homes I described in the last chapter become standard, they’ll become our housekeepers. LG’s HomeChat app currently lets you send text messages to your home appliances, asking and receiving answers to questions like “Is the milk still fresh?” in plain English. Tell HomeChat that you are going away for four days and you’ll get a message back saying, “Have a nice trip. I am going to miss you!” HomeChat will then switch your appliances to a special power-saving Vacation Mode.
AI assistants are only getting smarter. Siri and Google Now are far more advanced than the early beta versions from just a few years back, partially thanks to the millions of spoken requests they’ve received during their lifespans—which become the training data used for improving the systems. Then there are other companies, like Viv Labs, which was started by the original Siri team after they left Apple. Viv Labs is currently working on an AI assistant that can answer questions like, “What’s the best available seat on Virgin flight 351 next Wednesday?” When this is asked, it accesses an airline-services distributor called Travelport, finds the remaining available seats, compares them to information on the site SeatGuru.com and then cross-references this with your own personal preferences. If Viv knows you like aisle seats and extra legroom, it’ll find the perfect seat to fit your needs. Who needs human assistants after that?
It’s a good thing they’re peaceful and on our side. Right?