Building Human-Machine Partnerships
MY WIFE AND I RECENTLY WALKED OUR SON AROUND THE neighborhood and surprisingly found ourselves locked out when we returned. We called a locksmith company and they sent over a sixteen-year-old kid to help us. As the locksmith fiddled with our backdoor, I admired his surgical precision and pondered the future of his profession.
According to Frey and Osborne’s 2013 analysis, locksmiths have a 77 percent likelihood of losing their jobs to automation. Already, a company called KeyMe has automated a primary function of locksmiths. Convenience stores like 7-Eleven house KeyMe kiosks where customers can pay $3.50 to scan the key that they want to copy. KeyMe then stores the digital file of each key on its online database, so that if customers find themselves locked out, they can log in to a kiosk using a fingerprint scan and pay another $20 to get a new key cut immediately, obviating the need to destroy their locks. Several roboticists have also developed robots capable of picking or breaking locks in a matter of seconds. These advancements suggest that human locksmiths’ days are numbered.
However, in talking with the locksmith, one aspect of his job stood out to me as a particularly difficult task to automate. The first thing he asked me before tampering with the lock was whether we had a window open (it was a chilly day, and we did not). He explained to me that every time he arrives at a job, he first asks residents if they have an open window, and often the answer is yes. Residents far prefer scaling their houses or climbing a tree to jam themselves through a window than breaking their locks to get inside. This insight, I believe, gives the human locksmith superiority over a robot, as I know of no robot primarily tasked with completely disregarding the job it was programmed to do in order to develop a solution outside of its expertise.
“Aha!” an automation evangelist might respond. “Given machines’ vast capability to learn from human performance, the automated locksmith would certainly come to learn about the benefits of open windows, and would add this to its repertoire of questions to ask before beginning the lock-breaking process!” And although plausible, it is also precisely why humans will likely stave off automation for longer than predictive models suggest. Machines almost always will be catching up with and learning from humans rather than the other way around. That is, machines need humans.
Although countless articles have expressed both anxiety and excitement over how humans and machines might collaborate in the automated future, there is little clarity over how these bonds will actually form. Here we explore this issue in greater depth.
As work becomes more mechanized, several possible solutions to this societal shift have emerged. All proposals are on the table, from universal basic income to support workers who lose their jobs to automation to Bill Gates’s proposed “automation tax” on anyone who develops a robot that replaces human workers. The most optimistic vision of the automated future involves a division of labor whereby humans complete tasks for which they are uniquely suited while robots perform everything else. But what do these partnerships actually look like? Here I describe three prescriptive templates: (1) let robots and humans play to their respective moral strengths; (2) let robots handle the dull, rote, and mechanical work to give humans more interesting work; and (3) let robots reduce the emotional burdens that humans face in their jobs. We examine each one in turn.
In the context of decision making, people expect robots to be utilitarian entities that make choices based on cold, cost-benefit calculations. People expect humans, on the other hand, to follow “deontological” moral rules such as “do not actively discriminate against another person.”1 These expectations indeed reflect machines’ and humans’ unique moral strengths. Robots’ moral advantage is their “blindness” to context. However, research shows humans prefer deontological decision makers who consider subjective factors related to harm and injustice to perfectly utilitarian decision makers.2
In work explicitly examining people’s aversion to machines making moral decisions, psychologist Yochanan Bigman and Kurt Gray found that people indeed preferred humans instead of machines to make moral decisions in medical, military, and legal contexts, yet people were receptive to machines in an advisory role.3 In one study asking participants whether a doctor, a computer system named Healthcomp (capable of rational, statistics-based thinking), or a doctor advised by Healthcomp should decide about pursuing a risky surgery for a child, a majority of people favored the doctor advised by the machine. Thus, one method of optimizing human-machine partnerships is letting robots guide decision processes through intensive utilitarian calculations and letting humans ultimately correct for any additional moral violations.
My colleague, sociologist Brian Uzzi, has written about this first prescription while discussing housing rental company Airbnb’s response to racial discrimination complaints.4 Airbnb’s issues have included hosts sending racist messages to potential renters and denying or canceling renters’ plans based on their race, as well as a report showing hosts were less likely to rent to individuals with stereotypically African American–sounding names. Uzzi describes how machine learning algorithms could scour user-generated text on the Airbnb website or on Twitter to identify “red flag” phrases that tend to predict discriminatory patterns of behavior from property hosts. Humans may not be able to identify the precise words that predict such behavior, and they definitely do not have the time to comb through countless terabytes of text-based data looking for red flag terms. This algorithmic search process, however, as Uzzi describes, is insufficient on its own to prevent large-scale discrimination. Such an effort would require comparing an algorithmic search process to information about property hosts through vetting them, as well as ensuring the algorithm is not developing its own biases (more on this below). The vetting process would involve asking hosts if they have a history of prejudiced behavior and favor certain races more than others. Here, human discrimination and diversity experts would need to design surveys and other methodological tools to obtain information directly and accurately from these hosts. And this is where humans would have to decide on the moral rule whereby we consider another person to be racially prejudiced or discriminatory. Machine learning can pinpoint hot-button words to identify people at risk of discriminating, but confirming these suspicions requires a human being asking people directly about their behavior and attitudes toward discrimination.
There exist several cases where machines left unchecked have produced more rather than less race-based and gender-based discrimination. For example, a Bloomberg News report found that Amazon’s data-driven approach to determine where to offer Amazon Prime (a service that offers same-day delivery on products) neglected majority black neighborhoods in major American cities.5 Amazon responded by describing how it uses various data points (including distance to order fulfillment centers and number of existing Prime members in a region) to determine algorithmically which neighborhoods are Prime eligible (and this was the reason, Amazon claims, black neighborhoods were excluded).6 Yet consumers were not satisfied with this response, prompting human beings at Amazon to reverse course, admitting that the algorithmic process is unfair. Once these racial disparities became apparent, Amazon agreed to provide Prime services to the appropriate neighborhoods in Boston, New York, and Chicago.
Communications scholar Safiya Umoja Noble directly examines how algorithms fortify inequality in her book Algorithms of Oppression: How Search Engines Reinforce Racism.7 Noble came to her topic after a Google search of the term “black girls” immediately returned to her results full of pornography featuring black women. Noble describes this discovery, writing, “Black girls were still the fodder of porn sites, dehumanizing them as commodities, as products and as objects of sexual gratification.” When Noble searched Google asking “why are black women so . . . ,” the autocompleted results were disparaging (e.g., “loud,” “lazy,” “rude”), whereas equivalent searches on “white women” were far more complimentary.
Noble describes several other examples of machine-learning-driven discrimination including Google’s photo-organizing service tagging images of black people as gorillas. Again, once Google identified this glitch, human engineers fixed it. Similarly, when computer scientists Joy Buolamwini and Timnit Gebru presented research showing that algorithmic facial recognition software from Microsoft and IBM struggled to recognize black women’s faces in particular, representatives from both companies responded that they would fix this issue.8
Machine discrimination occurs because computer programs can learn and produce gender- and race-based biases just as humans can. Research by computer scientist Aylin Caliskan illustrated this by using a machine learning program to scan massive amounts of text across the internet to learn the proximity of associations between different words (e.g., “dog” often occurs frequently next to “cat” but less frequently alongside “macaroni”).9 Her research found that machines trained on human-generated data (text) develop similar racial and gender-based associations as do humans, with pernicious consequences.
Caliskan’s program associated African American names more with negative terms and European American names with more positive terms, a racial bias that commonly emerges in humans. The program also associated female-related words with family-related words and male-related words with career-related words. These biases are problematic because algorithms such as these form the basis of many online tools including Google Search’s autocomplete function. As Noble found, searching phrases like “women are . . .” or “Chinese people are . . .” prompts Google to offer off-putting terms (e.g., “women are like children”) because Google’s algorithms learn through association. Caliskan also demonstrated this phenomenon by using Google Translate to translate a genderless phrase “bir doctor” in Turkish to English. The software translated the phrase to “he is a doctor,” associating this professional term with males rather than females. Recently I just tried translating the same phrase, and it now translates appropriately to “a doctor,” suggesting, again, that human beings at Google identified and corrected the issue.
Microsoft also inadvertently demonstrated machines’ capacity for bias when it unleashed an artificial intelligence bot on Twitter to chat like a typical teenager in real time. The bot, named Tay or Taybot, mined publicly available text online to “learn” how to converse on Twitter, but within twenty-four hours of its release, it began spouting bigoted tweets such as “GAS THE KIKES RACE WAR NOW.” Again, it took humans to flag this inappropriate behavior and shut the entire program down. The partnerships I am describing here require humans to do what they can exclusively do better than machines: decide the moral code we want to live by and teach machines accordingly. Because most of us have decided we want our moral code to exclude inequities based on race or gender, it is up to humans to identify when machine output appears to perpetuate these inequities and to correct course.
These examples of language-based biases are trivial when compared to the ways that machine learning has begun to influence more consequential decisions. Public interest website ProPublica published an exposé on how courtrooms have begun using algorithms to predict a criminal defendant’s risk of future crime.10 These programs calculate risk scores that humans then use to determine everything from assigning bail bond amounts to deciding whether or not to grant parole.
In contrast to the ProPublica article, computer scientist Jon Kleinberg and colleagues built a machine learning algorithm that can make better bail-granting decisions than human judges.11 Kleinberg’s research shows that an algorithm that learns based on human judges’ prior decisions (in this case, five years of New York City arrests data), and accounts for various defendant characteristics (e.g., demographics, type of criminal charge, prior criminal record, or whether a defendant reoffends after being granted bail) could reduce crime without increasing jailing rates or reduce jailing rates without increasing crime. Implementing this particular algorithm would considerably reduce jailing rates of African American and Hispanic individuals.
The ProPublica exposé on such an algorithm (not Kleinberg’s, but a tool called COMPAS—Correctional Offender Management Profiling for Alternative Sanctions) revealed something more pernicious: increased racial bias in criminal sentencing. The report showed that, for example, the sentencing formula that Broward County, Florida, used from 2013 to 2014 misidentified black defendants as future criminals almost twice as often as white defendants. Even after accounting for defendants’ criminal history and criminal charge, the formula identified black defendants as 77 percent more likely to commit a future violent crime.
COMPAS used 137 questions ranging from queries like “Are there gangs in your neighborhood?” to “About how many times have you been fired from a job?” to “Is it difficult for you to keep your mind on one thing for a long time?” The relationship between these items and the risk of reoffending is murky, but one thing is clear: it will certainly require human beings to help determine the types of situations and personality traits that best predict criminal behavior to ensure these that decisions are morally just.
Beyond humans and machines dividing up moral decision-making, a second way to structure human-machine partnerships is to let robots do dull, routine work with humans taking on tasks that involve variety and spontaneity. Henry Wang, a former venture capitalist who has worked on investments in several early-stage AI companies, characterized these differences to me by noting, “A robot’s mind cannot wander,” whereas “the data you capture as a human being is broader and more random.”12 Given this distinction, humans and robots could complement each other using their opposing expertise, with robots mastering scripts, rules, and the routine while humans focus on variety, anomaly, and improvisation.
One question is whether such partnerships can emerge if robots have begun fully displacing workers in the “routine” jobs examined in chapter 6. Historically, we can observe the example of the automatic teller machine (ATM) and the bank teller. The machine performs the exceedingly rote job of counting and dispersing money whereas human bank tellers can handle more complex transactions while managing customers’ concerns about their personal finances or the security of the bank. Eric Schmidt, executive chairman of Alphabet, attempted to assuage worries over automation and unemployment by noting that introducing ATMs in fact increased employment of bank tellers. However, Schmidt failed to explain that this primarily resulted from ATMs enabling banks to lower costs and open more branches, thereby employing more tellers.13 Now that bank branches are declining (in part because ATMs are becoming more sophisticated and because many manage their money through mobile banking), this trend is unlikely to continue.
A more promising domain where humans and machines could split more routine and less routine work is cybersecurity. Several organizations have begun using machine learning to identify security risks by scrolling through endless data and using pattern recognition to flag potential threats. This task of identifying and storing patterns across millions of lines of data certainly qualifies as robotic work that few humans would enjoy. The problem is that these computerized systems produce too many false positives, tagging risks or attacks that turn out to be mundane behaviors. This is where humans come in—in particular, analysts who are capable of finding nuance and outliers in the data.
A study from MIT’s Computer Science and Artificial Intelligence laboratory demonstrates how humans and machines can work together to optimize this task.14 The MIT team developed a platform that could detect 85 percent of cyberattacks and reduce false positives to 5 percent through the following procedure: the platform sifts through gobs of data to identify any suspicious activity. It then reports a sample of its findings to human analysts. The analysts look in the sample to see if anything that the platform identified as a cyberattack is actually a false positive (something that the platform identifies as a virus may actually be anomalous). Then the human analysts feed that information back into the platform, which it uses to perform its next search for cyberattacks. Through this iterative process, the platform learns and improves its performance. The researchers suggest that an otherwise unsupervised machine would produce closer to a 20 to 25 percent false positive rate—not very promising.15 Meanwhile, humans working alone would produce too many false negatives, missing critical security threats.
Cybersecurity provides a beacon of hope for human-machine collaboration. This is because (a) cyberattacks have become more sophisticated and thus require more human input, and (b) the cybersecurity field currently faces a human talent shortage. A 2015 report by Cyber-Security Ventures estimated that 3.1 million cybersecurity job openings would be available by 2021.16 This trend suggests some optimism for humans and machines working together. As machines become capable of handling bigger and bigger data sets, we can let them do the dull, heavy lifting of scouring the web and matching patterns, while humans dig into what the patterns actually mean.
Another promising domain for this type of human-machine partnership is law. Law firms have begun employing technology that scans documents and aggregates data to determine which information is relevant for a particular case. This rote task of sifting through countless documents is the type of work that typically causes lawyers’ hours to balloon; these new technologies seem poised to save lawyers significant valuable time. For example, in a New York Times article, bankruptcy lawyer Luis Salazar describes using an artificial intelligence program to find cases that best match the one he is currently pursuing.17 As the program accomplished instantly the same task that took Salazar ten hours, he remarked, “It’s kind of scary. If it gets better, a lot of people could lose their jobs.” However, machines seem unlikely to master the art of persuasive argumentation and negotiation that is the core of human legal work. At least not yet.
Beyond partitioning principles of moral decision making or splitting up routine and nonroutine work, a third possible work arrangement between humans and machines is to divide emotional labor. Given that tasks involving empathy and emotion regulation can be depleting, offloading some of this work onto robots could restore our wherewithal to perform emotional labor as well. In an essay titled, “Let Robots Handle Your Emotional Burnout at Work,” science writer Meeri Kim suggests robots could mitigate the negative consequences of emotional labor, including burnout and compassion fatigue.18 This recommendation is counterintuitive considering research I have already described showing people dislike the idea of robots performing emotional tasks. However, designing robots that convey the proper emotional cues can mitigate this aversion (a topic to which I return below).
Kim suggests that jobs requiring social interaction are the most emotionally burdensome, noting, “Professions that require emotional labor, which involves inducing or suppressing emotion for the sake of a job, continue to see unprecedented levels of attrition, especially among customer service representatives, flight attendants, doctors, nurses, school teachers, and hotel employees.” She also describes a humanoid customer service robot named Pepper that operates in one of these emotion-laden domains. Pepper is a baby-faced robot that operates in retail stores and hospitals. Currently, Pepper provides only limited responses and primarily communicates through text message. This rudimentary functionality enables Pepper to answer small queries, which is helpful for customers who are double-parked and just need to know which store aisle contains shampoo. However, once Pepper resolves these basic queries, humans can offer more customized and direct responses as to which shampoo is best for dandruff or whether a particular brand is on sale.
Some successful examples of how to divide this emotional labor come from phone-based customer service software. Customers despise the gauntlet of questions they must answer before actually talking to a human being. A Harvard Business Review article by Accenture consultants discusses how technology can alleviate this experience.19 For example, they describe a Canadian financial services firm that uses a biometric program to identify the customer by voice, eliminating authentication questions altogether and improving customer service routing by 50 percent. They also describe a European Bank that uses biometrics to identify high-profile clients as their conversation progresses on the phone. This system has reduced call handling time, on average, by fifteen seconds, with 93 percent of clients rating the system positively. These systems can reduce the frustrations that customers typically offload onto innocent customer service agents, contributing to these agents’ burnout. By eliminating the emotional burdens of the job, technology can free up customer service agents to handle more substantive inquiries beyond verification. As described in chapter 6, empathy will be essential for humans to remain employable in customer service, and if robots can manage the small frustrations like user authentication, they can free up humans to empathize with more complex customer issues.
Beyond phone-based customer service, domains that explicitly require social care and understanding could benefit from dividing emotional labor between humans and machines. For example, several companies are developing robots for socioemotional tasks ranging from therapy to preschool instruction to eldercare. People’s general resistance to robots in emotional domains makes them a tough sell, but with the right features in place, they can succeed. Take, for example, the therapeutic robotic seal called Paro. In addition to being cuddly, socially responsive, capable of generating emotions, and able to learn people’s names, Paro is most importantly nonhuman, which makes it seem nonthreatening and nonjudgmental. Research has shown that elderly individuals who interact with Paro show improved mental health and physical activity levels, although I know of no evidence suggesting such care robots can fully substitute for human companionship.20
What role can humans play then if robots become capable of assisting with care? This is the question I asked Ai-jen Poo, executive director of the National Domestic Workers Alliance.21 Poo is a labor organizer who has devoted her career to fighting for domestic workers including eldercare workers. As the automation age coincides with a boom in the elderly population, Poo has begun thinking as much as anyone about how robots might collaborate with human caretakers. Poo told me, “There’s a lot of things that technology will make it easier such as helping sort medications.” She also mentioned how robotic technology could improve on the Hoyer lift (she calls it an “outdated piece of machinery”) to physically transfer limited mobility individuals into a car or a bathtub. However, Poo also believes there is “an artificial distinction between manual labor and emotional labor.” As she mentions, “The importance of touch and the intimacy that comes with that human contact is an important part of care.” Poo’s point resonates with other commentators warning that the decline of literal human touch in our daily lives is also contributing to declines in mental health.22 As we saw in chapter 2, human touch can spur compassion, generate cooperation, and even reduce pain, and machines cannot provide a substitute.
Given the importance of human touch, it appears that caretaking robots are best used to give human caretakers a breather. Employing robots for key tasks can help people in emotionally demanding empathic work avoid burnout and compassion fatigue. In addition, despite the promise of therapeutic robots like Paro, emotion-laden human-machine partnerships only work if humans will engage with machines and, in some cases, treat them as they do other humans. Thus, enabling these partnerships requires designing technology in humanizing ways that attracts users rather than repels them. And this goes not only for work robots but also for robots performing mundane tasks from shopping to commuting to helping us stick to our exercise routines. In the following sections, we explore the research on robot design to understand the essential design principles for optimizing human engagement.
A few years ago, an engineer from General Motors emailed me and asked if I was interested in studying the self-driving car. General Motors, like all the other forward-thinking automobile and technology companies, was developing its own version of a self-driving car and wanted a psychologist’s point of view. In particular, the engineer wanted my colleagues and me to answer a simple question: Would people actually like such a vehicle?
General Motors then purchased us a very fancy and realistic driving simulator so that we could conduct a simple experiment.23 We programmed the simulator to approximate driving in three separate ways for three separate experimental conditions. In one condition, participants drove as they normally would on any road, controlling steering, braking, speed, and changing lanes. In an autonomous condition, the vehicle controlled these features. In an anthropomorphic condition, the vehicle drove autonomously (as in the autonomous condition), but we also added three simple features. We gave the car a name (“Iris”), a gender (female), and a voice—Iris narrated the drive similar to a GPS system.
All participants drove the car through two courses: one basic driving course around a simulated city and one course that forced all participants to experience an unavoidable accident where a car backed out of a driveway and hit them. In addition, throughout the study, we measured trust in the vehicle in three ways. (1) We asked people to report how much trust and confidence they felt toward the car during the first course and how much they liked the car. (2) We measured their heart rate when they got into the accident in the second course to assess physiological arousal. (3) We videotaped their reactions to the accident and research assistants rated how startled the participants looked. These measures captured both self-reported trust in the car’s competence and trust in terms of participants’ comfort with the car.
Such measures enabled us to ask whether people would respond more favorably to driving normally or driving autonomously. And would the addition of human features make any difference?
We found that people reported trusting the car more and appeared less startled by the accident (both in their videotaped responses and in their heart rate change) in the anthropomorphic condition as compared to the autonomous and normal conditions. We also asked participants to indicate how much their car should be punished and blamed for the accident, and people held the car in the autonomous and anthropomorphic conditions more responsible. This response is unsurprising given that, in the normal condition, the car itself had very little to do with the accident. Yet interestingly, blame and punishment dropped in the anthropomorphic condition compared to the autonomous condition, suggesting participants gave the car the benefit of the doubt when it appeared more humanlike. Despite looming fears about the safety of self-driving vehicles, and despite people’s attachment to driving, people responded much more positively overall to the conditions where the car drove autonomously. And people responded even more positively when the car had a voice, name, and gender.
This study echoes the research presented in chapter 3 on how humanness engenders moral care. It demonstrates that mere humanness can encourage trust and affiliation with technology, allowing people to preserve their own humanity in the presence of robotic agents.
Considering the methodological and time constraints for our self-driving car study, we could only add the most minimal of humanlike cues—gender, name, and voice. When I have presented this work publicly, people have asked whether adding excessively humanlike features might pose a threat to users or provoke discomfort, to which I have responded, “Sure.”
Many are familiar with the concept of the “uncanny valley,” roboticist Masahiro Mori’s 1970 observation that as robots become more humanlike in appearance, people feel more positively toward them until a certain point where robots become uncannily humanlike.24 At this point, people become repulsed. Entities that fall within this valley include not only robots but also animated characters such as Tom Hanks’s conductor in The Polar Express—Hanks’s character appears just a bit too much like the real Hanks to fit into the human or nonhuman category, creating a discomfort that repels viewers.25 In Mori’s original essay, he describes a prosthetic hand that looks humanlike but feels dead and cold. This is where the discomfort emerges.
A 2016 viral video featuring legendary Japanese filmmaker and animator Hayao Miyazaki illustrated this phenomenon. The video depicts Miyazaki listening to a group of student programmers showing him an animation created by an artificial intelligence program. In the animation, a zombie with discernibly human features writhes across the floor in a manner unlike any human. An exasperated Miyazaki responds, “Whoever creates this stuff has no idea what pain is whatsoever. I am utterly disgusted,” adding, “I strongly feel that this is an insult to life itself.”26
In my many years of trying to study the uncanny valley, I have never truly figured out its practical importance. I have also observed that empirical tests of the phenomenon sometimes fail to produce its predicted pattern.27 As it turns out, Mori’s original title of his essay, “Bukimi No Tani,” does not translate to “uncanny valley” (a mistranslation popularized by American authors) but rather is better encapsulated by the phrase “valley of eeriness.” This more general phenomenon implies a simpler point: more humanness is not always better. Despite the overarching argument in chapters 2 and 3 that humanness enhances perceptions of meaningfulness, value, and morality, it appears that excessive human resemblance can also be off-putting. Indeed, technology theorist Nicholas Carr explains that one reason consumers have readily adopted robotic smart speakers (e.g., Amazon Echo, Google Home, Apple HomePod) in their homes is these speakers do not resemble humans. “Lacking human characteristics,” Carr explains, “smart speakers avoid the uncanny valley altogether.”28
In the next several sections, we will see how designing technology for effective human-machine interaction involves knowing which human cues to select. Considerable research has examined how to humanize machines optimally, presenting them as humanlike enough to engage users but not too uncanny to repulse them. Let us explore that research now.
As demonstrated by the success of vocally communicative smart speakers and our insights from the earlier self-driving car study, voice can humanize technology to positive ends. It may in fact be the most important feature for conveying humanness without spurring revulsion.
Renowned human-computer interaction expert Clifford Nass and his student Scott Brave wrote in their 2005 book Wired for Speech about the voice’s critical function for anthropomorphic technology. Nass and Brave describe how voice engages people with technology, stating that “over the course of 200,000 years of evolution, humans have become voice-activated with brains that are wired to equate voices with people and to act on that identification . . . Because humans will respond socially to voice interfaces, designers can tap into the automatic and powerful responses elicited by all voices . . . to increase trust, efficiency, learning, and even buying.”29 They suggest that voice is a fundamental cue to an entity’s human likeness, and research since has shown this cue engages us in several positive ways.
Research outside of technology use led by psychologists Juliana Schroeder and Nick Epley has shown how voice conveys credibility. In their studies, they asked professional recruiters and participants role-playing hypothetical employers to evaluate job candidates making a pitch.30 These observers watched, listened to, or read pitches that candidates delivered through video, audio-only, or text-only means. Participants rated candidates as smarter and more competent when they heard the candidates’ voices (either through audio or video means) compared to merely reading their pitches, even though the information conveyed in the pitches was exactly the same. What is more, Schroeder and Epley found video pitches were no better than audio-only pitches. Thus, their studies show that voice, rather than visual appearance, is critical to conveying credibility. In other work, Schroeder and Epley showed that adding voice (but not video) to a script of computer-generated text made people more likely to judge that a human being generated the text.31 This work suggests that voice critically engages people and that observable appearance provides little added benefit beyond voice in communicating the presence of an intelligent mind.
Voice is a critical humanizing cue for robot design, but how should designers optimize robot behavior? I believe a certain degree of spontaneity might be the key. As I discussed earlier in this chapter as well as in chapter 6, we tend to prize technology for its rote, predictable nature, whereas we value other humans’ ability to deviate from scripted behavior. However, programming a certain degree of unpredictability in machines can increase our engagement with them.
This instruction might sound counterintuitive, but for proof, take a study of toddlers’ responses to a humanoid robot in a childhood education center.32 The study involved observing children interacting with the robot over a five-month period in three distinct phases. During phase 1, the researchers introduced the robot, which displayed its full behavioral repertoire, including walking, dancing, sitting, standing, lying down, giggling, and making hand gestures. Over this time period, the quality of children’s interactions with the robot steadily improved as rated by the researchers. However, in phase 2, the researchers programmed the robot to behave predictably, mostly just dancing alone; the quality of interaction with the toddlers deteriorated substantially. When, in phase 3, the researchers reintroduced the robot’s full behavioral variability, interaction improved again with the toddlers now treating the robot as a peer rather than an inanimate object. They were fully engaged.
Some of my own research, noted in this book’s introduction, also supports this point. It demonstrates that robots that behave in minimally unpredictable ways can encourage people to want to make sense of the robot’s behavior.33 This sense-making motivation then prompts people to humanize the robot. For example, we programmed a computerized robot to respond to yes or no questions with consistent patterns (mostly yes or mostly no) or with unpredictable patterns (50 percent yes, 50 percent no). When the robot behaved unpredictably, people were more likely to treat it like a human being. In subsequent work, we showed that encouraging people to think about nonhuman entities as humanlike also made people feel that they better understood the entity. Our work suggests that a little bit of unpredictability spurring humanization can enhance interactions with technology.
The idea here is not to program robots to behave randomly but rather to introduce subtle variability outside of the robot’s specified task to keep people engaged. Minimal unpredictability motivates people to make sense of that unpredictability through humanizing it. In this way, making robots a tad less robotic can improve our interactions with them.
One example of what such a robot would look like is Mimus. Mimus is an industrial robot that computational design scholar Madeline Gannon developed and programmed to behave in an unconstrained fashion. A description on Mimus’s website reads, “Mimus has no pre-planned movements: she is programmed with the freedom to explore and roam about her enclosure.”34 On display at the London Design Museum, Mimus can mimic the movements of observers, respond to gestures by moving closer or farther away, explore its environment, or simply get bored and shut down. Mimus’s description follows: “Ordinarily, robots like Mimus are completely segregated from humans as they do highly repetitive tasks on a production line. With Mimus, we illustrate how wrapping clever software around industry-standard hardware can completely reconfigure our relationship to these complex, and often dangerous, machines. Rather than view robots as a human adversary, we demonstrate a future where autonomous machines, like Mimus, might be companions that co-exist with us.”
How Mimus would work in an actual industrial setting is still unclear. Would workers respond more amicably to an industrial robot that took occasional breaks to interact with them? Would human employees more quickly solve a problem that a machine cannot address if they are more attuned to that machine’s diverse behaviors? I would certainly welcome the experiment to test whether an ounce of machine spontaneity could facilitate better human-machine interactions on the factory floor.
The goal of designing a robot for which human employees feel responsible is to empower employees. This goal also addresses a consistent concern about the rise of intelligent technology—namely, that it will threaten humans’ autonomy. We humans, as it turns out, generally like to call our own shots, control our situations, and experience choice and independence in our daily actions. When technology threatens our autonomy, we might discard it even when it offers us convenience. One potential solution to this threat is making robots more childlike. Given that threats to autonomy are already a broad concern for aging adults, introducing robots that aid them in small tasks might exacerbate these concerns. If the robots doing the caregiving could also receive care, such a relationship could give elderly individuals a sense of responsibility and empowerment.
I asked Tandy Trower about this possibility.35 Trower is, as of writing, CEO of Hoaloha Robotics, after having spent twenty-eight years at Microsoft where he successfully launched the Windows operating system and other iconic products. In 2005, he joined Microsoft founder Bill Gates’s strategic staff and then led Microsoft’s initiative on robotics. He left Microsoft in 2009 to start Hoaloha, which specializes in developing social companion robots for the elderly that can perform various tasks from retrieving objects to scheduling reminders for users to call their grandchildren. Trower wrote to me, “It is very important that our user regard our robot as being in a subordinate, but not ‘servant’ role. This is important not only to ensure that the user feels in control (a basic tenet of good design), but also helps sets expectations appropriately. . . . Our approach starts with the paradigm center[ing] on a good companion, and our model is the role of conventional ‘sidekick.’ In almost every case, Barney to Fred, Boo Boo to Yogi (yes showing my age here), Robin to Batman, etc. there is a dominant partner and a subordinate one. In our scenarios, the user is the dominant partner. We reinforce this in several ways. First, our design presents our robot more as a youthful/child-like personality. When it introduces itself, it notes that it doesn’t know very much and requires interaction with the user to get better.” Trower clearly understands the importance of granting the user autonomy and giving the user an opportunity to care for others.
Designing eldercare robots to be subordinate or childlike follows the same logic of giving elderly individuals other opportunities for mastery. A famous 1976 study of nursing home residents by psychologists Ellen Langer and Judith Rodin showed that giving elderly individuals the responsibility to care for a houseplant increased their activity, alertness, and general happiness, as compared to a group given a plant under the care of the nursing home staff.36 Granting responsibility through plant care gave these individuals a sense of agency in their lives that critically enhanced well-being. By the same token, giving elderly individuals responsibility to “care” for a childlike robot could create a sense of mastery even as the robot is in place to assist them.
One radical example of a successful childlike robot is the Blabdroid, which is barely a robot at all. Developed by artist and roboticist Alexander Reben, the Blabdroid is a cardboard robot on wheels whose face resembles that of a wide-eyed infant. The Blabdroid also speaks in a child’s voice. In a panel I sat on with Reben to discuss the future of artificial intelligence, he noted that he specifically used cardboard instead of plastic and metal because it makes the robot “feel more familiar and more vulnerable.”37 Reben also presented a film for which he sent various Blabdroids out in the world across several continents, equipped with video cameras, to document humans’ responses to its inquiries. He programmed the robot to get “stuck” in public places and to request help and thereby demonstrated that passersby consistently showed a willingness to provide assistance. More impressively, Reben programmed the Blabdroid to ask strangers sensitive questions (e.g., “Who do you love most in the world?” or “What is something you have never told a stranger before?”) and found people willingly divulged their deepest secrets to the Blabdroid. People’s comfort with the Blabdroid’s childlike appearance seemed to put them at ease and make them willing to collaborate with it. Reben’s experiment also demonstrated that the most effective design for a robot interacting with humans does not need to be technologically sophisticated.
Beyond designing robots that preserve humans’ autonomy, it is also crucial to customize them to the particular task they are performing and to the user for which they are performing. Research consistently demonstrates that doing this optimizes the user’s experience. Demonstrating the importance of task matching, Michael Norton and I found that people preferred robots designed to appear capable of cognition to perform cognitive tasks (e.g., data analysis) whereas they preferred robots designed to appear more capable of emotion for more social tasks (e.g., social work).38 We drew on work showing that people perceive robots with “baby-faced” features such as a small chin and wide eyes to have greater emotional capability compared to robots with more masculine and mature facial features. In our studies, people preferred the more baby-faced robot for more emotional tasks. This finding also provides an answer to how to mitigate people’s discomfort with robots taking on emotional labor—that is, make the robot appear capable of emotion.
Research led by psychologist Jennifer Goetz also shows the importance of matching robot appearance to its specified task.39 In one study, Goetz and colleagues designed two-dimensional robots to be more humanlike or more machinelike, and asked college students which robots they preferred for jobs that were more or less social in nature. Jobs for which participants preferred more humanlike robots were more social in nature such as aerobics instructor and museum tour guide. Jobs for which participants preferred more machinelike robots were jobs that were less social such as a customs inspector or lab assistant. A subsequent study by these researchers found that people complied more with a robot that instructed them to stick to an exercise routine when the robot behaved seriously rather than playfully. This was because the task itself was not playful, but rather one that required discipline, the match between the robot personality and the task made users more engaged.
Along these lines, an experiment by psychologist Gale Lucas and colleagues showed what robot psychotherapy might look like. Lucas and colleagues showed that designing a virtual human to appear capable of emotional understanding increased participants’ comfort with the robot conducting a mental health screening interview, a situation rife with emotion.40 In this paradigm, the virtual human, which could detect specific emotions and respond in real time, appeared on a computer screen to conduct a semistructured interview. In Lucas’s experiment, the mental health screening interview varied across two conditions. In one condition, participants believed the virtual human was fully automated, and in another condition, they believed human researchers in the next room were controlling it. Participants answered interview questions, such as “Tell me about an event or something that you wish you could erase from your memory,” and evaluated the interview process overall. When participants believed that other humans were absent from the process and that they were interacting with artificial intelligence alone, they disclosed more personal information and reported less fear of being evaluated negatively. In other words, when the robot appeared independently capable of emotional intelligence, people engaged with it more naturally and effectively.
Other work by Clifford Nass and colleagues also shows the importance of not only matching the robot to the task but also matching the robot to the user. In particular, Nass’s work has shown the benefits of matching technological agents’ voices to the users’ emotional states and personalities. In one study, Nass’s research team measured participants’ personalities to classify them as introverts or extroverts, and then asked them to peruse an Amazon.com-like book-buying website.41 The website included information and reviews about books delivered by a vocal agent. In varying conditions, Nass manipulated the agent’s voice to sound more extroverted or introverted, producing speech with higher quickness, loudness, mean frequency, and frequency range to mimic extroverted speech and lowering these qualities to produce more introverted speech.
After exploring the website, participants rated each book reviewer agent, and both introverts and extroverts rated the agent whose voice matched their personality higher on credibility, liking, and review quality. Both groups also reported more interest in purchasing books from the website when there was a personality match. A follow-up study found similar results using an online auction website modeled off eBay.com. Nass and colleagues programmed an extroverted or introverted voice—manipulating the same dimensions of quickness, loudness, frequency, and frequency range as in the book recommendation study—to describe collectible auctions and found participants preferred the voice that matched their personality.
To take this idea further, Nass and colleagues conducted another study that used a driving simulator to examine matching machines and users on emotional state.42 In this study, the researchers first manipulated participants’ moods by playing film and TV clips that either communicated happy or sad themes to put the participants in a good or bad mood. Then all participants drove a simulated course narrated by a “virtual passenger” GPS system. For half of the participants, the voice of the virtual passenger spoke with a happy, energetic tone while for the other half of participants the virtual passenger spoke with a subdued, sadder tone. Participants could speak back to the virtual passenger, and the amount of that speaking served as a measure of engagement. Afterward, participants completed a questionnaire evaluating the overall experience. The researchers also counted how many accidents participants got in during the simulation and how attentive they were while driving, measured by their reaction time to driving-specific tasks.
As predicted, matching users and agents on emotional state greatly improved users’ driving experience. When a happy-sounding virtual agent accompanied happy drivers and when a sad-sounding agent accompanied sad drivers, participants got in fewer accidents, reported more alertness, and responded more quickly to events on the road even while they communicated more with the virtual passenger. Emotion-matching enhanced user engagement and safety.
Together, these findings suggest the importance of robot customization for work, leisure, shopping, commuting, and more. Knowing the task that the robot will perform and knowing who the user is can enable robot designers to tailor the robot to the specific user experience, optimizing the human-machine relationship.
SIX MONTHS before beginning this book, I attended an academic conference in Tel Aviv. The experience of traveling to Tel Aviv was largely bereft of human beings. From my home in Chicago, I used the Uber app on my phone to request a ride to the airport, bypassing the old system of having to call and talk to a cab dispatcher. I put my headphones on in the cab and listened to a Kamaiyah album instead of talking to the driver. During the ride, I pulled up my boarding pass on my phone so that I could avoid any gate agent or boarding pass–dispensing machine at the airport and headed straight to the security line.
I took an initial flight from Chicago O’Hare to Newark where I had a four-hour layover. The Newark airport offers a fantastic and novel (to me) dining experience. Simply sit down at any table, view the menu presented on an iPad affixed to that table, select on the iPad what you would like to eat, and a server delivers it to your table with minimal interaction. You pay using the iPad when you are finished.
After eating my spaghetti Bolognese and drinking a diet Coke, I realized I had forgotten my phone charger, and with little time before my next flight, I found a vending machine that dispensed electronic devices. Using the machine precluded me from waiting in a store line or navigating overeager salespeople. When it was finally time to board my Tel Aviv flight, as luck would have it, the seat next to mine was empty. This was as human-free an experience as one could possibly have on an international journey.
One need not be an introvert to find the experience I described above as enjoyable. Even the most gregarious among us appreciate both freedom and efficiency during travel. Yet as I began to dive deeper into the research required for this book, I realized that a human-less experience is greatly deficient compared to a more human one, even if it comes at the expense of convenience. As technology becomes more advanced, travel and other basic activities will become even more human-free (and the experience described here will likely sound quaint to future readers). Human beings will become less relevant, as navigating the complexity of others’ feelings, desires, wants, needs, and opinions will become less of a necessary activity. Therefore, it is incumbent upon us to push back against automation or to at least make our interactions with technology more human.
We might also ask, given humans’ desire to connect socially with others, why would so many of us prefer a human-less experience in the first place? As we’ve already seen, interacting with humans is mentally demanding and empathy is one of the hardest tasks in our cognitive repertoire. Because of our limited capacity for deep social engagement, we often discard or overlook other humans who do not reside within our immediate social circle (those who differ from us in psychologically significant ways). In the next two chapters, we learn how to improve relationships with these socially distant others, as well as with those closest to us.