Fifteen

CONVERSATIONAL HUMAN

The dumb presence of objects is a silence that prods human beings into speech. It seems never to matter that the exchange is naturally one-sided. To dolls and teddy bears little children jabber incessantly. Gardeners can sometimes be overheard urging their budding tulips to flourish. “Geez, you’re beautiful,” a Chevrolet-lover says, under his breath, at the sight of his future possession. And, in our era of high technology, more and more among us mutter at computer screens.

But unlike the toys, plants, and cars, the wizziest computers have begun to talk back. Ask yours for the date of your next dentist appointment, the height of Mount Everest, the number of miles that separates New York from Sacramento, and an electronic voice—if you can call it a voice—with an unplaceable accent responds. Some tinker with the voice until it sounds plausibly male or female; turns flirty or sniffy; brags or dodders. You can also customize it to answer to a name.

Fifty years ago, “Eliza” was one of the first programs to simulate human conversation. When the user at MIT hit return after typing “Nice to meet you,” so prompt was the “Oh…?” from Eliza, which flashed up in chunky letters on the screen, so disarming, that it could feel as though a woman waiting on the other side of the computer’s connection had just sent it. In the space of a few stock words, a few canned phrases, the character of an inquisitive psychoanalyst would take shape. To such an extent that the user mightn’t be sure if what he sensed was empathy or his imagination.

USER: Nice to meet you.

ELIZA: Oh…?

USER: My name is John.

ELIZA: I don’t care about names, please go on.

USER: I have a fear of computers.

ELIZA: I see.

USER: What should I do?

ELIZA: What do you think?

USER: Should I trust you?

ELIZA: We were discussing you, not me.

“The fact that individuals bind themselves with strong emotional ties to machines ought not in itself to be surprising,” the program’s Pygmalion, Joseph Weizenbaum, wrote in Computer Power and Human Reason, which was published in 1976. “The instruments man uses become, after all, extensions of his body.” What pens had been for our hands and bicycles had been for our legs, computers were becoming for the brain: a computer could do the remembering and the calculating of a thousand men. But would a computer ever learn to converse? To employ human language with wit and feeling and creativity? Weizenbaum himself was skeptical. “Eliza was such a small and simple step. Its contribution was, if any at all, only to vividly underline what many others had long ago discovered, namely, the importance of context to language understanding.” Without the user’s indulgent fancy to flesh out the bones of the program’s palaver, no dialogue could ever get going: Eliza was little more than a speak-your-imagination machine. Weizenbaum was skeptical, and none of the later generations of “chatbots” gave him any grounds to reconsider.

One of the latest and most talked-about is “Evie,” a youngish bot with blinking green eyes, smiling pink lips, and flowing brown hair (it seems that bots are almost always made to look and sound like women). According to its makers, Evie comes out with statements that have all been acquired at some point in the past ten years from the things people type to “her.” For this reason, its database of possible answers is vastly bigger than anything Eliza had to draw on. Even so, there are some strange moments when I attempt a chat with the pixelated face on my computer screen. A remark I type to Evie about Buster Keaton leads it to reply—actually to spit out, Spock-like—that I am “making sense.” “Am I?” I ask. “Yes, you are the love of my life.” A clumsy play for empathy, if ever there was one. I change the subject. I try books. I wonder about her reading habits. Is she in the middle of a novel? Incorrectly, Evie replies, “You already asked me that.”

“It’s already uncanny how good some of these robots coming out of Japan look and sound,” Professor Naomi Susan Baron tells me over the phone from her American University office in Washington, DC. Perhaps the linguist is attempting to dampen my skepticism about the value of chatbots. Baron is the author of a recent academic paper, “Shall We Talk? Conversing with Humans and Robots,” which is how I discovered her work. I ask her outright whether computers will ever master conversational Human and she says, “That’s the $64,000 question. I don’t have a firm answer to give you, but I’ll say this. Take syntax. Very complicated, all the ways in which people build sentences out of words and phrases. Very complicated, and yet computers can manage that nowadays. They passed that hurdle. Conversation is the next hurdle. Maybe insurmountable, maybe not.”

Professor Baron has been doing a lot of thinking about the likely features of authentic “computer talk,” comparing it with the various kinds of talk linguists usually dissect. Talking with a computer would be much like talking with a foreigner or a pet or a child, she thinks. “Like child-directed speech. We used to call it motherese until we realized there were also stay-at-home dads. The characteristics are fairly consistent across cultures, across classes: the parent speaks to the child with higher pitch, greater articulation, slower delivery.”

Foreigners, pets, children: all have lower status. To believe Baron, machines may never be permitted to become our conversational equals. “They’ll be designed to do our bidding. To be informative. Or entertaining. Or both. But we likely won’t accept any backchat from a computer. We won’t want them to tell us things we don’t want to hear. Conversation is about control, about power. Raising your voice, for example, or switching topics. We won’t want to relinquish that power.”

Computers that grovel, pander, flatter, cheerlead. Computers as feel-good coaches, coaxing surplus calories out of their owners: “Keep up the good work!” “You’ve done just swell!” Baron envisages them as sponges, possessing magpie memories: not only would they ask you what you want for your birthday and anticipate the very moment when to dial up this or that friend, but also recall—via data-collecting bracelets and questionnaires—your every action and concoct menus based on how much of what and when you ate. Vigilant and obsequious. “It might ask you whether you enjoyed the particular brand of spaghetti Bolognese you had three days ago on Wednesday.”

“But wouldn’t that sound a little too pernickety? Too much like a computer?”

Baron laughs. She knows people who talk like that, she says. But she wants to make a bigger point, and to do so she describes toy robots presently manufactured in Japan for kids—toy robots in the form of seals. In Tokyo, these seals sell by the thousands. They squeak and look cute and cuddly. They are, in other words, only vaguely like a real seal: no sliminess, no sharp teeth (to shred and devour fish), no pinniped odor. No person in his or her right mind would buy a toy that resembles a seal too closely. The same would be true of any machine that prattles, argues, blunders as human beings do. “You’d take such a machine back for an immediate refund. Too wordy. Always veering off-topic. Interrupting, mixing up meanings or forgetting whatever it was it wanted to say next. At the very least, you’d trade it in for a less human, more computer-like model.”

And here Baron returns to where she began our conversation: the Japanese generation of humanoid robots. “There’s a tipping point where human-like becomes too human,” she says. She brushed up against that point on a recent trip to Asia. “I was at the airport. I went up to a lady at the counter. Only, the lady wasn’t a lady. It was a robot. Complete with eyelashes, uniform, and good manners. Very polite. When I approached, it gave a little Japanese bow.” Baron’s response was pure stupefaction. “I bowed back. Then the robot spoke a greeting. It inquired how it could be of assistance to me. To see the robotic lips move was eerie. Watching the gestures, hearing the words coming out of those lips, I felt my flesh begin to crawl. I couldn’t help that.” An automaton, all wax and wires, hidden in the convincing guise of a demure customer service employee, but the linguist found the encounter disquieting, disorienting. Even so, she thinks the squeamishness we feel now needn’t always constitute an obstacle. She can recall the period, in the seventies, when the first home-answering machines were similarly off-putting. Older folk in particular were frequently at a loss for words when confronted with the shock of the beep. “But there’s nobody there,” they complained to Baron, then a young researcher. The appropriate phrases, short, crisp, unruffled: the “Hi, it’s grandma,” and the “Just wanted to hear how your day went” and the “Can you give me a quick call when you get this?” all took time to acquire. But acquire them, in the end, they did.

I think it’s fair to say that Baron is a techno-optimist. But she is quick to raise her own reservations. What if the naive, the vulnerable, are taken in by online talk sharks? By programmers without all their scruples, whose chatbots preach, browbeat, coquet? As Baron says this, tales of email scams—a heart broken here, a thousand dollars lost there—come to mind; if you think that a swindler’s typed-out text can achieve so much, it is easy to imagine how much more persuasive a sweet-talking program might be. Indefatigable, unpunishable, they would roam freely along the Internet’s electronic byways, ready at all hours to snare, to trick, to fleece their next victim.

“And what if a robot won’t accept no for an answer?” Baron wonders. There is unease in her voice. She asks me to picture an old woman living in a nursing home. The home doesn’t have enough caretakers to go around. To save time, the staff assigns the old woman a talking robot. The robot is strict: three times a day—morning, noon, and night—it must see that its patient takes her tablets. But say the old woman is headstrong. She wasn’t always sick and old. Say she was once a bigwig, her career filled with clash and ego, and now cannot stand to take orders from a jumped-up cash register. “Ms. Henderson, it’s time for your medication.” The robot repeats itself when the old woman pretends not to hear. “Ms. Henderson, you must take your medication,” it intones. “Your pulse is currently five beats below the normal level.” It utters something about blood sugars too, but the old woman still doesn’t budge. With her child—assuming she had a child—she might relent and throw the little blue pill into her mouth; with a nurse—after a respectable amount of fuss—she would finally sluice it down with a large tumbler of water. But with a robot? Never!

“If the old woman is still sprightly, if her faculties are still intact, then she can always go for the off switch, I suppose. That’s the big difference between robots and humans: having an off switch,” Baron says. What, though, she worries aloud, if the rules of human-robot conduct prevent this, forbid patients to unplug their caregiver? Robots talking patients into obedience on one side, and, on the other, patients unwilling to hear the machine out: textbook conditions for a shouting match. “When you overhear a row between neighbors in an apartment block, or between a customer and a member of staff in a shop, that’s unpleasant enough as it is. How then would we react to a war of words in which one of the protagonists is mechanical?”

A program that sweet-talked or squabbled persuasively would have a good chance of disproving Descartes. Three hundred years before the digital computer was invented, he wrote this in his Discourse on Method and Meditations on First Philosophy:

If there were machines which bore a resemblance to our bodies and imitated our actions as closely as possible for all practical purposes, we should still have… very certain means of recognizing that they were not real men… they could never use words, or put together signs, as we do in order to declare our thoughts to others. For we can certainly conceive of a machine so constructed that it utters words, and even utters words that correspond to bodily actions causing a change in its organs (for example, if one touches it in some spot, the machine asks what it is that one wants to say to it; if in another spot, it cries that one has hurt it, and so on), but it is inconceivable that such a machine should produce different arrangements of words so as to give an appropriately meaningful answer to whatever is said in its presence, as the dullest of men can do.

Descartes’ language test was a thought experiment, a seventeenth-century defense of the specialness of human reasoning; it wasn’t intended to be operational. But the Turing test (named for the British pioneer of computer science, Alan Turing), first advanced in 1950, proposes a simple means of putting through its paces a program’s ability to talk.

It goes like this. An “interrogator” sits alone in a room before a computer screen. He is wielding a keyboard, and the messages he sends in quick succession go out to a pair of respondents in separate rooms. One is a man or woman who replies to the interrogator’s questions and comments as any man or woman might. The other is a program, built to interact just like a confirmed talker. The interrogator has five minutes to tell the two apart. Puns, jokes, idiosyncratic turns of conversation: all forms of talk are permitted. If, after the dialogues, the two remain indistinguishable, the program passes, and we can discard Descartes’ objection: a machine will be said to have conversed.

Yet Eliza, Evie, and the other chatbots remain very far indeed from passing. So far, in fact, that you wonder whether Descartes’ inconceivable means not only unthinkable—as a flying machine might once have been unthinkable—but also impossible: impossible as a pig that flies. Talk as somehow fundamentally robot-proof. The clumsy exclamations, the non sequiturs, the wisecracks that time and again fall flat as card castles: the computer’s failure to say the right things does indeed seem telling. Its prowess in other disciplines, improving by leaps and bounds, makes the failure all the more remarkable. Programs have for years outperformed even the strongest chess masters and played checkers to perfection—literally. At the time of this writing, a program has for the first time bested a human champion at the ancient strategic game of go. (The human in question, a twenty-three-year-old South Korean, boasted in a prematch press conference that he would wallop the machine five to nothing; he lost four to one.) And then there are the face-recognizing programs, the knee-bending robots, the quiz-show-question–answering machines. Only the computer’s language smarts still leave much to be desired. Only in the area of language is a Cartesian disdain toward the machine still tenable. The computer stands tongue-tied; and its silence grows newsworthier by the year.

It isn’t for any want of trying. For twenty-five years, an American businessman has reportedly offered $100,000 to the designer of the first chatbot crafty enough to fool a majority of its human interrogators. Every year programmers enter their latest creations—complete with first names, family names, and biographies—into the competition; and, in fairness to the programmers, every now and then one of the more gullible or unimaginative judges mistakes a bot’s “quirkiness” for a foreign adolescent’s snark. That is the exception, though. The most fluent, thoughtful, engaging texts, the programs never manufacture. They never “produce different arrangements of words so as to give an appropriately meaningful answer to whatever is said in its presence, as the dullest of men can do.” Not once has the businessman been at any risk of being separated from his money. And the annual publicity the media gives his contest certainly does his business no harm.

Some days after my discussion with the linguist, it occurs to me that I might need the advice of someone who spends time with these programs. Someone who knows bytes from RAMs. I don’t have that kind of knowledge. A computer’s innards are a total mystery to me. I talk it over with a friend, an information technologist. He goes quiet. Then he tells me not to delve into the technical side of things, that it isn’t necessary, and he suggests a name: Harry Collins. Not a computer specialist as such, it turns out, but a sociologist doing interesting work on the Turing test.

I email Collins and make an appointment to call his office at the University of Cardiff. He sounds like a man between meetings when I telephone. For an instant, I dread having to tell my friend that the discussion with his sociologist came to nothing; but, very quickly, my anxieties evaporate. Collins’s tight schedule (in the course of our conversation he mentions having three academic textbooks in the works) has made him curt, but also focused. He explains everything briskly and precisely, and I feel grateful for that succinct precision: a question saver.

On the present chatbots and their makers:

“The businessman’s contest is nonsense. It’s only a measure of doing best rather than of actually doing. But even the least worst bot can’t talk. It can’t converse. It can’t use human language appropriately. Some of those in the field—the most optimistic—say, ‘Wait another twenty years. You’ll see.’ Frankly, I don’t believe them. They’re hypesters; they’ll say anything.”

On the Turing test:

I’ve performed my own experiments that are variations on the Turing test. The same setup: the computer screens and the keyboards and the separate rooms and so on, but with a second person on the receiving end rather than a program. Human to human, as in real life. The idea is to better understand how we humans communicate, how we make ourselves understood, how we employ language to pass for one of us and for a particular kind of person.

In one experiment, we had a group of colorblind subjects. We said to them, “Reply to the interrogator’s messages as though you can see colors just fine. So, for example, if an interrogator asks for your favorite color, you type back, blue or yellow or fire-engine red, you name it, even though you’ve never seen anything blue or yellow or red in your life. The idea was to explore how they performed in the language of those who see the world in color.

Collins reports that his subjects performed flawlessly. Without difficulty they discussed flower arranging, spun tales about playing snooker, described their impatience at waiting in their car for a traffic light to turn to green. Their interrogators couldn’t tell whether they truly saw colors or not. The reason, he explains, is that the subjects had all been immersed from birth in a color-seers’ society; in it, they had acquired the language down even to familiar expressions about “seeing red” or “feeling blue.”

Collins took Turing’s imitation game a stage further in a second experiment. He asked a group of blind subjects to converse at distance with their seeing interrogators. The subjects used screen-reader software to hear the keys pressed and the words formed as they typed their answers. “They had all lost their sight when very small, by the age of two or three, so they had no memory of the visual world.” Even so, the interrogators were unable to determine from what their correspondents wrote that they were blind. The visually impaired, raised in a society in which vision predominates, had “sighted language.” To every question they could provide a “right type of answer.”

I ask Collins for an example of the questions posed to these subjects. “Well, for instance: ‘Around how many millimeters must a tennis ball drop from the line to be considered out?’”

They had never held a racket or swiveled their head left and right, left and right, to follow a tennis match. But they knew family or friends who had. And some of them had listened to sports commentaries on the radio.

“Then we turned things around. We asked a group of seeing subjects to type-talk as though they couldn’t see and had no memory of having ever seen. In a word, to use ‘blind language.’” They couldn’t. The interrogators, who were all blind, were able to tell right from the opening question that the subjects were only pretending.”

The interrogators’ first question was the simple-seeming “How old were you when you went blind?”

Collins: “The subjects would say things like ‘two years old’ or ‘I lost my sight when I was three,’ whereas a blind person will reply something like ‘It began when I was two, and I was registered blind at three and a half.’”

Because they hadn’t been raised by blind parents or mixed with blind friends, the seeing subjects had never learned how the nonseeing speak among themselves. They had never learned that to speak of going blind was to speak of a gradual process.

“What the subjects in these tests, the colorblind and the blind, were doing wasn’t guesswork. They weren’t attempting to ‘talk the talk.’ Something else, something far more interesting, was going on. The subjects displayed a sort of language know-how. They knew what green or tennis meant, but more importantly they knew precisely how talkers in green-seeing and tennis-playing societies use these words in everyday conversation. They could reproduce the appropriate conversational behavior at any given moment. They could ‘walk the talk.’”

Language as a stand-in for the body, as a substitute for direct experience: conversational Human is the outcome of our “talk walking. Often that talk is small. But, Collins adds, there are occasions when it has to turn more complicated, when we must address a lawyer, for example, or a doctor, and conversation suddenly comes less easily. Even so, most defendants and patients manage. An average person’s language know-how can be surprisingly broad and deep.

To test just how deep, in 2006 Collins performed his most impressive experiment. On himself. “I’m a sociologist of scientific knowledge. I’ve spent my career studying the men and women who do gravitational wave physics. I’ve hung out with them. Talked for hours on end with them. Immersed myself in their community. Now, I can’t perform any of their calculations, no one would ever let me loose on a soldering iron, but I can talk just as they talk. So one day I decided to put my money where my mouth was.”

Collins asked a panel of gravitational wave physicists to send him and a gravitational wave physicist a list of questions to answer separately. One of the questions went as follows:

A theorist tells you that she has come up with a theory in which a circular ring of particles is displaced by gravitational waves so that the circular shape remains the same but the size oscillates about a mean size. Would it be possible to measure this effect using a laser interferometer?

The physicist wrote back:

Yes, but you should analyze the sum of the strains in the two arms, rather than the difference. In fact, you don’t even need two arms of an interferometer to detect gravitational waves, provided you can measure the roundtrip light travel time along a single arm accurately enough to detect small changes in its length.

Collins, simulating a physicist, replied:

It depends on the direction of the source. There will be no detectable signal if the source lies anywhere on the plane that passes through the centre station and bisects the angle of the two arms. Otherwise there will be a signal, maximized when the source lies along one or other of the two arms.

Out of the nine judges on the panel, seven considered the quality of the answers to their questions to be identical. Only two dared to identify the nonphysicist. Neither chose Collins.

“Apparently, in one of his responses the genuine physicist had drawn on ideas from a published paper. I hadn’t come across that paper. I had to come up with my own answer. The two judges thought, ‘Only a genuine physicist could write something like this.’”

Collins passed this version of the Turing test because, like his colorblind and blind subjects, he had gained enough “interactional expertise.”

“Reading papers and books and newspapers alone won’t do. You have to spend time, lots of time, in conversation with people who know from experience what they are talking about.”

I agree with Collins. I tell him his theory matches my experience as a writer. In preparation for my novel Mishenka, the story of a Soviet chess grandmaster’s intuitive search for meaning, I spoke in person and at length with several grandmasters. I visited the home of the former world champion Vladimir Kramnik, the feller of Garry Kasparov, and drew anecdote after anecdote out of him. In Paris I sat backstage amid the analysts and reporters at a tournament in which the world’s strongest players were competing. All in order to put myself in my character’s thoughts.

“Exactly. You can do that, whereas no one can think up a way computers might ever socialize. They haven’t a body. Maybe they don’t require a lot of body. Maybe a tongue and larynx, a pair of ears and eyes, the mechanical equivalents, would be sufficient. But then, how do we go about embedding the machine in a human speech community? Making it a participant in the circulation of meanings? The very notion seems to me to be a nonstarter.”

A nonstarter. Why then did Turing foresee fluent machines in the course of the twentieth century? (He wrote that machines would likely converse with ease by the year 2000.) Probably, with his head for data, he assumed that conversation could eventually be boiled down to a science. Many other intellectuals of the postwar period believed that the brain was a squishy computer, that human language was nothing more than a digital code. The metaphor, even its critics admit, has the benefit of being seductive. Clingy. Through the disappointment of failed predictions, its popularity has survived.

“I blame Chomsky,” Mark Bickhard says. Bickhard, a philosopher of language, is speaking to me from his home in Pennsylvania via Skype. He says,

Back in the fifties, his work was all the rage, and of course it still has clout. Essentially, Chomsky claims that you and I understand sentences because of their structure—the order of the words, rules of grammar, and so on. I have two things to say about that. One, yes, of course language is in part structural, but then so too are any number of skills. Fire-making, for example, has its own syntax: you have to perform all the different subtasks—collecting the tinder and kindling, striking a match, blowing onto the logs—in a particular order for the fire to burn. That doesn’t make fire-making a language. Two, learning theory has come a long way since the fifties; we now know the huge role played by situational context, and the many intricate semantic relationships between words, in how we communicate.

Bickhard is seventy, with a bald dome and intense eyes. As befits a resident of Bethlehem (Pennsylvania), he has a prophet’s long white whiskers. Behind his desk, on a side table, sit thick books atop even thicker books. He came to language by accident, he tells me. Forty years ago, after writing a dissertation on psychotherapy, he was told it contained too much math and was asked to include a chapter on language. Bickhard spent “a whole bunch of years” studying and rejecting every linguistic model then available. But his fascination with what makes language language remained.

Language, according to Bickhard, is dynamic. Like Rorschach blots, words need to be constantly interpreted, and always require us to do some filling in. “You walk down some old wooden stairs, and one of the steps creaks. Instantly, you know what that creak means: ‘Gee, I’d better get off—it’s about to break.’ Well, the same sort of thing happens all the time with words. A father who hears his little boy say something like ‘I buttoned the calculator’ understands him perfectly, knows exactly how to respond to him, even if, strictly speaking, the words themselves are nonsense.

“Or, imagine you hear someone shout, ‘Roast beef at table three needs water.’ Nonsense, too. Unless, that is, you’re sitting in a restaurant. Waiter talk.”

In any given situation, the meaning of a word, a phrase, unfolds dynamically. It cannot be second-guessed. “You’re in a restaurant. You take a menu and you order. ‘Roast beef,’ you say. The waiter returns a while later with your dish.” For Bickhard, there is always much more going on than meets the ear. “What we have to ask ourselves is this: how does uttering ‘roast beef’ change the social reality in which the speaker participates?”

Does uttering roast beef carry a higher social value than, say, pork chop? Does the waiter, who took you for a vegetarian, see you henceforth with different eyes? Does the utterer’s friend at the same table conceal a grin as his memory sings “This little piggy had roast beef, this little piggy had none”?

“Words transform the world around us. Learning a language is learning how roast beef transforms a situation compared to roast chicken or indeed I’m tired, the one over there, or See you around.”

I’m listening carefully to Bickhard, my pen running fast with notes, when all of a sudden our connection gives out: the philosopher vanishes in mid-sentence. Minutes pass. Finally, he calls me back and the screen fills again with his white beard and navy-blue sweater and the side-table pile of books.

Humans in conversation, he concludes, update and modify social reality from moment to moment. Meanings are broached, negotiated, tussled over. Big things are at stake. Computers, on the other hand, inert and indifferent, “can’t care less” about meaning. It is this can’t-care-less-ness that will forever keep them imitating people’s words.

I care about the philosopher’s words. They can change me, and I let them. When I turn off my laptop it feels warm. I notice that. Not the warm of a friend’s hug or handshake; only of electricity, I think. But without it, how much less of the world’s meaning would our brains transform, convert?