27   The Final Frontier: Computers with a Sense of Humor

A robot goes into a bar.

“What can I get you?” asks the bartender.

“I need something to loosen me up,” says the robot.

So the bartender serves him a screwdriver.

—@jokingcomputer8

How does AI do when it comes to that most human of storytelling activities—telling jokes?

Graeme Ritchie’s Joking Computer tweets a joke a day. Let’s start with an example.

Question: What do you get when you cross a fall with a dictionary?

Answer: A spill checker.

It’s early days for the computer, but it’s trying.

The problem is, not only is there no accepted computational theory of humor, there is no single accepted theory of humor at all, despite efforts over the centuries from everyone from Plato and Aristotle to Freud. In fact, there are three at present. The most prevalent is the “incongruity theory,” put forward by Schopenhauer, Kant, and Bergson and developed more fully by Victor Raskin at Purdue University. This is the way most jokes are constructed: a setup followed by an unexpected or bizarre—incongruous—punchline. The other two theories are the “release theory” (that humor releases nervous energy and tension) and the venerable “superiority theory,” beloved of Plato and Aristotle, that we find humor in others’ shortcomings and misfortunes—schadenfreude.

This being the case, how on earth are we to begin instructing computers in the art of humour?9

Computers can beat world champions at complex games like chess and Go, identify patterns in huge sets of data, do massive calculations, even recognize faces in a crowd. But such feats take place inside a machine with limited access to the outside world, particularly in regard to knowledge and feelings. It’s a closed system. Eventually they will be able to accrue knowledge by scanning the web and will actually take physical form; they’ll be “embodied” as robots, enabling them to interact with the world around them and have and store experiences of their own. They may even begin to create of their own accord.

But will they then start telling jokes? Humor is the last frontier. Getting a joke, cracking a joke—perhaps an off-color one—employing sarcasm, timing, irony, all require social awareness and a rather wide knowledge base. Jokes do not travel well across cultures. What I find funny may not be your cup of tea. Plus, all the above are time dependent. What was funny yesterday might be shocking or disgusting or just plain boring today. Today’s joke could be tomorrow’s gaffe.

Humor is a very creative activity. It might involve taking a startling new perspective on received wisdom, turning a situation upside down, undermining clichés and commonplaces. Every dimension of intelligence touches on it.

Here are a couple of jokes for a start:

The person who invented the door knocker got a No Bell Prize.

Veni, Vidi, Visa: I came, I saw, I did a little shopping.

Straightforward? Getting the first involves knowing what a Nobel Prize is and what a door knocker is, for a start. The second requires a knowledge of rudimentary Latin, of Caesar’s immortal words, and of what a Visa card is. This knowledge would either have to be programmed into a symbolic machine—akin to programming the entire world and all its knowledge into the machine—or a neural network would have to have scanned the web well enough to grasp human language in all its intricacy, including metaphor, irony, and sarcasm; not quite up to the task as yet.

Cracking the problem of humor is akin to solving AI itself: a matter of evolving computers as intelligent as humans.

In the early years, humor, as well as any other dimension of intelligence to do with emotions and feelings, was rejected by the founding fathers of AI as unworthy of study. In their proposal for a summer research project to be held at Dartmouth College in New Hampshire in 1956—the famous Dartmouth Conference—they wrote: “The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”10 They envisaged a field that studied forming concepts, learning, reasoning, problem solving in general. Emotions, however, had no place in it, and neither did humor.

Emotions did not begin to enter AI research until the 1990s with the work of Rosalind Picard at MIT, which I discuss in detail in chapter 42.11 In 1997, she published a book called Affective Computing, which focused on the importance of emotional intelligence, the vital role that the communication of emotion has in relationships, and the possible effects and uses of emotion recognition by computers. This seminal work effectively kickstarted the field.

Even by the mid-2000s, humor studies were still not well developed. I asked Julia Taylor Rayz, a humor researcher at Purdue Polytechnic Institute in West Lafayette, Indiana, why she went into the field at that time. She replied, “It was challenging and sounded like fun—and at that point nobody had done it.”12

One of the first projects in this area was HAHAcronym, Humorous Agents for Humorous Acronyms, funded by the EU and devoted to computational humor, carried out by Oliviero Stock and Carlo Strapparava in Trento, Italy.13 The idea was to create an “acronym ironic reanalyser and generator.” HAHAcronym subverts existing acronyms and constructs new ones by taking “words that are in ironic relation with input concepts.”14 The system’s database is WordNet, a massive corpus of words grouped into sets of synonyms, each expressing a specific concept. Stock and Strapparava started by creating a separate database of contrasting pairs—religion and technology, sex and religion—to produce an “incongruity generator.”15 Then they fed their system existing acronyms to see what would come out, the idea being to keep the flavor of the acronyms while using irony to subvert them.16

The computer reimagined FBI—Federal Bureau of Investigation—as Fantastic Bureau of Intimidation, a not-inappropriate assemblage of concepts, and PDA—personal digital assistant—as Penitential Demoniacal Assistant. But what happened when they asked the computer to generate new and humorous meanings for acronyms from scratch? When they fed in the acronym FAINT, plus a main concept (tutoring) and an attribute (intelligent), the computer came up with Folksy Acritical Instruction for Nescience Teaching. NAÏVE, with the same concept and attribute, generated Negligent At-large Instruction for Vulnerable Extracurricular-activity. Which might raise a smile, at least.

One of the problems with computational humor is finding funding. In 2010, Kristian Hammond, a professor at Northwestern University in Evanston, Illinois, received a $700,000 stimulus grant from the National Science Foundation. No less a figure than Senator John McCain dismissed Hammond’s project as a “joke machine” and declared the grant a waste of taxpayers’ money.17 It certainly, he added, could have no impact on creating jobs.

Hammond’s software looks at news stories and social media and assembles words to form original lines of thought—or even jokes. In answer to McCain, he commented that a way to enhance human-machine communication might be with humor. This line of research, he said, might even “be used to write scientific papers.”18 Hammond went on to found Narrative Science, a company that churns out computer-generated news stories, and in so doing certainly created a few jobs.

As we’ve seen, not much has been done toward developing a general theory of humor, though there is plenty of humor research. Researchers prefer to limit themselves to chipping away at well-defined joke scenarios. As Julia Taylor Rayz explains, there are two aspects of humor that researchers focus on: computers generating jokes and computers recognizing jokes. The first step is to feed the computer on a diet of jokes so that it learns to create jokes of its own. The next step is far more challenging. Can the machine ever learn to understand that it cracked a joke or know the proper moment to break into a conversation to make a witty remark? This is akin to a computer understanding that a work of art or piece of music it has produced is good.

Puns are one of the simplest forms of humor. In 1994 Graeme Ritchie and Kim Binsted created the first pun generator, JAPE (Joke Analysis and Production Engine), to generate question-and-answer puns. Pun generators work with vast databases of words, like WordNet, plus enormous amounts of puns, and are given pattern-matching rules to compose riddles. Here’s one computer-generated pun:

What do you get when you cross an optic with a mental object? An eye-dea.

Nerdy in the extreme. It might lead to a few guffaws at a comedy club before pointing the would-be comedian to the exit. Of course the funny thing about this pun, as journalist Becky Ferreira observes, is that you “can’t imagine a human ever opening with such a weird setup question.”19

Recognizing even the simplest structured jokes, like knock-knock jokes, beloved from childhood, requires some heavy lifting on both the linguistic and the computer fronts. The basic format is overlapping wordplay between two people, resolved with a punch line.

Says Rayz, there are two ways of teaching computers to make jokes. One is to feed in the rules that jokes are based on and give the computer “knowledge of the world”—material to work with. The other is to tell the computer a lot of jokes, letting it learn by example.

To teach her computer knock-knock jokes, Rayz assembled a large number of jokes, including some from the “111 Knock Knock Jokes” website, then grouped them together to make templates to demonstrate how the word play works, by sound association. This was one of the computer’s efforts:

Knock Knock.

Who’s there?

Water.

Water who?

Water you doing tonight?

Not belly-achingly funny, but not bad.

But does the machine know it’s cracked a knock-knock joke? Rayz takes a good look at wordplay, especially the final sentence, to make sure it’s not nonsense. But in the end, most of the computer’s jokes are not really jokes at all, due to “failures in sentence understanding.”20 The problem is that the computer has not learned enough words to enable it to engage in wordplay. It can create the knock-knock format, but without meaningful wordplay the results make no sense.

“Even a one-liner is a bit of a challenge,” says Graeme Ritchie, with Scottish understatement.21

Rayz is “interested in looking for patterns in humor.”22 She separates out different types of jokes and tests them to see how people react. “You need not have a huge amount of data” for this, she says. Much has already been published, which means she can now compare computer-generated jokes with real-life jokes, scrutinize them “under a microscope,” alter them and clean them up to make them funnier. A computer joke is not going to be perfect, she says. But you “don’t want to throw away that error when you test it on humans.” We “can learn from computers not understanding humor.”23

Seeing patterns in data is a key to understanding them—in this case, understanding what makes a joke a joke. Recent studies take all available data on jokes and try to make sense of them using neural nets. “Some people are familiar with humor theories. Some are just interested in taking the best machine-learning algorithm and applying it and seeing what happens,” Rayz says.24 She prefers to input rules for jokes to investigate how computers might be able to generate jokes and recognize them. “If you can locate certain features and throw them into a [joke] generation model and something reasonable emerges, we may be close to something like creativity,” she says, adding that in her opinion, there is no viable definition of creativity. “Where creativity fits in, I don’t know,” she says emphatically.

Graeme Ritchie considers creativity, either human or machine, not worth discussing. It is “hard to model creativity without a definition,” he says, and he staunchly believes that none are on offer.25 Regarding machine learning, in a widely cited paper, he writes, “If the reason that the program manages to produce humour is because it has a box inside it that (mysteriously) ‘creates humour,’ that would not illuminate the problem of humour creation. Instead this would introduce a form of regress, in which there is a new problem: how does that black box achieve its results?”26

Janelle Shane, meanwhile, a researcher in optics, takes the approach of throwing data on humor into a neural network to see what happens, just for the joy of it. She has used neural networks to come up with absurd recipes, ridiculous paint names, and hilarious pickup lines, all of which have garnered a large web following. The algorithm she uses is Char-RNN, which she has also let loose on knock-knock jokes. Unlike Rayz, who inputs rules into the computer to generate jokes, Shane leaves the neural network to figure them out from the thousands of jokes she’s put in. At first the results made no sense. Then the machine began to produce more cogent, but still not very funny, results.

Finally it produced a completely original joke:

Knock Knock.

Who’s There?

Alec.

Alec Who?

Alec Knock Knock Jokes.27

Again not belly-achingly funny, but there’s certainly something quirky, if not surreal, there. You can almost sense the machine’s presence, complaining that it’s had enough.

Shane points out that the creativity is hers, not the machine’s. She chose the database and she applied an algorithm of her choice.

In 2015, two computer scientists, Dafna Shahaf and Eric Horvitz of Microsoft, teamed up with the legendary cartoon editor of the New Yorker, Robert Mankoff. Mankoff’s most familiar cartoon depicts a businessman on the phone replying to a persistent caller: “No, Thursday’s out. How about never—is never good for you?” One of Mankoff’s tasks was to run the New Yorker’s weekly cartoon contest, in which the magazine invites captions for a wordless cartoon. Five thousand submissions a week were pouring in. Sorting through that many entries was wearing out Mankoff’s assistants.

The previous year, Dafna Shahaf had attended a lecture Mankoff gave describing the huge cartoon archive at the New Yorker and the thousands of captions submitted for the weekly contest. Could the archive be used to teach a computer to distinguish a funny caption from an unfunny one, she wondered, thereby massively reducing the workload for Mankoff’s assistants?

Computers do all right with recognizing images and writing captions for well-defined ones. But cartoon images regularly have an edge of sarcasm, irony, or anger and may not even need a caption to be funny. How can a computer deal with such subtleties—such affective dimensions?

The team began by submitting captions to crowd workers at Mechanical Turk for them to assess the humor. Then they used neural networks to study the not-always-obvious relationship between the words used in the captions and words describing the image in the cartoon. This gave them another way to assess the value of a caption: whether the words were too close to the image or at odds with it. Neural networks can manipulate words in a multi-multi-dimensional space of meanings, giving each word a number and thus connecting it to billions of other words. The closer a word is in meaning to another word, the closer its number will be. This gives them a directionality, like vectors in physics. This is behind Google’s Word2vec, which has led to dazzling advances in speech recognition.

Sadly, in the case of the New Yorker cartoons, the results were only middling. The computer successfully chose the funnier caption only 64 percent of the time.

Recently, computers have even tried their hands at stand-up comedy. Piotr Mirowski, whose day job is senior research scientist at DeepMind in London, and a small bug-eyed robot called A.L.Ex (for artificial language experiment) have performed improv together on the stand-up circuit in London, Paris, and other places.

In one routine, they are out on a drive together. Mirowski mimes holding a steering wheel and looks at A.L.Ex expectantly.

“I am not trying to be angry,” the robot says abruptly, destroying the mood.28

Mirowski is ready with a retort. “I don’t want you to be angry—this is our quality time.”

To which A.L.Ex replies, “I’m sure that you will find love”—a decisive brushoff if ever there was one.

It’s a success. The audience laughs. “I’m so tired,” the robot adds. Mirowski attempts to save the relationship, but A.L.Ex concludes rather insightfully, “You are not me. You’re my friend.”

To train the AI that runs A.L.Ex, Mirowski fed it subtitles from one hundred thousand films, from action movies like Deep Impact to the pornographic film Deep Throat. When it hears someone speaking to it, it seeks out similar exchanges in its database and forms a reply. Mirowski developed this advanced version of his original system with Kory Mathewson, a Canadian AI researcher and fellow improv devotee.

The key issue is how to get A.L.Ex to stay on topic so that its responses are not merely random. Says Mirowski, the humor tends to be accidental. A.L.Ex’s deadpan remarks can be totally inappropriate, overly emotional, or just plain odd. It’s like working with a “completely drunk comedian,” he says. The real challenge is for the human improviser, who has to be ready with a response no matter how bizarre the robot’s comments.

At Google, Douglas Eck and the Magenta team are also hard at work on humor. The project, says Eck, is “very preliminary, very exploratory” and focuses on “punch-line related jokes and puns.”29

Today the field of computational humor is blossoming with (hopefully amusing) conferences dedicated to humor and sessions at AI meetings. Personal assistants such as Siri and Alexa are famously lacking in humor. If we are going to communicate with machines in a pleasant manner, they will one day have to develop a sense of humor akin to the one we humans possess.

Notes