113. Lynn Hershman Leeson's DiNA, Artificial Intelligent Agent Installation (2002–2004) is an animated, artificially intelligent female character with speech recognition and expressive facial gestures. DiNA converses with gallery guests, generating answers to questions and becoming “increasingly intelligent through interaction.”
Create an interactive chatbot, eccentric virtual character, or spoken word game centered around computing with speech input and/or speech output. For example, you might make a rhyming game or memory challenge; a voice-controlled book; a voice-controlled painting tool; a text adventure; or an oracular interlocutor (think: Monty Python's Keeper of the Bridge of Death). Consider the creative affordances of using a tightly restricted vocabulary as well as the dramatic potential of rhythm, intonation, and volume of speech. Keep in mind that speech recognition is error-prone, so find ways to embrace the lag and the glitches—at least, for another couple of years. Graphics are optional.
The capacity to speak has long been perceived as a sign of intelligence. For this reason, machines that speak can seem uncanny or even supernatural, as they decouple ancient bindings between voice, living matter, and intelligence. In the field of interaction design, voice interfaces are thought to make technologies more intuitive and accessible than their visual or typographic counterparts—but such anthropomorphized machines do this at the expense of our ability to accurately estimate how much they actually “understand.”
In a conversation, information is transmitted and received on multiple registers—in not only what is said, but also how it is delivered, and by whom. We have exquisitely tuned capacities for inferring contextual information like emotion, gender, age, health, and socioeconomic status from a speaking voice. Intonation, rhythm, pace, and rhyme are also used to create drama, suspense, sarcasm, and humor. When creating new experiences through speech, simple operations may be the most generative. For example, a vast range of meanings can arise just from altering the emphasis of words in a sentence.
Speech has a key paralinguistic social role. Through chit-chat and banter we establish trust, build relations, and create intimacy. Wordplay, punnery, and other playful verbal exchanges create a protected space for this social activity by exploiting ambiguities in the rules of language itself. Culture is embedded and propagated in the protocols of knock-knock jokes, call-and-response songs, and once-upon-a-time fairy tales. These rule-based media lend themselves well to creative manipulation with code. Some potentially helpful tools to algorithmically generate speech include context-free grammars, Markov chains, recurrent neural nets (RNN), and long short-term memory (LSTM) systems. Note that some commercial speech analysis tools transmit the users’ voice data to the cloud, raising issues of data ownership and privacy.
114. In Conversations with Bina48 (2014), Stephanie Dinkins performs improvised conversations about algorithmic bias with BINA48, a chatbot-enabled face robot. BINA48 was commissioned by entrepreneur Martine Rothblatt, and constructed by roboticist David Hanson, to resemble Rothblatt's wife Bina.
115. Hey Robot (2019) by Everybody House Games is a game in which teams compete to make a smart home assistant (such as Amazon Alexa or Google Home) say specific words.
116. In David Rokeby's The Giver of Names (1991–1997), a camera detects objects placed on a pedestal by members of the audience. A computerized voice then describes what it sees—with strange, uncanny, and often poetic results.
117. When Things Talk Back (2018), by Roi Lev and Anastasis Germanidis, is a mobile AR app that gives voice to everyday objects. The software automatically identifies objects observed by the system's camera, and anthropomorphizes them with AR overlays of simple faces. It then uses ConceptNet, a freely available semantic network, to retrieve information about the objects and their possible interrelationships. The app uses this information to generate humorous and sometimes poignant conversations between the objects in the scene.
118. David Lublin's Game of Phones (2012) is “the children's game of telephone, played by telephone.” Players receive a phone call and hear a prerecorded message left by the previous player; they are then prompted to re-record what they recall hearing for the next person in the queue. After a week, the entire chain of messages is published online.
119. Neil Thapen's playful Pink Trombone (2017) is an interactive articulatory speech synthesizer for “bare-handed speech synthesis.” Using a richly instrumented simulation of the human vocal tract, the project enables a wide range of vocal noisemaking in the browser.
120. Kelly Dobson's Blendie (2003–2004) is a 1950s Osterizer blender, adapted to respond empathetically to a user's voice. A person induces the blender to spin by vocalizing. Blendie then mechanically mimics the person's pitch and power level, from a low growl to a screaming howl.
121. In Nicole He's speech-driven forensics game, ENHANCE.COMPUTER (2018), players yell out commands like “Enhance!”—living out a science-fiction fantasy of infinitely zoomable images.