Create an interactive chatbot, eccentric virtual character, or spoken word game centered around computing with speech input and/or speech output. For example, you might make a rhyming game or memory challenge; a voice-controlled book; a voice-controlled painting tool; a text adventure; or an oracular interlocutor (think: Monty Python's Keeper of the Bridge of Death). Consider the creative affordances of using a tightly restricted vocabulary as well as the dramatic potential of rhythm, intonation, and volume of speech. Keep in mind that speech recognition is error-prone, so find ways to embrace the lag and the glitches—at least, for another couple of years. Graphics are optional.
The capacity to speak has long been perceived as a sign of intelligence. For this reason, machines that speak can seem uncanny or even supernatural, as they decouple ancient bindings between voice, living matter, and intelligence. In the field of interaction design, voice interfaces are thought to make technologies more intuitive and accessible than their visual or typographic counterparts—but such anthropomorphized machines do this at the expense of our ability to accurately estimate how much they actually “understand.”
In a conversation, information is transmitted and received on multiple registers—in not only what is said, but also how it is delivered, and by whom. We have exquisitely tuned capacities for inferring contextual information like emotion, gender, age, health, and socioeconomic status from a speaking voice. Intonation, rhythm, pace, and rhyme are also used to create drama, suspense, sarcasm, and humor. When creating new experiences through speech, simple operations may be the most generative. For example, a vast range of meanings can arise just from altering the emphasis of words in a sentence.
Speech has a key paralinguistic social role. Through chit-chat and banter we establish trust, build relations, and create intimacy. Wordplay, punnery, and other playful verbal exchanges create a protected space for this social activity by exploiting ambiguities in the rules of language itself. Culture is embedded and propagated in the protocols of knock-knock jokes, call-and-response songs, and once-upon-a-time fairy tales. These rule-based media lend themselves well to creative manipulation with code. Some potentially helpful tools to algorithmically generate speech include context-free grammars, Markov chains, recurrent neural nets (RNN), and long short-term memory (LSTM) systems. Note that some commercial speech analysis tools transmit the users’ voice data to the cloud, raising issues of data ownership and privacy.