Halcyon M. Lawrence
Voice technologies are routinely described as revolutionary. Aside from the technology’s ability to recognize and replicate human speech and to provide a hands-free environment for users, these revolutionary claims, by tech writers especially, emerge from a number of trends: the growing numbers of people who use these technologies,1 the increasing sales volume of personal assistants like Amazon’s Alexa or Google Home,2 and the expanding number of domestic applications that use voice.3 If you’re a regular user (or designer) of voice technology, then the aforementioned claim may resonate with you, since it is quite possible that your life has been made easier because of it. However, for speakers with a nonstandard accent (for example, African-American vernacular or Cockney), virtual assistants like Siri and Alexa are unresponsive and frustrating—there are numerous YouTube videos that demonstrate and even parody these cases. For me, a speaker of Caribbean English, there is “silence” when I speak to Siri; this means that there are many services, products, and even information that I am not able to access using voice commands. And while I have other ways of accessing these services, products, and information, what is the experience of accented speakers for whom speech is the primary or singular mode of communication? This so-called “revolution” has left them behind. In fact, Mar Hicks pushes us to consider that any technology that reinforces or reinscribes bias is not, in fact, revolutionary but oppressive. The fact that voice technologies do nothing to change existing “social biases and hierarchies,” but instead reinforce them, means that these technologies, while useful to some, are in no way revolutionary.4
One might argue that these technologies are nascent, and that more accents will be supported over time. While this might be true, the current trends aren’t compelling. Here are some questions to consider: first, why have accents been primarily developed for Standard English in Western cultures (such as American, Canadian, and British English)? Second, for non-Western cultures for which nonstandard accent support has been developed (such as Singaporean and Hinglish), what is driving these initiatives? Third, why hasn’t there been any nonstandard accent support for minority speakers of English? Finally, what adjustments—and at what cost—must standard and foreign-accented speakers of English make to engage with existing voice technologies?
In this essay, I argue that the design of speech technology is innately biased. It is important to understand the socioeconomic context in which speech technologies are developed and the long history of assimilation—rooted in imperialist, dominant-class ideologies—that nonnative and nonstandard speakers of English have had to practice in order to participate in global economic and social systems. Voice technologies do nothing to upset this applecart. I hypothesize that any significant change in these software development practices will come from developers in the periphery5 invested in disrupting the developing world’s practice of unequal consumption of technology vis-à-vis creation and innovation. This deimperializing wave of change will come from independent developers because they embrace the notion that “support for other cultures is not a feature of software design, but a core principle.”6 In proposing a way forward, I examine the results from my study that investigated the acceptability of foreign-accented speech in speech technology as well as contexts in which accented speech can be used in speech technology interactions. I conclude with guidelines for a linguistically inclusive approach to speech interaction design.
In his slave biography, Olaudah Equiano said, “I have often taken up a book, and have talked to it, and then put my ears to it, when alone, in hopes it would answer me; and I have been very much concerned when I found it remained silent.”7 Equiano’s experience with the traditional interface of a book mirrors the silence that nonstandard and foreign speakers of English often encounter when they try to interact with speech technologies like Apple’s Siri, Amazon’s Alexa, or Google Home. Premised on the promise of natural language use for speakers, these technologies encourage their users not to alter their language patterns in any way for successful interactions. If you possess a foreign accent or speak in a dialect, speech technologies practice a form of “othering” that is biased and disciplinary, demanding a form of postcolonial assimilation to standard accents that “silences” the speaker’s sociohistorical reality.
Because these technologies have not been fundamentally designed to process nonstandard and foreign-accented speech, speakers often have to make adjustments to their speech—that is, change their accents—to reduce recognition errors. The result is the sustained marginalization and delegitimization of nonstandard and foreign-accented speakers of the English language. This forced assimilation is particularly egregious given that the number of second-language speakers of English has already exceeded the number of native English-language speakers worldwide.8 The number of English as a Second Language (ESL) speakers will continue to increase as English is used globally as a lingua franca to facilitate commercial, academic, recreational, and technological activities. One implication of this trend is that, over time, native English speakers may exert less influence over the lexical, syntactic, and semantic structures that govern the English language. We are beginning to witness the emergence of hybridized languages like Spanglish, Konglish, and Hinglish, to name a few. Yet despite this trend and the obvious implications, foreign-accented and nonstandard-accented speech is marginally recognized by speech-mediated devices.
Gluszek and Dovidio define an accent as a “manner of pronunciation with other linguistic levels of analysis (grammatical, syntactical, morphological, and lexical), more or less comparable with the standard language.”9 Accents are particular to an individual, location, or nation, identifying where we live (through geographical or regional accents, like Southern American, Black American, or British Cockney, for example), our socioeconomic status, our ethnicity, our cast, our social class, or our first language. The preference for one’s accent is well-documented. Individuals view people having similar accents to their own more favorably than people having different accents to their own. Research has demonstrated that even babies and children show a preference for their native accent.10 This is consistent with the theory that similarity in attitudes and features affects both the communication processes and the perceptions that people form about each other.11
However, with accents, similarity attraction is not always the case. Researchers have been challenging the similarity-attraction principle, suggesting that it is rather context-specific and that cultural and psychological biases can often lead to positive perceptions of nonsimilar accents.12 Dissimilar accents sometimes carry positive stereotypes, which lead to positive perceptions of the speech or speaker. Studies also show that even as listeners are exposed to dissimilar accents, they show a preference for standard accents, like standard British English as opposed to nonstandard varieties like Cockney or Scottish accents (see table 8.1 for a summary of findings of accent studies).13
Table 8.1 Summary of Findings of Accented-Speech Perception Studies
Area of research | Finding | Authors |
---|---|---|
Negative perception of nonnative accented speech |
People who speak nonnatively accented speech are perceived more negatively than speakers with native accents |
Bradac 1990; Fuertes et al. 2009; Lindemann 2003, 2005 |
Strength of accent |
The stronger the accent, the more negatively the accented individual is evaluated |
Nesdale and Rooney 1996; Ryan et al. 1977 |
Similarity attraction |
Babies and children show a preference for their native accent |
Kinzler et al. 2007, 2009 |
Nonnative speech evoking negative stereotypes |
Nonnative speakers seen as less intelligent |
Bradac 1990; Lindemann 2003; Rubin et al. 1997 |
Nonnative speakers seen as less loyal |
Edwards 1982 |
|
Nonnative speakers seen as less competent |
Boyd 2003; Bresnahan et al. 2002 |
|
Language competency |
Nonnative speakers perceived as speaking the language poorly. |
Hosoda et al. 2007; Lindemann 2003 |
Positive stereotypes |
Certain accents are seen as prestigious standard UK and Australian accents |
Lippi-Green 1994; Giles 1994 |
Nonnative speech evoking discriminatory practices |
Discrimination in housing |
Zhao et al. 2006 |
Discrimination in employment |
Kalin 1978; Matsuda 1991; Nguyen 1993 |
|
Discrimination in the courts |
Frumkin 2007; Lippi-Green 1994 |
|
Lower-status job positions |
Bradac et al. 1984; de la Zerda 1979; Kalin 1978 |
On the other hand, nonsimilar accents are not always perceived positively, and foreign-accented speakers face many challenges. For example, Flege14 notes that speaking with a foreign accent entails a variety of possible consequences for second-language (L2) learners, including accent detection, diminished acceptability, diminished intelligibility, and negative evaluation. Perhaps one of the biggest consequences of having a foreign accent is that L2 users oftentimes have difficulty making themselves understood because of pronunciation errors. Even accented native speakers (speakers of variants of British English, like myself, for example) experience similar difficulty because of the differences of pronunciation.
Lambert et al. produced one of the earliest studies on language attitudes that demonstrated language bias.15 Since then, research has consistently demonstrated negative perceptions about speech produced by nonnative speakers. As speech moves closer to unaccented, listener perceptions become more favorable, and as speech becomes less similar, listener perceptions become less favorable; said another way, the stronger the foreign accent, the less favorable the speech.16
Nonnative speech evokes negative stereotypes such that speakers are perceived as less intelligent, less loyal, less competent, poor speakers of the language, and as having weak political skill.17 But the bias doesn’t stop at perception, as discriminatory practices associated with accents have been documented in housing, employment, court rulings, lower-status job positions, and, for students, the denial of equal opportunities in education.18
Despite the documented ways in which persons who speak with an accent routinely experience discriminatory treatment, there is still very little mainstream conversation about accent bias and discrimination. In fall 2017, I received the following student evaluation from one of my students, who was a nonnative speaker of English and a future computer programmer:
I’m gonna be very harsh here but please don’t be offended—your accent is horrible. As a non-native speaker of English I had a very hard time understanding what you are saying. An example that sticks the most is you say goal but I hear ghoul. While it was funny at first it got annoying as the semester progressed. I was left with the impression that you are very proud of your accent, but I think that just like movie starts [sic] acting in movies and changing their accent, when you profess you should try you speak clearly in US accent so that non-native students can understand you better.
While I was taken aback, I shouldn’t have been. David Crystal, a respected and renowned British linguist who is a regular guest on a British radio program, said that people would write in to the show to complain about pronunciations they didn’t like. He states, “It was the extreme nature of the language that always struck me. Listeners didn’t just say they ‘disliked’ something. They used the most emotive words they could think of. They were ‘horrified,’ ‘appalled,’ ‘dumbfounded,’ ‘aghast,’ ‘outraged,’ when they heard something they didn’t like.”19 Crystal goes on to suggest that reactions are so strong because one’s pronunciation (or accent) is fundamentally about identity. It is about race. It is about class. It is about one’s ethnicity, education, and occupation. When a listener attends to another’s pronunciation, they are ultimately attending to the speaker’s identity.
As I reflected on my student’s “evaluation” of my accent, it struck me that this comment would have incited outrage had it been made about the immutable characteristics of one’s race, ethnicity, or gender; yet when it comes to accents, there is an acceptability about the practice of accent bias, in part because accents are seen as a mutable characteristic of a speaker, changeable at will. As my student noted, after all, movie stars in Hollywood do it all the time, so why couldn’t I? Although individuals have demonstrated the ability to adopt and switch between accents (called code switching), to do so should be a matter of personal choice, as accent is inextricable to one’s identity. To put upon another an expectation of accent change is oppressive; to create conditions where accent choice is not negotiable by the speaker is hostile; to impose an accent upon another is violent.
One domain where accent bias is prevalent is in seemingly benign devices such as public address systems and banking and airline menu systems, to name a few; but the lack of diversity in accents is particularly striking in personal assistants like Apple’s Siri, Amazon’s Alexa, and Google Home. For example, while devices like PA systems only require listeners to comprehend standard accents, personal assistants, on the other hand, require not only comprehension but the performance of standard accents by users. Therefore, these devices demand that the user assimilate to standard Englishes—a practice that, in turn, alienates nonnative and nonstandard English speakers.
According to the Dictionary of the English/Creole of Trinidad and Tobago, the term “freshwater Yankee” is used to describe someone who “speaks with an American accent or adopts other American characteristics, esp. without ever having been to the US.”20 During the First World War, the presence of American servicemen at air bases in Trinidad was contentious, and “for those who lived in wartime Trinidad, the close encounter with the US empire meant more than anything else endless internecine friction and confrontation.”21 Hence when “freshwater Yankee” began to be used as a Trinidadian Creole term, it was not complimentary. As Browne states, “Caribbean people who immigrated to America prior to 1965 have traditionally been viewed as assimilationists and frowned upon by those in their native islands. For example, the image of the freshwater Yankee has made its way into the cultural consciousness as a powerful critique of assimilators and an equally powerful deterrent to what was considered ‘pretending’ to be American.”22
In December 2016, as I neared the completion of my annual pilgrimage to Trinidad, I spent time with a friend who is an iPhone user. I watched in fascination as she interacted with Siri, first in her native Trinidadian accent and then, when Siri did not understand her, repeating her question in an American accent. Siri sprang to life. It occurred to me that I had just witnessed yet another context in which “freshwater Yankee” could be used. This personal anecdote demonstrates the very practical ways in which nonstandard accented speakers of English must assimilate to “participate” in emerging technologies.
The assimilation by accented speakers using speech technology can also be alienating. In human–human interaction, both the speaker and the listener have an opportunity to negotiate the communicative process. For example, over the last ten years of living in the US and speaking a British West Indian variant of English, there have been daily instances where I have not been understood; but I have almost always had the freedom to negotiate the communicative process. Since changing my accent is nonnegotiable, I have deployed clear speech strategies23 to become more intelligible (slowing my rate of speech, hyperarticulating,24 and making lexical changes, to name a few). None of these choices have in anyway made me feel inauthentic or alienated, as these strategies do not involve me taking on another’s accent to be understood. Yet in speech technologies, because there is no opportunity to negotiate with the device, a change in accent is almost always required for accented speakers who are not understood but wish to engage with the technology. What evolves over time is a sense of alienation, so poignantly described by Nieto:
Having learnt that the existence and dialect of the dark-skinned is the incarnation of the bad, and that one can only hate it, the colonized then has to face the fact that “I am dark-skinned, I have an accent,” at this crossroads there seems to be only one possible solution, namely, becoming part of the superior, being one of them, speak their language. Nevertheless that solution is hopeless, therefore the oppressed faces alienation for the first time, the sense that one has lost one’s place in the world, our words are meaningless, our spirits are powerless.25
The assimilation and subsequent alienation described above in the digital sphere is by no means a new phenomenon. Assimilation and alienation have always been the impact of imperialist ideology, and language has always been a tool of imperialist practices.
The practice of accent bias is far more complex than just an individual’s prejudice for or against another’s accent. In much the same way that institutionalized practices of racism and sexism have a long and complex history, accent bias is better understood against the broader ideology of imperialism, facilitated by acts of conquest and colonialism. For millennia, civilizations have effectively leveraged language to subjugate, even erase, the culture of other civilizations.
One problem with the discussion about the history of English as a global language is that the discourse is often couched in organic language, using linguistic terms like “spread,” “expansion,” “movement,” and “death,” signaling the inevitability of language change over time through agentless, natural forces.26 But language change isn’t always an organic process; sometimes it comes through the violent subjugation of a people and their culture. This is not to be mistaken for some haphazard, unintentional by-product of colonialism; as Phillipson argues, the rise of English “was not accidental, but [was] carefully engineered.”27
The physical, then cultural, subjugation of peoples also characterized Europe’s colonization of the New World. In the case of the Americas and the Caribbean, when Native Americans were decimated by disease, Europeans—in search of a new source of labor on their plantations—captured, enslaved, and transported millions of West Africans to the New World. However, physical subjugation was only part of the expansion of European control. Many scholars28 have documented the ways in which language was used as a tool of imperialist expansion in the New World, as the colonizer’s language was exclusively used in her newly captured territories. Migge and Léglise note that “language policies became an essential way in which the colonization of peoples was made complete . . . the colonizers’ language . . . became a necessity for all those who wished to advance socially and to participate in the colony’s public sphere.”29
Therefore, Like Latin before it, the emergence of English and its status as lingua franca is fundamentally a story of the exercise of British and American power facilitated through military force, imperialist ideology, and cultural imposition over the last 300 years. As Crystal notes:
Why a language becomes a global language has little to do with the number of people who speak it. It has much more to do with who those speakers are. Latin became an international language throughout the Roman Empire, but this was not because the Romans were more numerous than the peoples they subjugated. They were simply more powerful.30
It is important to note that not all scholars agree that the dominance of English as lingua franca is problematic; for example, Quirk and Widdowson argue that English has become globalized for practical and historical reasons and that it can help the development of poor countries without necessarily endangering their cultures.31 This ideology is still very much present in modern-day imperialist ideologies, as internet connectivity and access to digital technologies are not only seen as a good thing for developing nations but purported as a human right.32 However, as astutely noted by Bill Wasik, “in old-fashioned 19th-century imperialism, the Christian evangelists made a pretense of traveling separately from the conquering colonial forces. But in digital imperialism, everything travels as one, in the form of the splendid technology itself: salvation and empire, missionary and magistrate, Bible and gun.”33
Consequently, digital media—and by extension, the language of digital media—is arguably one of the most powerful tools of neo-imperialism. For evidence of this claim one need look no further than the dominance of English as the language of the internet.
Phillipson, perhaps one of the most recognized and contentious voices on the topic of linguistic imperialism, defines the term as a “theoretical construct, devised to account for linguistic hierarchization, to address issues of why some languages come to be used more and others less, [and] what structures and ideologies facilitate such processes.”34 He argues that linguistic imperialism is a subtype of linguicism, akin to other hierarchization based on race, ethnicity, gender, and so on. While much of Phillipson’s work focuses on education’s role as a “vital site” for processes of linguistic hierarchization, I propose that the digital economy provides a less formal, but perhaps just as powerful, ubiquitous site for linguistic hierarchization to emerge. As stated by Watters, “empire is not simply an endeavor of the nation-state—we have empire through technology (that’s not new) and now, the technology industry as empire.”35
After almost thirty years of the World Wide Web being used commercially, English is still its dominant language. Consider the following: as of November 2017, 51 percent of web pages are written in English, with Russian having the next largest share, at 6.7 percent; over twenty-five languages have 0.1 percent representation, and over 100 languages have fewer than 0.1 percent representation online.36 Software is still written in Latin alphabets because computational technology was never conceived with the ideal of being inclusive of language. For example, the ASCII character code only supports 128 different characters, so as a result, complex script languages like Chinese, which need more characters, are pushed to the margins.
A similar language divide also exists in speech technologies, as software is primarily developed for the English-speaking market (standard American, British, and Australian). Error rates across languages differ significantly; for example, for Google’s voice recognition system, the word error rate for US English is 8 percent, and for tier two languages (i.e., languages spoken in emerging markets, like India), the error rate is above or around 20 percent, which renders it functionally unusable.37
Within the English-speaking market, development of technologies for nonstandard (creole) and nonnative (indigenized) accents are even more telling of the language divide and of bias. To describe the patterns of language development of speech technologies, we can look to the theory of concentric circles of English, developed by Kachru, who suggests that the world’s Englishes could be classified by three circles.38 First, the Inner Circle represents “traditional bases of English,” which include the United Kingdom, the United States, Australia, New Zealand, Ireland, Anglophone Canada, and some of the Caribbean territories. The Outer Circle is characterized by countries where English is not the native language but is designated as an official language, most often because of the legacy of colonialism. This includes countries such as India, Nigeria, the Philippines, Bangladesh, Pakistan, Malaysia, Tanzania, Kenya, and non-Anglophone South Africa and Canada. Finally, the Expanding Circle of English includes countries where the language has no historical or governmental role but is nevertheless used widely as a foreign language to conduct business. This circle represents the rest of the world’s population: China, Russia, Japan, most of Europe, Korea, Egypt, Indonesia, etc.
Mufwene presents a more scathing classification of the world’s Englishes and instead designates two categories: legitimate and illegitimate offspring.39 The legitimate offspring of English is typically spoken by descents of Europeans, while illegitimate offspring of English are pidgins, creoles (aligned with Kachru’s Outer Circle of English) and nonnative or indigenized varieties (aligned to Kachru’s Expanding Circle of English). Table 8.2 illustrates how these two classification systems are aligned. It is clear as we look at tables 8.2 and 8.340 (with the exception of Caribbean territories, whose markets are far too small to be considered lucrative) that the locus of development in speech technology for personal assistants (Siri, Alexa, and Google Home) is happening in Kachru’s Inner Circle/Mufwene’s Legitimate Offspring of English. Any development of other Englishes represents technology companies’ pursuit of emerging and lucrative markets like Singapore and India. Consider the example of the development of Hinglish for speech technology.
Table 8.2 Patterns of Language Support of Major Speech Technologies
Mufwene’s legitimate and illegitimate offspring of English | Kachru’s Concentric Circle Model of English | English variety | Englishes supported in major speech technologies | |
---|---|---|---|---|
Legitimate offspring |
English typically spoken by Europeans |
Inner Circle: Traditional bases of English |
United Kingdom, the United States, Australia, New Zealand, Ireland, Anglophone Canada |
United Kingdom, the United States, Australia, New Zealand, Ireland, Anglophone Canada |
Illegitimate offspring of English |
English pidgins and creoles |
Outer Circle: English is not the native language but is the designated official language |
India, Nigeria, the Philippines, Bangladesh, Pakistan, Malaysia, Tanzania, Kenya, non-Anglophone South Africa and Canada |
Hinglish, South African English ONLY |
Nonnative or indigenized varieties of English |
Expanding Circle: English has no historical or governmental role, but is used widely as lingua franca |
China, Russia, Japan, most of Europe, Korea, Egypt, Indonesia, Caribbean Territories, South America |
Singaporean English |
In 2017, Amazon’s Alexa rolled out support for Hinglish. There were also reports that Apple was planning similar support for the hybridized language.41 But who are the speakers of Hinglish? Where do they come from? And how is the hybridized language perceived and used in India, a nation of 1.3 billion people? Hinglish is defined as a “colloquial umbrella-term spanning isolated borrowings of indigenized Indian English forms within otherwise Monolingual Hindi or English, to rich code-switching practices unintelligible to Monolingual Hindi or English speakers.”42 It is estimated that approximately 350 million people speak Hinglish (more than even the number of native speakers of English worldwide). Additionally, researchers indicate that the emergent language is being used by the elite and is perceived as being more prestigious and modern than even English itself.43 Given this fact, nothing about Apple’s or Amazon’s initiative is surprising. The decision to break into the Hinglish market is not being driven by inclusionary ideals, it is about accessing an emerging profit center. Given that market forces will continue to drive the commercial decisions about which accents are represented in speech technologies, it is more than likely that some accents will never be represented in speech technologies, because no lucrative market for these accents exists.
Table 8.3 Accents/Dialects supported by Amazon’s, Google’s and Apple’s Personal Assistants
English accents/dialects | Amazon’s Alexa | Google Home | Apple’s Siri |
---|---|---|---|
American |
✔ |
✔ |
✔ |
British |
✔ |
✔ |
✔ |
Australian |
✔ |
✔ |
✔ |
Canadian |
✔ |
✔ |
✔ |
Indian |
✔ |
✔ |
|
Singaporean |
✔ |
||
South African |
✔ |
||
Irish |
✔ |
Despite over fifty years of development, the inability of speech technology to adequately recognize a diverse range of accented speakers is a reflection of the lack of diversity among employees of technology firms. This lack of diversity means that primarily dominant-class cultural norms are represented in the design choices of technology devices. As Alper observes, “the values, desires, and ideals of dominant cultural groups are systematically privileged over others in societies, and these biases are built into the structures of organizations, institutions, and networks.”44 As a result, our current speech technologies represent the values, ideals and desires of the Global North and as tools of digital imperialism, they bear “cultural freight as they cross borders.”45
Speech technology is trained using speech corpora—a collection of thousands of voices speaking on a wide range of topics. Many of the early speech technologies used the now thirty-year-old corpus Switchboard, from the University of Pennsylvania’s Linguistic Consortium, to train on; the accents represented there are largely American Midwestern. The result is that speech recognition error rates for marginalized voices are higher than others; as Paul notes, “a typical database of American voices, for example, would lack poor, uneducated, rural, non-white, non-native English voices. The more of those categories you fall into, the worse speech recognition is for you.”46 Work by Caliksan-Islam, Bryson, and Narayanan has empirically demonstrated what we have already suspected: that beyond just marginalization of people, “human-like semantic biases result from the application of standard machine learning to ordinary language.”47 Simply put, speech technologies replicate existing language biases.
Collecting data for a corpus is an expensive endeavor, and quite often existing corpora like Switchboard are all that smaller firms have as a resource. Larger firms, like Apple and Amazon, are building their own corpora, which they collect in different ways, including from the training data we freely provide when we set up our voice assistants or when we use dictation features on our software. Naturally, not only is this data not publicly shared but very little is known about the user demographics: which accents are captured, who participates in the transcription, verification process, and so on. The entire process is shrouded in mystery; market competitiveness demands that it be so.
Given the exclusive practices of big tech companies, we must look to independent developers—and free and open-source software (FOSS) initiatives that support independent developers—to make greater strides in disrupting the system. Ramsey Nasser,48 for example, comes to mind as an independent developer working on an Arabic coding language. Ramsey understands and embraces the importance of having one’s culture and language represented in the technologies we use.49 Another exciting project called Project Common Voice was launched by Mozilla in 2017. To date the project has collected over 300 hours of speech contributed by nearly 20,000 people globally. According to Mozilla, “individual developers interested in innovating speech technologies for marginalized accents have a corpora that in time will provide them with diverse representations.”50
Part of the challenge of democratizing the process and the technology is the prevailing misconception that accents are not only undesirable but unintelligible for use in speech technologies. Deeply unsettled by this notion, I conducted a study in 2013 to answer the following:
This research was motivated by the fact that many studies about accented speech examine how participants perceive accented speech and accented speakers.51 Bolstered by the finding that sometimes nonsimilar accents carry positive stereotypes, I was interested in participants’ performance in response to accented speech, not just their perception of that speech.52 For example, even if a listener held a particular perception about an accent, could that listener successfully complete a task that was presented by an accented speaker? And if so, what types of tasks? Were some tasks more challenging if the listener were presented with accented speech? I theorized that if listeners could indeed successfully complete certain tasks, then there may be room for the deployment of accented speech in contexts where standard accents are currently used. There were three relevant findings from the study:
Taken together, this is good news. The findings suggest that the use of nonnative speech can be deployed in technology but needs additional scaffolding, such as visual support, longer processing times, repetition, reinforcement, and training. However, it should not be used if recall of data is necessary or if processing time is limited. So while emergency systems, for example, should continue to be designed using standard and native accents, perhaps there is room for us to hear accented speakers in other contexts, like information systems that provide banking information, flight, and weather updates.
The benefit here is that listeners begin to hear accented speech in everyday contexts. Rubin indicates that listeners benefit from the practice of listening to foreign-accented speech, both in terms of their improved comprehension and in changed attitudes toward accented speakers.53 The entertainment industry is one place where we are beginning to see slow but high-profile change in this regard—Ronnie Chiang of The Daily Show and Diego Luna of Rogue One: A Star Wars Story are two actors who perform their roles in their nonnative English accents, for example.
My goal in writing this chapter was to dispel the notion that existing speech technologies provide a linguistic level playing field for all users. Speech technologies are not revolutionary, and it is erroneous to claim that they are. They are biased, and they discipline. As more devices embed speech as a mode of communication using standard accents, speakers who cannot or will not assimilate will continue to be marginalized. However, given the growing services across a range of devices and contexts, speech technologies are an ideal platform for the performance and representation of linguistic diversity. Given this potential, Meryl Alper challenges us to ask the critical question with regards to the design and development of speech technologies: “who is allowed to speak and who is deemed to have a voice to speak in the first place?”54 The question is an important one as it goes to the heart of representation and inclusion. Connectivity and access to technology cannot be ethically purported as a human right, if all people don’t have an opportunity to hear themselves represented in the technology that they use.
Looking back at the last fifty years of speech technology development, the fact that we have figured out how to teach and train our technology to speak and recognize human speech is in and of itself an amazing technological accomplishment. However, it is not enough that our technologies just speak, in the same way that it is not enough that our children just have a school to attend or that adults just have access to health care. In the same way we expect our physical spaces—where we work, play, and learn—to support and reflect our identity, culture, and values, so too we should expect and demand that our technology—as tools that support our work, play, and learning—also reflect our identity, culture, and values. I, for one, look forward to the day when Siri doesn’t discipline my speech but instead recognizes and responds to my Trinidadian accent.
I would like to thank Dr. Karen Eccles, Maud Marie Sisnette, and Keeno Gonzales, who directed me to the Lise Winer Collection housed at the Alma Jordan Library, the University of the West Indies, St. Augustine, Trinidad and Tobago.
1. Jayson DeMers, “Why You Need to Prepare for a Voice Search Revolution,” Forbes (January 9, 2018), https://www.forbes.com/sites/jaysondemers/2018/01/09/why-you-need-to-prepare-for-a-voice-search-revolution/#66d65a2434af.
2. Herbert Sim, “Voice Assistants: This Is What the Future of Technology Looks Like,” Forbes (November 2, 2017), https://www.forbes.com/sites/herbertrsim/2017/11/01/voice-assistants-this-is-what-the-future-of-technology-looks-like/#2a09e5ce523a.
3. Nick Ismail, “2018: The Revolutionary Year of Technology,” Information Age (May 15, 2018), https://www.information-age.com/2018-revolutionary-year-technology-123470064/.
4. Mar Hicks, “Computer Love: Replicating Social Order through Early Computer Dating Systems,” Ada New Media (November 13, 2016), https://adanewmedia.org/2016/10/issue10-hicks/.
5. In the narrow sense, I refer to independent computer scientists not affiliated with large software firms, but also in the broader sense, I refer to those developers who come from or are affiliated with developing economies.
6. Ramsey Nasser, “Command Lines: Performing Identity and Embedding Bias,” panel at Computer History Museum (April 26, 2017), accessed January 03, 2018, http://bit.ly/2r1QsuD.
7. Olaudah Equiano, The Life of Olaudah Equiano, or Gustavus Vassa, the African (North Chelmsford, MA: Courier Corporation, 1814), 107.
8. David Crystal, English as a Global Language (Cambridge: Cambridge University Press, 2003).
9. Agata Gluszek and John F. Dovidio, “The Way They Speak: A Social Psychological Perspective on the Stigma of Nonnative Accents in Communication,” Personality and Social Psychology Review 14, no. 2 (2010): 215.
10. John R. Edwards, “Language Attitudes and Their Implications Among English Speakers,” in Attitudes Towards Language Variation: Social and Applied Contexts (1982): 20–33; H. Thomas Hurt and Carl H. Weaver, “Negro Dialect, Ethnocentricism, and the Distortion of Information in the Communicative Process,” Communication Studies 23, no. 2 (1972): 118–125; Anthony Mulac, “Evaluation of the Speech Dialect Attitudinal Scale,” Communications Monographs 42, no. 3 (1975): 184–189; Ellen Bouchard-Ryan and Richard J. Sebastian, “The Effects of Speech Style and Social Class Background on Social Judgements of Speakers,” British Journal of Clinical Psychology 19, no. 3 (1980): 229–233; Ellen Bouchard-Ryan, Miguel A. Carranza, and Robert W. Moffie, “Reactions Toward Varying Degrees of Accentedness in the Speech of Spanish-English Bilinguals,” Language and Speech 20, no. 3 (1977): 267–273; Katherine D.Kinzler, Emmanuel Dupoux, and Elizabeth S. Spelke, “The Native Language of Social Cognition,” Proceedings of the National Academy of Sciences 104, no. 30 (2007): 12577–12580; and Katherine D. Kinzler, Kristin Shutts, Jasmine Dejesus, and Elizabeth S. Spelke, “Accent Trumps Race in Guiding Children’s Social Preferences,” Social Cognition 27, no. 4 (2009): 623–634.
11. Donald L. Rubin, “Nonlanguage factors Affecting Undergraduates’ Judgments of Nonnative English-Speaking Teaching Assistants,” Research in Higher Education 33, no. 4 (1992): 511–531; and Donn Byrne, William Griffitt, and Daniel Stefaniak, “Attraction and Similarity of Personality Characteristics,” Journal of Personality and Social Psychology 5, no. 1 (1967): 82.
12. Andreea Niculescu, George M. White, See Swee Lan, Ratna Utari Waloejo, and Yoko Kawaguchi, “Impact of English Regional Accents on User Acceptance of Voice User Interfaces,” in Proceedings of the 5th Nordic Conference on Human-Computer Interaction: Building Bridges (ACM, 2008), 523–526.
13. Howard Giles, “Evaluative Reactions to Accents,” Educational Review 22, no. 3 (1970): 211–227. All bibliographic information has been provided in the body of this essay. Table 8.1 just provides a summary of what has been already discussed.
14. James Emil Flege, “Factors Affecting Degree of Perceived Foreign Accent in English Sentences,” Journal of the Acoustical Society of America 84, no. 1 (1988): 70–79.
15. Wallace E. Lambert, Richard C. Hodgson, Robert C. Gardner, and Samuel Fillenbaum, “Evaluational Reactions to Spoken Languages,” Journal of Abnormal and Social Psychology 60, no. 1 (1960): 44.
16. James J. Bradac and Randall Wisegarver, “Ascribed Status, Lexical Diversity, and Accent: Determinants of Perceived Status, Solidarity, and Control of Speech Style,” Journal of Language and Social Psychology 3, no. 4 (1984): 239–255; Jairo N. Fuertes, William H. Gottdiener, Helena Martin, Tracey C. Gilbert, and Howard Giles, “A Meta-Analysis of the Effects of Speakers’ Accents on Interpersonal Evaluations,” European Journal of Social Psychology 42, no. 1 (2012): 120–133; Stephanie Lindemann, “Koreans, Chinese or Indians? Attitudes and Ideologies About Non-Native English Speakers in the United States,” Journal of Sociolinguistics 7, no. 3 (2003): 348–364; Stephanie Lindemann, “Who Speaks “Broken English”? US Undergraduates’ Perceptions of Non-Native English,” International Journal of Applied Linguistics 15, no. 2 (2005): 187–212; and Drew Nesdale and Rosanna Rooney, “Evaluations and Stereotyping of Accented Speakers by Pre-Adolescent Children,” Journal of Language and Social Psychology 15, no. 2 (1996): 133–154; Bouchard-Ryan, Carranza, and Moffie, “Reactions Toward Varying Degrees of Accentedness,” 267–273.
17. Bradac and Wisegarver, “Ascribed Status, Lexical Diversity, and Accent,” 239–255; Donald L. Rubin, Pamela Healy, T. Clifford Gardiner, Richard C. Zath, and Cynthia Partain Moore, “Nonnative Physicians as Message Sources: Effects of Accent and Ethnicity on Patients’ Responses to AIDS Prevention Counseling,” Health Communication 9, no. 4 (1997): 351–368; Edwards, “Language Attitudes and Their Implications among English Speakers,” 20–23; Sally Boyd, “Foreign-Born Teachers in the Multilingual Classroom in Sweden: The Role of Attitudes to Foreign Accent,” International Journal of Bilingual Education and Bilingualism 6, no. 3–4 (2003): 283–295; Mary Jiang Bresnahan, Rie Ohashi, Reiko Nebashi, Wen Ying Liu, and Sachiyo Morinaga Shearman, “Attitudinal and Affective Response toward Accented English,” Language & Communication 22, no. 2 (2002): 171–185; Megumi Hosoda, Eugene F. Stone-Romero, and Jennifer N. Walter, “Listeners’ Cognitive and Affective Reactions to English Speakers with Standard American English and Asian Accents,” Perceptual and Motor Skills 104, no. 1 (2007): 307–326; Stephanie Lindemann, “Who Speaks ‘Broken English’?,” 187–212; and Laura Huang, Marcia Frideger, and Jone L. Pearce, “Political Skill: Explaining the Effects of Nonnative Accent on Managerial Hiring and Entrepreneurial Investment Decisions,” Journal of Applied Psychology 98, no. 6 (2013): 1005.
18. Bo Zhao, Jan Ondrich, and John Yinger, “Why Do Real Estate Brokers Continue to Discriminate? Evidence from the 2000 Housing Discrimination Study,” Journal of Urban Economics 59, no. 3 (2006): 394–419; Rudolf Kalin and Donald S. Rayko, “Discrimination in Evaluative Judgments against Foreign-Accented Job Candidates,” Psychological Reports 43, no. 3, suppl. (1978): 1203–1209; Mari J. Matsuda, “Voices of America: Accent, Antidiscrimination Law, and a Jurisprudence for the Last Reconstruction,” Yale Law Journal (1991): 1329–1407; Beatrice Bich-Dao Nguyen, “Accent Discrimination and the Test of Spoken English: A Call for an Objective Assessment of the Comprehensibility of Nonnative Speakers,” California Law Review 81 (1993): 1325; Lara Frumkin, “Influences of Accent and Ethnic Background on Perceptions of Eyewitness Testimony,” Psychology, Crime & Law 13, no. 3 (2007): 317–331; Rosina Lippi-Green, “Accent, Standard Language Ideology, and Discriminatory Pretext in the Courts,” Language in Society 23, no. 2 (1994): 163–198; James J. Bradac, “Language Attitudes and Impression Formation,” in Handbook of Language and Social Psychology, ed. H. Giles and W. P. Robinson (New York: John Wiley & Sons, 1990), 387–412; Nancy De La Zerda and Robert Hopper, “Employment Interviewers’ Reactions to Mexican American Speech,” Communications Monographs 46, no. 2 (1979): 126–134; Kalin and Rayko, “Discrimination in Evaluative Judgments,” 1203–1209; and William Y. Chin, “Linguistic Profiling in Education: How Accent Bias Denies Equal Educational Opportunities to Students of Color,” Scholar 12 (2009): 355.
19. David Crystal, “Sound and Fury: How Pronunciation Provokes Passionate Reactions,” Guardian (January 13, 2018), https://www.theguardian.com/books/2018/jan/13/pronunciation-complaints-phonetics-sounds-appealing-david-crystal.
20. Lise Winer, Dictionary of the English/Creole of Trinidad and Tobago: On Historical Principles (Montreal: McGill-Queen’s Press-MQUP, 2009), 365.
21. Harvey R. Neptune, Caliban and the Yankees: Trinidad and the United States Occupation (Chapel Hill: University of North Carolina Press, 2009), 2.
22. Kevin Adonis Browne, Mas Movement: Toward a Theory of Caribbean Rhetoric (Philadelphia: Pennsylvania State University, 2009), 4.
23. Michael A. Picheny, Nathaniel I. Durlach, and Louis D. Braida, “Speaking Clearly for the Hard of Hearing II: Acoustic Characteristics of Clear and Conversational Speech,” Journal of Speech, Language, and Hearing Research 29, no. 4 (1986): 434–446.
24. Interestingly, many of the clear speech strategies that work in human–human interactions can be misunderstood as a display of angry speech in human–computer interactions. Some systems are now designed to recognize angry speech (marked by a slower rate of speech and hyperarticulation of vowel sounds), and transfer the speaker to a live agent.
25. David Gonzalez Nieto, “The Emperor’s New Words: Language and Colonization,” Human Architecture 5 (2007): 232.
26. Robert Phillipson, “Realities and Myths of Linguistic Imperialism,” Journal of Multilingual and Multicultural Development 18, no. 3 (1997): 238–248.
27. Bettina Migge and Isabelle Léglise, “Language and Colonialism,” Handbook of Language and Communication: Diversity and Change 9 (2007): 299.
28. Aimé Césaire, Discours sur le colonialisme (Paris: Edition Réclame, 1950); Frantz Fanon, Peau noire masques blancs (Paris: Le seuil, 1952); Ayo Bamgbose, “Introduction: The Changing Role of Mother Tongues in Education,” in Mother Tongue Education: The West African Experience (Hodder and Staughton, 1976); Ayo Bamgbose, Language and the Nation (Edinburgh: Edinburgh University Press, 1991); Ayo Bamgbose, Language and Exclusion: The Consequences of Language Policies in Africa (Münster: Lit Verlag, 2000).
29. Migge and Léglise, “Language and Colonialism,” 6.
30. Crystal, English as a Global Language.
31. Randolph Quirk and Henry G. Widdowson, “English in the World,” Teaching and Learning the Language and Literatures (1985): 1–34.
32. Maeve Shearlaw, “Mark Zuckerberg Says Connectivity Is a Basic Human Right—Do You Agree?,” Guardian (January 3, 2014), https://www.theguardian.com/global-development/poverty-matters/2014/jan/03/mark-zuckerberg-connectivity-basic-human-right.
33. Bill Wasik, “Welcome to the Age of Digital Imperialism,” New York Times (June 4, 2015), https://www.nytimes.com/2015/06/07/magazine/welcome-to-the-age-of-digital-imperialism.html.
34. Phillipson, “Realities and Myths of Linguistic Imperialism,” 238.
35. Audrey Watters, “Technology Imperialism, the Californian Ideology, and the Future of Higher Education,” Hack Education (October 15, 2015), accessed May 9, 2018, http://hackeducation.com/2015/10/15/technoimperialism.
36. “Usage of Content Languages for Websites,” W3Techs, accessed March 31, 2018, W3techs.com/technologies/overview/content_language/all.
37. Daniela Hernandez, “How Voice Recognition Systems Discriminate against People with Accents: When Will There be Speech Recognition for the Rest of Us?” Fusion (August 21, 2015), http://fusion.net/story/181498/speech-recognition-ai-equality/.
38. Braj B. Kachru, “The English Language in the Outer Circle,” World Englishes 3 (2006): 241–255.
39. Salikoko Mufwene, “The Legitimate and Illegitimate Offspring of English,” in World Englishes 2000, ed. Larry E. Smith and Michael L. Forman (Honolulu: College of Languages, Linguistics, and Literature, University of Hawai’i and the East-West Center, 1997), 182–203.
40. Source of table 8.3: “Language Support in Voice Assistants Compared,” Globalme Language and Technology (April 13, 2018), https://www.globalme.net/blog/language-support-voice-assistants-compared.
41. Saritha Rai, “Amazon Teaches Alexa to Speak Hinglish. Apple’s Siri Is Next,” Bloomberg.com (October 30, 2017), https://www.bloomberg.com/news/articles/2017-10-30/amazon-teaches-alexa-to-speak-hinglish-apple-s-siri-is-next.
42. Rana D. Parshad et al., “What Is India Speaking? Exploring the “Hinglish” Invasion,” Physica A: Statistical Mechanics and Its Applications 449 (2016): 377.
43. Parshad et al., “What Is India Speaking?,” 375–389.
44. Meryl Alper, Giving Voice: Mobile Communication, Disability, and Inequality (Cambridge, MA: MIT Press, 2017), 2.
45. Wasik, “Welcome to the Age of Digital Imperialism.”
46. Sonia Paul, “Voice Is the Next Big Platform, Unless You Have an Accent,” Wired (October 17, 2017), www.wired.com/2017/03/voice-is-the-next-big-platform-unless-you-have-an-accent/.
47. Aylin Caliskan-Islam, Joanna J. Bryson, and Arvind Narayanan, “Semantics Derived Automatically from Language Corpora Necessarily Contain Human Biases,” arXiv preprint arXiv:1608.07187 (2016): 1–14.
48. Nasser’s work can be followed at http://nas.sr/.
49. Ramsey Nasser, “Command Lines: Performing Identity and Embedding Bias,” panel at Computer History Museum (April 26, 2017), accessed June 1, 2017, http://bit.ly/2r1QsuD.
51. Halcyon M. Lawrence, “Speech Intelligibility and Accents in Speech-Mediated Interfaces: Results and Recommendations” (PhD diss., Illinois Institute of Technology, 2013).
52. Niculescu et al., “Impact of English Regional Accents,” 523–526.
53. Rubin et al., “Nonnative Physicians as Message Sources,” 351–368.
54. Alper, Giving Voice, 2.