Broadly speaking, applied phonetics can be assumed to cover a range of practical implementations of phonetic expertise, including forensic speech science, speech-language pathology, dialect coaching for actors, and pronunciation teaching. The last of these has had an especially long and successful engagement with the field of phonetics, as evidenced by an extensive research literature, as well as centuries of close attention from prominent speech researchers and theorists. The following discussion surveys the historical use of phonetics in second language (L2) pronunciation, summarizes research bearing on current teaching practices, and previews probable future trends in the field. Because so much of the recent research literature has been motivated by issues in English language teaching, it is inevitable that any such review would draw heavily on examples from English. Many of the observations made here are applicable to diverse language teaching contexts; however, a fuller understanding of pronunciation teaching and learning will require more studies involving other languages. This chapter’s focus on adult phonetic learning is well-justified on linguistic grounds, since children typically acquire L2 pronunciation mainly through naturalistic, as opposed to instructed, learning.
The speech sciences have enjoyed a long and fruitful relationship with the field of language pedagogy. Although the first scholarly discussions of the uses of phonetics in teaching cannot be pinpointed, English pronunciation specialists had taken a deep interest in practical questions well before Henry Sweet and the era of modern speech research (Subbiondo, 1978). Some of the earliest phonetic descriptions of English, such as Hart’s (1569) An orthographie, were motivated by the push for English spelling reform, while others, like Robinson’s (1617) Art of pronunciation, focused on the development of a systematic phonetic alphabet. One of the first volumes to feature an explicitly pedagogical orientation was The vocal organ, by the Welsh schoolmaster, Owen Price (1665). Intended for an audience comprising both “outlanders” and native English speakers wishing to enhance their social status through speech “improvement,” it featured some of the earliest-published diagrams of the articulators, along with articulatory descriptions of English speech sounds. Still another work from the same era was William Holder’s (1669) Elements of Speech, which carried the subtitle, “A study of applied English phonetics and speech therapy.” Holder’s development of a detailed and influential theory of segmental phonetics was a byproduct of his desire to promote the teaching of speech to the deaf.
The issues motivating these works – systematic phonetic transcription, pronunciation instruction (PI) for language learners, and the teaching of elocution – were revisited many times afterward by scholars committed to teaching. The elocutionist Alexander Melville Bell, for instance, developed Visible Speech (1867), an iconic transcription system, as a tool for teaching the deaf to articulate accurately, while his son, Alexander Graham Bell, invented the telephone while searching for ways to automatically represent spoken utterances on the printed page. The younger Bell also devoted much of his career to teaching the deaf. Remarkably, one of the most important achievements of the field of phonetics, the IPA, owes its existence to the work of language teachers, particularly Paul Passy, a specialist in French pronunciation and founder of the International Phonetic Association, which had been an organization of teachers in an earlier incarnation.
One of the giants of phonetics, Henry Sweet, was deeply interested in language teaching, as is reflected in his handbook (1900), The practical study of languages; a guide for teachers and learners. Sweet saw pronunciation as the central aspect of second language (L2) learning, and offered extensive advice to learners on how to acquire the sounds of a new language. Key to his perspective was the view that phonetic expertise was essential for teachers:
Phonetics makes us independent of native teachers. It is certain that a phonetically trained Englishman who has a clear knowledge of the relations between French and English sounds can teach French sounds to English people better than an unphonetic Frenchman who is unable to communicate his pronunciation to his pupils, and perhaps speaks a dialectal or vulgar form of French
(p. 48).
Although we view his “certainty” on this point with considerable skepticism, Sweet’s appraisal of the goals of language teaching seems to have been rooted in a realistic understanding of the capabilities of L2 learners. He did not, for instance, advocate attempting to acquire the L2 with perfectly native-like accuracy, but instead identified a suitable goal as “moderate fluency and sufficient accuracy of pronunciation to insure intelligibility” (Sweet, 1900, p. 152). His work thus presaged subsequent discussions of the merits of the “nativeness” and “intelligibility” principles (Levis, 2005) covered later in this chapter.
By the mid-20th century, a burgeoning interest in foreign language instruction in North America led to the development of a behaviorist approach to teaching known as the Audiolingual (AL) method. Based on the notion of “language-as-habit,” AL instruction had its roots in techniques developed for language training in the US military. It focused heavily on aural and oral skills, with near-native pronunciation as one of its aims. Languages were taught directly (i.e., without translation into the native language), and learners mimicked utterances after recorded native-speaker models, with the expectation that their productions would become increasingly accurate as a function of practice. The post-WWII availability of classroom technologies such as the tape recorder and the language laboratory was even thought to obviate the need for teachers to have proficiency in the languages they taught. Although AL methodology was often derided because of its reliance on tedious repetition and rote memorization, its impact on teaching was both powerful and long-lasting (Richards & Rodgers, 2014).
The advent of AL instruction in grade-school curricula coincided with a flurry of scholarly work on pronunciation pedagogy that began to appear in teaching-oriented journals in both the USA and the UK. Among these was Abercrombie’s (1949) paper in the British journal English Language Teaching. A professor and luminary at the University of Edinburgh, Abercrombie had impeccable credentials in phonetics, having studied under Daniel Jones. Abercrombie’s Elements of General Phonetics, originally published in 1967, is still in print and is recognized as one of the most important works in the field.
Like Sweet (1900), Abercrombie (1949) promoted intelligibility over nativeness, arguing that while aspiring secret agents might need to develop perfect L2 accents, such a goal was not only unnecessary for most language learners, but very unlikely to be achieved. He also argued a number of other points that remain contentious. For instance, he took the view that a detailed technical knowledge of phonetics should not be expected of pronunciation teachers and that phonetic transcription, though potentially helpful, was not a requirement in language classrooms. His work also reflected an awareness of large individual differences in pronunciation learning success. Furthermore, he observed that an array of social factors may lead some learners to choose to pronounce their L2 less accurately than they are actually able. The latter point has resurfaced frequently in 21st-century discussions of the relationship between pronunciation and learner identity (Moyer, 2009), though contemporary writers are often unaware that this issue was first identified many decades ago.
While many, if not most, of Abercrombie’s (1949) recommendations on pronunciation teaching continue to be pertinent, they represent an orientation towards pedagogy that might be characterized as “phonetics applied” rather than “applied phonetics.” The former refers to the general use by teaching specialists of knowledge derived from phonetics, even though such knowledge was not obtained with the goal of addressing teaching practices. In contrast, “applied phonetics” is a sub-focus within the field of phonetics that specifically aims to answer questions of pedagogical importance, such as what to teach, how to teach, and when to do so. Abercrombie’s proposals were rooted in a detailed understanding of general phonetics and were further informed by his own observations of language learning in his students and colleagues. The same was true of the work of several other prominent phoneticians of the era, such as Pierre Delattre, who, in addition to his role as a leading researcher of speech synthesis at Haskins Laboratories, also had a reputation in the USA as a master teacher of French (Eddy, 1974).
Neither Abercrombie nor Delattre used empirical research to test their ideas about teaching; nor did they have the benefit of an extensive pedagogically focused research literature to draw upon. However, with the rise of the AL method, an interest in empirical investigations of teaching per se began to emerge. Mid-20th-century research in such journals as Language Learning, the Revue de phonétique appliquée, and even Language often took the form of pedagogical intervention studies in which investigators explored such issues as the general effects of instruction on learning, the benefits of particular pedagogical techniques, and the nature of perception and production difficulties experienced by learners. Observing that “controlled experimentation is perhaps the largest remaining frontier in the field of Teaching English as a Foreign Language” (p. 217), Strain (1963) conducted experimental work intended to pinpoint the pronunciation difficulties of Japanese learners of English both with and without instruction. Despite concluding that his study was largely unsuccessful in addressing his research question, he argued that his findings were useful in highlighting some of the difficulties of carrying out L2 pronunciation research. Strain’s work, in fact, exemplifies an orientation of the field that contrasts with the “phonetics applied” perspective mentioned earlier. Instead, it belongs to a research stream designed specifically to focus on pedagogical questions. This type of “applied phonetics” has its own raison d’être, which is distinct from that of general experimental phonetics; it is motivated by its own research questions, executed with its own methods, and communicated to audiences through its own terminological conventions. To help clarify the distinction, one might compare a pedagogically relevant inference drawn from general knowledge of phonetics with the results of an instructional study. From basic articulatory phonetics, for instance, we can describe the typical configurations of the tongue during the articulation of English plosives using sagittal diagrams, along with appropriate terminology. One might consider providing such articulatory descriptions to a learner of English as a means of teaching accurate articulation. Doing so illustrates “phonetics applied,” in the sense that particular pieces of knowledge from the field of phonetics are exploited in teaching. However, the fact that one can explain to a learner how plosives are produced says nothing about whether and to what degree the learner can actually exploit such knowledge when it is gained through such instruction. In fact, a central issue in PI has long been the observation that explicit instruction is sometimes ineffective. Brière (1966), for instance, successfully trained native English speakers to articulate certain exotic sounds such as /x/ and a dental unaspirated initial /t̪/, but had little success with /ɯ/ and /ɦ/. Similarly, explicit suprasegmental instruction can be both successful and ineffectual (Pennington & Ellis, 2000). But even when instruction succeeds there is no guarantee that the learners’ new skill will offer any true communicative benefit. Many English pronunciation specialists, for instance, argue that some segmentals, such as interdental fricatives, do not merit detailed attention because the time required does not justify the minimal intelligibility gains (if any) the learner is likely to experience (see Derwing & Munro, 2015). We expand on this issue in the discussion of functional load below.
It has often been claimed that L2 teachers can benefit from knowing in advance the phonological difficulties that particular groups of learners will experience. However, ample evidence indicates that linguistically based error prediction is too fraught with problems to be more than minimally applicable in teaching (Munro, 2018b). One of the theoretical offshoots of behaviorist language teaching was the proposal that linguistic comparisons of L1 and L2 can yield predictions of learners’ areas of difficulty. The Contrastive Analysis Hypothesis (CAH) motivated extensive discussions of error hierarchies in the learning of sounds and grammar (Lado, 1957; Stockwell & Bowen, 1983). A key assumption of the phonological component of CAH was that structural differences between L1 and L2, described in terms of taxonomic phonemics, served as the basis for pronunciation errors. The purpose of error hierarchies was to predict relative degrees of difficulty of particular L2 sounds or sound pairs. For instance, a phonemic split, in which a single L1 phoneme corresponds to two separate phonemes in L2, was expected to be extremely challenging. The case of the Japanese flap category apparently subsuming the English /ɹ/ – /l/ distinction illustrates such a split, which would be predicted to make acquisition of that opposition difficult for Japanese learners of English. In contrast, a mere distributional difference between languages, such as the occurrence of word-final voiced consonants in English, but not in German, should offer only minimal difficulty to German learners of English.
Clearly, a contrastive approach can offer some insights into pronunciation difficulties. Japanese learners’ common difficulties with the perception and production of /ɹ/ – /l/ for instance, have been well documented. Nonetheless, empirical work in the 1960s demonstrated that the original conception of CAH was untenable, and the proposal received harsh criticism (Wardaugh, 1970). A rigorously executed intervention study by Brière (1966) yielded important counter-evidence to the assumed phonological error hierarchies. After teaching English speakers a contrived phonological inventory from a fictional language, he found that the predicted degrees of difficulty did not match the learners’ performance. The reasons for the failure of CAH in this case and in others are manifold. As Brière himself observed, most taxonomic phonemic comparisons are too crude to capture the perceptual and articulatory nuances that come into play in phonetic learning. However, a more general problem with CAH is the “language-as-object” fallacy that underlies it. Wardaugh (1970) recognized this problem when he pointed out that an ideal CAH analysis could supposedly be carried out entirely on the basis of previously published linguistic descriptions, with no need for any further observations of language learners. This expectation is at odds with the well-established finding that inter-speaker variability in L2 phonetic learning is large (Smith & Hayes-Harb, 2011), even when a wide range of outside influences are held constant. In a multi-year longitudinal study of English L2 vowel acquisition, for example, Munro and Derwing (2008) found that learners with identical L1 backgrounds, equivalent levels of formal education, and very similar starting proficiency levels, exhibited diverse learning trajectories in production performance. While a great deal of individual variation may be accounted for by the age at which learning begins and by differences in the amount and quality of L2 experience, other influences tied to motivation and aptitude almost certainly played a role (Derwing, 2018b; Flege, Frieda, & Nozawa, 1997; Trofimovich, Kennedy, & Foote, 2015).
On the one hand, one might argue that an exhaustive knowledge both of the phonetic details of the L1 and L2, and of individual learning circumstances, could account for the performance of a particular learner in acquiring the L2 sound system. However, even if it is true, such a proposal has few useful implications for classroom practice. Effective pedagogy entails determining students’ actual needs and addressing them as efficiently as possible. Given that learners differ widely in their pronunciation difficulties, even when they share the same L1 background, the most promising strategy for teaching is to carry out individual learner assessments prior to the beginning of instruction to pinpoint the specific problems that each learner experiences (Derwing, 2018a; Munro, Derwing, & Thomson, 2015).
Disenchantment with Audiolingual instruction eventually led to its abandonment in favor of so-called “designer methods” in the mid-1970s, which emphasized pronunciation to varying degrees (e.g., The Silent Way; Suggestopedia; Community-Counselling Learning) (Brown, 2014). Communicative Language Teaching (CLT) eventually came to dominate North American L2 classrooms in the 1980s, and is still the most popular approach with the addition of a task-based focus. CLT was informed by Canale and Swain’s (1980) seminal model of communicative competence and by Krashen’s (1981) Input Hypothesis, which claimed that with sufficient input, L2 pronunciation could be acquired without explicit instruction. At about the same time, Purcell and Suter (1980) published a correlational study in which they failed to find an effect of PI on learners’ L2 pronunciation, understood by them to refer to native-like production. That finding, interpreted in the context of CLT and the popularity of Krashen’s research, contributed to a radical decline in PI. Although a few expert practitioners insisted on PI’s importance for communicative purposes (e.g., Judy Gilbert, Wayne Dickerson, Joan Morley), it nearly disappeared from language classrooms. During this period, researchers were actively investigating aspects of pronunciation, but they were not pedagogically driven (Leather, 1983).
By the 1970s, an important new line of work on non-native speech perception emerged. For example, an influential study by Miyawaki et al. (1975) revealed differences between Japanese and American listeners’ discrimination of a synthetic /ɹɑ/–/lɑ/ continuum. The poorer performance of the Japanese group was accounted for in terms of language experience, and in the decades since then, countless studies have probed both perception and production of this English liquid distinction. The focus has also expanded to include research on other consonant and vowel distinctions, and to evaluate theoretical proposals, such as Best’s Perceptual Assimilation Model (PAM) (Best, McRoberts, & Sithole, 1988; Best & Tyler, 2007) and Flege’s Speech Learning Model (SLM) (1995).
These models are also described in Chapter 16 of this volume. The now vast body of research carried out by Winifred Strange, Catherine Best, and James Flege, along with their many students and colleagues, is not pedagogically oriented either in terms of its objectives or its dissemination. However, it has relevance to teaching in terms of what it has revealed about phonetic learning, particularly in adults. One especially important aspect of their work is its emphasis on the role of perceptual processes in L2 acquisition. PAM and SLM both assume that the perception of non-native categories entails activation and engagement of previously well-established L1 representations. To acquire the L2 phonological inventory, learners must somehow escape the confining effects of this previously acquired knowledge. Studies of English liquid perception have shown that such learning is at least partially possible in adults, with numerous feedback studies confirming that perceptual accuracy increases with modest amounts of systematic training (Lively, Pisoni, Yamada, Tohkura, & Yamada, 1994; Logan, Lively, & Pisoni, 1991). Perhaps more intriguing is the fact that production accuracy can improve as a result of the same perceptual feedback, even when no explicit instruction on production is given (Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997). Although the original feedback research was motivated primarily by interest in psycholinguistic issues, the implications for teaching are obvious. However, the transfer of findings from the laboratory to the classroom is not automatic. Speech scientists do not normally read or publish their work in education journals, and pedagogical specialists frequently find the research that appears in speech journals to be inaccessible due to its technical detail.
Over the past few decades, pronunciation experts have made explicit the distinction between the Nativeness Principle, which emphasizes accuracy in L2 speakers’ pronunciation, versus the Intelligibility Principle, which underscores the importance of producing speech that listeners can readily understand, even if it diverges from a native norm. As observed by Levis (2005), much of the twentieth century research on pronunciation focused on nativeness, despite calls from Abercrombie (1949), Gimson (1962), and others for an intelligibility-oriented approach. Not until the late 1990s, however, did empirical studies involving the assessment of intelligibility, comprehensibility, and accent establish the quasi-independence of these dimensions of L2 speech (Derwing & Munro, 1997; Derwing, Munro, & Wiebe, 1998; Munro & Derwing, 1995a, 1995b). Since then, it has become apparent that instruction may influence one of these dimensions without necessarily changing the others. In other words, it is possible to improve intelligibility without a noticeable change in accentedness, and an improvement in accentedness does not necessarily entail any intelligibility benefit, and may even be detrimental (Winters & O’Brien, 2013).
Phoneticians have long been interested in the construct of intelligibility, whether in connection with speech that is normal, disordered, synthetic, or foreign-accented. Schiavetti (1992) defined it as “the match between the intention of the speaker and the response of the listener to speech passed through the transmission system” (p. 13). Intelligibility is the most vital of the pronunciation dimensions, because it underlies the success of virtually all spoken communication. Operationalization of this construct includes a variety of approaches, the most common of which employs orthographic transcription by listeners followed by a word match analysis between the transcription and the intended message (Schiavetti, 1992). Other approaches have included responses to True/False utterances (Munro & Derwing, 1995b); comprehension questions (Hahn, 2004); summaries (Perlmutter, 1989); and segment identification (Munro et al., 2015).
Importantly, intelligibility is distinct from foreign accentedness, another dimension that has been probed extensively in the speech literature, and which is usually understood as the perceived degree of difference between an individual speaker’s production patterns and some particular variety. Listeners’ extraordinary sensitivity to accented features was documented in Flege (1984), who observed successful accent detection in excerpts of speech as short as 30 ms. Munro, Derwing, and Burgess (2010), in extending this line of research, found reliable accent detection for utterances played backwards and severely degraded through random splicing and pitch monotonization. Brennan, Ryan, and Dawson (1975) confirmed the reliability and validity of listener judgments of accentedness by utilizing magnitude estimation and sensory modality matching (squeezing a handheld device to indicate strength of accent). More recently, degree of accent has been typically measured using quasi-continuous or Likert-type scales. In the former, listeners indicate the accentedness of utterances with respect to two anchor points (e.g., “no foreign accent,” “very strong foreign accent”) by positioning a lever (Flege, Munro & MacKay, 1995) or clicking on a particular point on a screen (Saito, Trofimovich, & Isaacs, 2017). Likert assessments use a numbered scale; although there is debate regarding the appropriate scale resolution, a scale of at least nine points is known to yield highly reliable results (Isaacs & Thomson, 2013; Munro, 2018a). Degree of accentedness correlates with both segmental and suprasegmental accuracy, as described by Brennan et al. (1975) and Anderson-Hsieh, Johnson, and Koehler (1992). In short, more errors lead to stronger accentedness ratings. However, accentedness is partially independent of intelligibility, in that some (often very salient) accent features do not interfere with understanding or do so only minimally, while others are highly detrimental. As Munro and Derwing (1995a) demonstrated, heavily accented speech is often perfectly understood.
Comprehensibility, a third dimension of L2 speech, has been defined in several different ways (Gass & Varonis, 1984; Munro & Derwing, 1995a; Smith & Nelson, 1985); however, the prevailing sense in the pedagogical literature is Varonis and Gass’ (1982) definition: “how easy it is to interpret the message” (p. 125). Although this speech dimension has received less attention than intelligibility, Munro and Derwing (1995a) recognized the need to distinguish the two, because utterances that are equally well understood may nonetheless require significantly different degrees of effort on the part of the listener. This conceptualization is very similar to the notion in cognitive psychology of “processing fluency” (Alter & Oppenheimer, 2009), which refers to the amount of effort required for a particular cognitive task. Measurement of comprehensibility has relied upon the same types of scaling as accentedness. The reliability of nine-point scalar ratings is well-attested, and the validity of the construct is supported by evidence that listeners’ judgments correlate with response time measures (Munro & Derwing, 1995b; Bürki-Cohen, Miller, & Eimas, 2001; Clarke & Garrett, 2004; Floccia, Butler, Goslin, & Ellis, 2009). Even when speech is fully intelligible, limited comprehensibility has potentially negative consequences for interactions. If listeners have to expend a great deal of effort to understand an interlocutor, they may be disinclined to carry out further interactions (Dragojevic & Giles, 2016). For this reason, comprehensibility is more important to successful communication than is accentedness.
Although fluency is sometimes used to refer to proficiency, we restrict our discussion to Segalowitz’s (2010) tripartite conceptualization of cognitive, utterance, and perceived fluency. The first of these refers to “the speaker’s ability to efficiently mobilize and integrate the underlying cognitive processes responsible for producing utterances with the characteristics that they have” (p. 48). Utterance fluency is generally addressed by measuring speech and articulation rates, pausing, repairs and other temporal phenomena. The third type of fluency is typically measured through listeners’ scalar ratings. Perceived and utterance fluency correlate well (Derwing, Rossiter, Munro, & Thomson, 2004) and are assumed to be a reflection of cognitive fluency (Derwing, 2017). Derwing et al. (2004) observed moderate to high correlations between perceived fluency and comprehensibility, as did Thomson (2015), whereas the relationship between fluency and accentedness was weaker. Speech rate, a component of utterance fluency, is typically slower in L2 speakers than in native speakers (MacKay & Flege, 2004; Munro & Derwing, 2001). Rate manipulations have shown that computer accelerated L2 speech tends to be judged as less accented and more comprehensible than unmodified speech in all but very fast L2 talkers (Munro & Derwing, 2001). Although L2 speakers may often be advised to slow their speech down to make it more comprehensible, the available evidence does not support such a recommendation. As Derwing and Munro (1997) concluded, a seemingly fast speaking rate may be a scapegoat for other aspects of an L2 accent that necessitate additional cognitive processing on the part of the listener. For instance, when listeners find L2 speech difficult to understand because of unexpected differences in segments and prosody, they may incorrectly attribute their processing difficulty to a fast rate. There is no reason to assume that instructing such L2 speakers to slow their speech will address the root cause of their lack of comprehensibility. In fact, deliberately slowed L2 speech has been shown to have deleterious effects on communication, at least in some circumstances (Munro & Derwing, 1998).
As noted previously, throughout most of the 20th century, pronunciation instruction relied heavily on intuition, anecdotal observations, and untested theoretical proposals. It is increasingly recognized, however, that evidence-based pronunciation instruction is both possible and necessary (Munro & Derwing, 2015). In intuition-guided practice, instructors can be misled into placing emphasis on highly salient accent features that are only minimally relevant to intelligibility. For instance, mispronunciations of the English interdental fricatives are both common and noticeable, but there is no evidence that they detract significantly from comprehensibility and intelligibility (Munro & Derwing, 2006). In light of previous practice in which accent features alone guided instructional choices, leading to PI that is not motivated by the Intelligibility Principle, we argue that empirical data should be used to determine pedagogical priorities.
We propose a model of the relationship between research and pronunciation teaching as depicted in Figure 17.1. Although research and teaching need not be theory-driven, theory, both formal and informal, may influence each aspect of the model.
The first balloon in Figure 17.1 represents the knowledge and attitudes regarding pronunciation that underlie effective pedagogy. Questions that can be addressed through research include the following:
To illustrate potentially useful research findings within this domain, we might consider the numerous studies of the relationship between age of second language learning and accentedness. Research evidence shows that the two are closely related and that the likelihood of acquiring unaccented L2 speech becomes vanishingly small as adulthood approaches (Abrahamsson & Hyltenstam, 2009; Munro & Mann, 2005). While the underlying reasons for the relationship remain controversial, an obvious implication is that teachers cannot reasonably expect adult language learners to acquire fully native-sounding speech, suggesting that the Nativeness Principle (Levis, 2005) should not guide their practice.
The second balloon in Figure 17.1, Pedagogical Goals, points to several questions for researchers to address:
An example of how research can address such questions comes from a pedagogical intervention study by Derwing et al. (1998), who compared the production performance of a control group with learners in two instructional conditions. In a blind listening task, judges rated comprehensibility and accentedness pre- and post-instruction of an extemporaneous speaking passage. Learners who received prosodically focused instruction showed improved comprehensibility, whereas neither the control group nor the group who received exclusively segmental instruction improved on this dimension. This finding provides evidence that improvements in comprehensibility can be achieved through instruction over a relatively short period of time, though this particular study did not address long-term retention, a matter that has so far received only limited consideration. At least two research studies have confirmed lasting effects of PI for three months (Couper, 2006) and seven months (Derwing, Foote & Munro, forthcoming), but far more work in this area with a larger sample of learners and different pronunciation features is necessary.
Although a meta-analysis of 86 instructional studies (Lee, Jang, & Plonsky, 2015) indicated that most showed a significant improvement in accentedness, further analysis by Thomson and Derwing (2015) demonstrated that only 9% of 75 studies (all of which also appeared in Lee et al., 2015) measured either comprehensibility or intelligibility (Thomson & Derwing, 2015). As awareness of the Intelligibility Principle grows, researchers will presumably focus less on the accentedness dimension because of its limited importance for communication.
The next balloon depicts Focus of Attention, which relates to the following issues:
An example of a study that partially addressed the first question with regard to segments is Munro and Derwing’s (2006) exploration of the functional load of minimal pairs. Defined as the “amount of work” done by a particular phonemic distinction within the language (Catford, 1987), it takes into account the number of minimal pairs exhibiting the distinction, word frequencies, and contextual distributions. Functional load has the potential to predict problematic substitutions that may result in comprehensibility breakdowns. For example, a high functional load pair of segments, /n/ and /l/, are often interchanged in Cantonese-accented English. Learners from the same background also commonly fail to distinguish the low functional load pair, /d/ and /ð/. When Cantonese-accented utterances were presented to native English listeners for comprehensibility judgments, high functional load errors resulted in a greater comprehensibility decrement than did the low functional load errors. Moreover, the effect of the high functional load errors was additive, whereas that of the low functional load errors was not. Although these results were preliminary, the authors proposed that instructors of English should consider functional load when choosing segments for a teaching focus. Although Catford (1987) proposed a hierarchy of minimal pairs based on functional load, its accuracy remained untested for two decades. Munro and Derwing’s (2006) empirical study exemplifies how a theoretically based pedagogical recommendation can be assessed. For a more recent critical overview of this concept, see Sewell (2017).
The last balloon in Figure 17.1, Learning Activities and Techniques, raises several questions as well:
Among the still sparse research evaluating specific ways of teaching, a handful of studies have demonstrated that perceptual learning alone by L2 speakers can lead to improved productions (Bradlow et al., 1997; Lambacher, Martens, Kakehi, Marasinghe, & Molholt, 2005; Thomson, 2011; Thomson & Derwing, 2016). In an implementation designed for Computer-Assisted Pronunciation Teaching (CAPT), Thomson (2011) evaluated the effects of High Variability Phonetic Training (HVPT) on L2 vowel intelligibility. The listeners identified random presentations of ten target vowels in monosyllables produced by 20 native speakers. They received both auditory and visual feedback over the course of the eight-session intervention. Not only did their perception of vowels improve, but their own vowel productions were also more intelligible after the intervention, as assessed in a blind listener identification task. Although HVPT has been studied extensively in phonetics, it is practically unheard of in pedagogical circles, despite its effectiveness (Thomson, 2018b). This is just one learning activity that warrants more empirical study, but this entire category of research questions, like the others, merits further interdisciplinary research.
In Figure 17.1, research is depicted as the foundation for addressing the areas identified in the balloons. While some of this research takes the form of intervention studies (as cited previously), other, non-pedagogical investigations can inform these areas, as well. For instance, Tajima, Port, and Dalby’s (1997) study of the effects of resynthesizing L2 speech to correct temporal errors demonstrated a significant improvement in intelligibility. Their findings are therefore indicative of the potential benefit of including a focus on the same temporal features in instructional contexts. Unfortunately, many phonetics studies that could have beneficial applications in pedagogical contexts, such as the preceding HVPT example, are almost entirely unknown to teachers or teacher trainers. An attempt to bridge this gap is now emerging with the advent of the Journal of Second Language Pronunciation, and conferences and workshops such as Pronunciation in Second Language Learning and Teaching (PSLLT) (North America), Accents (Poland), New Sounds (various countries), English Pronunciation: Issues and Practices (EPIP) (various countries), and Speech and Language Technology in Education (SLaTE) (various countries).
In spite of the clear importance of intelligibility and comprehensibility in pronunciation teaching, the broader social ramifications of an L2 accent are also worthy of study because of their impact on communication. A line of research dating back many decades has probed the effects of accent on listeners’ judgments of a variety of speaker traits, such as employability, competence, and intelligence (Anisfeld, Bogo, & Lambert, 1962; Kalin & Rayko, 1978; Reisler, 1976). Together, these and other studies demonstrate that listeners make social judgments about speakers on the basis of accent, even though these may be entirely unwarranted. Lambert, Hodgson, Gardner, and Fillenbaum (1960) pioneered the matched guise test, an approach for detecting biases based on accent. In matched guise, a single individual speaks in two conditions differing in accent alone, e.g., Canadian English and Canadian French. By controlling for speaker, content, and physical appearance (if an image is used), the technique uncovers the listener’s accent-related biases. Typically, the speech samples are rated by listeners on a scale for characteristics such as friendliness, intelligence, competence, honesty, and laziness, among others. In general, this type of task has demonstrated that listeners sometimes judge speakers negatively simply because of the way they speak. Other evidence indicates that some members of the public act on their prejudices (e.g., Dávila, Bohara, & Saenz, 1993; Munro, 2003); however, researchers’ understanding of the way in which accent relates to social evaluation has become more nuanced in recent years. McGowan (2015), for instance, found that listeners’ expectations based on physical appearance influence access to existing knowledge in such a way as to facilitate or compromise comprehension. When presented with congruent accents and faces, e.g., Chinese accent/Chinese face, American accent/Caucasian face, listeners transcribed speech samples more accurately than in incongruent conditions. Thus, what might appear to be social stereotyping when an American accent is paired with a Chinese face is actually a more complex cognitive processing issue.
Another influence on the willingness of individuals to interact with L2 speakers is familiarity with their accent. While it might be expected that familiarity should improve comprehension of a particular accent, the existing research has yielded inconsistent results. Gass and Varonis (1984) manipulated familiarity with content, foreign accent in general, a specific foreign accent, and a specific speaker and found a familiarity benefit for the listeners in all cases. Bent and Bradlow (2003) identified an intelligibility benefit for listeners of shared L1s when speech samples were presented (in noise) in the L2. Munro, Derwing, and Morton (2006), however, found inconsistent benefits of a shared L1 background in the intelligibility of extemporaneously produced L2 speech. They concluded, as did Major, Fitzmaurice, Bunta, and Balasubramanian (2002) that L2 speech in one’s own L1 accent is sometimes understood better, but not always.
Both speakers and listeners have responsibility for the success of an interaction. While pronunciation instruction may improve low comprehensibility speech, sometimes a focus on the listener may also be useful. From a practical perspective, increasing people’s familiarity with L2 accents may enhance their inclination to interact with L2 speakers, as was found by Derwing, Rossiter, and Munro (2002), who provided explicit training to listeners. Students in a social work program were given eight weekly intercultural awareness sessions with Vietnamese-accented examples relevant to their social work content. Furthermore, they were taught aspects of a Vietnamese accent, such as final consonant deletions, reduction of consonant clusters, use of implosive stops, substitution of initial /f/ for /p/, substitution of final /s/ for any other final C, and inaccurate stress placement. The students reported much higher confidence in their own abilities to communicate with an L2 speaker at the end of the program. In a different awareness-raising activity, Weyant (2007) played a passage narrated by a female L2 speaker. One group of listeners who wrote a paragraph depicting a typical day from her point of view (using the first person pronoun) later registered more positive impressions of the speaker than did another group who wrote a similar paragraph but from a third person stance. This short intervention illustrates how attitudinal changes may be altered through a relatively simple task. Lindemann, Campbell, Litzenberg, and Subtirelu (2016) compared implicit and explicit accent training with native speakers of English. One group transcribed sentences after having practiced transcribing Korean-accented speech, while the other group did so after instruction comparing Korean- and American-accented English. Both groups improved. The same participants answered comprehension questions on a Korean-accented, TOEFL-based lecture but neither group showed significant improvement. In both the Lindemann et al. and the Derwing et al. (2002) studies, the interventions were very short. More research is necessary to determine whether this type of instruction can positively influence comprehension as well as attitudes.
A question often asked by teachers is “which matters more – segments or prosody?” to which the only sensible answer is “it depends.” It depends on the phonetic inventories of the L1 and the L2 and on the individual learner’s errors: students who share the same L1 exhibit tremendous variability (Munro et al., 2015). It also depends on the location of a segmental error in a word – an initial consonant error can have more serious implications for intelligibility than an error elsewhere (Bent, Bradlow, & Smith, 2007; Zielinski, 2008). In addition, it depends on the nature of both the segmental and prosodic errors. It is clear that both segments and prosody can affect intelligibility and comprehensibility (Derwing et al., 1998). More research on concepts such as functional load and studies similar to that of Hahn (2004) may elucidate which segments and aspects of prosody matter most, but more research is also needed to examine the combined effects of errors (Derwing, 2008).
Teachers often ask whether phonetic transcription should be used in pronunciation classes. Abercrombie (1949) argued that pronunciation instructors can be effective without using transcription, and certainly, if they are not fully conversant and comfortable with transcription, they should not use it. Moreover, if learners have limited literacy skills, introducing another writing system is likely to confuse them. However, if the instructor has a good grasp of transcription, and the students are literate, then some use of IPA symbols may be advantageous, at least in languages such as English, which has a nontransparent orthography. Transcription is generally unnecessary for languages such as Finnish or Spanish, because their orthographic systems correspond very closely to their sound systems, at least at the phonemic level.
In countries where English is not the language of the majority, teachers commonly raise the question of which variety to teach. Usually the debate is whether to employ British or American English in the classroom, as if these two descriptors were monolithic. In fact, much of the instruction is likely to be in the accent of the teacher, regardless of program directives (Derwing & Munro, 2015). However, technology allows choices. Recordings of multiple dialects of English (and other languages) are now readily available, some of which are designed expressly for pronunciation teaching, and others, such as YouTube videos and TED Talks, have been produced for other reasons but can easily be repurposed. Introducing students to multiple varieties of the L2 may have benefits for their own comprehension.
With the increasing attention on English as a Lingua Franca (ELF) over the past 15 years, it has been argued that learners should not be expected to emulate a native model of English, but should instead strive for mutual intelligibility using a core set of pronunciation features (Jenkins, 2002). Adherents of ELF make the case that most nonnative speakers will interact primarily with other nonnative speakers; therefore, their pronunciation model should not be a native speaker variety (Walker, 2011). This notion has been criticized on several grounds. Isaacs (2014) for instance, indicated that
the inclusion criteria for speech samples in the English as a lingua franca corpus that Jenkins and her colleagues frequently cite have not been clarified. Therefore, substantially more empirical evidence is needed before the lingua franca core can be generalized across instructional contexts or adopted as a standard for assessment.
(p. 8)
Szpyra-Kozɫowska (2015) argues that implementation of an ELF curriculum is extremely difficult, indicating also that since “the majority of English learners in the Expanding Circle already use English pronunciation based on some native model, EFL users might have problems not only communicating with native speakers, but also with non-native speakers” (p. 21). Given the diversity of English varieties, it seems prudent to expose learners to several, taking into consideration where they live and with whom they are likely to interact. In immigrant-receiving countries such as Australia, Canada, New Zealand, the UK, and the USA, the local varieties should receive the main focus of attention, but regardless of location, all learners would likely benefit from having access to several dialects.
Another frequently raised question is whether teachers should instruct their learners on speech reduction features. Given that L2 perception and production are closely linked, one of the goals of pronunciation instruction should be improved perception on the part of the learners. It is likely that students can implement knowledge about reduction processes to enhance their understanding of their interlocutors. For instance, the reduced form of “going to” (i.e., /ˈɡʌnə/) differs dramatically from its citation pronunciation, and may be unrecognizable to a learner who has not been explicitly taught what to listen for. Because full forms are probably at least as comprehensible to listeners of L2 speech as reduced forms, we surmise that students should not be required to produce reductions. If so, class time would be better spent on enhancing aspects of speech that typically interfere with comprehensibility.
A long-standing question is “when is the best time to learn a second language for optimal pronunciation?” Studies indicate that near-native accents in the L2 are tied to early childhood learning (Abrahamsson & Hyltenstam, 2009; Flege et al., 1995). Furthermore, infants begin losing their ability to distinguish sound contrasts that are not in their own L1 within the first year of birth (Werker & Tees, 2002), presumably making L1 acquisition more efficient. Flege et al.’s (1995) study examined accentedness in Italian immigrants who came to Canada at different ages; arrivals as young as four years of age showed evidence of an L2 accent in adulthood. However, many people, especially immigrants, cannot choose when they have to learn an L2. The age question is thus of limited practical relevance. An L2 accent is a completely normal phenomenon and should not be considered “a problem” unless a speaker has issues of intelligibility and/or comprehensibility (Derwing & Munro, 2015).
Various forms of technology have become prominent in pronunciation instruction, and are likely to take on a larger role in the years to come. Audio recording devices have been widely available since the mid-20th century, but with the advent of smartphones, students can easily send sound files back and forth to their instructors. CAPT is often employed to enhance learners’ perceptual skills. In addition, evidence-based pedagogical tools, such as the English Accent Coach (Thomson, 2018a), provide training that can be individualized to the needs of the students. This software, which employs HVPT for ten vowels and 24 consonants, has been shown to significantly improve L2 students’ perception and production (Thomson, 2018b).
In Lee et al.’s (2015) meta-analysis of pronunciation instruction research, CAPT studies were less effective than teacher-led activities, but since many language teachers are hesitant to offer any pronunciation instruction, CAPT on its own can still play a role. The clear advantages of CAPT include the amount of practice time it affords and the option for individualization catering to the needs of each user. In addition, as Ferrier, Reid, and Chenausky (1999) discovered, students whose progress was closely monitored by a teacher outperformed those left to their own devices. More gamified software targeting L2 pronunciation is being developed regularly; however, with a few notable exceptions, the publishers have tended to focus on aspects of the technology, without enough attention to important linguistic issues and best pedagogical practices (Foote & Smith, 2013). Visual feedback on speech for CAPT has been studied for some time and may offer new directions for PI (see Chapter 19 in this volume).
Technological advances over recent decades have influenced PI, and future innovations are certain to continue the trend. The wide availability of digital audio, for example, has made spoken utterance models, including dialectal variants, easily accessible to learners, and audio dictionaries have, to some degree, obviated the need for phonetic transcription as a guide to pronunciation at the lexical level. Text-to-speech applications also offer promise as a source of aural input during instruction (Bione, Grimshaw, & Cardoso, 2016). Of particular interest are the significant advances in the implementation of automatic speech recognition (ASR) for teaching purposes. As noted earlier, a recurrent finding in pronunciation research is inter-speaker variability, which points to the need for individual assessment and instruction tailored to each learner’s needs. It is widely recognized that a general benefit of computer-based instruction is individualization; high-quality ASR, in particular, offers the possibility of immediate pronunciation feedback to learners that can assist them in modifying their productions (Cucchiarini & Strik, 2018). Research on ASR-based feedback using a CAPT system designed for learning Dutch (Cucchiarini, Neri, & Strik, 2009) indicated relatively high error-detection rates along with positive feedback from users. The overall benefits of ASR feedback for learners, however, ultimately depend on how well the CAPT system addresses individual needs so as to highlight pronunciation difficulties that compromise intelligibility and comprehensibility (O’Brien et al., 2018). In addition, adaptive systems offer much promise, yet remain largely under-developed.
The rapidly growing field of accent reduction raises ethical issues regarding the qualifications of those offering PI and the claims they make regarding the value and extent of their services. Entrepreneurs with little or no training have opened businesses as accent coaches. Some of these enterprises are small, with a single individual offering services over Skype, while others are large companies, catering to other organizations (Derwing & Munro, 2015; Thomson, 2014). In many cases, advertising presents highly inaccurate statements about the English phonological system, betraying a lack of knowledge on the part of the providers. The obvious factual errors on many websites suggest that learners might receive little or no value for their money, and worse, they could be seriously misled to the point that they become less intelligible. Among the documented false and unsubstantiated claims are the following: English has no syllables; a syllabic /n/ is the same as a “long” /n/; practicing speech with a marshmallow between the lips is effective for accent reduction (for a detailed review, see Derwing & Munro, 2015).
In addition to falsehoods promoted by many accent reductionists, it is commonly claimed that accents can be “eliminated” in short periods of time. In fact, we know of no empirical evidence indicating that typical adult L2 learners can achieve such a goal (Derwing & Munro, 2015). It is understandable that some L2 speakers are concerned about the social consequences of their accents and may wish to sound native-like, even when they are already highly intelligible and easy to understand. Their fears are a response to societal bias including discriminatory practices against immigrants. Moreover, the advertising on accent reduction websites seems to exploit those fears and conveys the message that accent reduction can somehow reduce discrimination. We know of no research supporting such a conclusion. On the one hand, there is no reason to deny L2 learners whatever instruction they wish, including so-called accent reduction; on the other, there is no reason to give credence to the message that accents can be eliminated or that reducing an accent is effective in improving social status or opening employment options.
Currently there are no regulatory bodies for practitioners of accent reduction. As Lippi-Green (2012) observed, anyone can claim to be an accent coach, just as anyone can claim to “have developed a miracle diet and charge money for it” (p. 229). A consequence of this absence of regulation is a proliferation of predatory websites, which at best simply waste people’s money, but may actually cause them to be less intelligible (Derwing & Munro, 2015). This growing problem is exacerbated by the fact that so many ethical instructors of English feel inadequately trained to teach pronunciation (Foote, Holtby, & Derwing, 2011).
Other practitioners who sometimes offer “L2 accent modification” are speech-language pathologists (SLPs). As Thomson (2014) has pointed out, SLP credentials alone do not guarantee adequate background for teaching pronunciation to L2 learners, and indeed, most SLPs lack training in this area (Müller, Ball, & Guendouzi, 2000; Schmidt & Sullivan, 2003; Thomson, 2013). Unless they engage with the applied phonetics literature and second language acquisition research, they have little foundation for providing learners with a focus on comprehensibility and may even entertain ill-informed perspectives on accent. On its website, the American Speech-Language-Hearing Association (2018) emphasizes “that accents are NOT a communication disorder” and that “SLPs who serve this population should be appropriately trained to do so.” However, Thomson (2013) indicates that some SLPs do treat accent as a clinical disorder, a perception documented by Schmidt and Sullivan (2003) in a national American survey of SLP attitudes. Thus, some practitioners do not necessarily adhere to ASHA’s recommendations on this issue.
An alternative to “accent reduction” and “accent modification” is “pronunciation teaching,” the term typically employed by applied linguists. Language teachers, however, as indicated previously, often have limited access to training, because university teacher education programs rarely include coursework on how to teach pronunciation. In a survey of Canadian language programs, Foote et al. (2011) found that teachers generally recognize the importance of pronunciation, but often feel they have inadequate skills. In light of the recent explosion of applied phonetics research, more teacher training opportunities are likely to emerge in the future.
The current momentum in pronunciation research will lead to advancements in a variety of subdomains. Areas of particular need can be identified through an examination of Figure 17.1. Knowledge and Attitudes represents a subdomain in which a great deal of valuable work has already been carried out. Remaining avenues to explore include the benefits of intensive PI during the first massive exposure to the L2, referred to as the “Window of Maximal Opportunity” (WMO) (Derwing & Munro, 2015). Nevertheless, this area of work, along with Pedagogical Goals, represents the least urgent needs of the field. Such is not the case for Focus of Attention, which requires considerably more investigation. Teachers need guidance on where to focus their priorities in the pursuit of intelligibility and comprehensibility goals. Suitable issues for empirical inquiry include functional load, error gravity, and combinatory effects of pronunciation errors.
Of the four areas identified in Figure 17.1, though, Learning Activities and Techniques is the one most in need of attention. Whether learners receive the PI they need depends on how new knowledge of evidence-based practices finds its way to the classroom. Not only must research findings be incorporated into materials and software, but they must be communicated to teachers through appropriate training. Teachers must be able to identify students’ individual comprehensibility needs, and to implement the learning activities necessary to bring about positive changes. Given that teachers often have insights into the complexities of their students’ communication needs and the constraints inherent in the language classroom, the best results for students will be realized if researchers and teachers collaborate to a greater degree.
With respect to the design of learning tasks, the emphasis in research must be placed on identifying activities that are both engaging and demonstrably effective. Exploratory studies such as Lee and Lyster’s (2016) and Saito and Lyster’s (2012) comparison of the effects of types of classroom feedback on speech perception offer a promising direction for replication and extension. On a micro level, Couper (2011) and Fraser (2006) have discussed the importance of using socially constructed feedback: “metalanguage developed by students working together with the teacher using already understood first language (L1) concepts to help in the formation of target language phonological concepts” (Couper, 2011, p. 159). Another potentially useful avenue of investigation is that of Galante and Thomson (2017), who compared two oral classroom activities to determine which had the best outcomes on dimensions such as fluency and comprehensibility. Expansion of this research to encompass a range of activities would be valuable.
As noted earlier, technology has always held promise for advances in PI, but its impact has been attenuated by insufficient balance of linguistically accurate, pedagogically sound and technologically sophisticated considerations. Specialist teams, including phoneticians, L2 pedagogical experts, engineers, and computing experts can produce work together that takes into account the multifaceted needs of individual learners. While such collaboration is taking place in The Netherlands (e.g., Cucchiarini & Strik, 2018), it has yet to expand internationally.
Finally, as the importance of “big data” emerges, PI research would benefit from new or revised methodology. Corpus-based approaches (see, e.g., Yoon et al. 2009), larger scale studies, delayed post-tests (carried out months or even years after interventions), longitudinal studies with different populations, and investigations of the acquisition of pronunciation in a wider range of L2s could all lead to better insights, provided that the ultimate goal of applied phonetics research is understood to be enhanced communication skills for L2 speakers.
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching and research. Amsterdam, NL: John Benjamins.
Levis, J. M., & Munro, M. J. (2017). Pronunciation, Vols. 1–4. London & New York, NY: Routledge.
Applied linguistics, L2 pronunciation, speech perception, speech production, second language teaching
Abercrombie, D. (1949). Teaching pronunciation. ELT Journal, 3, 113–122.
Abrahamsson, N., & Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language: Listener perception versus linguistic scrutiny. Language Learning, 59, 249–306.
Alter, A. L., & Oppenheimer, D. M. (2009). Uniting the tribes of fluency to form a metacognitive nation. Personality and Social Psychology Review, 13, 219–235.
American Speech-Language-Hearing Association. (2018). Document. Retrieved March 9, 2018 from www.asha.org/Practice-Portal/Professional-Issues/Accent-Modification/
Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42, 529–555.
Anisfeld, M., Bogo, N., & Lambert, W. (1962). Evaluational reactions to accented English speech. Journal Abnormal Psychology, 65, 223–231.
Bell, A. M. (1867). Visible speech. London: Simpkin Marshall & Co.
Bent, T., & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. The Journal of the Acoustical Society of America, 114, 1600–1610.
Bent, T., Bradlow, A. R., & Smith, B. L. (2007). Segmental errors in different word positions and their effects on intelligibility of non-native speech. In O. S. Bohn & M. J. Munro (Eds.), Language experience in second language speech learning: In honor of James Emil Flege (pp. 331–347). Amsterdam, NL: John Benjamins.
Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance, 14, 345–360.
Best, C. T., & Tyler, M. D. (2007). Nonnative and second-language speech perception: Commonalities and complementarities. In O. S. Bohn & M. J. Munro (Eds.), Language experience in second language speech learning (pp. 13–34). Amsterdam, NL: John Benjamins.
Bione, T., Grimshaw, J., & Cardoso, W. (2016). An evaluation of text-to-speech synthesizers in the foreign language classroom: Learners’ perceptions In S. Papadima-Sophocleous, L. Bradley, & S. Thouësny (Eds.), CALL communities and culture – Short papers from Eurocall 2016 (pp. 50–54). Dublin, Ireland: Research-Publishing.net.
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. I. (1997). Training Japanese listeners to identify English/r/and/l/: IV. Some effects of perceptual learning on speech production. The Journal of the Acoustical Society of America, 101, 2299–2310.
Brennan, E. M., Ryan, E. B., & Dawson, W. E. (1975). Scaling of apparent accentedness by magnitude estimation and sensory modality matching. Journal of Psycholinguistic Research, 4, 27–36.
Brière, E. (1966). An investigation of phonological interference. Language, 42, 769–796.
Brown, H. D. (2014). Principles of language learning and teaching (6th ed.). New York, NY: Pearson Education.
Bürki-Cohen, J., Miller, J. L., & Eimas, P. D. (2001). Perceiving non-native speech. Language and Speech, 44, 149–169.
Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1–47.
Catford, J. C. (1987). Phonetics and the teaching of pronunciation: A systemic description of the teaching of English phonology. In J. Morley (Ed.), Current perspectives on pronunciation: Practices anchored in theory (pp. 83–100). Washington, DC: TESOL.
Clarke, C. M., & Garrett, M. F. (2004). Rapid adaptation to foreign-accented English. The Journal of the Acoustical Society of America, 116, 3647–3658.
Couper, G. (2006). The short and long-term effects of pronunciation instruction. Prospect, 21 (1), 46–66.
Couper, G. (2011). What makes pronunciation teaching work? Testing for the effect of two variables: Socially constructed metalanguage and critical listening. Language Awareness, 20, 159–182.
Cucchiarini, C., Neri, A., & Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51, 853–863.
Cucchiarini, C., & Strik, H. (2018). Automatic speech recognition for second language pronunciation assessment and training. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), The Routledge handbook of English pronunciation (pp. 556–569). New York, NY: Routledge.
Dávila, A., Bohara, A. K., & Saenz, R. (1993). Accent penalties and the earnings of Mexican Americans. Social Science Quarterly, 74, 902–916.
Derwing, T. M. (2008). Curriculum issues in teaching pronunciation to second language learners. In J. Hansen Edwards & M. Zampini (Eds.), Phonology and second language acquisition (pp. 347–369). Amsterdam: John Benjamins.
Derwing, T. M. (2017). L2 fluency. In S. Louwen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 246–259). New York, NY: Taylor & Francis.
Derwing, T. M. (2018a). The efficacy of pronunciation instruction. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), The Routledge handbook of contemporary English pronunciation (pp. 320–334). New York, NY: Routledge.
Derwing, T. M. (2018b). The role of phonological awareness in language learning. In P. Garrett & J. M. Cots (Eds.), Routledge handbook of language awareness (pp. 339–353). New York, NY: Routledge.
Derwing, T. M., Foote, J. A., & Munro, M. J. (forthcoming). Long-term benefits of pronunciation instruction in the workplace.
Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility. Studies in Second Language Acquisition, 19, 1–16.
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2 teaching and research. Amsterdam, NL: John Benjamins.
Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48, 393–410.
Derwing, T. M., Rossiter, M. J., & Munro, M. J. (2002). Teaching native speakers to listen to foreign-accented speech. Journal of Multilingualism and Multicultural Development, 23, 245–259.
Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004). Second language fluency: Judgments on different tasks. Language Learning, 54, 655–679.
Dragojevic, M., & Giles, H. (2016). I don’t like you because you’re hard to understand: The role of processing fluency in the language attitudes process. Human Communication Research, 42, 396–420.
Eddy, F. D. (1974). Pierre Delattre, teacher of French. The French Review, 47, 513–517.
Ferrier, L. J., Reid, L. N., & Chenausky, K. (1999). Computer-assisted accent modification: A report on practice effects. Topics in Language Disorders, 19(4), 35–48.
Flege, J. E. (1984). The detection of French accent by American listeners. The Journal of the Acoustical Society of America, 76, 692–707.
Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Theoretical and methodological issues (pp. 233–277). Timonium, MD: York Press.
Flege, J. E., Frieda, E. M., & Nozawa, T. (1997). Amount of native-language (L1) use affects the pronunciation of an L2. Journal of Phonetics, 25, 169–186.
Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995). Factors affecting strength of perceived foreign accent in a second language. The Journal of the Acoustical Society of America, 97, 3125–3134.
Floccia, C., Butler, J., Goslin, J., & Ellis, L. (2009). Regional and foreign accent processing in English: Can listeners adapt? The Journal of Psycholinguistic Research, 38, 379–412.
Foote, J. A., Holtby, A. K., & Derwing, T. M. (2011). Survey of the teaching of pronunciation in adult ESL programs in Canada, 2010. TESL Canada Journal, 29(1), 1–22.
Foote, J., & Smith, G. (2013, September). Is there an app for that? Paper presented at the 5th Pronunciation in Second Language Learning and Teaching Conference, Ames, Iowa.
Fraser, H. (2006). Helping teachers help students with pronunciation: A cognitive approach. Prospect, 21(1), 80–96.
Galante, A., & Thomson, R. I. (2017). The effectiveness of drama as an instructional approach for the development of second language oral fluency, comprehensibility, and accentedness. TESOL Quarterly, 51, 115–142.
Gass, S., & Varonis, E. M. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language Learning, 34, 65–89.
Gimson, Alfred C. (1962). Introduction to the Pronunciation of English. London: Edward Arnold.
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223.
Hart, J. (1569). An orthographie. London: William Seres.
Holder, W. (1669). Elements of speech. London: T. N. for J. Martyn. Reprinted (1967), R. C. Alston (Ed.). Menston: Scolar Press.
Isaacs, T. (2014). Assessing pronunciation. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 140–155). Hoboken, NJ: Wiley-Blackwell.
Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10, 135–159.
Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied Linguistics, 23(1), 83–103.
Kalin, R., & Rayko, D. S. (1978) Discrimination in evaluative judgments against foreign-accented job candidates. Psychological Reports, 43, 1203–1209.
Krashen, S. D. (1981). Second language acquisition and second language learning. Oxford: Pergamon.
Lado, R. (1957). Linguistics across cultures. Ann Arbor, MI: University of Michigan Press.
Lambacher, S., Martens, W., Kakehi, K., Marasinghe, C., & Molholt, G. (2005). The effects of identification training on the identification and production of American English vowels by native speakers of Japanese. Applied Psycholinguistics, 26, 227–247.
Lambert, W. E., Hodgson, R. C., Gardner, R. C., & Fillenbaum, S. (1960). Evaluational reactions to spoken language. Journal of Abnormal and Social Psychology, 60, 44–51.
Leather, J. (1983). Second-language pronunciation learning and teaching. Language Teaching, 16(3), 198–219.
Lee, A. H., & Lyster, R. (2016). Effects of different types of corrective feedback on receptive skills in a second language: A speech perception training study. Language Learning, 66, 809–833.
Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36, 345–366.
Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39, 369–377.
Lindemann, S., Campbell, M. A., Litzenberg, J., & Subtirelu, N. C. (2016). Explicit and implicit training methods for improving native English speakers’ comprehension of nonnative speech. Journal of Second Language Pronunciation, 2, 93–108.
Lippi-Green, R. (2012). English with an accent: Language, ideology, and discrimination in the United States (2nd ed.). New York, NY: Routledge.
Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y. I., & Yamada, T. (1994). Training Japanese listeners to identify English/r/and/l/. III. Long-term retention of new phonetic categories. The Journal of the Acoustical Society of America, 96, 2076–2087.
Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/and/l/: A first report. The Journal of the Acoustical Society of America, 89, 874–886.
MacKay, I. R. A., & Flege, J. E. (2004). Effects of the age of second-language learning on the duration of first-language and second-language sentences: The role of suppression. Applied Psycholinguistics, 25, 373–396.
Major, R. C., Fitzmaurice, S. F., Bunta, F., & Balasubramanian, C. (2002). The effects of nonnative accents on listening comprehension: Implications for ESL assessment. TESOL Quarterly, 36, 173–190.
McGowan, K. B. (2015). Social expectation improves speech perception in noise. Language and Speech, 58, 502–521.
Miyawaki, K., Jenkins, J. J., Strange, W., Liberman, A. M., Verbrugge, R., & Fujimura, O. (1975). An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception & Psychophysics, 18(5), 331–340.
Moyer, A. (2009). Input as a critical means to an end: Quantity and quality of experience in L2 phonological attainment. In T. Piske & M. Young-Scholten (Eds.), Input matters in SLA (pp. 159–174). Bristol: Multilingual Matters.
Müller, N., Ball, M. J., & Guendouzi, J. (2000). Accent reduction programs: Not a role for speech-language pathologists? Advances in Speech-Language Pathology, 2, 119–129.
Munro, M. J. (2003). A primer on accent discrimination in the Canadian context. TESL Canada Journal, 20(2), 38–51.
Munro, M. J. (2018a). Dimensions of pronunciation. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), Routledge handbook of contemporary English pronunciation (pp. 413–431). New York, NY: Routledge.
Munro, M. J. (2018b). How well can we predict L2 learners’ pronunciation difficulties. The CATESOL Journal, 30, 267–281.
Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45, 73–97.
Munro, M. J., & Derwing, T. M. (1995b). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38, 289–306.
Munro, M. J., & Derwing, T. M. (1998). The effects of speech rate on the comprehensibility of native and foreign accented speech. Language Learning, 48, 159–182.
Munro, M. J., & Derwing, T. M. (2001). Modelling perceptions of the comprehensibility and accentedness of L2 speech: The role of speaking rate. Studies in Second Language Acquisition, 23, 451–468.
Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruction: An exploratory study. System, 34, 520–531.
Munro, M. J., & Derwing, T. M. (2008). Segmental acquisition in adult ESL learners: A longitudinal study of vowel production. Language Learning, 58, 479–502.
Munro, M. J., & Derwing, T. M. (2015). A prospectus for pronunciation research methods in the 21st century: A point of view. Journal of Second Language Pronunciation, 1, 11–42.
Munro, M. J., Derwing, T. M., & Burgess, C. S. (2010). Detection of nonnative speaker status from content-masked speech. Speech Communication, 52, 626–637.
Munro, M. J., Derwing, T. M., & Morton, S. L. (2006). The mutual intelligibility of L2 speech. Studies in Second Language Acquisition, 28, 111–131.
Munro, M. J., Derwing, T. M., & Thomson, R. I. (2015). Setting segmental priorities for English learners: Evidence from a longitudinal study. International Review of Applied Linguistics in Language Teaching, 53(1), 39–60.
Munro, M., & Mann, V. (2005). Age of immersion as a predictor of foreign accent. Applied Psycholinguistics, 26, 311–341.
O’Brien, M. G., Derwing, T. M., Cucchiarini, C., Hardison, D. M., Mixdorff, H., Thomson, R. I., Strik, H., Levis, J. M., Munro, M. J., Foote, J. A., & Levis, G. M. (2018). Directions for the future of technology in pronunciation research and teaching. Journal of Second Language Pronunciation, 4, 182–206.
Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for English sentences with prosodic cues. The Modern Language Journal, 84, 372–389.
Perlmutter, M. (1989). Intelligibility rating of L2 speech pre-and post-intervention. Perceptual and Motor Skills, 68, 515–521.
Price, O. (1665). The vocal organ. Menston, Yorkshire: Scolar Press, A Scolar Press Facsimile.
Purcell, E. T., & Suter, R. W. (1980). Predictors of pronunciation accuracy: A re-examination. Language Learning, 30, 271–287.
Reisler, M. (1976). Always the laborer, never the citizen: Anglo perceptions of the Mexican immigrant during the 1920s. Pacific Historical Review, 45, 231–254.
Richards, J. C., & Rodgers, T. S. (2014). Approaches and methods in language teaching. Cambridge: Cambridge University Press.
Robinson, R. (1617). The art of pronunciation. London: Nicholas Oakes. Reprinted in 1969 by Menston, England: The Scolar Press.
Saito, K., & Lyster, R. (2012). Effects of form-focused instruction and corrective feedback on L2 pronunciation development of /ɹ/ by Japanese learners of English. Language Learning, 62, 595–633.
Saito, K., Trofimovich, P., & Isaacs, T. (2017). Using listener judgements to investigate linguistic influences on L2 comprehensibility and accentedness: A validation and generalization study. Applied Linguistics, 38, 439–462.
Schiavetti, N. (1992). Scaling procedures for the measurement of speech intelligibility. In R. D. Kent (Ed.), Intelligibility in speech disorders (pp. 11–34). Philadelphia, PA: John Benjamins.
Schmidt, A. M., & Sullivan, S. (2003). Clinical training in foreign accent modification: A national survey. Contemporary Issues in Communication Science and Disorders, 30, 127–135.
Segalowitz, N. (2010). Cognitive bases of second language fluency. New York, NY: Routledge.
Sewell, A. (2017). Functional load revisited. Journal of Second Language Pronunciation, 3(1), 57–79.
Smith, B. L., & Hayes-Harb, R. (2011). Individual differences in the perception of final consonant voicing among native and non-native speakers of English. Journal of Phonetics, 39, 115–120.
Smith, L. E., & Nelson, C. L. (1985). International intelligibility of English: Directions and resources. World Englishes, 4, 333–342.
Stockwell, R., & Bowen, J. (1983). Sound systems in conflict: A hierarchy of difficulty. In B. J. Robinett & J. Schacter (Eds.), Second language learning: Contrastive analysis, error analysis, and related aspects (pp. 20–31), Ann Arbor, MI: University of Michigan Press.
Strain, J. E. (1963). Difficulties in measuring pronunciation improvement. Language Learning, 13(3–4), 217–224.
Subbiondo, J. L. (1978). William Holder’s ‘elements of speech (1669)’: A study of applied English phonetics and speech therapy. Lingua, 46(2–3), 169–184.
Sweet, H. (1900). The practical study of languages: A guide for teachers and learners. New York, NY: Henry Holt & Co.
Szpyra-Kozɫowska, J. (2015). Pronunciation in EFL instruction: A research-based approach. Bristol: Multilingual Matters.
Tajima, K., Port, R., & Dalby, J. (1997). Effects of temporal correction on intelligibility of foreign-accented English. Journal of Phonetics, 25, 1–24.
Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. Calico Journal, 28, 744–765.
Thomson, R. I. (2013). Accent reduction. In Chappelle, C. A. (Ed.), The encyclopedia of applied linguistics (pp. 8–11). Hoboken, NJ: Wiley-Blackwell.
Thomson, R. I. (2014). Myth: Accent reduction and pronunciation instruction are the same thing. In L. Grant (Ed.), Pronunciation myths: Applying second language research to classroom teaching (pp. 160–187). Ann Arbor, MI: University of Michigan Press.
Thomson, R. I. (2015). Fluency. In M. Reed & J. M. Levis (Eds.), The handbook of English pronunciation (pp. 209–226). Hoboken, NJ: Wiley.
Thomson, R. I. (2018a). English Accent Coach [Online resource]. Retrieved from www.englishaccentcoach.com
Thomson, R. I. (2018b). High variability [pronunciation] training (HVPT): A proven technique about which every language teacher and learner ought to know. Journal of Second Language Pronunciation, 4, 207–230.
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36, 326–344.
Thomson, R. I., & Derwing, T. M. (2016). Is phonemic training using nonsense or real words more effective? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, October 2015 (pp. 88–97). Ames, IA: Iowa State University.
Trofimovich, P., Kennedy, S., & Foote, J. A. (2015). Variables affecting L2 pronunciation development. In M. Reed & J. M. Levis (Eds.), The handbook of English pronunciation (pp. 353–373). Chichester, West Sussex: Wiley-Blackwell.
Varonis, E. M., & Gass, S. (1982). The comprehensibility of non-native speech. Studies in Second Language Acquisition, 4, 114–136.
Walker, R. (2011). Teaching the pronunciation of English as a Lingua Franca. Oxford: Oxford University Press.
Wardaugh, R. (1970). The contrastive analysis hypothesis. TESOL Quarterly, 4, 123–130.
Werker, J. F., & Tees, R. C. (2002). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 25(1), 121–133.
Weyant, J. M. (2007). Perspective taking as a means of reducing negative stereotyping of individuals who speak English as a second language. Journal of Applied Social Psychology, 37, 703–716.
Winters, S., & O’Brien, M. G. (2013). Perceived accentedness and intelligibility: The relative contributions of f0 and duration. Speech Communication, 55, 486–507.
Yoon, S. Y., Pierce, L., Huensch, A., Juul, E., Perkins, S., Sproat, R., & Hasegawa-Johnson, M. (2009). Construction of a rated speech corpus of L2 learners’ spontaneous speech. CALICO Journal, 26, 662–673.
Zielinski, B. W. (2008). The listener: No longer the silent partner in reduced intelligibility. System, 36, 69–84.