Robert Fuchs and Eva-Maria Wunder

9 A sonority-based account of speechrhythm in Chinese learners of English34

1 Introduction

Speech rhythm appears to be a rather difficult feature to master when learning a second or foreign language (e.g. Adams 1979; Bond and Fokes 1985; Faber 1986; Wennerstrom 2001). As various studies have confirmed (e.g. Kaltenbacher 1998; Moyer 1999; Van Els and de Bot 1987; Anderson-Hsieh, Johnson, and Koehler 1992), an inappropriate rendering of the speech rhythm of a language is one of the main reasons learner speech is perceived as accented. To date, the most frequently adduced reason for rhythmic differences in L1 speakers compared to learners is influence resulting or deriving from structural differences between the L1 and the non-native target language (e.g. Adams 1979; Wenk 1985; Kaltenbacher 1998; Gut 2003a, 2003b; Lee, Guion, and Harada 2006). Further potential sources of influence on non-native speech rhythm have not been thoroughly explored yet, such as the phonological structures of a previously acquired non-native language. Another aspect that needs more attention is that second and foreign language users may differ with respect to the target norms they pursue (Gut 2007; Schneider 2003, 2007), so that differences between native and non-native speech rhythm might be due to differences in norm orientation and/or cross-linguistic influence.

Research trying to resolve these questions also needs to take into account that the definition and analysis of speech rhythm have evolved over the last decades. All languages were traditionally thought to belong to discrete rhythm classes: for example, British English (BrE) and German were thought to be stress-timed, while Spanish and French were thought to be syllable-timed. Following a more recent approach to speech rhythm, languages could also be differentiated according to phonetic aspects, i.e. length, pitch and quality, and phonological aspects, i.e. syllable structure and the function of accent (Dauer 1987). However, many researchers have come to the conclusion that speech rhythm should

Robert Fuchs and Eva-Maria Wunder, Westfälische Wilhelms-Universität Münster

rather be considered as a gradable property, where degrees of stress- and syllable-timing can be distinguished.

The traditional account classifies the world’s languages as either stress-timed or syllable-timed35 (e.g. Pike 1945; Abercrombie 1967). This distinction of two main classes of speech rhythm was based on the assumption that in prototypically stress-timed languages, such as BrE, Arabic or German, prominent syllables seem to occur at quasi-isochronous (i.e. regular) intervals: each foot –comprising the length of a stress beat plus all subsequent unstressed syllables up to the next stress beat – is allegedly always of the same length. This means that a varying number of syllables can be contained in one such isochronous foot. Consequently, stress-timed languages were thought to show great durational variability of syllables.

In prototypical syllable-timed languages, such as Spanish, Romanian or Mandarin Chinese, on the other hand, it is the duration of the syllable, not of the foot, that is supposedly equal; this means that the syllables are isochronous and stress beats occur only irregularly (e.g. Auer 2001; Rossi 1998). Thus, syllable-timed languages exhibit rather little or no variability of syllable durations.

To account for the differences believed to exist between stress- and syllable-timed languages, several rhythm metrics have been proposed, all of which aim to capture acoustic correlates of speech rhythm. Many of these metrics rely on syllabic, consonantal or vocalic measures: Ramus, Nespor, and Mehler’s (1999) %V, ΔV and ΔC, based on vocalic and consonantal intervals; Dellwo’s (2006) modification of these, VarcoC and VarcoV; Low and Grabe’s (1995) PVI (or raw Pairwise Variability Index, rPVI); Gibbon and Gut’s (2001) Rhythm Ratio; or Deterding’s (2001) syllable-based Variability Index. Many of these rhythm metrics have been used to account for the rhythm of postcolonial varieties of English and learner varieties. However, most of them have attracted criticism, for example because they rely on time-consuming and error-prone manual annotation of syllables and vocalic or consonantal intervals. This makes it hard to compare results across studies.

For the present paper, a different method, based on sonority measurements, will be used to determine rhythmic properties (see section 3 below for details). Rather than drawing on the above-mentioned differences in durational variability of syllables, this method (Galves et al. 2002) is based on acoustic measures of sonority and its variability. It can be calculated in an entirely automatic process, thus avoiding the comparability issues and the time-consuming annotation that many widely used rhythm metrics rely on.

The present paper is the first to apply Galves et al.’s (2002) method of automated sonority measurements – and consequently the determination of speech rhythm – to learner language. One of the most interesting aspects in connection with speech rhythm is that it has been found to be transferable in language learning (e.g. Adams 1979; Flege and Bohn 1989; Gut 2009; Lee, Guion, and Harada 2006). Thus, native speakers of a syllable-timed language learning a stress-timed language, for instance, could exhibit L1 influence on their non-native productions, so that their L2 rhythm leans towards syllable-timing. The speech rhythm of learner varieties often becomes a mixture of the L1 and the target language, with durational variability somewhere in between the two extremes, especially when the two differ to a greater degree with regard to rhythm (cf. White and Mattys 2007). In the present investigation, we want to find further evidence for these intermediate values for learner speech by testing native speakers of syllable-timed first language (L1) Mandarin Chinese learning English as a foreign language (with a stress-timed target rhythm). Furthermore, Galves et al.’s (2002) method could also be useful regarding the ongoing discussion about a phonological basis for a distinction between learner English, i.e. English as a foreign language (EFL), and English as a second language (ESL) as it is spoken in postcolonial contexts. We would like to explore whether differences in norm orientation between these groups might influence the rhythm of their speech. While EFL speakers strive to emulate the norms of native and established varieties of English (usually stress-timed British or American English), ESL speakers in postcolonial contexts usually adhere to local, emerging norms (Schneider 2007).

We expect that Mandarin-accented EFL has a speech rhythm between the stress-timed rhythm of English as a native language (ENL) and L1 Mandarin syllable-timed rhythm. This is in contrast to Mandarin-based postcolonial varieties, which have been shown to have a more syllable-timed rhythm (Deterding 1994, 2001; Low, Grabe, and Nolan 2000). We will therefore apply Galves et al.’s (2002) sonority-based rhythm metrics to data from L1 Mandarin, ENL and Mandarin EFL advanced learners’ speech. If advanced Mandarin-accented EFL is relatively similar to ENL in rhythm as measured by sonority-based metrics, then this would support the hypothesis that the learners have successfully acquired the stress-timed ENL rhythm. If, on the other hand, there are substantial differences between these groups, this would suggest that some acoustic correlates of rhythm might be easier to acquire for some learners than others.

The remainder of this paper is structured as follows: Section 2 discusses previous results on speech rhythm in learner language. Section 3 presents the sonority-based rhythm metrics used in this study, and section 4 explains how we applied them to our data. Section 5 presents the results, and section 6 discusses the implications of our findings.

2 Speech rhythm in learner language

A number of studies have previously explored non-native speech rhythm, mostly with a focus on English as the target language (e.g. Adams 1979; Flege and Bohn 1989; Lee, Guion, and Harada 2006). Adams (1979), for instance, elicited variables that add to creating non-native sounding rhythm in the target-language English of learners from various L1 backgrounds, such as a lack of durational differences between stressed and unstressed syllables (cf. Gut 2009: 172; Kaltenbacher 1998: 28). Most studies investigating non-native speech rhythm in English relied on vowel reduction as an acoustic correlate of speech rhythm (e.g. Bond and Fokes 1985; Mairs 1989; Wenk 1985; Zborowska 2000); the common result is that learners of English do not produce enough vowel reduction compared to native speakers. Other studies used the vocalic rhythm metrics VarcoV and nPVI-V to investigate the variability of vocalic durations. In cases where the L1 and target language differed substantially in rhythm, these studies often found a mixed rhythmic pattern in the learner variety (White and Mattys 2007; Grenon and White 2008; Jang 2008; Dellwo, Gutierrez Diez, and Gavalda 2009; Sarmah, Gogoi, and Wiltshire 2009; Ordin, Polyanskaya, and Ulbrich 2011; Tsiartsioni, 2011).

The speech rhythm of the two languages investigated in the present paper, Mandarin Chinese and English, is sufficiently different to allow us to pin down any potential L1 influence on the L2. In Mandarin Chinese, a language commonly classified as syllable-timed, most syllables have roughly similar durations, compared to stress-timed languages (e.g. Rossi 1998; Chiao and Kelz 1985: 30; Hunold 2009: 85; Mok 2009). BrE, by comparison, has a tendency for roughly equally long intervals between two stressed beats, regardless of how many unstressed syllables intervene (e.g. Gut 2003b: 140). Thus, sometimes more, sometimes fewer syllables are produced during one stress interval, obviously resulting in durational variability of these syllables.

Studies on the rhythm of Mandarin-accented EFL speech have so far failed to provide unequivocal evidence of such a mixed rhythmic pattern. Lin and Wang (2008) found Mandarin-accented EFL speakers to use as much vocalic variability (nPVI-V) as ENL speakers, which was shown to be higher than in L1 Mandarin. If the Mandarin-accented EFL speakers had used a truly mixed rhythm, their vocalic variability would have been halfway between that of L1 Mandarin and ENL. Regarding the proportion of vocalic durations over the whole utterance duration (%V), there was more evidence of such a mixed pattern. %V was found to be highest in L1 Mandarin, followed by Mandarin-accented EFL, and finally ENL with the lowest value. This is evidence for the L1 Mandarin speakers using a more syllable-timed rhythm, the ENL speakers a more stress-timed rhythm, and the Mandarin-accented EFL speakers using a mixed rhythm, but tests of the statistical significance of the differences were not reported.

A second study by He (2010) also found L1 Mandarin speakers to exhibit lower variability of vocalic durations (as measured by VarcoV and nPVI-V) than ENL speakers. Advanced Mandarin-accented EFL learners in this study spoke with intermediate variability of vocalic durations. This is commensurate with a description of ENL as stress-timed, L1 Mandarin as syllable-timed, and Mandarin-accented EFL as having a mixed rhythm. But the Mandarin-accented EFL speakers were in fact relatively close to the ENL speakers, so that the difference between ENL and EFL was not (VarcoV) or only marginally significant (nPVI-V). The same constellation was found for the proportion of vocalic durations over the whole utterance duration: %V was significantly higher for L1 Mandarin than for Mandarin-accented EFL, and ENL had lower %V than Mandarin-accented EFL, although this difference was not significant.36

Studies of Mandarin-accented EFL speech have thus failed to provide unequivocal evidence for its alleged mixed rhythm. By contrast, studies of an ESL variety with Mandarin substrate, Singapore English (SinE), have provided such evidence. Low, Grabe and Nolan (2000) used measurements of the variability of vocalic durations (nPVI-V) to show that SinE speakers (significantly less variability) use a more syllable-timed rhythm than BrE speakers (significantly more variability). Similar conclusions were reached by Deterding (1994, 2001), who used a measure of the variability of syllable durations, the Variability Index. The SinE speakers in these studies varied the durations of syllables significantly less than the BrE speakers, which Deterding interpreted as pointing towards a more syllable-timed rhythm in SinE.

The results of these studies suggest that Mandarin-accented EFL, a learner variety, might be relatively close in rhythm to ENL, which is perhaps testament to the fact that relatively advanced learner speech was examined. The learners seemed to be quite successful in imitating or acquiring the rhythm of the target language. SinE (an ESL variety), by contrast, was found to differ in rhythm from ENL. The speakers in the SinE studies also had Mandarin as their L1, but, unlike the Mandarin-accented EFL speakers, they probably did not strive to imitate ENL rhythm. Schneider (2003, 2007) argued that SinE has entered an “exonormative” phase in its development, where Singaporeans have stopped looking to other countries, such as Great Britain, to provide language standards. Instead, they rely on their own norms, and a more syllable-timed rhythm might be a part of these norms. It appears then that one of the differences between EFL and ESL is that proficient EFL learners of English with a syllable-timed L1 strive to acquire a native-like stress-timed rhythm, and are often successful at that. Proficient speakers of ESL varieties with a syllable-timed L1, by contrast, do not aim for a stress-timed rhythm, and a more syllable-timed rhythm may be part of the emerging standard of many of these postcolonial varieties. The fact that English speakers in a postcolonial setting tend to orientate themselves at the stress characteristics of their L1 rather than at those of one of the Inner Circle varieties reflects what Gut (2007) claimed for the phonologies of postcolonial varieties of English in her Norm Orientation Hypothesis.

The studies referred to above all accounted for rhythm by measuring vocalic durations. While this is one of the most commonly used acoustic correlates of rhythm, recently a more holistic view of rhythm has been advocated, taking into account factors other than duration, such as intensity, loudness and pitch (Cumming 2010, 2011; Fuchs 2013, 2014a, 2014b; He 2012; Low 1998; Stojanovic 2009). One of these factors, Galves et al. (2002) argued, should be sonority.

3 Sonority-based rhythm metrics

Contrary to the most commonly used rhythm metrics, such as nPVI-V and %V, which necessarily rely on extensive manual annotation, the sonority-based method suggested by Galves et al. (2002) calculates a measure of sonority based on the rate of change in the spectrum of the speech signal almost fully automatically. Sonority was defined by the authors as relative change in the acoustic signal. Each 2 ms stretch of a recording was mapped onto a scale ranging from 0 to 1, with no change in the acoustic signal corresponding to 1, or high sonority, and rapid change in the acoustic signal corresponding to 0, or little sonority. Following this definition, vowels, with relatively stable periodic patterns, are very sonorous. Obstruents, by contrast, are characterised by aperiodic noise and rapid change in the acoustic signal, corresponding to regions of low sonority. On the basis of this scale, two metrics were defined: S is a measure of mean sonority in an utterance and higher for syllable-timed languages, and δS is a measure of mean change in sonority and higher for stress-timed languages.

Using Ramus, Nespor and Mehler’s (1999) original data in order to replicate and triangulate their results, Galves et al. (2002) were able to discriminate between languages in terms of their rhythmic class, like Ramus, Nespor and Mehler did with their ΔV, ΔC and %V calculations. That is, both measures placed languages on a similar rhythm scale with different metrics applied to these languages. Galves et al.’s methods were also used by Fuchs (2013), who showed that Educated Indian English has a more syllable-timed rhythm than BrE in terms of sonority. Educated Indian English has higher mean sonority (in read and spontaneous speech) and less variation in sonority (in read speech) than BrE.

4 Method

4.1 Data

Recordings were made of 10 L1 Mandarin Chinese advanced learners of English, namely five female and five male speakers aged between 21 and 28 at the time of recording. They started learning English, as their first non-native language, in China via formal instruction when they were between 10 and 14 years old. Classroom instruction was oriented towards the norms of BrE. The learners were first recorded performing a read-out-loud text task in their L1 (syllable-timed) Mandarin Chinese, namely the Mandarin version of Aesop’s fable The North Wind and the Sun plus two phonetically rich sentences (157 words, see Appendix 1). The latter two sentences were selected based on the number of differing phonemes occurring in them in order to aim for a maximally broad sample of Mandarin speech. Secondly, the learners were recorded reading out a short text taken from the online edition of National Geographic in their foreign language English (282 words, see Appendix 2). All recordings were conducted in a quiet room. In order to avoid any bias of results, the participants were simply asked to read out loud first the English and then the Mandarin texts, and were only told after the recordings which aspects of their productions would be investigated.

To provide a point of comparison in the form of ENL, four native speakers of English (two Southern BrE, one American English, one Scottish English, all female, aged 22 to 48, recorded in Münster, Germany) were recorded reading the same text as the Mandarin L2 English speakers. For the recordings, a handheld Edirol R-09 wav/mp3 recording device with an inbuilt stereo condenser microphone was used with a sampling rate of 44 kHz and a bit depth of 16 bit. For the ENL recordings, a high-quality head-mounted microphone was used with the Edirol recording device.

4.2 Data analysis

The uncompressed stereo wav-files were transformed into mono-channel files with the open-source audio editor Audacity. In order to prepare the application of Galves et al.’s (2002) automated sonority measurements, the speech signal had to be segmented into breath units. These were defined as continuous speech of at least five syllables. A pause was defined as a period of silence of at least 150 ms. Utterances of less than five continuous syllables were excluded, as were hesitations, stuttering and non-speech noises, etc. In order to avoid any bias of syllable-final lengthening, the last syllable of each breath unit was also excluded from calculations. After annotating breath units and pauses in Praat TextGrid files and scaling all recordings to the same average intensity level, all breath units of at least five syllables minus the final syllable were extracted automatically with a Praat script.

The breath units were then analysed with the Perl script provided by Galves et al. (2002),37 which uses a console version of Praat (Praatcon) for all acoustic measurements. After calculating mean sonority and variation in sonority for each of the breath units, the data was imported into the statistical package R. Finally, median values of mean sonority and variation in sonority were calculated for each speaker individually. These speaker median values were then used to calculate mean values for each of the three languages/varieties. The significance of the differences between them was established with t-tests, which were applied separately for mean sonority and variation in sonority.

It is expected that the L1 Mandarin Chinese passage turns out to be most syllable-timed and the ENL passage to be most stress-timed, whereas the Mandarin-accented EFL recordings will be positioned in between due to influence from the L1. Consequently, mean sonority (S) is expected to be highest for L1 Mandarin and lowest for ENL; variation in sonority (δS), on the other hand, is hypothesised to be highest for ENL and lowest for L1 Mandarin. For Mandarin-accented EFL, S and δS are expected to show hybrid values: When influenced more by the learners’ L1, they should be situated more towards the syllable-timed end of the speech rhythm continuum, and when influenced by the target language English they should be situated more towards the stress-timed extreme. Whether these expectations indeed hold true will be discussed in the subsequent results section.

5 Results

Figure 1 below shows average mean sonority and variation in sonority for the three groups (data for individual speakers is shown in Appendix 3). Variation in sonority was lowest for Mandarin-accented EFL (EFL, 0.142), followed by L1 Mandarin (Man, 0.146) and ENL (0.156). ENL differed significantly from Mandarin-accented EFL (p < 0.0001, t = 6.0, df = 70.5) and L1 Mandarin (p < 0.01, t = 3.1, df = 43.9), but L1 Mandarin and Mandarin-accented EFL were not significantly different in mean sonority (p = 0.17, t = –1.4, df = 35.1).

Variation in sonority was hypothesised to be higher for ENL than for both L1 Mandarin and Mandarin-accented EFL, which was supported by the data. However, Mandarin-accented EFL was expected to have an intermediate value between L1 Mandarin and ENL, but in fact did not differ significantly from L1 Mandarin.

e9783110345926_i0135.jpg

Figure 1: Mean and standard deviation for variation in sonority (δS) and mean sonority (S ) in L1 Mandarin (Man), ENL and Mandarin-accented EFL (EFL).38

Mean sonority was lowest for Mandarin-accented EFL (0.441), followed by L1 Mandarin (0.445), and ENL (0.506). ENL differed significantly from Mandarin-accented EFL (p < 0.05, t = 2.5, df = 11.6) and L1 Mandarin (p < 0.05, t = 2.2, df = 11.8), but L1 Mandarin and Mandarin-accented EFL did not differ significantly (p = 0.9, t = –0.13, df = 17.9).

Mean sonority was hypothesised to be higher for L1 Mandarin than for ENL, which the data did not support. In fact, ENL had a higher mean sonority than both L1 Mandarin and Mandarin-accented EFL.

6 Discussion

This study was driven by two pairs of hypotheses: The first two hypotheses stated that ENL speakers have (a) higher variation in sonority and (b) lower mean sonority than L1 Mandarin speakers. Recordings of four ENL speakers and ten L1 Mandarin speakers provided support for part (a), i.e. higher variation in sonority in ENL than in L1 Mandarin. However, part (b), lower mean sonority in ENL than in L1 Mandarin, was not confirmed. This unexpected result might conceivably be due to the text that was chosen as the basis for the present measurements, and the fact that only four speakers were recorded.39

The second pair of hypotheses concerned a possible difference between Mandarin-accented EFL and ESL varieties used by L1 Mandarin speakers (such as SinE). Previous research, based on measurements of speech rhythm as variability of vocalic durations, suggested that advanced EFL learners approximate ENL rhythm fairly closely, while speakers of ESL varieties might maintain their more syllable-timed rhythm and continue to adhere to the local emerging standard.

It was thus hypothesised that Mandarin-accented EFL would have intermediate values between L1 Mandarin and ENL, or values closer to ENL, in (a) mean sonority and (b) variation in sonority. The results did not support this. Mandarin-accented EFL had in fact values of mean sonority and variation in sonority that did not differ significantly from L1 Mandarin. Although the Mandarin-accented EFL speakers were advanced learners, they did not approximate ENL stress-timed rhythm as measured by the sonority-based metrics.

It thus appears that duration-based metrics suggest a difference between postcolonial varieties of English with syllable-timed substrate languages (more syllable-timed than ENL), and EFL learners with syllable-timed L1 (close approximation to ENL stress-timed rhythm). Sonority-based metrics, by contrast, suggest that even advanced EFL learners with a syllable-timed L1 maintain this syllable-timed rhythm. These results are seemingly contradictory as long as speech rhythm is regarded as a unitary phenomenon with different acoustic correlates, such as variability in vocalic durations, variation in sonority and mean sonority.

However, speech rhythm might be better considered as a multidimensional phenomenon (Fuchs 2013, 2014a, b; Gut, Trouvain, and Barry 2007; Stojanovic 2009; Loukina et al. 2011; Nolan and Asu 2009). A language might tend towards syllable-timing as measured by one acoustic correlate, and towards stress-timing as measured by another acoustic correlate. Taking this perspective, it appears that EFL speakers with a syllable-timed L1 can come close to ENL stress-timed rhythm as measured by one acoustic correlate (variability in vocalic durations), and maintain a syllable-timed rhythm as measured by other correlates (variability in sonority and mean sonority).

Other acoustic correlates of rhythm that might shed light on this are variation in fundamental frequency, intensity and loudness. In fact, He (2012) studied variability in intensity in Mandarin English, L1 English and L1 Mandarin, and found that Mandarin L2 English remains relatively close to L1 Mandarin syllable-timed rhythm, and does not approximate L1 English stress-timed rhythm. Like mean sonority and variation in sonority, variability in intensity thus appears to be another correlate of rhythm that is more strongly influenced by the L1, even in advanced learners.

Table 1 summarises these results, and attempts a classification of different dimensions of speech rhythm, their learnability, and the consequences for EFL and ESL varieties with a syllable-timed substrate language. Crucially, EFL speakers tend to aim for ENL stress-timed rhythm, while speakers of ESL varieties strive to maintain endonormative standards, which might include a more syllable-timed rhythm. The variability of vocalic durations appears to be a rhythm dimension that is comparatively easy to acquire by learners, and advanced EFL speakers with L1 Mandarin Chinese have been shown to approximate ENL stress-timed rhythm fairly closely on this dimension (Lin and Wang 2008; He 2010). Speakers of ESL varieties with a Mandarin Chinese substrate, by contrast, have been found to maintain a more syllable-timed rhythm, driven by their endonormative standards (Low, Grabe, and Nolan 2000). The conclusion that variability of vocalic durations is a correlate of speech rhythm that is comparatively easy to acquire (at least for advanced learners) is also supported by Gu and Hirose (2014), who investigated the speech rhythm of advanced L1 English learners of Mandarin.

Variation in sonority and mean sonority might be dimensions of speech rhythm that are comparatively hard to change in second language learning. This is what the results of the present study suggest, where speakers of Mandarin-accented EFL were relatively close in rhythm to L1 Mandarin speakers. Speakers of ESL varieties with a syllable-timed substrate language also tend to maintain a more syllable-timed rhythm as measured by this dimension. This is because they desire to maintain local standards with a more syllable-timed rhythm, and, even if they wanted to acquire a more stress-timed rhythm, would face difficulties in doing so because this rhythm dimension might be harder to acquire. Evidence for this rhythm dimension in SinE is lacking, but, as mentioned above, Fuchs (2013) showed that speakers of Educated Indian English have a more syllable-timed rhythm than BrE speakers as far as sonority-based measurements are concerned.

Finally, variability in intensity is another dimension of speech rhythm. It is also comparatively difficult to acquire, and He (2012) showed that even advanced learners of English with L1 Mandarin fail to approximate a stress-timed rhythm on this dimension. Speakers of ESL varieties with a syllable-timed L1 are also likely to maintain a more syllable-timed rhythm because of (1) endonormative standards, which possibly mandate a more syllable-timed rhythm, and (2) because variability in intensity is a rhythm dimension that is more difficult to acquire. This is commensurate with Gut’s (2007) Norm Orientation Hypothesis, and also supported by Low (1998), who determined that SinE has less variability in intensity than BrE.

While a range of studies provide support for the generalisations summarised in Table 1, future research might offer more evidence on these questions, with a wider range of L1s. In addition, other dimensions of speech rhythm, such as variability in loudness and fundamental frequency, should also be considered. Finally, the question of the learnability of different dimensions of speech rhythm also needs to be looked at more closely. Although previous research supports the view that variability in vocalic durations is a feature that is easier to acquire than other dimensions of rhythm, it remains unclear whether it is inherently less difficult, or whether a focus on this rhythm dimension in English language instruction is responsible. If there is evidence for the latter, then a focus on other dimensions of speech rhythm might also help EFL learners in acquiring a stress-timed rhythm on these dimensions.

7 References

Abercrombie, David. 1967. Elements of general phonetics. Edinburgh: Edinburgh University Press.

Adams, Corinne. 1979. English speech rhythm and the foreign learner. The Hague: Mouton de Gruyter.

Anderson-Hsieh, Janet, Ruth Johnson and Kenneth Koehler. 1992. The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42 (4). 529–555.

Auer, Peter. 2001. Silben- und akzentzählende Sprachen. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher and Wolfgang Raible (eds.), Language typology and language universals: An international handbook, 1391–1399. Berlin: Mouton de Gruyter.

Bond, Zinny S. and Joann Fokes. 1985. Non-native patterns of English syllable timing. Journal of Phonetics 13. 407–420.

Chen, Lei and Klaus Zechner. 2011. Applying rhythm features to automatically assess non-native speech. Proceedings of Interspeech 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27–31.

Chiao, Wei J. and Heinrich P. Kelz. 1985. Chinesische Aussprache, 2nd edn. Bonn: Dümmler.

Cumming, Ruth E. 2010. Speech rhythm: The language-specific integration of pitch and duration. Cambridge: University of Cambridge unpublished PhD dissertation.

Cumming, Ruth E. 2011. Perceptually informed quantification of speech rhythm in Pairwise Variability Indices. Phonetica 68 (4). 256–277.

Dauer, Rebecca M. 1987. Phonetic and phonological components of language rhythm. In Tamaz V. Gamkrelidze (ed.), Proceedings of the 11th International Congress of Phonetic Sciences, Tallin, Estonia, August 1–7, 447–450. Talinn: Academy of Sciences of the Estonian SSR.

Dellwo, Volker. 2006. Rhythm and speech rate: A variation coefficient for delta C. In Pawel Karnowski and Imre Szigeti (eds.), Language and language processing: Proceedings of the 38th Linguistic Colloquium, Piliscsaba, 2003, 231–241. Frankfurt: Peter Lang.

Dellwo, Volker, Francisco Gutierrez Diez, and Nuria Gavalda. 2009. The development of measurable speech rhythm in Spanish speakers of English. Actas de XI Simposio Internacional de Comunicacion Social, Santiago de Cuba, 594–597.

Deterding, David. 1994. The rhythm of Singapore English. In Roberto Togneri (ed.), Proceedings of the Fifth Australian International Conference on Speech Science and Technology, Perth, Australia, 316–321. Canberra: Australian Speech Science and Technology Association.

Deterding, David. 2001. The measurement of rhythm: A comparison of Singapore and British English. Journal of Phonetics 29 (2). 217–230.

Faber, David. 1986. Teaching the rhythms of English: A new theoretical base. International Review of Applied Linguistics in Language Teaching 24. 205–216.

Flege, James Emil and Ocke-Schwen Bohn. 1989. An instrumental study of vowel reduction and stress placement in Spanish-accented English. Studies in Second Language Acquisition 11. 35–62.

Fuchs, Robert. 2013. Speech rhythm in educated Indian English and British English. Münster: University of Münster unpublished PhD dissertation.

Fuchs, Robert. 2014a. Integrating variability in loudness and duration in a multidimensional model of speech rhythm: Evidence from Indian English and British English. In Nick Campbell, Dafydd Gibbon and Daniel Hirst (eds.), Social and Linguistic Speech Prosody: Proceedings of the 7th International Conference on Speech Prosody 2014, Dublin, Ireland, 290–294.

Fuchs, Robert. 2014b. Towards a perceptual model of speech rhythm: Integrating the influence of f0 on perceived duration. In Li, Haizhou, Meng, Helen, Ma, Bin, Cheng, Eng Siong, Xie, Lei (eds.), Proceedings of Interspeech 2014, Singapore, pp. 1949–1953. Singapore.

Galves, Antonio, Jesus Garcia, Denise Duarte and Charlotte Galves. 2002. Sonority as a basis for rhythmic class discrimination. Paper presented at Speech Prosody 2002, Aix-en-Provençe, France, 11–13 April.

Gibbon, Dafydd and Ulrike Gut. 2001. Measuring speech rhythm. In Proceedings of Eurospeech 2001, Aalborg, Denmark, 91–94.

Grenon, Isabelle and Laurence White. 2008. Acquiring rhythm: A comparison of L1 and L2 speakers of Canadian English and Japanese. In Harvey Chan, Heather Jacob and Enkeleida Kapia (eds.), Proceedings of the 32nd Annual Boston University Conference on Language Development, 155–166. Somerville: Cascadilla.

Gu, Wentao and Keikichi Hirose. 2014. Rhythmic patterns in native and nonnative Mandarin speech. In Nick Campbell, Dafydd Gibbon and Daniel Hirst (eds.), Social and Linguistic Speech Prosody: Proceedings of the 7th International Conference on Speech Prosody 2014, Dublin, Ireland, 592–596.

Gut, Ulrike. 2003a. Non-native speech rhythm in German. In Marie-Josep Solé, Daniel Recacens and Joan Romero (eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, 3–9 August, 2437–2440. Barcelona: Universitat Autònoma de Barcelona.

Gut, Ulrike. 2003b. Prosody in second language speech production: The role of the native language. Fremdsprachen Lehren und Lernen 32. 133–152.

Gut, Ulrike. 2007. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26 (3). 346–359.

Gut, Ulrike. 2009. Non-native speech: A corpus-based analysis of phonological and phonetic properties of L2 English and German. Frankfurt: Peter Lang.

Gut, Ulrike, Jürgen Trouvain and William J. Barry. 2007. Bridging research on phonetic descriptions with knowledge from teaching practice: The case of prosody in non-native speech. In Jürgen Trouvain and Ulrike Gut (eds.), Non-native prosody: Phonetic description and teaching practice, 3–21. Berlin and New York: Mouton de Gruyter.

He, Lei. 2010. Interlanguage Rhythm. Edinburgh: University of Edinburgh MA thesis. url: http://hdl.handle.net/1842/6011.

He, Lei. 2012. Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Ma, Qiuwu, Ding, Hongwei and Hirst, Daniel (eds.), Proceedings of the 6th International Conference on Speech Prosody, Shanghai, May 22–26, 2012. Shanghai: Tongji University Press.

Hunold, Cordula. 2009. Untersuchungen zu segmentalen und suprasegmentalen Ausspracheab-weichungen chinesischer Deutschlernender. Frankfurt: Peter Lang.

Jang, Tae-Yeoub. 2008. Speech rhythm metrics for automatic scoring of English speech by Korean EFL learners. Malsori Speech Sounds 66. 41–59.

Kaltenbacher, Erika. 1998. Zum Sprachrhythmus des Deutschen und seinem Erwerb. In Heide Wegener (ed.), Eine zweite Sprache lernen, 21–38. Tübingen: Narr.

Lee, Borim, Susan G. Guion and Tetsuo Harada. 2006. Acoustic analysis of the production of unstressed English vowels by early and late Korean and Japanese bilinguals. Studies in Second Language Acquisition 28 (3). 487–513.

Lin, Hua and Qian Wang. 2008. Interlanguage rhythm in the English production of Mandarin speakers. In Proceedings of the 8th Phonetic Conference of China and the International Symposium on Phonetic Frontiers, Beijing, April 18–20.

Loukina, Anastassia, Greg Kochanski, Burton Rosner, Elinor Keane and Chilin Shih. 2011. Rhythm measures and dimensions of durational variation in speech. Journal of the Acoustical Society of America 129. 3258−3270.

Low, Ee Ling. 1998. Prosodic prominence in Singapore English. Cambridge: University of Cambridge PhD dissertation.

Low, Ee Ling and Esther Grabe. 1995. Prosodic patterns in Singapore English. In Kjell Elenius and Peter Branderud (eds.), Proceedings of the XIIIth International Congress of Phonetic Sciences, Stockholm, Sweden, 636–639. Stockholm: Kungliga Tekniska Högskolan (Royal Institute of Technology).

Low, Ee Ling, Esther Grabe and Francis Nolan. 2000. Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and Speech 43. 377–401.

Mairs, Jane Lowenstein. 1989. Stress assignment in interlanguage phonology: An analysis of the stress system of Spanish speakers learning English. Susan M. Gass and Jacquelyn Schachter (eds.) Linguistic perspectives on second language acquisition, 260–283. Cambridge: Cambridge University Press.

Mok, Peggy Pik Ki. 2009. On the syllable-timing of Cantonese and Beijing Mandarin. Chinese Journal of Phonetics 2. 148–154.

Moyer, Alene. 1999. Ultimate attainment in L2 phonology. Studies in Second Language Acquisition 21 (1). 81–108.

Nolan, Francis and Eva Liina Asu. 2009. The Pairwise Variability Index and coexisting rhythms in language. Phonetica 66 (1–2). 64–77.

Nolan, Francis, Kirsty McDougall, Gea de Jong and Toby Hudson. 2006. A forensic phonetic study of ‘dynamic’ sources of variability in speech: The DyViS project. In Paul Warren and Catherine I. Watson (eds.), Proceedings of the 11th Australian International Conference on Speech Science and Technology, University of Auckland, New Zealand: December 6–8, 13–18. Canberra: Australian Speech Science and Technology Association.

Ordin, Mikhail, Leona Polyanskaya and Christiane Ulbrich. 2011. Acquisition of timing patterns in second language. In Piero Cos, Renato de Mori, Giuseppe di Fabbrizio and Roberto Pieraccini (eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association: Interspeech 2011, Florence, Italy, August 27–31. http://www.isca-speech.org/archive/interspeech_2011/i11_1129.html (accessed 14 November 2013).

Pike, Kenneth L. 1945. The Intonation of American English. Ann Arbor, MI: University of Michigan Press.

Ramus, Franck, Marina Nespor and Jaques Mehler. 1999. Correlates of linguistic rhythm in the speech signal. Cognition 73 (3). 265–292.

Rossi, Mario. 1998. Intonation in Italian. In Daniel Hirst and Albert di Cristo (eds.), Intonation Systems, 219–238. Cambridge: Cambridge University Press.

Sarmah, Priyankoo, Divya Verma Gogoi, and Caroline Wiltshire. 2009. Thai English: Rhythm and vowels. English World-Wide 30 (2). 196–217.

Schneider, Edgar W. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79 (2). 233–281.

Schneider, Edgar W. 2007. Postcolonial English: Varieties around the world. Cambridge: Cambridge University Press.

Stojanovic, Diana. 2009. Issues in the quantitative approach to speech rhythm comparisons. Working Papers in Linguistics (University of Hawai’i at Manoa) 40 (9). 1–20.

Tsiartsioni, Eleni. 2011. Can pronunciation be taught? Teaching English speech rhythm to Greek students. In Eliza Kitis, Nikolas Lavidas, Nina Tpointzi and Tasos Tsangalidis (eds.), Selected papers from the 19th International Symposium on Theoretical and Applied Linguistics (ISTAL 19), 447–458. Aristotle University of Thessaloniki: Thessaloniki. http://www.enl.auth.gr/symposium19/.

Van Els, Theo and Kees De Bot. 1987. The role of intonation in foreign accent. The Modern Language Journal 71 (2). 147–155.

Wenk, Brian J. 1985. Speech rhythms in second language acquisition. Language and Speech 28 (2). 157–175.

Wennerstrom, Ann. 2001. The music of everyday speech: Prosody and discourse analysis. Oxford: Oxford University Press.

White, Laurence and Sven L. Mattys. 2007. Calibrating rhythm: First language and second language studies. Journal of Phonetics 35 (4). 501–522.

Zborowska, Justyna. 2000. The acquisition of English speech rhythm by Polish learners. In Proceedings of New Sounds 2000, Amsterdam, The Netherlands, 368–374.

8 Appendices

Appendix 1

The North Wind and the Sun in Mandarin Chinese:

e9783110345926_i0137.jpg

Pinyin:

Yǒu yì huí, běi fēng gēn tài yang zhèng zài nàr zhēng jùn shuí de běn lǐng dà. shuō zhe shuō zhe, lái le yí ge guò lù de, shēn shàng chuān le yí jiàn hòu páo zi. tā men liǎ jiù shāng liang hǎo le, shuō, shúi néng xiān jiào zhè ge guò lù de bǎ tā de páo zi tuō xià lái, jiù suàn shì tā de běn lǐng dà. běi fēng jiù mǎo zú le jìnr, pīn mìng de chuī. kě shì, tā chuī de yuè lì hài, nà ge rén jiù bǎ tā de páo zi guǒ de yuè jǐn. dào mò liǎor, běi fēng méi zhé le, zhǐ hǎo jiù suàn le. yì huǐr, tài yang chū lái yí shài, nà gè rén mǎ shàng jiǔ bǎ páo zi tuō le xià lái. suǒ yǐ, běi fēng bù dé bù chéng rèn, hái shì tài yang bǐ tā de běn lǐng dà.

 

(Translation: The North Wind and the Sun were disputing which of them was stronger, when a traveller came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveller take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew, the more closely did the traveller fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shone out warmly, and immediately the traveller took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two.)

 

Two phonetically rich sentences in Mandarin Chinese:

1.e9783110345926_i0138.jpg

Pinyin: Xiăo péng yŏu men ài wán zhĭ fēi jī hé qì qiú.

(Translation: The children like playing with paper planes and balloons.)

 

2. e9783110345926_i0139.jpg

Pinyin: Wŏ men xĭ huan qù gōng yuán fang fēng zheng, dàng qiū qiān hé dă yŭ máo qiú.

(Translation: We love kite flying, seesawing and playing badminton in the park.)

Appendix 2

English text

 

Koalas

Millions of koalas once lived in Australia. About 100,000 survive today. What’s happening to these popular critters?

Wildfires raged in Australia during January 2002, destroying 600,000 acres of forest. The flames’ victims included countless koalas. These tree-climbing mammals live only in eastern Australia. But the fire alarms caught the attention of koala lovers around the world.

The wildfires, however, were just part of a much larger problem: Forests are vanishing throughout eastern Australia. Cute and popular as koalas are, they’re having trouble hanging on.

Koalas’ problems stem from being picky eaters. These marsupials like just one thing: They’re hooked on eucalyptus, an Australian tree. Koalas use their big noses to sniff out tasty leaves. “If you offered them something else,” says zookeeper Jennifer Toll, “they wouldn’t know what to do with it. They’d starve before they’d eat a carrot.”

Koalas weigh only twenty pounds. But they gobble almost three pounds of food a day. That’s like a sixty-pound kid eating nine pounds a day! Eucalyptus leaves, you see, aren’t very nutritious. So koalas need supersize servings to get enough energy.

Even eating as much as they do, koalas don’t have much energy. So they rest about 20 hours a day. That doesn’t make it any easier to search for mates, especially when their territories are so scattered.

As a result, the koala population plunged. No one knows exactly how many koalas survive today. What does the future hold for koalas? Can humans find ways to help them hold on? Australians hope so. “The koala,” an Australian once said, “is essential to how we see ourselves.” Saving koalas is possible. But it will take time, work, hard choices – and plenty of eucalyptus leaves. (modified from http://magma.nationalgeographic.com/ngexplorer/0303/articles/mainarticle.html ; 16.02.2010)