15_9781118505083-ch10.html

Chapter 10

Grasping the Melody of Language

In This Chapter

Using juncture for different speaking styles and rates

Exploring the syllable and stress assignment

Patching with sonority and prominence measures

Transcribing is more than just getting the vowels and consonants down on paper. You need that extra zest! For instance, you should be able to describe how phonemes and syllables join together, a property called juncture. A phonetician must be able to hear and describe the melody of language, focusing on patterns meaningful for language. This important sound aspect, called prosody, gives speech its zing and is described with a number of specialized terms. This chapter gives you the tools to handle bigger chunks of language, so that you can master description of the melody of language.

Joining Words with Juncture

Unless you’re a lifeless android (or have simply had a very bad night), you probably don’t say things such as “Hel-lo-how-are-you-to-day?” That is, people don’t often speak one word (or syllable) at a time. Instead, speech sounds naturally flow together. Juncture is the degree to which words and syllables are connected in a language. These sections explain some characteristics of juncture and help you transcribe it.

Knowing what affects juncture

A number of factors can affect juncture, including the following:

Some factors are language-specific. Some languages (such as Hawaiian) break things up and have relatively little carryover between syllables, while other languages (such as French) allow sounds to be run together. In French, the process of sounds blending into each other is called liaison, in which sounds change across word boundaries. Check out these two examples:

In these examples, the syllables of Hawaiian have little effect on each other, whereas the French has resyllabification (the shift of a syllable boundary) and a voicing of an underlying /s/ sound — a clear example of adjacent sounds affecting each other.

Other factors are more personal. They include speaking formality and rate. Think about how your speech changes when you formally address a group versus talking casually with your friends. In a formal setting, you usually use more polite forms of address (sir and madam), fancier terms for things (restroom or public convenience instead of john or loo), and frillier sentence constructions (Would you kindly pass the hors d’oeuvre please? instead of Yo. The cheese, please?).

In informal speech, talkers usually have less precise boundaries than in formal speech. This register change often interacts with rate, because rapid speech often causes people to undershoot articulatory positions (not reach full articulatory positions). The result can be vowel centralization (sounds taking on more of an [ə]-like quality), de-diphthongization (diphthongs becoming monophthongs), changes in consonant quality (such as the tongue moving less completely to make speech sounds), and changes in juncture boundaries (including one boundary shifting into another).

Check out these examples from American and British English:

Changes in register and style clearly affect juncture (how speech sounds are connected in terms of pauses or gaps). Some phoneticians refer to juncture as oral punctuation because it acts somewhat like the commas and periods in written language.

Transcribing juncture

You can transcribe juncture in a couple different ways. They are as follows:

Close juncture: This default way of transcribing shows that sounds are close together by placing IPA symbols close together in transcription from phoneme to phoneme. An example is “Have a nice day!” /hӕvə naɪs ˈdeɪ/.

Open juncture: You use open juncture (also referred to as plus juncture) symbols when you need to emphasize gaps separating sounds. Consider these two expressions:

“Have a nice day!” /ˈhӕvə + naɪs ˈdeɪ/

“Have an ice day!” /ˈhӕvən + aɪs deɪ/

Many speakers would probably produce this second example (“Have an ice day”) with a glottal stop before the vowel of ice, as a way of marking the gap between the words “an” and “ice.” To distinguish these two expressions, the exact placement of the gap between the /ə/ and /n/ is critical. Therefore, open juncture symbols are helpful.

Phoneticians use different conventions for juncture between words. Depending on the speaking style, some phoneticians place a content word (such as the verb “have” in the preceding examples) next to an adjacent function word (such as the determiner “a”), resulting in /ˈhӕvə/. Doing so tells the reader there is no pause between these sounds. Other transcribers indicate such juncture with a tie-bar at the bottom of the two words: (/ˈhӕv‿ə/).

The flow of spoken language doesn’t necessarily follow the grammatical patterns you learned in English class. Talkers can run-on or hesitate during speech for many reasons. Consider the sentence, “I went to the store.” This sentence can be produced with many different juncture patterns, such as

I . . . went to the store.

I went . . . to the store.

I went to . . . the store.

I went to the . . . store.

And so on. You get the idea. Transcribing all the potential variations in the exact same way wouldn’t make sense. What’s important is showing where all the gaps take place. Many phoneticians use the IPA pipe symbol ([ǀ]), which technically indicates a minor foot, a prosodic unit that acts like a comma (I describe it in greater detail in Chapter 11). However, many transcribers also use this symbol to represent a short pause, whereas they use a double bar ([‖]) to represent a long pause, such as at the end of a sentence. Here are some examples:

/aɪ ǀwɛn tə ðə stɔɹ‖/

/aɪ wɛnt ǀ tə ðə stɔɹ‖/

If you use these symbols in this manner, be sure to indicate it in notes to your transcription. A good general principle to follow is to employ juncture and timing information only when needed. For instance, the hash mark (#) is a linguistic symbol that means a boundary, such as the end of a word. I have seen older phonetic transcriptions with a hash mark placed between every word. These ended up looking as if a psychotic chicken used the transcription to practice the Rhumba. Keep your transcriptions tailored to your needs, with just the amount of detail your applications require.

Emphasizing Your Syllables

A syllable is something everyone knows intuitively, but can drive phoneticians nuts trying to pin down precisely. By definition, a syllable is a unit of spoken language consisting of a single uninterrupted sound formed by a vowel, diphthong, or syllabic consonant, with other sounds preceding or following it. Phoneticians don’t see the definition so cut and dry.

Phoneticians consider a syllable an essential unit of speech production. It’s a unit with a center having a louder portion (made with more air flow) and optional ends having quieter portions (made with less air flow). Phoneticians agree on descriptive components of an English syllable, as shown in Figure 10-1.

9781118505083-fg1001.eps

Illustration by Wiley, Composition Services Graphics

Figure 10-1: Parts of an English syllable.

From Figure 10-1, you can see that an English syllable (often represented by the symbol sigma [σ]), consists of an optional onset (beginning) and a rhyme (main part). The rhyming part consists of the vowel and any consonants that come after it. The vowels in a rhyme sound alike. At a finer level of description, the rhyme is divided into the nucleus (the vowel part) and the coda (tail or end) where the final consonants are. From this figure, you can take a word like “cat” and identify the different parts of the syllable. For “cat” (/kæt/), the /k/ is the onset, /æ/ is the nucleus, and the /t/ is the coda.

This is why this type of poem rhymes:

Roses are red, violets are blue. . . .

blah blah blah blah, blah blah blah blah . . . you.

Languages vary considerably with which kinds of onsets and codas are allowed. Table 10-1 shows some samples of syllable types permissible for English.

Table 10-1 Sample Syllable Types in English

Example	IPA	Syllable Type
eye	/aɪ/	V
hi	/haɪ/	CV
height	/haɪt/	CVC
slight	/slaɪt/	CCVC
sliced	/slaɪst/	CCVCC
sprints	/spɹɪnts/	CCCVCCC

The last column lists a common abbreviation for each syllable type, where “C” represents a consonant and “V” represents a vowel or diphthong. For instance, “eye” is a single diphthong and thus has the syllable structure “V.” At the bottom of the table, “sprints” consists of a vowel preceded and followed by three consonants, having the structure “CCCVCCC.”

Strings of consonants next to each other are called consonant clusters (or blends). Each language has its own rules for consonant cluster formation. The permissible types of consonants clusters in English are, well, rather odd. Figure 10-2 shows some of the English initial consonant clusters in a chart created by the famous Danish linguist, Eli Fisher-Jørgensen.

9781118505083-fg1002.eps

Illustration by Wiley, Composition Services Graphics

Figure 10-2: Some English syllable-initial consonant clusters.

Notice the phonotactic (permissible sound combination) constraints at work in Figure 10-2. It’s possible to have sm- and sn- word beginnings, but not sd-, sb-, or sg-. There can be an spl- cluster, but not a ps- or psl- cluster.

Stressing Stress

Nothing makes a person stand out as a foreign speaker more than placing stress on the wrong syllable. In order to effectively teach English as a second language, transcribe patient notes for speech language pathology purposes, or work with foreign accent reduction, you need to know how and where English stress is assigned. This, in turn, requires an understanding of phonetic stress at the physiologic and acoustic levels.

Stress is a property of English that’s signaled by a syllable being louder, longer, and higher than its neighbors. It’s a suprasegmental property (which means that it extends beyond the individual consonant or vowel). Louder, longer and higher are perceptual properties, that is, in the ear of the beholder. For a syllable to be perceived as stressed, physical attributes must be physically changed. For now, this table describes what a talker does to produce each of these speech properties (articulatory), what the acoustic property is called (acoustic change), and how it’s heard (perceptual impression). Check out Chapter 12 for more in-depth information.

To understand Table 10-2 and get a sense of how louder, longer, and higher works, say a polysyllabic word correctly and then say it incorrectly. Say “syllable” correctly, with stress on the initial syllable. Next, incorrectly place the stress on the second to last syllable (also called the penultimate, or penult), as in “syllable.” Finally, place stress on the final syllable, or ultima, “syllable.”

Table 10-2 Physical, Acoustic, and Perceptual Markers of Stress in English

Articulatory	Acoustic Change	Perceptual Impression
Increased airflow, greater intensity of vocal fold vibration	The amplitude increases	Louder
Increased duration of vocal and consonantal gestures	The duration increases	Longer sound (“length”)
Higher rate of vocal fold vibration	The fundamental frequency increases	Higher pitch

In each case (whether you’re correctly or incorrectly pronouncing it), the stressed syllable should sound as if someone cranked up the volume. The following sections tell you more about how stress operates at the word, phrase, and sentence level in English.

Eyeing the predictable cases

Stress serves four important roles in English. They are as follows:

Lexical (word level): When you learn an English word, you learn its stress. This is because stress plays a lexical (word specific) role in English: it’s assigned as part of the English vocabulary. For example, syllable is pronounced /ˈsɪlebəl/, not /sɪˈlʌbəl/ or /sɪləˈbʌl/.

Noun/verb pairs: In English, stress also describes different functions of words. Try saying these noun-verb pairs, and listen how stress alteration makes a difference (the stressed syllables are italicized):

Spelling	Part of Speech	IPA
(to) record	Verb	[ɹəˈkʰɔɹd]
(a) record	Noun	[ˈɹɛkɚd]
(to) rebel	Verb	[ɹəˈbɛɫ]
(a) rebel	Noun	[ˈɹɛbɫ̩]

These stress contrasts are common in stress-timed languages, such as English and Dutch (whereas tone languages, such as Vietnamese, may distinguish word meaning by contrasts in pitch level or pitch contour on a given syllable).

Compounding: With compounding, two or more words come together to form a new meaning, and more stress is given to the first than the second. For example, the words “black” and “board” create “blackboard” /ˈblækbɔɹd/.

Also, the juncture is closer than a corresponding adjective + noun construction. For example, if you pronounce the following pairs, you’ll notice a longer pause between the words in the first example (the English column) than between the words in the second example (the IPA column).

Grammatical Role	English	IPA
Adjective + noun	a black board	/ə blæk ˈbɔɹd/
Compound noun	a blackboard	/ə ˈblækbɔɹd/

Emphasis in phrases and sentences: Also known as focus, this is a pointer-like function that draws attention to a part of a phrase or sentence. By making a certain syllable’s stress louder, longer, and higher, the talker subtly changes the meaning. It’s as if the utterance answers a different question. For example:

Dylan sings better than Caruso. (Who sings better than Caruso?)

Dylan sings better than Caruso. (What does Dylan do better than Caruso?)

Dylan sings better than Caruso. (Who does Dylan sing better than?)

People handle this kind of subtlety every day without much problem. However, just think how difficult it is to get computers to understand this type of complexity.

Identifying the shifty cases

For the most part, English stress remains fairly consistent. However, some cases realign and readjust. You may think of it as a musical score having to be switched around here and there to keep with the rhythm. These adjustments, called stress-shift, are a quirky part of English phonology.

Stress realigns itself in a manner to preserve the up-and-down (rhythmic) patterns of English. If syllables happen to combine such that two stressed syllables butt up against each other, one flips away so that there is some breathing room. Think of it like two magnets with positive and negative ends: put two positives together and one flips around so that it’s positive/negative/positive/negative again.

Some English words take primary stress on different syllables, based on the context. For example, you can pronounce the word “clarinet” with initial stress, such as /ˈklɛɹɪnɛt/ or with final stress, as in /klɛɹɪˈnɛt/, depending on the stress of the word that comes next. Try this test:

1. Say “Clarinet music” three times.

Doing so sounds a bit awkward, right? It should have been more difficult because two stressed syllables had to butt up against each other.

2. Say “Clarinet music” three times.

You should notice that this second pattern flows more naturally because it permits the usual English stress patterns (strong/weak/strong/weak) to persist.

Sticking to the Rhythm

Another way an English speaker can show adeptness with the language is having the ability to use English sentence rhythm patterns, where greater stresses occur at rhythmic intervals, depending on talking speed. To get a sense of these layered rhythms, consider these initially stressed polysyllabic words: “really,” “loony,” “poodle,” “swallowed,” “fifty,” “plastic,” and “noodles.”

When you put them together in a sentence, they form:

The really loony poodle swallowed fifty plastic noodles.

Although speaking this sentence is possible in many fashions, a typical way people produce it is something like this:

The really loony poodle swallowed fifty plastic noodles.

That is, regularly spaced, strongly stressed syllables (italicized) are interspersed with words that still retain their primary stress (such as “loony”), yet they’re relatively deemphasized in sentential context. This kind of timing is rhythmic and can reach high levels in art forms like vocal jazz (or perhaps, rap). Chapter 11 discusses ways you can transcribe this kind of information.

Tuning Up with Intonation

In phonetics, sentence-level intonation refers to the melodic patterns over a phrase or sentence that can change meaning. For instance, rising or falling melodic patterns that change a statement to a question, or vice-versa. Intonation is quite different from tone, which is the phoneme-level pitch differences that affect word meaning in languages such as Mandarin, Hausa, or Vietnamese (see Chapter 18). English really has no tone. The following sections take a closer look at the three patterns of sentence-level intonation that you find in English.

Making simple declaratives

A basic pattern of English intonation is the simple declarative sentence, which is a statement used to convey information. A couple examples are “The sky is blue” or “I have a red pencil box.”

Think of this pattern as the plain gray sweater of the phonetic wardrobe. A bit dull, perhaps, but it’s necessary. When you’re simply stating something, the chances are your intonation is falling. That is, you start high and end low.

Falling intonation seems to be a universal pattern, perhaps due to the fact that it takes energy to sustain the thoracic pressure needed to keep the voice box (larynx) buzzing. As a person talks, the air pressure drops and the amount of buzzing tends to drop, causing the perceived pitch to fall, as well.

Answering yes-no questions

The second pattern of sentences is called the “yes/no question.” When you’re asking a question that has a yes or no answer, you probably have rising intonation. This means you start low and end high.

Try producing the same sentences that I introduce in the previous section, but instead of falling in pitch as you speak, have your voice rise from low to high.

You probably noticed these English statements (“The sky is blue?”) have now turned into questions. Specifically, they’re questions that can be answered with yes or no answers. This rising pitch pattern for questions is fairly common among the world’s languages. For instance, French forms most questions in this manner. Note: Some languages don’t use intonation at all to form a question. For instance, Japanese forms questions by simply sticking the particle /ka/ at the end of a sentence.

Focusing on “Wh” questions

The third pattern of sentences include English questions with the Wh questions, including “who,” “what,” “when,” “where,” “why,” and “how,” (which are produced with falling pitch, rather than rising). Try a few, while determining whether your voice goes up or down:

Who told you that?

What did he say?

When did he tell you?

Where will they take you?

Why are you going?

How much will it cost?

Your intonation likely goes down over the course of these utterances. Try this for yourself. Say the preceding sentences to see whether your intonation goes down.

Showing Your Emotion in Speech

When someone talks, part of the melody serves a language purpose, and part serves an emotional purpose. When you’re transcribing speech, you need to understand emotional prosody because it can interact in complex ways with the linguistic functions of prosody. In fact, people can show many emotions in speech, including joy, disgust, anger, fear, sadness, boredom, and anxiety.

Studies have shown that people speak happiness (joy) and fear at higher frequency ranges (heard as pitch) than emotions such as sadness. Anger seems to be an emotion that can go in two directions, phonetically:

Hot anger: When people go up high with the voice and show much variability.

Cold anger: When people are brooding with low pitch range, high intensity, and fast attack times (sudden rise in amplitude) at voice onset.

Emotional patterns in speech (also known as affective prosody) don’t directly affect sentence meaning. However, these patterns can interact with linguistic prosody to affect listeners’ understanding. For instance, adults with cerebral right hemisphere damage (RHD) can have difficulty understanding, producing, and mimicking the emotional components of speech. The speech of such individuals can often be monotonic (flat). It can sometimes be challenging for clinicians to sort out which aspects of these speech presentations are due to emotion or to linguistic deficits.

Fine-Tuning Speech Melodies

Phoneticians can be sticklers for detail. They just don’t like messy bits left over. In addition to the different types of stress, intonation, focus, and emotional prosody, certain aspects of speech melody still require measures to account for them. These sections examine two such measures.

Sonority: A general measure of sound

Sono- means sounds, and sonority is therefore a measure of the relative amount of sound something has. Technically, sonority refers to a sound’s loudness relative to those of other sounds having the same length, stress, and pitch. This measure of sound is particularly handy for working with tone languages, such as Vietnamese, where decisions about tone structure are important.

To get a clearer sense of this jargon, try saying the sound “a” (/ɑ/) followed by the sound “t.” Assuming you spoke them at the same rate and loudness, the vowel /ɑ/ should be much more sonorous (have more sound) than the voiceless stop, “t.”

The concept of sonority is relative, which means phoneticians often refer to sonority hierarchies or scales. In a sonority hierarchy, classes of sounds are grouped by their degree of relative loudness. Check out www-01.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsTheSonorityScale.htm for an example of one.

A sonority scale expresses more fine-grained details. For instance, according to phonologist Elizabeth Selkirk, English sounds show the following ranking:

([ɑ] > [e=o] > [i=u] > [r] > [l] > [m=n] > [z=v=ð] > [s=f=θ] > [b=d=ɡ] > [p=t=k])

If you try out some points on this scale, you’ll hear, for example, that [ɑ] is more sonorous than [i] and [u].

Sonority is an important principle regulating many phonological processes in language, including phonotactics (permissible combinations of phonemes) syllable structure, and stress assignment.

Prominence: Sticking out in unexpected ways

When all is said and done, some problem cases of prosody can still challenge phoneticians. One such problem is exactly how stress is assigned to syllables in words. For instance, some English words can be produced with different amounts of syllables. Consider the words “frightening” and “maddening.”

Do you say them with two syllables, such as /ˈfɹaɪtnɪŋ/ and /ˈmӕdnɪŋ/? Or do you use three syllables, such as /ˈfɹaɪtənɪŋ/ and /ˈmӕdənɪŋ/? Or sometimes with two and sometimes with three?

Other English words may change meaning based on whether they are pronounced with two or three syllables. For instance:

“lightning” (such as in a storm) /ˈlaɪtnɪŋ/

“lightening” (such as, getting brighter) /ˈlaɪtənɪŋ/

A proposed solution for the more difficult cases of stress patterns is to rely on a feature called prominence, consisting of a combination of sonority, length, stress, and pitch. According to this view, prominence peaks are heard in words to define syllables, not solely sonority values.

Prominence remains a rather complex and controversial notion. It’s an important concept in metrical phonology (a theory concerned with organizing segments into groups of relative prominence), where it’s often supported with data from speech experiments. However, other phoneticians have suggested different approaches may be more beneficial in addressing the problems of syllabicity in English (such as the application of speech technology algorithms, rather than linguistic descriptions).