3
How We Colour Our Voices with Pitch, Volume, and Tempo

The writer Dorothy Parker was alone and bored at a party one evening. Whenever anyone she vaguely knew came up to her and asked, ‘How are you? What have you been doing?’ she replied, ‘I’ve just killed my husband with an axe and I feel fine.’ Because she said it with the intonation usually used for party small talk, every one of them simply smiled, nodded and, unastonished, drifted on.1

Parker demonstrated how pitch, loudness, and tempo – some of the features that make up paralanguage – can be more important than language itself. For paralanguage doesn’t just support words but gives them life. Without it, they’d be inert, flat, lacking in emotion. Paralanguage is what separates a real human voice from a synthesised one.

So important are intonation and stress – sometimes known as prosody – in shaping our reactions that they can make us back off from someone or warm to them without knowing anything about them. Prosody is the audio version of our personality, our sonic self.

Yet though, ‘It’s not what you say, it’s the way that you say it,’ now belongs in the realm of cliche, paralanguage is relatively young as a recognised field of study. Until the 1940s linguists dwelt almost exclusively on the verbal aspects of speech. Then a pair of major studies of American and British intonation,2 followed by a pioneering 1958 paper on paralanguage itself, showed that prosody could be studied and analysed as much as language.3 Almost fifty years on the field has grown vastly, although it’s still mined with disputes. The researchers can’t even agree on terminology – 107 different names have been used to identify register alone.4

LET ME STRESS

What counts as language and what as paralanguage isn’t fixed – it’s subject to argument and change, for intonation has a linguistic function as well as a non-linguistic one.5 It allows us, for example, to distinguish between ‘conduct’ as a noun and as a verb, or between a question and a statement (even though teen ‘uptalk’ has been busily destroying this distinction). Merely by the way it’s said, the meaning of a word can be either reversed or reinforced. And, depending where you place the emphasis, you can say the sentence, ‘I didn’t want to go home,’ with five different meanings.

Consider the case of the 19-year-old illiterate Londoner Derek Bentley, hung in 1953 for the murder of a policeman. Bentley was convicted primarily on the basis of the testimony of another policeman, who claimed to have heard him say to his fellow defendant, ‘Let him have it, Chris.’ The prosecution maintained that those words proved that Bentley had encouraged his fellow criminal to shoot the policeman, but his defence lawyer argued (unsuccessfully) at his trial that the same words might have been spoken (with the emphasis presumably on the ‘let’ rather than the ‘have’) to mean, ‘Give him the gun, Chris.’6

So a person’s fate can depend on the stress of a word,7 and in tone languages like Chinese where a pitch change indicates an entirely different word, the potential for mistakes is prodigious. Someone learning Beijing Mandarin Chinese may think, for example, that by using the word ‘ma’ (level tone) they’re saying ‘mother’, but if they employ a falling-rising tone they’re actually saying ‘horse’.8

Paralanguage as a concept includes not only vocal features but also non-vocal ones, like facial expression and body movement. In fact it’s now something of a dustbin category, into which eyebrow movements, pitch, gesture, and any other vocal attribute that can’t be put anywhere else is tossed.9 Yet despite its imprecision, paralanguage is still a useful concept, if only because it helps us appreciate the sheer opulence of the human voice, and the number and variety of its components.10

HOW YOU PITCH IT

Pitch is the falling or rising tone heard in the voice: it creates our voice’s melody. It can alter by the syllable, by the word, or over a stretch of speech (where it’s known as the pitch-range). Patterns of pitch are called intonation.11 Pitch is an auditory sensation: it’s what we hear, or think we hear, and is affected by judgement, volume, and other factors. It’s distinct from frequency, an acoustic feature that can be measured by a spectrograph: you can change pitch without changing frequency, and vice versa.12 In other words pitch is subjective while frequency is objective. Pitch is also a musical term,13 but music is of limited use in helping us to understand the pitch of speech –after all, we can’t speak out of tune,14 and our voices have far more gradations than the notes of the musical scale.15

Intonation draws attention to new information, for example, ‘I just heard a great singer.’ Falling pitch also signals that one speaker has finished speaking and another can begin – it oils the turn-taking through which conversation proceeds. Without the information supplied by intonation, conversation would be an infinitely greater cacophony of overlapping, competing voices, and yet we usually change pitch without thinking. Our brain circuits are charged with the extraordinary task of continuously adjusting the mass, length, and tension of the vocal folds, so as to produce the variations in intonation patterns we want to convey.16

Average pitch levels vary according to age and gender. Although the least tiring one is three or four tones above the lowest we can clearly produce17 – i.e., near, but not at, the lower end of our physiological vocal range18 – our speaking pitch ranges between two or three octaves, and none of us has an habitual or stable speaking pitch.19 Over the course of a year, our pitch can cover as wide a range as that between different speakers,20 and can vary by as much as 18 per cent from one day to another.21 Indeed male (but not, interestingly, female) voices fluctuate significantly within the course of a single day, between morning, early and late afternoon22.

Pitch plays a huge role in shaping how we appear to others, as well as to ourselves. Some have even gone so far as to claim that future academic success can be predicted by voice pitch and range: in two studies teachers judged slow-speaking and low pitch as a sign of academic failure, and found higher pitch and lower volume among the academically more successful.23 Whether the pitch was cause or effect of low achievement (or unconsciously affected teachers’ judgements), what’s disturbing about these findings is the implication that only one kind of voice allows you to flourish academically. If it’s true, inequality of opportunity is audible.

Certain pitches become identified with particular qualities. The British broadcaster Jon Snow is sure that ‘it’s bass registers that give authority to a voice’.24 Satirist Rory Bremner thinks that politicians like Michael Portillo and TV performers like Ian Hislop talk with a deeper than natural register in order to sound authoritative.25 To play Othello at the Old Vic in 1964 Laurence Olivier added a whole octave at the bass end of his voice.26

When American university students were asked to write down words that best described the pitch of well-known people, they judged the low inflections of conservative commentator William Buckley as ‘authoritative, aggressive, in command, decisive, controlling’. Writer Truman Capote’s rising inflections, on the other hand, were considered ‘effeminate, ambivalent, indecisive, questioning, unsure’, while former President Lyndon Johnson’s all too stable pitch suggested ‘loss of affect, boredom, no emotionality, fatigued’.27

Intonation plays an important symbolic role. It can convey volatility, or act as a safety-net. The level, almost monotonous, pitch of veteran American news anchor, Dan Rather, emits gravitas and decency: its restrained, almost presidential cadences evoke old values of trust and authority, casting back to the acoustic of a pre-cable and satellite TV world before upbeat intimacy became the norm. In Jon Snow’s words, his voice ‘brought anchorage and security to viewers in a very insecure age’.28

Compared with Rather, most American news-broadcasting voices today arch and buck, changing pitch for what seems like capricious reasons. Pitch now seems dissociated from meaning, and has become part of corporate style instead. Says Rather, ‘I don’t believe in having extremes of pitch and register. When I hear this in other people, it’s off-putting to me. There’s a belief today that to keep people from being bored you need to move your voice up and down the scale pretty regularly. Everything in me shouts against that.’29

The voices of American broadcasters are now hardly distinguishable from those of waitresses, air stewards, or receptionists.30 I’ve sat in a seafood restaurant in Connecticut and heard a waitress deliver her spiel about the specials with such rhythmic rapture, her voice all a-swooping-and-a-leaping, that to derail her by interrupting in a downbeat British way would have been an act of heartlessness.

American intonation seems relentlessly ‘up’ compared with British. When I go to the United States I’m always struck by the mismatch of melodies: I hear my own intonation anew when I hear it through American ears. After a few days I invariably shift my intonation ‘up’ a few notches, Americanising not my accent but my inflections. Modifying vocal melody like this makes communication easier; not to would sound like a pompous display of Britishness, or as if I were auditioning for a part in Masterpiece Theatre. George Bernard Shaw is alleged (but has never been proved) to have said that Britain and America were two countries divided by the same language. As much as anything else, it’s our different intonation that does the dividing.

Not that British broadcasting is without its mannerisms. Listen to a Radio 4 announcer or a BBC Radio 3 morning presenter and you hear a voice that also gambols in an unnatural rhythm and with abnormal cheerfulness – partly to disguise the fact that a script is being read. ‘And now’ (normal, low pitch) ‘we’re’ (voice has soared like a frenzied finch) ‘going to hear’ (it plunges rapidly) ‘a Mozart’ (stays low) ‘sonata’ (rapid ascent again, followed by a moderate fall) ‘no. 15 in F’ (low) ‘written’ (starts travelling up again) ‘in’ (in the clouds) ‘1788’ (and back down again). The bizarre pattern of pauses sounds like the unfortunate consequence of an incurable disease.

Most of us grow into expert decoders of each other’s cadences. We also quickly learn how to read those of politicians and other public figures. Newsweek reported that Robert J. McCloskey, a spokesman for the State Department in the Nixon administration, had three distinct ways of saying, ‘I would not speculate.’ ‘Spoken without accent, it means the department doesn’t know for sure; emphasis on the “I” means “I wouldn’t, but you may – and with some assurance”; accent on “speculate” indicates that the questioner’s premise is probably wrong.’31

Pitch varies between countries. As the linguist Edward Sapir put it:

Society tells us to limit ourselves to a certain range of intonation and to certain characteristic cadences … If we were to compare the speech of an English country gentleman with that of a Kentucky farmer, we would find the intonational habits of the two to be notably different, though there are certain resemblances due to the fact that the language they speak is essentially the same.32

We can’t read the intonation of someone from another culture properly, argues Sapir, unless we’re familiar with their country of origin – otherwise we’re likely to hear an Italian as temperamental, rather than simply sharing the normal melodic curves of all Italian speakers. For languages create their own rhythms. ‘If a Frenchman accented his words in our English fashion, we might be justified in making certain inferences as to his nervous condition.’33

Does the persistence of British anti-German and anti-Japanese prejudice so long after the end of the Second World War have anything to do with how harsh German sounds to British ears, or with the Japanese use of pitch contours and glottal attack that in English would be a sign of aggression?

In a globalised world where labour is increasingly mobile, sensitivity to intonation is becoming more and more important. Consider the experience of a group of Indian women working in the staff cafeteria at a major British airport, who were seen as surly and uncooperative by both their supervisor and the cargo handlers they served. Even though they said little, it was soon clear that their intonation was creating friction. When British canteen staff asked a cargo handler who’d chosen meat if he wanted gravy, they’d say, ‘Gravy?’ using rising intonation. The Indian assistants, on the other hand, said the word with a falling intonation, until the supervisor pointed out that it was as if they were saying, ‘This is gravy.’ Since that was obvious, it was interpreted as rude and insulting.34

Certain forms of expression are almost entirely a property of the voice. Sarcasm, for example, allows you to undermine and ridicule through tone rather than explicitly through language – it can express insubordination without leaving any incriminating words in its wake. Sarcasm forces the person at whom it is aimed either to ignore it or to draw attention to it (‘Don’t you use that tone of voice with me, young man’). Either way the target is left as victim rather than victor of the interchange. Nonverbal channels of communication, by being inherently ambiguous, ‘Allow us to express our feelings … without taking full responsibility for them. It is undeniably easier to retract what has been expressed nonverbally than what has been expressed verbally.’35

So powerful a weapon is sarcasm (an essential component of the teenage arsenal) that in American ‘brat camps’ there’s a No Sarcasm rule. A recent Anti-Social Behaviour Order served on a British man also stipulated that he wasn’t allowed to use sarcasm. In ‘The Wall’ Pink Floyd sang of ‘dark sarcasm in the classroom’. Generations of teachers used it as a teaching aid.

LOUDSPEAKER

Volume, like pitch, is both physiological and psychological. The level of intensity or loudness that we hear doesn’t necessarily match the amplitude with which the vocal folds vibrate.36 (And we can be peculiarly insensitive to our own volume – it’s always other people who are loud, never ourselves.) Volume also plays an important role in managing the interplay of conversation: we often speak more softly as we approach the end of a sentence, for instance.37 And it has a strong cultural dimension. As a general rule, Asians are softly spoken (in Asian countries a soft voice signals deference) and find Americans loud. (So do Britons.) Arabs, on the other hand, sound loud to Americans, while to Arabs the American voice is too quiet and sounds insincere.38 But volume is changing. Americans used to speak louder in public than the British, who tended to treat public space as though it were a library where you had to whisper. The mobile phone, though, has produced a whole generation of Britons uninhibited about blaring private matters in public.

Of course we’re not born loud- or soft-speakers – we learn to use the volume level that prevails in our culture, and then turn it up or lower it depending on our subculture and peer group. Indeed regulating the volume of our voices according to the situation is an important social skill. Nor are we mono-volume creatures – we can be soft at home and loud outside it, or vice versa. Emerging from the enforced quietude of the classroom, children erupt into noisy exuberance: in the clamour of the playground lungs and larynx, as well as limbs, are exercised. One of a swimming-pool’s most important functions is surely to allow children to scream without fear of reprimand.

We use our voices differently in a group. A 14-year-old girl says of outings with her schoolfriends:

I always try to make them be quiet – lots of them go really loud and really obnoxious to everyone else – it’s a bit harsh and we have quite posh accents, and people give us looks. People who go on the bus together make a lot of noise in a small space and it’s embarrassing. [Being loud with a group of people in public can be liberating] but it can also be quite overwhelming if everyone’s speaking to get their point across and you’re not.39

Loudness can be an important part of the collective public identity of a group. By marking out a shared acoustic space, group members assert their right to be heard. Among those whose voices have been silenced historically – groups of young girls or teenage Asian boys, for example – volume can be a sign of defiance. Though the buildings might not belong to them, they’re able to impose their voices on the street and lay claim to the air.

But volume can also be inversely related to status. To compel total attention some powerful people speak so softly that their listeners are obliged to lean forward to hear. (An American politician told me that he speaks softly because a politician talking loudly jars with an audience, and makes them tune out.40) Speaking quietly can be a display of arrogance, but then again it can also mean that someone believes that they’ve nothing interesting to say. Reversing the norm can be powerful, too. Since loudness is traditionally associated with rage, and softness with intimacy and confidentiality, quietly expressed anger can be devastating.

How loud is a loud voice? The average conversational voice, with speaker and listener some three feet apart, is 60 dB. Quiet speech hovers around 35-40 dB, while shouting rises to 75 dB. Rustling leaves measure 10 dB, and loud radio music 80 dB. Around 120 decibels creates a sensation akin to touch, and above this we hit the pain threshold.41

In normal human conversation the voice can’t be raised to a point where it might endanger the ear (over 90 decibels), yet somehow a loud voice seems to provoke more anger than other equally loud sounds. In 1914 the city of Bern in Switzerland enacted a by-law against ‘carpet-beating and noisy children’. In 1971 a Hare Krishna sect was arrested in downtown Vancouver and convicted under the Noise Abatement Act for their public chanting. They were standing next to a construction site where demolition equipment produced noise levels of 90 decibels. Needless to say the construction firm wasn’t fined.42

Politicians sometimes try to make a more powerful impression by increasing the volume and attack of their voice, almost invariably resulting in a croak.43 If they understood that consonants gave meaning to the words and vowels supplied the emotion, they could achieve the desired effect without strain. The feelings behind a Shakespeare speech (though not its meaning) can be communicated by pronouncing only the vowels (try it with ‘To be or not to be’). When luvvies go, ‘Daaarling,’ they’re extending the vowels to extract the maximum emotion from them. At the same time, if the final consonants in a speech aren’t sounded clearly, the meaning isn’t conveyed. By taking care with final consonants and extending vowels, public speakers can sound forceful without having to rely on volume or shout.44

THE TEMPO OF TALK

A person’s pace seems so characteristic of the way they speak yet well-known politicians and entertainers can still be recognised even when the rate is altered.45 The average tempo of an adult American or British adult is 120 to 150 words per minute, or about six syllables per second.46 90–100 wpm suggests incapacity, dignity, or vanity. Roosevelt spoke far slower in his Fireside Chats than any other political orator on the air in the 1930s, but after the bombing of Pearl Harbour in 1941 his rate dipped to 88 words per minute –the normal radio tempo of the time was 175–200 wpm.47

Speed can be used to prevent interruption, or to put distance between oneself and a difficult subject. We also accelerate as we become more excited, like horse-race commentators.48 The belief that some languages are spoken more rapidly than others is an illusion caused by different styles of speech and types of language.49

Our speech rate reveals profound differences in outlook, confidence, self-image and style, but our reaction to other people’s tempo is stuffed with prejudice. Slow speakers’ abilities are consistently denigrated. They’re judged less truthful, less persuasive, but also colder and weaker.50 Fast speakers, on the others hand, are considered cleverer.51 You might think that, because speaking quickly makes speech less clear, it also makes it less persuasive. In fact the opposite is true: fast speakers are seen as more knowledgeable and trustworthy.52 But our judgements are also influenced by how we sound ourselves. We think people who speak in a similar tempo to our own are more competent and attractive than those who go slower or faster.53 People whose speech rates are mismatched can have difficulties communicating or even dislike each other.54

Tempo, like volume, works in subtle and sometimes contradictory ways. The American politician I interviewed told me, ‘I often speak more slowly than I’m comfortable with. I think faster than I speak.’ To which his wife retorted, ‘That’s such a power ploy, so that you’ve got everyone waiting for your next word.’55

It’s only in drama that speech is always fluent – the rest of us fluff, repeat, prolong, or stress weirdly some of the time. The difference between normal hesitancy and stuttering can be a matter of just a few per cent, yet we’re still unforgiving as a culture in our attitudes to stutterers and those who aren’t fluent.

But there are cultural differences in tempo too. In Zulu society, for example, slow speech is a sign of respect and sincerity.56 Americans, by contrast, tend to speak quickly, and American radio is strikingly faster than British: American broadcasters treat breath like a foreign body. Pace and tempo, argues Dan Rather, are important factors in establishing clarity and intelligibility. ‘I am conscious of speaking slower than other broadcasters but it would be a mistake to try and change it. I speak at that pace where I can best be understood and hold interest. I have done sports broadcasting and play-by-play [ball-by-ball commentary] – you have to speak faster in order to cover the action, especially with basketball. I do remember feeling that the pace and tempo of play-by-play was not commensurate with clarity and having people understand the news. I’ve never had any pressure to speak faster, though I do know that the trend is to speak faster.’57

American television now routinely speeds up sitcoms and compresses speech in order to fit in more ads. If ‘Frasier’ seems to talk faster on one affiliate station than on another, that’s probably because he is talking faster – the station has accelerated the episode.58 Rather thinks that there’s been a spill-over from taped to live broadcasting, putting pressure on broadcasters to speak more rapidly. ‘I was listening to a colleague on a cable TV station speaking so fast and I thought, why does he do that? It was all pudding to me. There is a theory that audiences, particularly young audiences, the most coveted these days, expect people to speak faster.’59

Extreme speed or languor can be comic: John Cleese, in his I’m Sorry I’ll Read That Again radio monologues, raced up to 200 wpm.60 The Guinness World Records reported a man speaking at 637.4 words per minute on a British television programme in 1990.61 But starting slow and speeding up can also be a powerful oratorical device: Martin Luther-King’s ‘I have a dream’ speech began at 92 words per minute and ended at 145. In a thinly disguised anti-Semitic speech on American radio in 1938, Father Coughlin, broadcaster of weekly right-wing sermons, veered from 100 to 275.62 His accelerando alone must have been enough to raise listeners’ blood pressure and instil fear.

OVERTALK

Voice specialists call uncontrollable or excessive talking ‘logor-rhoea’, a word that carries unwelcome echoes of diarrhoea. It’s also known as manic speech – ‘loud, rapid, and difficult to interrupt. Individuals may talk non-stop … without regard to others’ wish to communicate’.63

Unusual quantities of speech have been linked with anxiety. On the other hand, when professors and student counsellors were asked to choose the most mentally healthy college students they chose those who talked more.64 Those who speak more are often perceived as possessing higher leadership ability too,65 yet most of us apply different measures in public and private life to judge whether a person is masterful or just over-voluble.

Many of the people I interviewed brought up the question of balance in conversation, in work and in relationships. ‘My partner is generally much more voluble than me,’ said a 43-year-old man, ‘but when we come to be actually talking to each other, it’s very even – that’s one of the things I like about it.’66 A 51-year-old academic observed, ‘I try and force myself to listen to other people’s voices – to give a space for other people to talk. I think one of the dangers, because I have a strong voice, is to be dominant.’67 And a 49-year-old woman admitted, ‘I have a real horror of situations where one person talks too much and seems oblivious of the fact. I can’t stand it in discussions when someone hogs the talk time and you notice that others haven’t spoke once, perhaps because I could easily talk non-stop but I monitor myself.’68

A 53-year-old man recalled with horror a long car drive with his father-in-law who, in response to a question, talked for half an hour without pausing to allow anyone else to contribute. The son-in-law was shocked to discover that not everyone possessed an internal gauge that told them how much they were speaking.69 A 22-year-old teacher says that she can tell if she’s talking too much by the amount of fidgeting done by her 7-year-old pupils.70

Verbosity, of course, can serve all sorts of defensive purposes. The characters in the stories of the great Yiddish writer Sholem Aleichem pour out a torrent of words to defend themselves against chaos. They seem to be saying, ‘I talk, therefore I exist.’71 When one of his story-tellers breaks off his unfinished tale because it’s time for him to change trains, his fellow passengers can’t believe that a Jew like themselves would rather stop talking in the middle of a sentence than miss his connection.72

If over-talk can disturb, so also can over-fluency. A voice produced too easily casts doubt on its own authenticity. We may all be actors, but the over-fluent voice draws uncomfortable attention to the degree of performance. Without doubts, nuance or slips its arguments sound pre-assembled: hearing it is like being trapped in honey.

I’D KNOW THAT VOICE ANYWHERE

The Roman rhetorician Quintilian believed, ‘Every human being possesses a distinctive voice of his own, which is as easily distinguished by the ear as are facial characteristics by the eye.’73 Though chapter 16 challenges him and the idea of a ‘voiceprint’, timbre nevertheless enables us to identify someone we haven’t seen for years after only a few words.74 Also known as voice quality or voice set, timbre is idiosyncratic, biologically controlled,75 and present no matter whether a speaker is yelling or whispering. It’s why actors almost always remain recognisable: whatever role they assume,76 strains of Judi Dench or Tom Hanks invariably peep through. How dexterous the voice has to be – to make the same words spoken by different people sound distinctive enough to allow the speaker to be swiftly identified, but at the same time to maintain enough that’s similar about the words so that we can recognise and understand them even when they’re spoken at a different pitch and volume.77

What makes this all the more phenomenal is that we can recognise another person’s voice in milliseconds, and we hear a person’s timbre as a consistent sound even when they’re walking through different-sized rooms whose different reflecting conditions make the real sound of their voice fluctuate. In processing other people’s voices we constantly, selectively add and filter out certain characteristics in order to compensate for local and temporal changes: it’s as if our brains are committed to transforming uneven bits into a smooth whole.78

Why do no two people sound alike? The voice is shaped not only by the length and thickness of the vocal folds, and the muscles of our lips, mouth, and tongue (all of which differ between individuals), but also by our different bone structures that produce differently shaped resonating chambers in the throat and nasal cavities. The potential combinations of these factors create the infinite profusion of human voices.