The Human Voice

1
What the Voice Can Tell Us

One balmy friday evening in Silicon Valley, Amy and Bruno Smart fantasised about inventing a new kind of machine. The Smartacom would reveal everything you’d want to know about another person’s education, status, self-confidence, and even state of sexual arousal – in under a minute, even from six feet away. The excited Smarts set about creating and testing a prototype – and it worked! There was just one problem. The Smartacom violated every single privacy law in existence, and had the Smarts been audacious enough to manufacture it, the Federal Trade Commission would have made sure that their business dissolved faster than the Wicked Witch of the West.

The Smarts were actually very dumb. They hadn’t realised that an instrument like the one they’d invented already existed. What’s more, it breaks no laws and is free. It goes by the name of the human voice.

In an era so preoccupied with privacy and its infringement, with hackers and cookies and data protection, we’re remarkably breezy about the personal information that seeps from our voices. Employers award fat contracts to psychometric companies promising to uncover their staffs’ hidden flaws and skills, while ignoring what those same staff are freely divulging through their voices. We take this fabulously rich resource for granted, yet if the human voice were a new technology we would be hymning it loudly, and extolling its special properties.

For the moment we open our mouths and start to speak, even if it’s only to read out regulations for the disposal of sewage, our voice is doing something terrifyingly intimate – leaking information about our biological, psychological, and social status. Through it, our size, height, weight, physique, sex,¹ age and occupation,² often even sexual orientation,³ can be detected.

The voice is a stethoscope, and transmits information not only about anatomical abnormalities but even illnesses.⁴ Our risk of coronary heart disease can be predicted, it’s been claimed, on the basis of voice characteristics like volume and speed alone.⁵ Doctors have even maintained that picking up on changes to the sound of the voice can be life-saving in cases of throat cancer ⁶ and people contemplating suicide.⁷

The voice can function as a breathalyser, a rough guide to intoxication.⁸ Joseph Hazelwood, captain of the Exxon Valdez oil tanker, was acquitted by an Alaskan jury of being drunk when the tanker ran aground in 1989, causing one of the worst accidental oil spills in history. Yet an analysis of his voice from tape recordings of conversations between him and the coast guard before, during, and after the accident found that he didn’t only misarticulate certain sounds, in the way that stage drunks do – at one point he called his ship the ‘Ekshon Valdez’ – but also took 50 per cent longer to say its name at the time of the accident than the day before (alcohol slows down speech).⁹

Tiredness, too, is audible (and this matters because in certain occupations tiredness can cause death). The voice is such a reliable measure of fatigue ¹⁰ that a couple of Japanese researchers have devised a drowsiness-predictor for pilots and air-traffic controllers based entirely on readings from their voice: they hope to use it to improve flight safety by informing exhausted pilots when they should be relieved by other crew members.¹¹ (Airline pilots also now receive training on how to pitch frightening announcements perhaps one reason why they all sound the same.¹²) And the voice is a census: through it we can detect social class, race, and education, sometimes even the number of years that a speaker has spent in school.¹³

Alter a person’s voice and you can totally change the way that others react to them. When they wanted to reduce the impact of Sinn Fein leader Gerry Adams, the Conservative government under Margaret Thatcher banned neither his words nor his face but only his actual voice. When the airline captain intones, ‘Good evening, ladies and gentlemen, this is your captain speaking,’ in a slow baritone, he immediately instils confidence. Imagine the same words shrill and fast.¹⁴ Or a lisping Macbeth, or a camp newsreader. Part of scientist Stephen Hawking’s fame comes surely from the contrast between his disabled body and synthesised, American-accented voice. It is as though his mind and voice (unlike his body) were somehow beyond human frailty.

Although cinema is usually characterised as a visual medium, when face quarrels with voice the results can be disastrous. Films dubbed into another language, no matter how expertly, seem to lose some essential synergy between body and sound – although this depends on what you’re used to. Humphrey Bogart has to sound like Humphrey Bogart – he is his rasp, except to those non-English speakers who always hear him dubbed. They associate his face with a different voice, so that a new, seemingly-natural synergy is created. But when you know the original voice, a dubbed one sounds disembodied, as if the real film is unfolding tantalisingly somewhere beyond the dubbed version, just out of reach. Even where you don’t understand the words’ meaning, you take in something important, if non-verbal, about a film through its original voices.

When we travel abroad, we become like dubbed actors. We’re infantilised, because even if we have the language we almost invariably lack the music and rhythm particular to each language, however good our accent. (The poet Tom Paulin maintains that W.H. Auden, when he emigrated to the United States, lost ‘that sense of the skip and kick of the language and he writes a very perfect but rather chewing-gum standard English, without the deeper rhythms of the language’.¹⁵)

Abroad, we also can’t read the locals’ voices the way we do at home, and so become deaf to nuances of class, background, and status. (This opens the door to gaffes, but can also free us from the prison of prejudice based on accent and pronunciation.) Yet until recently features like pitch and volume have been almost entirely neglected in foreign-language teaching. A survey of twenty conversation textbooks found not a single reference to voice. The author suggested that language teachers, as well as covering grammar and pronunciation, should also teach sarcasm and reluctance.¹⁶

A change of voice can destabilise a whole sentence. The playwright Bernard Kops turned up at a rehearsal of one of his plays only to find that its whole centre of gravity had changed. It took him a while to work out what was wrong. The actress playing the main role, instead of saying, ‘I can’t sleep with anyone,’ with the accent on ‘any’, i.e., any Tom Dick or Harry, had put the stress on the whole word ‘anyone’ and was playing the character as frigid.¹⁷ There’s the joke about a board outside a church bearing the message, ‘The Same Yesterday. The Same Today. The Same Tomorrow. Jesus Christ.’ A Jew comes along and reads it in an exasperated tone and Jewish accent: Jesus now sounds like an expletive, and the rest like a complaint.

NO AUDIO

And yet, despite all this, we’ve developed very little consciousness of the channel on which our ability to communicate so crucially depends. In the process of writing this book I’ve had a recurring experience. Again and again I’ve met people who, hearing what I’m writing about, have suddenly gone silent (and not just through self-consciousness). ‘I’ve never given the subject the slightest thought,’ they say, once the power of speech returns. And then, almost invariably, they proceed to regale me for ten minutes with anecdotes about voices that they love or hate and why. It’s as if we both know and don’t know about the voice.

In the fifty interviews with people about their own voices and those of their friends and relatives that I’ve carried out for this book, I’ve come upon this chasm between feeling and articulating many times. A lot of people seem to be enormously irritated by certain types of voices – say, nasal, high-pitched, or loud. They can give examples of those they know, or hear on radio and television, whose voices needle them like a dentist’s drill. And yet they’re almost entirely unable to identify why or how those voices manage to penetrate them with such malign effect. It’s as though they don’t know much about voices but they know what they dislike.

We’re saturated with other people’s cadences, most of the time without any inkling of how they work on us or shape our comprehension. As a culture, until quite recently, we’ve been barely voice-aware – a society that has blocked its ears. In Western cultures’ hierarchy of the senses, sound is often placed below sight. We suffer from what Coleridge called ‘the despotism of the eye’.¹⁸ Indeed you could say that we often despise sound since we live in a culture where to love the sound of one’s own voice is a term of abuse.

The last couple of decades have seen an extraordinary eruption of interest in language – how we acquire it, the skills and rules governing how we use it. Conversation, dialect, the very language instinct have been analysed and the psychology of talk probed. And yet, remarkably, for much of the time this fascination with what we say and how we say it has continued to marginalise the voice. Most linguistic studies on conversation neglect the medium through which it’s conducted. Voice and speech are treated as almost identical, and speech as little more than spoken language. Language is thought to be the primary carrier of meaning, as if the voice were only the vehicle for words, the real force governing the direction and speed of a sentence. We raid speech for its semantic meaning, and then discard the voice like leftovers, detritus.

So an entire book on Ronald Reagan’s oratory included just one single reference to his voice. The study was called ‘Reagan Speaks’ but, given its focus on the verbal at the expense of the vocal, it should have been called ‘Reagan Reads’.¹⁹ Its omission is all the more extraordinary when you consider how important Reagan’s voice was in creating his folksy image. The voice is also usually missing from discussions about film. Where sound is referred to at all, it is in terms of ‘the soundtrack’.²⁰ And radio still gets far less money, status, and critical attention compared to television.

It sometimes seems as if we only pay attention to the voice when it goes wrong. There’s a whole literature on teachers’ voice disorders going back over thirty years,²¹ yet the role of the teacher’s voice in enthusing a class has been barely researched.²² A 7-year-old girl I interviewed, however, had no hesitation in identifying the role it played in her education. ‘You can hear if a teacher is interested in what they’re saying – their voice goes high. If it’s got expression then we learn better, because it’s not boring and down – it’s up and we’ll take notice and start listening.’²³ This neglect of the role of the voice in the classroom is all the more disturbing since the pitch, volume, and tempo of children’s voices have even been found unconsciously to affect teachers’ opinions of their intelligence.²⁴

And yet we have no shared public language through which to speak about the voice or sound, in contrast to the wide vocabulary that we’ve developed for visual images. Sounds are still part of the great unnamed.²⁵ Back in 1833 the American physician, James Rush, tried to identify different kinds of voices – whispering, natural, falsetto, orotund, harsh, rough, smooth, full, thin, slender.²⁶ By the 1970s phoneticians hadn’t moved much beyond Rush in naming different types of voice. The terms they had come up with – like whispery voice, harsh voice, creaky voice, tense or lax voice ²⁷ – were never taken up by the public. Neither was more specialist terminology, like vocal fry, jitter, or shimmer, words which anyway have no agreed definition. We’re in a state of terminological disarray, and few of us are able to describe the voice in words that aren’t either impressionistic or ambiguous.²⁸

Perhaps this is because voices are such an inescapable aspect of daily life, girdling us at work and home, suffusing us even when we deliberately tune them out. There’s no way to stop sound ²⁹ and we have no ‘earlids’.³⁰ Voices are the audio equivalent of air. For most of us, our voice is something that’s simply there when we open our mouths, and other people’s voices are givens, as unchangeable and taken-for-granted as their faces.³¹ Voices just are.

What’s more, the moment you try to describe them, voices seem to evanesce. Made out of breath, they’re equally vital and insubstantial. Unlike visual images, the voice exists only in time and can’t be frozen. Our voice only comes into being as it passes out of existence.³² It’s gone as soon as it’s been produced;³³ in its beginning is its end.³⁴

Or maybe our neglect of the voice is connected with age. Because children are dependent on adults and their language skills are limited, they’re particularly sensitive to register and cadence: they learn early what inflections are expected in what roles. As we grow and acquire language, though, we begin to neglect the non-linguistic aspects of speech, transferring our attention almost entirely from voices to words. Yet when an anthropologist, to test the importance of the acoustic in people’s lives, gave British students a questionnaire asking them to identify two or three important sounds from their childhood, he uncovered entire individual symphonies with deep emotional and personal associations. The scraping of a father’s razor, his clearing his throat when concentrating, a mother singing in the kitchen while cooking, all produced a remembered sense of security, as powerful as any gnawed old piece of blanket or other transitional object connected with touch, smell, or sight.³⁵ Proust’s madeleine could have been something aural.

TALK ABOUT TALK

To see just how thin our thinking about the voice really is, meet the speakers of the Tzeltal language in Tenejapa, Mexico. Not only do they talk a great deal, but they also spend a large part of their time judging, commenting on, or mocking the way the other speaks. The word ‘k’op’ is a central feature of their metalinguistic lexicon: combined with other words, it’s used to describe more than 400 separate speech situations and characteristics. Tzeltal has words that refer to the personality of the speaker, their physical, mental, emotional, and postural condition, their location, social identity, and volubility. It has others for ‘talking with a nice, mellow, singing voice’, ‘talking very slowly, as if sad’, and ‘talking with a high voice, not quite falsetto, but almost singing’. Tzeltal can identify ‘high, scratchy, cracking voice – characteristic of adolescents’, ‘speech that is poor and indistinct in which the speaker’s head is turned away from the listener’, ‘pouted whining talk from someone with a wounded ego’, and ‘speech that is excessively self-assertive, that is loud and forceful coming out with great confidence (negatively valued)’.

Tzeltal-speakers are plain speakers. They have different words to describe ‘speech cut off midstream during a conversation so that the speaker can go outside to urinate or defecate’, ‘speech that trails off into nothing as the speaker falls asleep (especially apt for describing drunk persons)’, and ‘a kidding-around voice, when someone says, “I’m going now,” and doesn’t mean it’.³⁶ The idea that the Inuit have dozens of words for snow, it’s now clear, is apocryphal – nothing more than an urban legend.³⁷ Perhaps it should be replaced in the popular imagination by the number of Tzeltal words for talk.

You could argue that the English dictionary would also, if thoroughly thumbed, come up with 400-plus words to do with speech. But this isn’t analogous, for the ‘k’op’ words are the equivalent of English words like ‘sweet talk’, ‘baby talk’, ‘shop talk’, etc., and even a demon Scrabble player would be hard pushed to come up with more than a dozen. There’s no denying it: we don’t do much detailed talking about talk.

SIMULTANEOUS INTERPRETERS

Yet despite all this, my interviews reveal the enormous amount of skilful voice-reading that we engage in every day, often unconsciously. Interpreting other people’s inflections and modifying our own is one of our most important interactive tasks. We voice-read to confirm what our other senses have told us, and sometimes use the voice to express feelings and moods that, if put into words, might leave a trail of embarrassment or shame.

For example: A mother down the years hired nine nannies to look after her children purely by interview over the phone because she felt that she could get a sense of them more accurately from their voice alone. She was disappointed by only one of them (the kleptomaniac daughter of a policeman).³⁸

A 56-year-old South African man says of his long-term partner, ‘I can tell in her voice when we’re going to have sex, long before we go to bed. I know she’s up for bed, even if she doesn’t mention it – there comes an extra softness in her voice.’³⁹

A 14-year-old girl, when her parents complain angrily about her untidy room, failure to do her homework, and other standard parent-teenage grievances, makes a point of responding to them in a calm voice. ‘If you don’t get loud I think it makes them feel a bit weird for shouting, so they stop. It’s very easy to do.’ Does she do it often? ‘Yes. I’m really aware of doing this, like a plan.’⁴⁰

A 12-year-old boy, in regular conflict with his parents, enjoys the moment when they come and kiss him goodnight ‘and talk quietly – there are no other voices to compete with’.⁴¹ He recognises the special property of the intimate voice – that it seems to speak just to us.

A 46-year-old woman says of the loving voice her husband sometimes uses, ‘I suppose that’s what makes the marriage. It’s the private tone between us – I haven’t heard him use that voice to anyone else, except maybe slightly to the children.’⁴²

RAISING THE VOICE

So, if most of us are reasonably fluent voice-users and proficient voice-readers, why do we need to understand its physiology, psychology, and acoustic properties? Do we have to learn the Latin name of a plant to enjoy its scent, or the ingredients of a dish to savour its taste? Does knowing about joints and muscles help us walk better?

There are three reasons for exploring the voice. Firstly, the differences and similarities in the way people speak are fascinating. What’s more, voice is a distinctive human feature. Other creatures are also vocally skilled. Birds, for instance, can distinguish the voices of their near relatives from thousands of others,⁴³ and the chaffinch can even recognise a male rival by the individuality of his voice.⁴⁴ Monkeys express anger, fear, submissiveness, and dominance through similar vocal cues to humans.⁴⁵ Yet while apes and monkeys have a repertoire of about thirteen calls, the human vocal system has a far larger number of sounds.⁴⁶ No vocal learning by imitation takes place in mammals below humans; apart from some birds, only humans have voluntary control over the acoustic nature of their vocal utterances, can learn vocal patterns by imitation, and even invent new ones.⁴⁷ This means that there’s something quintessentially human about the voice, and understanding it enables us to peer more deeply into the unique, complex properties of our own species. So in some important sense, an investigation into the voice becomes an exploration of our humanness.

Finally, the voice is central to our communicative abilities. Women’s vocal folds (sometimes called vocal cords) perform more than one million oscillatory cycles a day. Men’s accomplish around half a million in the course of a day.⁴⁸ Astonishingly, our vocal folds often travel more than two kilometres a day.⁴⁹ Our relative ignorance about a channel on which we’re so dependent, and that’s so critical to debate and negotiation, rumour and argument, is staggering.

From teachers to receptionists to lawyers, around a quarter of the total labour force is in a vocally demanding profession, or uses their voice as their primary tool of trade.⁵⁰ Yet even professional voice-users can be remarkably casual about the health of their instrument.⁵¹ And you only have to look at the occupational-safety limits placed on hand movement to find a chastening comparison: if the distance travelled by hands, exposed to vibrations, exceeds 520 metres in one day, it’s considered an industrial hazard, but the vocal folds can travel this distance in less than forty-five minutes of continuous speech. We’re all familiar with Repetitive Strain Injury, but Repetitive Vocal Injury as a concept doesn’t exist, and there’s no health-and-safety legislation applying to voice-users.⁵²

Although interpreting the human voice is one of our most important daily social activities, the way we speak and hear remains almost entirely hidden, and unexplored. Those who develop a richer understanding of the voice are less likely to mishear their friends, lovers, and rivals, and more able to intuit when people are being authentic or dishonest. And with vocal awareness comes a greater chance of achieving real communication, of expressing oneself clearly, of really hearing another person, of two voices connecting. In an atomised society where isolation and loneliness are rife, this dimension of the voice has never been more essential.

NO IMPROVEMENT

This isn’t a self-help book. There are no tips to be found here on how to develop a rich brown voice, of the kind that sells limousines or instant coffee, or can snare a partner. Plenty of such books already exist, few of them effective. As one speech and language therapist put it, ‘If you just give people a voice, it’s a bit like a script, they become a cardboard cut-out. And in a moment of stress and anxiety they revert to the voice they had before.’⁵³ Or they risk losing the natural dynamism of their voice, with all that this implies. Opening one’s mouth only to hear another person’s voice coming out isn’t the same as developing a greater sensitivity to other people’s voices and one’s own – it’s ventriloquism.

This book isn’t about the singing voice either, even if there are more similarities between the singing and speaking voice than differences. The same part of the brain may produce both singing and speaking – we have one voice, and not two – but the speaking voice is enough for one volume.⁵⁴

Again, when people speak of the voice, it’s often the metaphorical voice they have in mind, and not the embodied one. Today ‘having a voice’ has lost a lot of its original meaning. Increasingly it’s used not literally to mean the sound we make with our mouths but more abstractly, in the sense of having the right to be heard socially and politically. ‘Voice’ has also become a common term for narrative authority and literary self-expression.⁵⁵

Is this an example of how we’ve taken the body out of the voice, and the voice out of the body? Or does using ‘having a voice’ as a metaphor only prove once again the importance of the voice in human experience? The fact that ‘finding one’s voice’, a phrase that on the literal level means being able to speak, has lent itself almost exclusively today to the figurative sense of speaking out testifies to the voice’s fabulous ability to express a person’s deepest sense of self.

Finally – especially in Britain – if you talk about the voice it’s assumed that you’re talking about accent. George Bernard Shaw summed up the relationship between accent and class with his famous quip: ‘It is impossible for an Englishman to open his mouth without making some other Englishman despise him.’⁵⁶ It’s rather dispiriting that, all these years later, debate about accent still dominates our thinking about the voice, as if dialect and pronunciation made up the whole of the subject, and not just a small corner of it, albeit a fascinating one. On the other hand, if we’ve become so sensitive to accent that a native speaker can distinguish instantly between an Atlantic City accent and an Atlanta one, between a Bristolian and a Brummie, or between Hochdeutsch (‘High German’) and Plattdeutsch (‘Low German’), then we have the potential to develop our sensitivity to the many other astonishing dimensions of this vital medium.

1What the Voice Can Tell Us

NO AUDIO

TALK ABOUT TALK

SIMULTANEOUS INTERPRETERS

RAISING THE VOICE

NO IMPROVEMENT

1
What the Voice Can Tell Us