12
Cultural Differences in the Voice

Even before I open my mouth to speak, the culture into which I’ve been born has entered and suffused it. My place of birth and the country where I’ve been raised, along with my mother tongue, all help regulate the setting of my jaw, the laxity of my lips, my most comfortable pitch. They play a part in dictating how fast I speak, how soft or loud, how much silence I can tolerate. My vocal habits mark me out as urban or rural, upper or working-class, while the rules about turn-taking that I follow or flout also place me. I speak with my voice, but my culture speaks through me.

For a long time anthropologists tended to treat speech purely as a route through which cultural values were communicated. But then, in the 1960s and early 1970s, came the explosion of interest in the ‘ethnography of speaking’. Now the rules of speech in different countries, regions, ethnic groups, and social classes themselves became the object of study.1 Why were some clans so much more voluble than others,2 and what positive or negative social values were associated with being taciturn?3 Researchers chronicled the use of the voice for sacred purposes – chants and invocations – and rhetorical ones (political oration, funeral speeches, etc). They understood that in every culture, no matter how sophisticated, a kind of tribalism is expressed through the voice – we use it to sniff out similarity and difference, us and not us – and to signal where in the social hierarchy we position ourselves (or are positioned by others). Through a society’s precepts about the voice you can trace its ideas about ancestry and lineage, rank and kin – even its metaphysics.

MAGIC SPEAKING

In traditional societies the voice is one of the single most important components of ritual. A pioneering piece of twentieth-century anthropological field-work, Bronislaw Malinowski’s 1935 account of magic among the Trobriand Islanders near New Guinea, showed how it could also be a vector of magical beliefs. The magician, Malinowski observed:

prepares a sort of large receptacle for his voice – a voice-trap, we might almost call it. He lays the mixture on a mat and covers this with another mat so that his voice may be caught and imprisoned between them. During the recitation he holds his head close to the aperture and carefully sees to it that no portion of the herbs shall remain unaffected by the breath of his voice … When you watch the magician at work … then you realise how serious is the belief that the magic is in the breath and that the breath is the magic.4

In ritual, voice and breath often connect a speaker with spirits handed down by their forebears. Young chanters of the Ata Tana Ai of eastern Indonesia, when they ‘receive the tongue and take the voice’ of their ancestors’ knowledge, acquire, supposedly in a flash, historical and linguistic understanding.5 The Kalui people of Papua New Guinea make no distinction between spirit voices, the weeping and singing voice, and the sounds of rainforest birds.6 The Dogon people of Upper Niger in North-West Africa don’t distinguish between literal and spiritual voice either. To them the voice carries the life-force, so it’s hardly surprising that individuals are vulnerable to each other’s bad voices. A nasal voice, producing speech caught between the nose and the throat, they call ‘decayed’: like a stagnant miasma, it gives off a bad odour that penetrates into the listener’s very body.7 Until recently Westerners tended to regard such views not as evidence of the underdevelopment of their own vocal attitudes but as primitive.

THIS IS MY PITCH

Tone of voice is like a language, and different cultures use it differently. The Kaingang of Brazil communicate degree and intensity by changing pitch, as well as by facial expression and posture, whereas we’re more likely to do it through words.8 The Christian habit of addressing God in hushed tones would strike the Sioux as bizarre, since they use high pitch and loud voice to communicate the solemnity of religious ceremonies, addressing the Great Spirit in shrieks and loud cries.9

But there are also cross-cultural similarities in the way we use our voice. An Italian with no knowledge of English can distinguish between ‘When can you do it?’ asked courteously and when asked curtly – in both languages ending the question on the very bottom pitch rather than a rising one introduces an element of insistence.10

Of course no culture is static, and speaking styles constantly evolve. The more complex and rule-laden a society becomes, it’s been suggested, the more sharp sounds take over from lax ones: there’s a higher level of muscular tension in countries (and individuals) that prize conformity, discipline, and self-control.11 In India and Pakistan, on the other hand, the jaws are held rather inertly and loosely apart.12

TALK ABOUT TALK

In other cultures differences in prosody are not only noticed but also openly discussed. For instance Japanese aizuchi – backchannel cues in the form of agreeing sounds or affirmative noises made by the listener while someone else is speaking – can be words (‘hai’ [yes], ‘so’ [that’s true]), echoes of one of the speaker’s key words, or even grunts. They maintain harmony and tell the speaker that the listener is paying attention.13

Aizuchi require a keen sensitivity to the voice, for they’re anything but random. Listeners interject them at particular stages in a speech – at a low-pitch point, for example, or when the speaker slows down, or raises their volume or pitch. It’s as if, alongside his or her actual words, a speaker is transmitting non-verbal vocal instructions to the listener about how to react. ‘Prosody alone is sometimes enough to tell you what to say and when to say it.’14

The Japanese refer to aizuchi in everyday conversation, discussing how a particular aizuchi was read, or how someone else’s aizuchi made another person feel.15 Chapter I showed how the Tzeltal speakers of Tenejapa in Mexico had many words to describe the voice. The Japanese too, it seems, are not only acutely responsive to vocal cues, but have also developed a language in which to discuss them.

EMPHATICALLY VERBAL

Even though Bronislaw Malinowski helped awaken generations of anthropologists to the connection between the voice and magic, one of his most important concepts is curiously deaf to the vocal dimension. It was Malinowski who coined the word ‘phatic’ to describe the kind of conversation that has no intrinsic meaning but is simply a kind of small talk – ‘a type of speech in which ties of union are created by a mere exchange of words’,16 where the very fact of saying something – anything – is significant, irrespective of the verbal content. ‘The breaking of silence, the communion of words, is the first act to establish links of fellowship … The modern English expression, “Nice day today” or the Melanesian phrase, “Whence comest thou?” are needed to get over the strange and unpleasant tension which men feel when facing each other in silence.’17

Curiously, nowhere does Malinowski specifically acknowledge or even mention the voice. Yet phatic communion’s power lies precisely in what’s communicated nonverbally, especially vocally – that’s why the words aren’t important. In phatic speech words are simply carriers of the voice – excuses for it, opportunities for it to be heard. When another person speaks they’re implicitly volunteering information about themselves: through vocal self-exposure, purely by using their bodies to produce a sound, they’ve made themselves less threatening. Phatic speech is essentially phatic voice.

A DIFFERENT CLASS OF VOICE

We express through our voice not only friendliness but also social rank; we use different registers at different times, for ‘no human being talks the same way all the time’.18 In Burundi a peasant farmer, no matter how naturally eloquent, is obliged to stammer, shout or generally make a rhetorical fool of himself when his adversary is a herder or other superior. But put him in a council in the role of judge and he’ll speak with grace and dignity.19

In Senegal low-ranked Wolof speak loudly, quickly, in a high pitch, and with a wide pitch-range. In political meetings their voices rise to a falsetto, reaching more than 300 syllables a minute. High-ranking Wolof nobles, on the other hand, are expected to sound breathy and talk in a soft, low voice that can drop to a basso profundo, sometimes mumbling at 60 syllables a minute or less:

Power is … displayed in this form of talk – the power to command the audience’s attention. The people who speak in the most extreme versions of this style are those whose high status and authority are unambiguous. Although they speak little, their right to the floor is unquestionable … When the high noble does begin to say something, an immediate hush falls … Everyone leans forward, straining to make out what is being said.20

The Wolof, of course, aren’t born with these contrasting vocal styles – they develop them as they take their social place.

Nor are such contrasts peculiar to the developing world – they’re also present in industrialised countries. The setting of the mouth, according to the French sociologist Pierre Bourdieu, is shaped by class, with the working-class way of eating and speaking dominated by the refusal of ‘airs and graces’.21 A working-class English-speaking Glaswegian will have a different degree of openness about the jaw and place their tongue differently from a middle-class English-speaking Glaswegian, and those settings will be modified again by age and gender.22 In fact, quite fine distinctions of status are discernible through the voice, even to listeners who don’t understand the language of the speakers. When non-French-speaking Canadian students listened to a tape of French Canadians reading a short passage, they recognised their social status accurately purely on the basis of their vocal properties.23

Though this may be depressing, it isn’t surprising. Bourdieu has coined the words ‘habitus’ and ‘bodily hexis’ to describe the ways in which class is inscribed in the body. ‘Habitus’ refers to the way that childhood learning is mapped on to the body so that it becomes ‘a living memory pad’, while ‘bodily hexis’ (from the Greek word for habit) describes how social status comes to be embodied in our way of standing and speaking,24 in our very mouths. In other words, the social body and the physical body enjoy an intimate relationship.

One of the earliest, most influential (and enjoyable) studies on status and speech was conducted by William Labov in three large New York City department stores in 1962. Labov had noticed that New Yorkers of different social classes pronounced ‘r’ when not followed by a vowel (as in ‘card’, and ‘four’) differently, and decided to test this out in Saks Fifth Avenue, Macy’s, and S. Klein (Manhattan stores at the top, middle, and bottom of the price and fashion scale). Assuming that sales staff, even if they didn’t themselves share the social status of the stores’ clientele, borrowed the customers’ mannerisms, Labov adopted the splendid research method of approaching sales people in the store and asking for directions to a particular department located on the fourth floor. The answer usually would be, ‘Fourth floor,’ whereupon he’d lean forward and say, ‘Excuse me?’, so eliciting another, more forceful, ‘Fourth floor.’ Labov conducted as many such interviews in as many aisles as possible until people began to notice. Then he proceeded to the fourth floor to ask, ‘Excuse me, what floor is this?’

His interviews revealed that the majority of Saks’s employees, half of Macy’s, but only one-fifth of Klein’s sounded the ‘r’ clearly, bearing out his theory that a more emphasised ‘r’ was now the more prestigious pronunciation, and that social class is inscribed in the pronunciation of even a single sound.25

CROSSTALK

Because our voices are so imbued with our culture, the potential for cross-cultural misunderstanding is enormous. Pitch or volume can mean one thing in one country but something quite different in another. The Chinese, for instance, lower their pitch or volume to draw attention to the seriousness of the subject or the intensity of their feeling about it. Since Americans use high pitch and volume to communicate this, they often find the Chinese over-deferential and insufficiently forceful.26

On the other hand, West Indians, to emphasise a point, might suddenly and dramatically change their pitch in the middle of a conversation, and start speaking louder. To other ethnic groups this seems like an expression of anger or aggression and so they react as if the speaker has been rude, leaving the West Indian feeling that they’ve come up against an example of racial discrimination.27 As we have seen, volume can also cause problems in American-Arab conversation: Arabs sound loud to Americans, and Americans too quiet and insincere to Arabs.28

Silence is perhaps the most culturally bound aspect of communication. A Japanese speaker, in the absence of aizuchi from a non-Japanese listener, can get the impression that they haven’t been understood29 – the Japanese give backchannel feedback twice as often as the English. But they also regard halting speakers much more positively than fluent ones, as their proverbs (‘The mouth is the source of calamity’, ‘Talkativeness is a mouth’s fart’, ‘Honey in the mouth, a dagger in the belly’) testify. The Americans and the British, by contrast, are discomfited by silence. So while the average length of pause between two Americans is .74 seconds, in an all-Japanese meeting it’s 5.15 seconds and can even extend to 8.5 seconds.30 An American trying to be polite to a Japanese colleague might end up reducing his or her credibility. Similarly the Japanese tend to find American taxi-drivers talkative, while Americans can be thrown by what they see as the tremendous reticence of Japanese taxi-drivers.31

Even the placing of pauses can cause problems. White American children are taught to pause at the beginning of clauses or before conjunctions, whereas black children pause wherever a significant change in pitch occurs, even if it’s within a clause. White children, as a result, sometimes see black children’s speech as less grammatical than their own, which forces black ones, when talking to white ones, to code-switch into the white way of speaking to avoid communicative problems.32

The consequences of these differences in vocal style can be serious. Alaska Native Americans in the 1970s received on average 20 per cent longer jail sentences than non-Native Americans because, to show deference to authority when talking to the police, they slowed down the rhythm of their speech and paused for longer than usual before speaking. Their hesitation was interpreted by non-Native police as a sign of antagonism rather than respect. Again, when the Native Americans didn’t respond to questions in court with the expected speed, officials thought they were expressing hostility and gave them stiffer sentences.33

Countries’ different vocal melodies can create powerful, and sometimes misleading, impressions. A senior lecturer in German studies at a British university observed, ‘When my students visit Germany they often think German people have been rude, but it’s just the different intonation and phrasing.’34 These conflicting tunes result partly from linguistic differences, but they can also occur between countries that speak the same language. Since Americans, when they ask a question, are far more prone than the British to use the high-rising tone (which sounds lighter than the falling tone), Americans sound casual to the British, while the British (with their preponderance of falling tones) sound formal to the Americans,35 even though American society in many ways is the more formal.

Cultural differences in the use of pitch can, as we have seen, make foreigners seem impolite or tentative. Thanking someone for an invitation to dinner, English men will raise their pitch, but Japanese men won’t: to the English this sounds unsociable, cool, or downright rude.36 Indian accents have a musical quality, but their inflections can sometimes be wrongly interpreted as questioning and uncertain, and the voice treated as lacking authority.37

And finally tempo, too, has a cultural dimension. New Yorkers regard quick talk as evidence of quick thinking, so that a Midwestern professor’s habit of pausing before speaking and then speaking slowly is seen not as a sign of being thoughtful but of being dim, which is how societies or regions of faster talkers have always regarded slower ones.38 Even the Finns, long characterised as silent, distinguish between the faster talkers of Carelia in the south-east of the country, and the slow-speaking residents of south-west Finland. Every country, it seems, needs to point the finger at someone slower than themselves, although in reality tempi may have less to do with national characteristics than with how urban a country is. Rural speakers are usually slower and quieter than urban ones – who, after all, have to make themselves heard over the din of the city. (A farmer complained about the city people who walk through his woods. ‘They talk so loudly. No wonder they never see any wild life.’)39 Finland, until the 1960s, had a famously dispersed rural population. If the Finns were silent, it’s probably because they didn’t have anyone around to talk to.40

What accounts for the enormous growth in interest in cross-cultural misunderstandings, to the extent that one British bank has even made it the subject of its television and poster advertising campaign? Partly it reflects the general preoccupation with communication: making oneself understood is no longer taken for granted, especially today when we’re increasingly required to commune beyond national boundaries. A lot of this material is also fun – anthropology for beginners, saloon-bar sociolinguistics. And yet some examples of cross-cultural miscommunication are just a whisper away from talk of ‘national character’ and can easily lapse into stereotypes, even if they’re legitimated by linguistics.

Many of the cross-cultural comparisons involve the Americans and Japanese. Is this because of those nations’ financial and corporate dominance? Or because the Japanese, to the British, Americans and other Westerners, constitute a kind of instant ‘other’, a new exotic, a negative American – high-tech enough to be part of the same modernist universe, and yet socially distinctive. In a sense they’ve come to represent cultural difference.

Much of the time, human beings manage to converse through the voice quite successfully, yet this perspective seems to suggest that human communication is riddled with misunderstanding, that we’re divided by voices rather than connected through them. Speaking about the Japanese like this also draws covertly on old, war-engendered prejudice: stereotyping them as cold automatons turns the Japanese into almost high-tech machines themselves – even though, as we’ve seen, their society (like most others) is in flux.

THE GLOBALISED VOICE

Globalisation plays a part in our obsession with inter-country misunderstanding: our interest is driven partly by anxiety about its impact on trade. Oddly, we’ve become interested in vocal difference at the very moment that it’s beginning to disappear. It’s not only the distinctive voices of Japanese women that are beginning to resemble their Western counterparts. Other vocal fashions are sweeping across continents, indifferent to local prosody. The high-rising terminal (HRT), for instance – in which the intonation of questions is applied to statements – seems to have begun in New Zealand,41 moved over to Australia, migrated to American teenagers (especially female),42 and eventually colonised Europe.

HRTs have been cited as another example of ‘cultural cringe’, Australians’ low self-esteem, and traced back to Australian soap operas like Neighbours and Home and Away. TV, others say, rarely disseminates phonological change – only word-of-mouth can do that.43 The increase in international travel certainly exports not only germs from one part of the world to another but also vocal patterns: intonation can be contagious.

In the beginning, HRTs were also seen as a marker of casualness, and a characteristically female expression of doubt.44 So are we all tentative, or female, now? Or perhaps they’re the intonation of rapport. Differing views about their meaning began to surface after the publication in the New York Times in 1993 of a humorous piece by a professor of journalism, who’d noticed them spreading among his students and coined the phrase ‘uptalk’. ‘Uptalk’ is a sign not of deference or self-doubt, according to another commentator, but of ‘identity and group affiliation’.45 HRTs had mutated from tentative to cool – so much so that when, in 1993, the US’s National Public Radio examined the phenomenon by phoning professional men and women in their 20s and 30s and listening to their answering-machines, it found messages like this: ‘This is Bob? I’m not in right now? Leave a message at the beep?’46

HRTs remain the lingua franca of Australian soaps and American teen movies, and can still be detected in young voices. But now that they’ve spread to all age groups their life-span is surely limited.

But the most compelling example of how globalisation changes the human voice is the call centre. Increasingly today the letter and the body have been replaced by the voice: all call-centre transactions are conducted through the medium of the voice, now the sole link between customer and company or service. The way call-centre operatives have had to transform their voices seems at first just another striking example of how we alter our voices to fit in with other people’s.47 A Scottish study found that staff answering telephone enquiries will say ‘Aye’ instead of ‘Yes’ if they are speaking to a person with a distinctly Scottish accent, or drop their ‘h’s if the caller has a cockney accent.48

But the location of call centres tells another story. More than half the world’s top 500 companies now outsource either IT or business processes to India, where over 170,000 people work in call centres. They must first complete a spell of ‘accent training’ or so-called ‘accent neutralisation’ (although there’s nothing neutral, of course, about the accents that the workers are trained to use). Students repeat ‘Peter Piper picked a peck of pickled pepper’ while holding a sheet of paper ten inches from their face: they have to pronounce the ‘p’s so that the paper blows away from it.49 They also learn to say ‘villain’ rather than ‘willian’, utter ‘extraordinary’ as one word,50 and repeat ‘can’t’ with a long ‘a’ after watching Rex Harrison in My Fair Lady.51

Excellence in Indian call centres is now almost synonymous with anglicising the voice. A British chief executive, visiting India to check out potential call centres, said admiringly, ‘In two operations the agents had virtually no Indian accent.’52 In a nice irony, the Indian call-centre workers are expected to use an accent, with its perfectly pronounced ‘t’s and ‘p’s, that many of their non-Received Pronunciation (RP) British counterparts can’t achieve. Decades after ‘How now brown cow’ deformed the elocution of generations of Brits, Indians are still required to practise it.53

There are other vocal demands on call-centre workers. The need to ‘build rapport’, to sound helpful and confident talking to people with strange accents and bizarre figures of speech, all while under the pressure of the clock, puts immense strain on the voice. The tensions in international capitalism are played out in the voices of call-centre workers – no wonder so many of them lose theirs.54 As one 27-year-old engineering graduate put it, ‘You don’t realise how much working in a foreign language takes out of you. You try not to miss a single word of what people are saying. They expect you to be familiar with their culture, but they don’t care a damn about yours.’55

Now staff are quitting after racist abuse from British callers, who’ve detected from their accents that they’re not English. They tell of callers barking, ‘Get on with it. I don’t understand a word you’re saying.’56 According to the training vice-president of one Mumbai-based company, ‘Usually they won’t use abusive language but you can tell from the tone of their voice that they’re angry.’ Another former call-centre worker admitted, ‘I found it difficult to work for British clients. They wouldn’t call you names but you could hear the hostility in their voices.’57 Sometimes Indian employees hit the mute button and scream in Hindi, before returning to a polite, ingratiating persona.58

A vocal power struggle is taking place. For most workers, says a head of training, ‘It is very challenging to unlearn their natural manner of speech.’59 Of course the concept of ‘neutral accent’ or ‘English correction’ is a shifty one. Those who use it know that the idea of coercing workers to adopt new patterns of pronunciation is controversial, so they rephrase it in more acceptable terms. ‘We’re not trying to create accents, we’re just trying to influence the way he or she speaks,’60 says one.

According to another, ‘I don’t know that we’re doing elocution as such … not necessarily altering vowel sounds, but perhaps working on pitch and delivery.’61 Says a third, ‘A lot of times our trainees say, “Why do I need to sound British? … I’m an Indian and this is my accent and I want to sound like this … I don’t want to lose my identity.” What we tell them is … we want to eradicate any chance of the customer not being comfortable, so we need to enhance your accent just to be a great customer-service agent.’62 In reality, most Indian call-centre training colleges remove anyone not using an acceptable accent after training.63

The experience of call-centre workers shows how misleading is the term ‘globalisation’. Although it evokes some McLuhanesque global village, an egalitarian nirvana where all have equal access to the world’s bounty, in reality there’s a strongly colonial aspect to call centres, with the same old countries in the roles of colonisers and colonised. On the other hand it’s ironic that the British, having destroyed India’s own industries and obliged Indians to speak their language, are now losing some of their own call-centre jobs to India.64

With the spread of call centres, the growing interest in cross-cultural vocal misunderstanding, and the way that class and dominance are expressed and detected paralinguistically, the idea that the voice has diminished in importance in modern industrial societies has never seemed more unfounded.