In 1907 a Florida woman identified the man who had raped her, even though she hadn’t seen him during the attack or met him before. He had, though, spoken two short sentences, on the basis of which she recognised his voice. The trial judge accepted her testimony, arguing that such a terrifying assault had made her extremely alert. ‘Who can deny that under these circumstances that voice so indelibly and vividly photographed itself upon the sensitive plate of her memory as that she could forever afterwards promptly and unerringly recognize it on hearing its tones again.’1
The case posed a question that has been asked many times since: how long and how accurately can we retain the impression of a voice in our minds? For many American courts the judge’s ruling in this case constituted the legal precedent for the admissibility of earwitness identification.2
A century later the belief that the individual’s voice is as distinctive as their fingerprint has become so unshakable that voice verification has been welcomed by both commerce and government, offering the promise of security in transactions and surveillance. And with the spread of the mobile phone and the development of an extraordinary array of new techniques for storing and remodelling the voice, the process of separating voice and speaker that had begun with Alexander Graham Bell seemed to have come full circle.
It was Alexander Melville Bell, father of Alexander Graham Bell and an Edinburgh professor of elocution and speech, who began the process that led to voiceprinting, the identification of people by their voice, by trying to represent every single sound made by the human voice in symbols.3 But voice identification only really ambushed the public imagination during the infamous Lindbergh trial. In 1932 the 20-month-old son of Charles Lindbergh, the first person to fly solo across the Atlantic, was kidnapped and later found dead. A German immigrant, Bruno Hauptmann, was arrested: his 1935 court case was the O.J. Simpson trial of the day. Lindbergh testified that he recognised Hauptmann’s voice as that of the kidnapper, even though he’d only heard Hauptmann say four words, ‘Hey, doc – over here,’ in a cemetery, addressed to the person collecting the ransom while he himself was sitting in a car with the windows closed over half a block away. Lindbergh’s testimony was also delivered more than two years and eight months after the evening on which the words were spoken, although in the intervening period he’d insisted to police that he wouldn’t be able to remember the voice. Can hearing be this keen, and memory this reliable? Lindbergh’s testimony caused a sensation and Hauptmann was executed in 1936.
When a psychologist, Frances McGehee, tried to replicate Lindbergh’s identification in a pair of experiments based on similar circumstances she found that, although listeners were able to distinguish a particular voice with some accuracy soon after hearing it, this fell off gradually but steadily over time,4 a conclusion backed up by more recent experiments.5
Examples of misidentification were recorded even in biblical times (Isaac misidentified Jacob’s voice). The whole process is bedevilled by the fact that some people are more aurally sensitive (and have plain better hearing) than others, while some speakers have more distinctive voices than others. It’s also far easier to identify the voice of someone you already know than that of a stranger.6 Earwitness parades, in which a suspect’s voice is embedded in a recording of up to ten other people, are used today by criminal investigations in the same way as eyewitness ones, even though they aren’t comparable, partly because auditory memory is processed differently from visual memory, and voices, unlike faces, have to be paraded sequentially.7 Also, fear and anger (not to mention dentures and asthma) have a much more distorting effect on the voice than on the face. The accuracy rate of voice parades can be as low as 30 per cent.8 Indeed the more confident you are about your ability to identify a voice, the less accurately you’re able to do so.9 Hearing the voice of a perpetrator also can have a profoundly traumatising effect on the person making the identification, especially if they’re the victim of a violent crime or rape.10
Perhaps technology might do better. The invention of the sound spectrograph in the early 1940s produced a visual record of voice patterns based on sound waves. During the Second World War acoustic scientists thought spectrographs might be able to identify enemy radio voices and, after an assassination attempt on Hitler, American phoneticians were asked to compare two broadcasts of Hitler to verify that the speaker in the second wasn’t a double. After analysing the recordings and matching the voices they concluded that Hitler was still alive.11
But it wasn’t until the 1960s that the technology was refined and the word ‘voiceprint’ coined. As social unrest grew and crime increased, voice identification seemed an ideal method of nailing terrorists, anonymous tipsters, and assorted ransom-seekers.12 It also raised the issue around which the Soviet writer Alexander Solzhenitsyn built his novel The First Circle, set in a Stalinist penal institution where political prisoners are engaged in top-secret research ‘to find a way of identifying voices on the telephone and to discover what it is that makes every human voice unique’.13
Could such a perpetually elusive aim ever be accomplished? Voice identification is a romantic idea – that, in an age of homogenisation, we’re each the possessor of an inimitable object, our voice – and a dramatic one: it seems to accord with some theatrical fantasy of the unmasking of the criminal – betrayed by their voice. And voiceprinting has had its successes. In 1974, for instance, it was used to decide if someone purporting to be the eccentric reclusive tycoon Howard Hughes was Hughes and not an imposter.14 Yet the very word ‘voiceprint’, suggesting an analogy with fingerprints or footprints, is misleading. While fingerprints are an infallible method of identification, voiceprints in the end always come down to opinion, even if expert. However consistent the machines may be, there’s inevitably an interpretive element to voice identification, because it involves a visual task (comparing two sets of spectrograms), and an aural one (comparing breathing patterns, inflections, accents, as well as idiosyncratic speech habits in two different recordings) which, to some extent at least, must be subjective.15
What’s more, most people’s pitch alone varies hugely even in the space of a single day. Might Fred, sober and scared, sound more like Jim than Fred, after three triple whiskies and with an attractive person pouring him a fourth? If we add in factors like attempted voice disguise and distorting background noise, can we be sure that every one of the six billion people in the world will invariably produce speech more like their own than anyone else’s,16 especially since no one person says the same word twice in the same way, even in the same sentence? Voiceprints are actually more comparable in their accuracy to lie detection than fingerprinting.17 One of the scientific critics of the procedure has been less equivocal. ‘I believe voiceprinting to be a fraud being perpetuated upon the American public and the Courts of the United States.’18
In fact, despite a series of precedent-making cases, American courts have not wanted to move at a faster pace than the scientific community in admitting forensic speaker-identification as evidence, so in some American states it’s still not admissible, and in others it works on a case-by-case basis. The UK law courts seem a little less tormented over the subject: voice evidence is already being used in English criminal trials, and a series of precedents since Crown against Robb in 1993 has made voice verification admissible as evidence if backed by an expert.19 Ultimately it’s down to British juries as to whether they accept evidence based on speaker recognition or not.20 Voice verification returned to the headlines after 9/11, with the CIA and the US National Security Agency poring over each new audio tape purporting to come from Osama bin Laden, in order to compare it with confirmed recordings of him.
Hardly has the dark-chocolate voice finished uttering the words, ‘the name’s Bond. James Bond,’ than the high-tech doors open – they’ve recognised his voice. But what was once the apotheosis of sci-fi is today, increasingly, reality. Speaker recognition is an example of biometrics – establishing identity on the basis of unique physiological and behavioural characteristics.21
‘Your voiceprint is your key.’22 ‘Unlike tokens, cards, PINS and passwords, your voice cannot be shared, stolen, or forgotten.’23 ‘The future is hear.’24 Slogans of the brave new vocal world. All sorts of uses are envisaged for this emerging technology. We’ll be able to withdraw money from cash tills simply by speaking into them.25 Even safes and bank boxes will be regulated by the voice. Travel, health services, immigration and border control – all might employ it. At their most euphoric, the biometrics companies envisage a world without the need to carry money, keys, credit card or even identity cards. So ubiquitous will the technology become that the voice will allow us secure access to buildings, and joggers will be able to run unencumbered, paying for a bottle of water en route purely with their voice.26
The growth in telephone banking and call centres has created the need for some sort of remote verification and using the voice certainly reduces the demands on customers’ memory. To its advocates voice authentication is superior to the other kinds of biometrics, because all you need, no matter where you are in the world, is access to a phone27 – and, since most Westerners are familiar with the telephone, are now comfortable using it for confidential communication, and own one, the infrastructure is already in place.28
Voice verification, according to the prevailing propaganda, is the biometric of choice for most people because they regard it as the least intrusive. Chase Manhattan Bank, the first American domestic bank to introduce it, found that 95 per cent of its customers would accept it, compared with an 80 per cent acceptance of fingerprinting.29
Voice verification, say its advocates, also prevents fraud, something about which consumers express considerable anxiety. In reality much of the impetus for voice verification comes not from its capacity to safeguard the funds of individuals but from its potential to save those of commercial institutions. Statistics suggest that help desks spend between 40 and 60 per cent of their time on password problems – resetting forgotten, lost, or shared ones.30 What costs on average $1.75 a minute when carried out by a live operator can be done by voice recognition for 10–20 cents per minute.31 And the system is low maintenance: it doesn’t require the installation of thousands of scanners, yet it can continuously update and refine its databank each time an individual calls in.
It’s also supposedly a way of protecting against identity theft, something the voice-verification lobby is obsessed with, although precise figures of its incidence are hard to come by. Those very same bodies that fan fears of identity theft are, by coincidence, on hand to provide a technological solution to it.
Similarly, much of the research about the acceptability of voice verification has been conducted either by the firms developing the technology or by consulting firms employed by them, so it’s hard to get a clear sense of just how readily the public will embrace it. Two-thirds of those questioned in one survey hadn’t heard of voice authentication before and yet, in answer to the very next question, over 80 per cent felt that it could help prevent fraud or identity theft. One can’t help but wonder what sort of dialogue occurred between the first and second questions.32
Above all, the procedure depends on participant compliance: in other words, speakers must be willing to cooperate with the system, and must want to be recognised. Perhaps this is why so much effort is being expended on selling it to consumers – since they have the capacity to sabotage or reject the system, it needs to be depicted as something in their interests, whose accuracy is beyond question, even though this clearly isn’t the case.
For the idea of the voice as a kind of audio DNA is misleading. Short utterances, strong background noise, poor quality recording equipment, a speaker who’s modified or disguised their voice – all of these, as we’ve seen, are obstacles to an accurate system. With identical twins there’s a false-error rate as high as 50 per cent.33 The technology simply isn’t secure, and academic caution is at odds with commercial euphoria.34 At best speaker verification can supplement other systems, helping to increase their accuracy, but it isn’t a replacement for them yet.35
Public resistance to procedures like voice verification has diminished because of the remarkably rapid spread of another voice technology – the mobile phone. This has brought about major changes in voice use, significantly changed the relationship between parents and their children, and removed the intimate connection between voice and place that the telephone established.
Bell’s telephone may have distanced the body from the voice, but it still allowed callers to locate other people’s voices with precision: just as in face-to-face contact, you knew exactly where to find and place the voice of the person you called. The phone today follows the voice; the voice doesn’t have to go to the phone. It’s now almost impossible to pin down an adolescent geographically, because their phone has become as mobile as they are. Children’s whispered, midnight, under-the-duvet calls also can elude parental scrutiny. Parents don’t hear the voices of their teenagers’ friends so much either, because the youngsters now communicate mobile-to-mobile, without being mediated by adults answering the phone.36 Mobiles create the possibility, even the need, for teenagers always to be available to friends – although not to parents. With an increasing number of 8-10-year-olds and now a quarter of British under-8s owning one,37 it seems inevitable that communal voice time in the family will diminish.
From young people’s point of view, mobile phones have become a way to demarcate the boundary between themselves and their parents,38 and overcome the spatial boundaries of the home. The home landline is shared but the mobile is private, an expression of individual personality, customised in appearance and ring tone. (McLuhan called the new electronic media ‘extensions of man’.39 He couldn’t have anticipated how true this would be of the mobile, and how much closer it would bring the phone to the body.) The mobile liberates the teenager and pre-teen: texting transcends limits on vocal communication in public places like school, and allows adolescents to target precisely the person to whom they want to speak – no longer do they have to make polite small talk to friends’ parents.
A new etiquette of the phone has developed. It’s perfectly acceptable for people to answer their mobile even if it means putting their face-to-face conversation on hold.40 Mobiles have brought about a conflict of registers, so that people now speak to their caller in a style inappropriate for their overhearer. By having access to only one side of a phone conversation and not the other, bystanders find themselves in limbo, ‘neither fully admitted nor completely excluded’.41 Thanks to the mobile, there’s also a new exhibitionism in British life: callers speak as if the mobile were conferring privacy, even when they’re surrounded by strangers. (Will future generations of mobile users have got so used to its mobility that they’ll no longer feel obliged to announce where they’re calling from?) The mobile has penetrated remote beaches,42 and turned even the Finns – famous both for their dislike of small talk and virtuosity in mobile-phone production – into chatterboxes, of sorts. The very inescapability of mobiles is leading to curbs and controls on their use. In Japan most trains, buses, and many restaurants display ‘No mobile phone’ signs43 and other countries are following suit.
Accounts of mobile phones saving lives – of people rescued at sea, or on a mountain, after calls to their family – have become common, while the plangent sound of an unanswered ringing phone after a train crash has also become part of our contemporary soundtrack, an emblem of finality. One of the most distinctive and poignant features of 9/11 was the reporting, in some cases relaying, of last, loving calls made to relatives by people on the doomed planes or in the twin towers. ‘this is death being faced in real time,’ wrote one writer, ‘this space between life and death … has been wired up and switched on, electronically illuminated.’44 Another saw such experiences as ‘harbinger of a time to come when no one will die alone, and will make their dying peace with partner, child, parent, friend, even answering-machine or operator,’45 curiously one of the functions that Edison imagined for the phonograph. When the voice of Herbert Morrison, eyewitness to the crash of the Hindenburg, cracked with emotion, it was unprecedented; now the voices of victims themselves can be heard, and by the time of the London bombings in July 2005 camera phones had become so ubiquitous that pictures too were added to the grim first-person electronic testimony.
The telephone voice isn’t the same as the face-to-face voice because the telephone doesn’t cover the full range of frequencies of the human voice, favouring certain voices over others (German men, for example, transmit better than Japanese women).46 On the other hand, mobiles are also reaching remote parts of the developing world that landlines haven’t begun to touch. Only 3 per cent of Africans have mobiles, but they represent 53 per cent of all phone subscribers on the continent.47 The voice will travel everywhere.
The mobile itself, they tell us, might soon be always-connected, a teat for our times. Already the baby monitor allows us to tune in to our infants continuously. Today aural surveillance seems uncontroversial, and the idea that it’s legitimate for governments to use the voice to locate and identify its citizens has attracted almost no public criticism. British and American non-custodial government surveillance programmes for serious young offenders now include voice tagging, which requires the young person to call in from a landline at specified times every day. Within a few seconds, the computer can check the voiceprint, phone location, and time.48
Voice tagging is a fabulous money-saver. It costs $9,000 to keep a young person in detention for three months, but to track them for the same period, American researchers have found, costs only $300.49 Judges like the system because not only does it mean that a young offender doesn’t have to be housebound – they can go to school or wherever else the terms of their probation allow – but also that a large element of unpredictability can be built into the system, making it almost impossible to cheat it.50 At the very least such schemes must operate as a kind of aversion therapy, turning the phone into a potential instrument of persecution. Put like this, parents of teenagers might want to sign theirs up for it immediately.
Yet vocal tagging also has a curious historical resonance. In the nineteenth century the philosopher Jeremy Bentham imagined prisoners being confined in brightly lit cells in a Panopticon, a circular cage from where they would be permanently visible to a supervisor in a high central tower. Thus would the inmate become the instrument of their own surveillance. The Panopticon dispensed with the need for physical confrontation or the corporal exercise of power ‘by its preventative character, its continuous functioning and its automatic mechanisms’.51 In an almost exact parallel, the 24/7, 365-day dimension of vocal tagging is what makes it attractive to the police and probation service, who praise the way that it automates the supervision of offenders. Has voice verification given birth to a modern Panauditorium?52
A British home secretary proposed another phone solution to youth antisocial behaviour – confiscating mobiles for a few months.53 So the voice has become a means of social control not only through its presence but also its absence.
The Walkman was the first portable device to allow a constant aural presence, one now vastly extended by the iPod and mobile. A remarkable number of other machines, procedures, and software can or soon will be either operated by voice, responsive to it, or possess it. These include a programme that allows us to dub our own voices on to DVDs of animated movies like Shrek;54 Kismet, a robot being developed by MIT’s artificial intelligence lab, that not only shows through its voice if it’s sad, angry, happy, or calm,55 but can also tell from its instructor’s voice whether it’s being praised, scolded, or comforted;56 and a voice-box transplant, transferring the larynx of a dead person to a living one.57 This in addition to all the telephone helplines, answering-machines, blogs, karaoke machines, text-to-speech systems58 already in existence. If the producers of the new technologies are correct, we’re going to be using our voices more and more – accessing voice-enabled services on voice portals through our mobiles, for instance. As the chief executive of a company developing speech-synthesis solutions put it chillingly, ‘Voice as a brand is becoming more and more important.’59
The ability to chop up and reassemble the digital voice is an irrevocable and intriguing one. On the one hand it opens up all sorts of new vocal possibilities, demonstrated in the sonic art now emerging.60 The liveliest of today’s audio artists, far from complaining about the disembodiment of the voice, revel in it and play with it instead, using it to make something new. Similarly the spread of rap, sampling, and ambient sound has produced an exceptionally audio-aware generation, one open to vocal experiment, even if the voices they hear aren’t necessarily voice as we’ve known it. On the other hand, some of the new voice technologies are touching off serious disquiet.
Over the past ten years warnings about the diminishing role of the voice caused by technology have come from many different quarters. In 2001, for example, it was revealed that 45 per cent of telephone calls have been replaced by email and we now email more than we talk. To encourage staff to talk to each other some large British firms began introducing ‘email-free Fridays’.61 Email, shouted another anxious headline, ‘could replace talks with teacher’62 (as if the culprit were technology rather than teachers’ workloads). This was followed by a poll claiming that text messaging was harming students’ speaking skills, with three out of 100 company directors believing that email is detrimental to spoken communication.63 The United States has even passed a Human Voice Contact Act, stipulating that state agencies that use automated telephone-answering equipment must also provide callers with the option of a live operator.
There have also been recurring alarms about young children’s poor communicative skills: the British Chief Inspector of Schools warned that many children were unable to speak properly when they started school because parents had left them in front of the television instead of engaging with them through talking and play,64 while at the other end, large numbers were leaving school unable to express themselves adequately or follow what others say.65 According to the Director of the Basic Skills Agency, routine communication in families with exhausted parents and too few family meals now amounts to little more than a ‘daily grunt’ from monosyllabic schoolchildren. (While most of the newspaper columnists pitched in with laments for the lost art of conversation, one demurred, arguing that grunts had a grammar of their own, their meaning varying with length and tone. ‘Gaad’ on its own means ‘There is a depressing but predictable item in the newspaper that I’m reading', but by extending it and lowering the note, the speaker is saying ‘You’re not really going to watch this television programme, are you?’66)
So concerned was the British government about children’s communicative skills that in 2003 it instigated a national drive to improve ‘oracy’, setting term-by-term objectives for speaking and listening.67 It also piloted children taking exams by mobile phone ‘because this way they’ll use their voices’.68
While this renewed focus on speaking is welcome, it often rides on a crude technophobia. The De-Voicing of Society: Why We Don’t Talk to Each Other Anymore, a recent book by an American neurolinguist, argued that interpersonal conversation was dramatically declining in Western societies, causing a waning of intimacy.69 To the human, the book claimed, talk was the equivalent of grooming in apes and monkeys, and a critical factor in the creation of intimacy. The author indicted the usual suspects – email, the Internet, home shopping, television, telecommuting – claiming that anomie was their inevitable consequence. Yet mobile phones and answering-machines also stood accused – and neither of these, you might think, were impediments to talk so much as facilitators. On closer observation, the book appeared to be lamenting not so much the loss of talk but of face-to-face talk, and in so doing it appealed to a disappearing (sociobiological) golden age where everyone was busily involved in sustaining conversations. Others claim that technology has amplified the voices of the powerful.70
This vanished nirvana isn’t convincing. Pre-technological societies were also pretty stratified, with some voices counting for more than others (there were kings and subjects, masters and slaves, even before the birth of the loudspeaker or email). What’s more, our knowledge of how voices worked in the pre-technological era only comes from written accounts or transcribed oratory. It seems unlikely that medieval families, say, conducted their everyday business entirely through the rhetorical curlicues preserved in public documents: private life then probably had daily grunts of its own. And far from talking less than we did, we may well be using our voices more. With the arrival and growth of call centres and other service industries where the voice is pivotal, a growing proportion of people now speak for a living. What’s more, public speaking, once derided as an outmoded and declining activity in an increasingly informal world, has become a necessary skill in a growing number of spheres. Over thirty years ago a researcher found that almost 50 per cent of the blue-collar people she interviewed had given at least one speech to at least ten people in the past two years, and conjectured that middle-class people had probably given more.71 Those figures must surely have grown exponentially in the intervening years while, as we’ve seen, voice-management has become an essential component in politicians’ arsenal of spin. Does this sound like a devoiced society?
In addition, new technologies allow us to hear things that in previous generations we would only have read, such as recordings of calls to emergency services. Whether this enlarges human experience, or is just another stage in the banalisation and desensitising of the culture is open to debate, but it certainly doesn’t strengthen the devoicing argument.
Of course none of this is speaking in the face-to-face sense that devoicers have in mind, but my interviews suggest that even people who aren’t in obviously voice-related jobs, or who lack rhetorical prowess, rely on vocal skills to a remarkable extent, using them to make vital social and personal connections. In a therapeutic culture that lionises talk and pays lip service, at least, to qualities like emotional intelligence72 that are expressed to a great extent vocally, communicating through the voice is growing rather than diminishing in status.
Indeed, as I’ve suggested again and again, technological change has made the voice more rather than less important. The irresistible spread of answering-machines, mobile phones, help lines, phone-ins, CD-ROMs, talk shows, and voice-recognition techniques, along with the proliferation of radio stations, is making ours much more of a voice-centred society. Mobile phones, reality TV shows, video diaries, weblogs and webcams have also eroded the boundary between public and private, meaning that we now hear people speak ‘privately’ in public. (Perhaps the devoicers are looking for vocal intimacy in the wrong place: it’s now moved into the public domain.) The critics lament the eruptions of capitals, exclamation marks, and emoticons in email and text messages as a poor expressive substitute for tone of voice. In fact the growth of email and text messaging (with all their stylistic conventions), far from eclipsing the voice, has drawn attention to its irreplaceability. New technologies invariably provoke anxiety: sometimes this turns out to have been warranted, but it usually takes more than a couple of decades to judge what has really been lost, and what has simply morphed into a new shape or been replaced by something similar.
One can’t help thinking, too, that some of these pessimists have recruited the voice for their own purposes, as a form of technology-bashing. They seem less interested in the voice per se, and more in using it as an antidote to modernity, a means of somehow preventing the future from getting out of hand. Imagining that the voice was free but is now being tamed or enslaved is romantic fantasy – especially since for at least 150 years, as we’ve seen, the voice has been in a process of almost perpetual transformation.
On the other hand not all the anxieties about the effects of technology on the voice can be dismissed. For instance, although speaker authentication is being sold as a guarantor of personal privacy, it might actually invade it. How do we feel about our voices being appropriated by commercial companies and state institutions? The voice-authentication lobby emphasises its potential contribution to workplace safety, especially in the post 9/11 world, yet there are also social, personal, and ethical implications. Will it, for instance, change our relationship with our voice?
Voices are increasingly being used to sell; in the commercial applications being developed they’re also requisitioned to buy. They’ve come to be seen as a potential money- and labour-saver: just as supermarkets have displaced some of the labour of selling from the retailer on to the customer so, by replacing call agents with voice-recognition systems, administrative labour is increasingly being transferred from producers’ to consumers’ voices. Promoters of the new technology say that the voice is so valuable because it can’t be stolen, but aren’t they, in some small way, doing precisely that? Or is rueing the fact that our chief tool of communication is now being purloined by the corporate world just anti-modern nostalgia?
Certainly, before the nineteenth century, the voice wasn’t seen as a marker of individuality or of an internal disposition, but rather the outward sign of social traits, resulting from the speaker’s cultural position and role.73 Speaker recognition, by contrast, focuses attention on what’s distinctive and not shared in the voice, treating the voice as a sign of the self that can be exploited for bureaucratic or corporate needs. Although it may become a boon to the visually impaired and people with other kinds of disabilities, a huge international database of voices is horribly open to abuse. Big Brother no longer needs to watch when he can listen so effectively – truly His Master’s Voice.
At the moment the users of voice-recognition software aren’t eavesdropping on our conversations, only getting us to say our names along with other, more banal information. The human voice, in any case, has always served multiple functions – the means by which we buy potatoes as well as declare our love. And yet we’re undoubtedly living through a major paradigm shift, one in which the voice has travelled even further from the body.
So should we be concerned about the new voice machines? And who, now, does the voice belong to? Because it’s not a possession or accessory but an attribute, the voice can’t belong to anyone, and yet the question isn’t as far-fetched as it might sound.
Although the idea of body and voice existing in perfect harmony in some mythical past is untenable, it’s true that for centuries the human voice was (at least to some extent) a guarantor of the body – you couldn’t have one without the other. Today voice and body have been prised apart, and the voice is in the throes of becoming just another component – a digitalised bit to be reconstituted and remixed, in the same way as music and song are now broken down into separate segments, sampled and reassembled into a new whole.
Who does the voice belong to? Ask Nancy Cartwright, the voice of Bart Simpson in the TV cartoon series, who is regularly requested by interviewers to ‘do Bart’. She always refuses because ‘the prohibition has to do with legalities, copyright infringement. It is a funny thing but although I own my own voice, I don’t own Bart’s voice. It would stand to reason then that I couldn’t just go out and in Bart’s voice say, “Even though you think I am Bart Simpson, I am actually Nancy Cartwright.” These legalities are common in television/film production. The actors don’t own the characters they are doing. They are owned by the creator and/or the studio/production company.’74
Dan Castellanata, the voice of Homer Simpson, faced the same problem when he performed a segment on an American alternative comedian’s album in Homer’s voice, only for the album’s release to be blocked by Fox Television’s lawyers, who insisted that Homer’s voice was part of their intellectual property.75
Who does the voice belong to? Ask Nick Campbell, researcher into speech processing. ‘At the moment, the law seems to be undecided about the use of very small samples of a person’s voice – it seems that they probably don’t have ownership of the “sounds” of their speech … currently the sound of a voice is legally similar to the colour of a painting or the words of a book … We can’t copyright colour or words, only the shapes or ideas that are made up by sequences of them.’76 Campbell is developing a speech-synthesising system in Japan that could, potentially, enable machines synthetically to produce convincing versions of the voices of real-life people like the American president saying something incriminating. It could bring back old, beloved, long-dead movie stars and get them speaking new scripts (putting live actors out of work in the process).77
The new technology introduces the possibility of a new kind of crime – voice theft. If, for example, a synthesised version of Nick Campbell ordered hundreds of pizzas to be delivered to all his friends from the neighbourhood pizza parlour that had recognised and trusted his voice, who would be legally bound to pay for them?
The synthesised-speech programmes being developed today will be used by call centres, text-to-speech systems, and help desks. Eventually they may be indistinguishable from the voice of a living, breathing person, so various security measures to prevent abuse are being entertained. There’s even talk of embedding in the digitalised version some almost imperceptible feature (like a watermark on paper) that could be picked up by a special detector but not the human ear, in order to allow an artificially generated voice to be differentiated from a genuine one.78 Clearly, hearing is no longer believing.
Why are these technological developments so disturbing? One answer might lie in Freud’s concept of ‘the uncanny’. Certain situations, Freud suggested, cause dread and fear because they make us doubt whether an apparently animate being is really alive or, conversely, whether a lifeless object might not, in fact, be animate.79 Ghosts and spirits fall into this category (so do waxworks or ventriloquists’ dummies80).
The dismembered, reconstituted voice is just such an example. Is the voice of some long-deceased star, newly digitalised to sell the latest model limo, a dead or a living thing? Is it still the speaker’s own, or has it been so digitally remade that it’s now a voice-in-the-machine? Anxieties about synthesised voices express fears about the impact of technology on our very idea of the human, and the way we define alive.
But Freud also believed that uncanny fears express primitive beliefs, often originating in fantasies in the womb, which we think we’ve surmounted but which reappear with force. The processed, synthesised voice, by this reckoning, may re-stimulate infantile anxieties about the loss, through the act of being born, of the mother’s embodied voice. As we’ve seen, the foetus hears but, even more importantly, also feels the maternal voice: in entering the world, the baby loses the feel of it and now only hears it. Technology continues this process, putting the voice at a further remove from the body – no wonder it excites so many primitive fears.
Freud’s concept of ‘the uncanny’ also helps us understand why the anxieties about the voice kindled by these latest synthesised techniques are almost identical to those evoked by Bell’s telephone and Edison’s phonograph a century before. New technologies, but old fears. In fact the human voice has proved remarkably resilient and, as we’ll see, astonishingly versatile.