5Patterned Sound: Inscriptions and the Trained Ear in Birdsong Analysis

A Novel Technique

“The invention of a novel analytical technique often helps to launch a new science. What microscopes were for the emergence of cell biology as a discipline, or the cathode-ray oscilloscope for neurophysiology, it was the sound spectrograph that, immediately after the Second World War, enabled the birth of the science of birdsong. … Until about 1950, everyone interested in birdsong had no choice but to work by ear. Only when the sound spectrograph became available was it possible, for the first time, to grapple objectively with the daunting variability of birdsong” (Marler 2004, 2). Thus biologist Peter Marler sketched in a textbook on birdsong biology what he regarded as a formative shift in the discipline.1 Originating in American telecommunications engineering, the sound spectrograph represented a sound spectrum visually, by plotting frequency in function of time. It was this application that, according to Marler, “elevated studies of song dialects … from the birdwatcher level to the status of scientific research” (Marler 2004, 11). Marler was well placed to comment: together with his superior, William H. Thorpe, at the Cambridge University Department of Zoology, he became a leading figure in a sound visualization technique that would prompt an unprecedented change in the substance and organization of the study of birdsong (figure 5.1).

10307_005_fig_001.jpg

Figure 5.1 William H. Thorpe recording a dove at the Madingley Ornithological Field Station near Cambridge.

Source: J. Hall-Craggs, “Obituary: William Homan Thorpe,” Ibis 129 (s2) (1987): 568. Reproduced with kind permission of Les Barden.

Among students of bird vocalizations, the sound spectrograph was heralded as the solution to a problem that had preoccupied them since the turn of the century: how to read and represent birdsong objectively and accurately, but also intelligibly. As a device that permitted an analysis of sound in a way that was roughly analogous to human hearing, it became deeply associated with the concept of “visible speech” and the idea that the eye could be trained to read meaningful patterns of sound vocalization in much the same way as the ear could perceive them. For a brief moment, its developers at the Bell Laboratories considered the instrument particularly suited to support oral education and visual telephony for the deaf (Mills 2010). But the prospect of a universal language for sounds, based directly on its physical properties, also found a much broader appeal among students of birdsong. Ever since Albert Brand (1937b) had critiqued the human ear as a subjective and unreliable tool of investigation, students of birdsong had sought to compensate for their own defective hearing with various techniques of sound visualization—albeit with varying success. Yet here was an instrument that promised to evacuate detailed acoustic analysis of birdsong from the domain of the audible altogether at no loss, an instrument that did not merely visualize sound but, so its developers initially projected, would actually provide an intuitively legible orthography of sound. The sound spectrograph’s promise was to fulfill, as media arts theorist Douglas Kahn (2002, 180) puts it, a long-standing desire to make sound “tangible and textual by making the invisible visible and holding the time of sound still.” However appealing, the concept of visible speech proved difficult to achieve in practice, and communication engineers soon moved on to other applications for the sound spectrograph. But by that time, a new generation of animal behaviorists had adopted the device, along with its developers’ initial hopes and assumptions about the transcription and transmission of sound.

It is tempting to read Marler’s analogy between the sound spectrograph and visual technologies such as the microscope or oscilloscope as signifying a landslide turn from analysis by ear to an exclusive reliance on visual inscriptions. Indeed, at first glance, these inscriptions may seem to discard the human “expert ear” in favor of a mechanical ear in the recording instrument. In practice, however, the concept of an objective yet intelligible visual language for sound could not be fully sustained. Understanding the ubiquity of sound spectrography in birdsong biology thus requires careful consideration of its embedding in an analytic and representational practice that, although ostensibly visual, was never entirely mute itself. Nor did scientists themselves always want it to be or pretend that it was. In this chapter, I attend to the minutiae of ways of analyzing and representing, listening to, and looking at sound that biologists at the Cambridge University zoology department practiced in their spectrographic work. Tracing the sometimes unsuccessful and controversial ways Cambridge University students of birdsong positioned spectrographic visualization in relation to an embodied experience of sound, this chapter revisits the notion of “inscription” to ask how and why certain visualizations acquired authority in the science of birdsong (Latour and Woolgar 1986).

Visible Speech

The Bell Telephone Laboratories began developing the sound spectrograph in 1941. They initially conceived of it as one of their telephone products, but the project was soon classified because of its potential military relevance. The instrument automatically broke down complex sounds into individual components and represented them in a visual order. This principle found useful applications in wartime cryptanalysis and naval intelligence. It served to expose scrambling in telephone communications and to decode and reconstruct speech fragments, and may also have been used to distinguish friendly and enemy naval craft by their engines’ signature sounds (Fehr 2000, 41; Radick 2007, 455). Even before the end of World War II, however, Bell Labs engineer Ralph K. Potter and his colleagues resumed their search for civilian applications, setting up a training program for the deaf to learn to read spectrographically rendered speech (Potter, Kopp, and Green 1947).

The idea of converting speech sounds into a readable script was not new. The phonautograph that Leon-Scott famously developed to elevate stenography to an automatic, universal written language was variously appropriated in the study of human voice.2 Scott constructed an analog model to the anatomy of the ear and attached to a stylus to render the waveforms produced by speech visible, but the resulting curves were irregular and difficult to decipher (Brain 2015, 74). Nevertheless, modified versions of the device found welcome reception, among others in the mid-1880s by physiologist Victor Hensen, who redevised it into a Sprachzeichner or speech depictor, and in 1899 by Yale experimental psychologist Edward Wheeler Scripture, who had set up a program to record speech waves for research on phonetics. Although the curves were about as hard to decipher as “Chinese ideographs,” Scripture’s hope was that they could ultimately be read in such minute detail as to allow conclusions based purely on what could be observed by the eye. Likewise, French linguists joined forces with experimental physiologist Étienne-Jules Marey to study phonetics by recording air pressure and movements in the nasal passages, larynx, and lips (Brain 2015, 72). In the 1920s, Milton Metfessel—a student of psychologist Carl Seashore at the University of Iowa—developed a technique he called “phonophotography,” seeking to record and transcribe singing voices with a detail and accuracy that would allow him to study emotional expression. Each of these developers was motivated by a more or less explicit desire to mute their studies of sound as much as possible. When Potter and his colleagues introduced the sound spectrograph in a 1945 article in Science and in the monograph Visible Speech in 1947, they placed themselves within this lineage—the term visible speech referred to Alexander Melville Bell’s system for phonetic transcription.

But Potter’s team also explicitly distinguished the innovation from existing methods. The team now argued that existing inscriptions were simply “unreadable to the eye” (Potter, Kopp, and Green 1947, 4)—the problem being not so much the resolution of detail, but the perceptual decoding of the inscriptions. At the time Potter was writing, the preferred instrument for displaying sound was the cathode-ray oscilloscope, which presented variations in acoustic energy over time as a waveform. Such waveforms provided crucial information on the physical nature of sound, but they gave very few clues as to the actual auditory experience. Tellingly, for Potter, if the spectrogram was an outstretched rug with its pattern clearly visible, the waveform was comparable to the rug’s threads unraveled and bundled together: the waveform provided abundant information, indeed, but most of it gave very few clues as to the actual perception of sound (Potter, Kopp, and Green 1947, 351).

The sound spectrograph’s pattern was produced by etching recorded sounds onto paper in a series of shaded bands, plotted against a horizontal time axis (see figure 5.2). Vertical spacing indicated frequency, while coloring suggested a relative amount of sound energy. According to the sound spectrograph’s developers, it produced a “translation [of sound] similar to that made by the ear. It should spread out the dimensions of speech so that they were visible to the eye as they are audible to the ear” (Potter, Kopp, and Green 1947, 4). Rather than showing the intricacies of an acoustic waveform, the representation depicted its perception (Mills 2010, 38). The spectrograph enabled people to see what they heard, Potter explained, because it was modeled on a schematic cochlea of the inner ear (Potter, Kopp, and Green 1947, 10–11). The inner ear was believed to be made up of sensitive elements that were each tuned to a particular frequency, the sum of reactions by these elements being what produced a physical sensation of tone. Analogously, the sound spectrograph automatically applied a Fourier analysis by unraveling a complex sound into the simpler sound waves that constituted it. The sound spectrograph thus did for sound what an optical spectrograph could do for white light, by distinguishing its constituent frequencies of color. Put very schematically, the device recorded a sound signal from a magnetic tape recorder, and then looped this signal through a filter that tuned to successive frequency ranges. A stylus then traced the sound energy present in each of these successive frequency bands on a revolving roll of electrically sensitive paper (Koenig, Dunn, and Lacy 1946).

10307_005_fig_002.jpg

Figure 5.2 Original spectrogram of the phrase “This is visible speech.” The horizontal axis expresses time, the vertical axis frequency. Amplitude is suggested by the darkness of the trace.

Source: R. K. Potter, “Visible Patterns of Sound,” Science 102 (2654) (1945): 464. Reproduced with kind permission of AAAS.

At the beginning of its career, the sound spectrograph—conceived as a mechanical analog to the human ear—had seemed like an ideal substitute for the hard of hearing (Mills 2010). But the genealogy of “visible speech” also reflects a more general conviction that it would be both desirable and possible to replace hearing completely by seeing (Mills 2010; Tillmann 1995). As the history of inscription technologies shows, such instruments were thought of not only as sensory prostheses for the deaf, as a physical necessity, but also as aids to help able-hearing linguists and acousticians transform sound into a scientific object, hence as an epistemic aspiration (Brain 1998; Sterne 2003). These scientists aspired to convert sound into a set of readable signs without the loss of crucial information. Once sound could be “read” from the image, listening, as a private and thus subjective experience, would become public and accessible for all to scrutinize.

The developers of the sound spectrograph assumed that this was possible because both aural and visible languages were made up of patterns, and similar expressions would, they believed, have broadly similar patterns. Their preliminary evidence seemed to suggest that the “indexicality” of sound and image would enable a new kind of visible ABC to be assembled, one that, with sufficient training, anyone could learn to read. With phonetician George Kopp and his psychologist assistant Harriet Green, Potter set up a training program to test this assumption, by training five normally hearing women to learn to understand the visual language (Potter, Kopp, and Green 1947). However, the project failed to yield conclusive proof that visual hearing could be efficiently achieved. Meanwhile, researchers in the Haskins Laboratories tackled the problem from the reverse direction. Physicist Franklin Cooper and psychologist Alvin Liberman set out to design a print-to-sound machine to read out for the blind, but hit problems when their respondents could not learn to “read” the sounds (Fehr 2000; Liberman and Cooper 1972). To discover more about the minimum parameters of intelligible speech perception, they experimented with playing back spectrograms on which they themselves had drawn simple patterns. Although these experiments demonstrated that simple acoustic patterns could actually translate into simple visual patterns (and the reverse), just as the spectrograph developers had anticipated, the displays of actual human speech proved too complex and ambiguous for a human listener to read (Cooper, Liberman, and Borst 1951). It soon became clear that the development of a truly visible language for speech was far from completion, but other applications seemed more feasible. If human speech was too complex for spectrographic reading, simpler patterns often displayed perfectly well. By 1951, Kay Electric Company, a company founded by a former Bell Labs engineer working on US Navy radar control projects during World War II, was licensed to develop the sound spectrograph commercially, naming it the Sona-Graph.3

Spectrographic Studies of Bird Communication

The Sona-Graph sound spectrograph was quickly adopted by universities worldwide, and became a preeminent instrument in fields such as linguistic analysis, phonetics, speech analysis, and signals intelligence. It was through exposure to these fields, and often its wartime applications, that students of animal vocalizations first learned about the potential of this new instrument. When Potter (1945, 470) introduced the sound spectrograph in Science, he already intimated that biologists might find the instrument useful “to analyze, compare, and classify the songs of birds, and, of even more importance, it will be possible to write about such studies with meaningful sound pictures.” He even put his hypothesis to the test. In 1948, he sent a draft paper to Arthur A. Allen at the Cornell Laboratory of Ornithology, documenting some of the findings he had collected by experimenting in his spare time with the spectrographic analysis of frog sounds—which he had taken from the laboratory’s gramophone publication Voices of the Night.4 Despite this forewarning, the Cornell ornithologists likely got to know the spectrograph firsthand through Charles Hockett, a linguistic anthropologist in Cornell’s Division of Modern Languages who would later pursue studies of animal communication in a bid to define the properties of human language. Likewise, when Cornell zoologist Nicholas Collias published a study on vocalizations of domestic fowl in 1953, he had been provided access to a rewired commercial Kay Sona-Graph by his coauthor, the University of Wisconsin linguist Martin Joos. Joos had worked intensively with the instrument during the war, while employed by the US Signal Corps to conduct cryptanalysis, and used the spectrograph as the basis for what he termed acoustic phonetics (Fehr 2000, 41; Joos 1948). Military service in naval intelligence also likely provided Donald J. Borror, a pioneering bioacoustician, entomologist, and amateur ornithologist, with firsthand experience with the device (Marler and Slabbekoorn 2004, 3). With Carl Reese, a colleague at Ohio State University’s Department of Zoology and Entomology, Borror secured access to the instrument in the astronomy department, where it was used to study the scintillations of stars (“Visible Bird Song,” 1953).

William H. Thorpe, a Quaker and conscientious objector without military experience during the war, may have first learned about the sound spectrograph through an article by a British engineer and amateur ornithologist in Ibis, the journal of the British Ornithologists’ Union (Bailey 1950). The author had learned that one of the only spectrographs in Britain was used at the General Post Office Research Station in London and he obtained their consent to use it for serious research requests. Thorpe (1979, 68) had begun his career in insect physiology, but under influence of continental ethologists, particularly the work of Konrad Lorenz, shifted his research toward mechanisms of learning and instinct in animals. Together with Niko Tinbergen, Thorpe was responsible for establishing ethology in England after the war, as editor of the newly establish ethological journal Behavior and president of the Association for the Study of Animal Behavior (established in 1936 by Julian Huxley) (Durant 1986). An avid birdwatcher, he decided that birds would be ideal subjects for studying behavior and campaigned for an ornithological field station, which began operations in 1950 on a plot of land in Madingley, near Cambridge (Burkhardt 2005, 341–342). During its first year, Thorpe and the station’s newly hired curator Robert Hinde initiated several research projects, the largest of which examined the nature of song learning. Thorpe had already turned to the BBC engineering department and the General Post Office for technical advice on sound equipment, and in 1951 he was able to take a collection of birdsong recordings to the Bell Telephone Laboratories for conversion. There he learned that at least one instrument existed at the Admiralty Research Laboratory, for analyzing submarine noises. Thorpe managed to convince the Admiralty to let him use its spectrograph regularly, until in 1953 he used a Rockefeller Foundation grant to purchase his own for the Madingley field station (Burkhardt 2005, 343). In 1951, Thorpe was joined by several new staff, including botany graduate Peter Marler, who was starting a doctorate on chaffinch behavior. Together with Thorpe, Marler learned to use the sound spectrograph on their own recordings and the copies of BBC recordings that Thorpe had been given by its Bird Song Panel.

In the sound spectrograph, Thorpe (1954, 465) identified an alternative to existing methods of birdsong analysis, which, he found, all suffered from “the primary difficulty of perceiving accurately by the naked ear elaborate sound patterns of high frequency, high speed and rapid modulation,” so that “vocalizations were formerly the most difficult of all [behavioral] releasers to investigate precisely.” Thorpe was glad to announce that “they have now become far more readily amenable to analysis than many patterns of visual and olfactory stimulation.” Sonagrams—an abbreviation for the Sona-Graph’s sound spectrograms—supplied both a new form of notation and a method of precise measurement that allowed analysts to avoid the “dangers of subjective interpretation” entailed by earlier notation technologies (Thorpe 1954, 465). Other pioneering spectrographic investigations of bird vocalizations presented similar arguments. Donald Borror and Carl Reese explained that “most accounts were merely subjective descriptions and not accurate analysis” (Borror and Reese 1953, 271). While acknowledging that naturalists like Aretas Saunders possessed an “exceptional ability to analyze bird songs by ear,” they asserted that relevant “characteristics cannot be accurately determined by ear alone.” Even Saunders himself applauded the potential of the sound spectrogram for the study of birdsong. “Writing from the standpoint of what the ear hears,” he notes, one’s observations “may be more or less different from what the Vibralyzer [the Kay Sona-Graph’s forerunner] records” (Saunders 1961, 598). Indeed, like Brand’s sound film two decades earlier, sound spectrograms seemed to expose the perceptive limits of human hearing, particularly when it came to the time resolution of rapid birdsongs. As Borror and Reese demonstrated in one of their first spectrographic studies, what had sounded like faint lisps or a single buzzy note to the ear appeared clearly as a series of separate notes in the spectrographic image. The spectrograph, they concluded in a common turn of phrase, will provide “objective data” that are more detailed and accurate than those obtained by most of the methods heretofore used (Borror and Reese 1953, 276).

It is easy to see the attraction of this new instrument. The spectrograph provided precise acoustic information that could be understood without dedicated training. But what exactly did it allow its ornithologist users to do better than they could by ear? An example taken from the first applications of the sound spectrograph by Thorpe and Marler at the Cambridge University Department of Zoology may demonstrate why it was the spectrograph that elevated their studies, in Marler’s words, “to the level of scientific research.” When Marler arrived as a research fellow at Cambridge University, he had already conducted his own large-scale study of chaffinch song, resulting in a classification based on transcriptions made in France, the Azores, the Scottish Highlands, and the English countryside (Marler 1952). Like many field ornithologists, Marler had relied on a self-devised system of transcription to record the chaffinch songs by ear.5 His study set out to debunk what he thought was a common misconception among field ornithologists, namely that birds display broad regional variety, and thus a song that is regionally characteristic. Instead, he argued, chaffinches maintained a wide range of different song types, song “dialects,” that were differently distributed across different regions. He speculated that such variations could be explained by the way individual chaffinches learned their songs (Radick 2007, 247–253).

In a separate study on chaffinch song, the Danish ethologist Holger Poulsen (1951) had recently suggested that it was learned at least in part from adult birds in early adolescence, which would explain convergence between the songs of birds in the same localities. This force of adaptation, and its role in variation, was the subject of Thorpe’s spectrographic experiments on the learning abilities of birds (Thorpe 1951). In the seclusion of his laboratory, Poulsen had reared two chaffinches that produced abnormal songs, allowing him to specify some of the song elements that birds developed innately and the stage in their development at which these were modified. In similar fashion, Thorpe (1954) later isolated groups of juvenile birds in aviaries and exposed them in different degrees to the songs of mature singing birds. This enabled him to control different stages of development and establish which parts of their songs the birds already possessed without learning. Crucially, whereas Poulsen had identified all observable changes in song by ear, Thorpe reproduced the songs as spectrograms and compared them visually. As a result, he could follow in detail how the songs developed over time. He found, for instance, that young chaffinches did seem to possess a minimal “blueprint” for their song, which was one of the determinants of its length and form. Other details were clearly learned. Some elements must have been learned in the chaffinch’s first weeks, even before it sang itself, supporting Lorenz’s concept of “imprint,” to which Thorpe was sympathetic. The experiments also confirmed Marler’s observations regarding dialects. Since chaffinches refined their song through imitation in their first spring, as they competed for the territory they would occupy for the rest of their life, their songs naturally converged in local, but nonetheless individually distinctive, patterns. However, the sonagrams also demonstrated that even as chaffinch songs matured, they never became completely fixed. Some elements of the bird’s song displayed subtle differences between the first and second year. It was these variations, “so minute as to be practically imperceptible to the naked ear,” that came to light in spectrographic renderings of the songs’ acoustic structures (Thorpe 1954, 468).

Comparisons of spectrographic representations made it clear that individual birds’ song was much less fixed than previously assumed. It had long been thought that the distinctiveness of a species’ song was a reproductive isolating mechanism. However, spectrographic studies of birdsong increasingly suggested that variation was a much more prominent organizing principle than had yet been acknowledged. Taking stock in 1958, at a symposium on animal sounds and communication at the American Institute of Biological Sciences, Marler (1960) distinguished several biological levels on which variation took place simultaneously. There was, of course, geographic variation, illustrated by the fact that chaffinches of the same species in the Azores sang less elaborately than in Britain. Within a given geography, adjacent populations of birds also seemed to employ various song dialects. Moreover, even within a single population of birds, individual birds consistently varied their song, to such a degree that experienced field observers could distinguish between individuals. Finally, within a single bird’s repertoire, there could be hundreds of song themes. Zoologists were now beginning to glimpse how the balance between a birdsong’s individuality and its conformity to local, geographic, and species-specific patterns was fine-tuned through adaptation and selection. In the mid-1950s, the Cambridge University ethologists had identified some of the mechanisms by which chaffinch songs varied, but as yet their precise behavioral purposes were unclear. A first step in answering these questions was to compile detailed descriptions of the occurrences of variations, on all of these levels and for more species than the chaffinch alone—and the best instrument for that task seemed to be the sound spectrograph.

At the end of the 1950s and in the early 1960s, comparative spectrographic analyses were widespread. In Cambridge, Thorpe (1961b) and some of his graduate students such as Richard J. Andrew (1957) had begun to inventory and study the vocalizations of buntings. The Madingley field station had by then expanded, and in 1960 it was officially recognized by the university as the “Sub-Department of Animal Behaviour.”6 Marler, who moved to the United States to become a professor at the University of California at Berkeley in 1957, continued with new model animals for studying vocal learning, such as the white-crowned sparrow and the zebra finch. Marler’s graduate students, such as Masakazu Konishi and Fernando Nottebohm, went on to examine aural feedback on song learning and to pioneer the neuroethology of birdsong, while Marler himself embarked on field studies of primate communication. Elsewhere, the comparative approach was taken up by Donald Borror (1959, 1961) in Ohio, Peter Kellogg and Robert Stein (1953; see also Stein 1956) at Cornell University, and Wesley Lanyon of the American Museum of Natural History, together with William R. Fish (1958).7 Traditional ornithological studies had restricted their focus to the song of a small number of individuals or a small population at most. These papers, in contrast, collectively shifted their attention to the inventory and comparison of song repertoires of several populations at once. The spectrograph also obliged them to organize their investigations differently. Whereas the naturalists discussed in chapter 2 had tended to prefer elaborate and varied songs, these recordists necessarily turned to birds whose repetitive, abundant, and short vocalizations could be represented most effectively in a sonagram. In the 1950s and 1960s, the spectrograph could analyze and represent fragments of only 2 to 4 seconds, and shorter, repetitive songs were better suited for representation. Thorpe’s favored experimental subject, the chaffinch, had been selected not only because it bred well in captivity, but also because it produced “a complex but not too elaborate phrasical song of medium frequency range and convenient length.” It also displayed local peculiarities, and as such allowed representative sampling of song variation (Thorpe 1954, 466).

Although variation was not discovered using spectrographic analysis, it greatly facilitated and accelerated the making of representation that benefited studies of variation on a large scale. It allowed the analyst to focus not solely on specific features or parameters, such as pitch or duration, as musical recordists had done in the past. Instead, researchers adopted the spectrograph developers’ view that “what the sonagrams show best is pattern.”8 Spectrographers would examine and compare the visual print or “structure,” as they called it, of a song fragment, with regard to its shape and spacing. This is illustrated, for instance, in a glossary of song components compiled by Marler and his Berkeley student Miwako Tamura, which distinguishes between “notes,” “phrases,” and “syllables” on the basis of their visual shape alone (Marler and Tamura 1962). Distinguishing a song thus visually, rather than audibly, in its smallest units of analysis allowed spectrograph users to make detailed comparisons of a song structure (figure 5.3). But it also introduced new problems. Terminology, for instance, became almost as controversial an issue as it had been in the early decades of the twentieth century. According to eyewitnesses, a session on bioacoustics terminology at the International Ornithological Congress in 1962 prompted “almost a riot”; after it became clear that “divergent and strong opinions came from every quarter … it was obvious that we are still in a very elementary stage on that score.”9 In the absence of agreement on a standard terminology, individual researchers and groups devised their own terminologies, taking their cue from music or linguistics, which resulted in confusion that would remain unresolved for at least a decade (Baker 2001, 8; Shiovitz 1975).

10307_005_fig_003.jpg

Figure 5.3 Visual model for dividing song structures into analyzable components.

Source: P. Marler and M. Tamura, “Song “Dialects” in Three Populations of White-Crowned Sparrows,” Condor 64 (5) (1962): 369. Reproduced with kind permission of the American Ornithological Society.

Producing Sound Spectrograms

Although the sound spectrograph had simplified methods of reading variations in song structures, producing useful spectrograms was not straightforward. It required selections to be made, settings to be decided on, and images to be reproduced and eventually printed. A commercial sound spectrograph such as the Kay Sona-Graph posed a number of trade-offs: accuracy in frequency measurements had to be sacrificed, for instance, for precision in measurements of time. This elicited from Peter Kellogg at Cornell the observation that “the important thing about the [sound spectrograph] is that no one trace shows everything. The technique used in an analysis is very dependent upon the characters you wish to show or emphasize.”10 As a result, conventions for the appropriate production and reproduction of information in a spectrogram differed significantly and controversially.

Some users, for example, found that the amount of information contained in a spectrogram hindered a good understanding of its spectral properties. In a note on the illustrations to his 1961 monograph on birdsong, Thorpe (1961a, xii) explained that “for many purposes of the student of bird behaviour, and for the general ornithologist, sound spectrograms contain a great deal more information than is relevant to the particular point at issue.” He had often found it “advantageous to reproduce [the spectrograms] in a somewhat diagrammatic and stereotyped form which draws attention to the main items of information without confusing the picture with a great deal of irrelevant detail.” To highlight relevant structures and mark significant patterns in a song phrase, biologists at the Cambridge University Department of Zoology customarily resorted to tracing the original spectrograms with pen and ink. Alternatively, they would sometimes reproduce spectrograms as high-contrast photographic plates. This not only facilitated the reproduction of the spectrograms in print but, as Marler and Tamura (1962, 369) noted, also allowed the researchers to retouch the copies with white paint to mask traces of “background noises” that had been picked up by the microphone (figure 5.4). This treatment of spectrograms was common in bioacoustics publications: from the late 1950s onward, soft-toned photographs were gradually replaced by high-contrast reproductions and ink tracings. In fact, as late as 1979, editorial guidelines of the ornithological journal The Condor required contributors to prepare spectrograms on high-contrast film to produce a strictly black-and-white copy, which significantly cut the cost of reproduction and allowed “extraneous sounds to be erased with paint or white correction fluid” (Thompson 1979, 220).

10307_005_fig_004.jpg

Figure 5.4 Ink tracing by hand of a spectrogram.

Source: P. Marler, M. Kreit, and M. Tamura, “Song Development in Hand-Raised Oregon Juncos,” Auk 79 (1) (1962): 15. Reproduced with kind permission of the American Ornithological Society.

In “sampling” down bird voices to lower definitions and filtering out extraneous noises, bioacousticians’ concerns resonate with those of communication engineers such as Franklin Cooper and Alvin Liberman at Haskins Laboratories, who a decade earlier had relied on a similar technique for painting over simplified spectrograms to determine the minimum parameters by which signals could be stored, retrieved, and transmitted without significant losses in information.11 But where these engineers’ drawings had entailed an experimental attempt to explore visible sound patterns, bioacousticians typically had already acquired a strong sense of which traces could and should not be eliminated in order to preserve their essential features. In this delicate process, contrast and sharp or faint features were adjusted in part by relying on the author’s auditory experience. Rachel Mundy (2009, 210) has likened the technique to calligraphy. This is not only apt with regard to the aesthetics of the trace, as sound spectrograms (particularly those that had conventionally been produced with the wideband setting) looked as if they had been drawn with a wide-bladed pen, but also because of the care and skillful penmanship required of the tracer in this process.

The analogy with calligraphy is fitting in another, unsuspected way, as well. In 1961, Thorpe and his assistant Barbara Lade proposed a method for turning sound spectrographic traces into an objective notation. The sound spectrograph was incomparably more precise and objective than earlier attempts at notation, they pointed out, but it also contained more information than necessary for the student of birdsong. Evidently aware of Kopp and Green’s attempt to teach visible speech based on spectrograms, the authors drew on the extensive libraries of the BBC and Cornell University to develop a series of symbols for different song types, based on a diagrammatic rendition of their spectrographic patterns. By halving the timescale and doubling the frequency axis, the patterns resembled long curved brushstrokes of different thickness (see figure 5.5). Such a notation, they argued, would be accurate, yet also capable of being read and used by the field student, thus bridging the gap between laboratory analysis and field experience. By eliminating distracting information to show just “the essence of the pattern,” the symbols would allow users to recognize a sound by its spectrographic shape, just as birdwatchers could recognize a distant bird by its shape in flight.

10307_005_fig_005.jpg

Figure 5.5 Thorpe and Lade’s diagrammatic notation based on sonagrams.

Source: W. H. Thorpe and B. I. Lade, “The Songs of Some Families of the Passeriformes. I. Introduction: The Analysis of Bird Songs and Their Expression in Graphic Notation,” Ibis 103a (2) (1961): 238. Reproduced with kind permission of the British Ornithological Union.

Yet in spite of the appealing simplicity of Thorpe and Lade’s idea, an actual spectrographic notation never caught on—at least not with users in the field. Even though the spectrographic traces had been radically simplified in comparison with Kopp and Green’s failed experiments in visible speech a decade earlier, the patterns continued to place an insurmountable strain on the reader. This was due, at least in part, to the linear scale with which spectrograms represented measures of frequency, which did not map directly onto the (logarithmic) scale by which a subjective experience of pitch is typically represented. This distortion made it difficult for field observers to translate spectrographic symbols back into their own auditory experience. The idea of reducing a spectrogram’s visual complexity to a schematic diagram did prove its worth, however, in laboratory analysis, as a way of categorizing hundreds of song fragments and figuring out how these patterned in sequences.12

Even if such diagrammatic renditions of spectrographic sound did not permit an intuitive legibility per se, they had a clear analytic advantage. By manipulating the spectrogram, analysts achieved far greater control over the audible phenomena themselves, allowing them to distinguish between irrelevant “noise” and the “sounds” they deemed scientifically interesting. The spectrogram enabled such distinctions to be made more precisely than the parabolic microphone, more selectively than a crude band-pass filter (that cut off any frequencies below a certain range), and more economically than the soundproof rooms that Thorpe and field station curator Robert Hinde had introduced in previous years (Thorpe and Hinde 1956). These rooms had been installed primarily to isolate the birds themselves from the acoustic ambience of the Madingley Ornithological Field Station. To investigate what part of their song was instinctive and what part was learned, it was necessary to control the kinds of sounds to which they were exposed and exclude any vocalizations of outside or neighboring birds. However, by artificially filtering out such environmental noise levels, the soundproof rooms also contributed to an ideal of the spectrogram as free of exposure to sounds that analysts would discard as extraneous. In spectrography, “noise” came to refer interchangeably to acoustic interference and scrambled information, both of which rendered the communication of essential spectrographic features unintelligible. Just as the parabolic close-up recording aimed to reduce unnecessary background sounds, these ink tracings and high-contrast photographs placed a spectrographic sound against a white and therefore equally “mute” background (Bruyninckx 2012). As such, spectrographic tracings served as a powerful filter to distinguish patterns from noise.

Yet the attempts by Thorpe and his former students to enhance the original spectrogram’s legibility for analytic purposes were not embraced by all. Some reviewers of Thorpe’s 1961 monograph simply requested more technical detail on his methods of producing sound spectrograms (Stonehouse 1963); others, particularly in the United States, received the book and especially his casual note on the reproduction of his spectrograms with surprise. Ohio bioacoustician Donald Borror and Cornell ornithologist Robert Stein prepared critical reviews.13 Peter Kellogg, likewise, “didn’t think much of [Thorpe’s] reasons for tracing spectrograms rather than reproducing them as they are. It seems that such a method as he uses must always include any opinions the tracer must have about the important or unimpotance [sic] of some character.”14 He explained:

[Thorpe’s] idea that, for economy and perhaps for neatness, it is a good idea to trace spectrograms rather than to reproduce them directly, is a technique which I most seriously question. In tracing a spectrogram, so as to clean it up, and also so that it may be reproduced as a line drawing, it is almost impossible to keep from changing the picture a little so as to make it more precisely fit your ideas. This results in emphases which were not present in the original and in the omission of everything which you consider to be, but which may not be, an artifact. In one instance, this technique had led to the inclusion of material not in the original.15

Kellogg instead advocated using as little intervention as possible: “I am in favor of publishing spectrograms as they come from the machine rather than to trace them so as to emphasize the pattern and eliminate any details which the author considers at the moment to be of no importance.”16 He also admonished his colleagues and research associates to take reproduction seriously: “I would like to see the spectrograms … authentically reproduced so that they could be used for study with the same confidence as one could use the originals. It might be necessary to reduce some of them slightly, but this should not hurt them much.”17 The extent of his concerns is illustrated by deliberations with colleagues, and the printers of publications to which they contributed, on how best to achieve such authentic reproductions. Results even differed significantly with different printing techniques, he found, and printers were often “amazed to find that I preferred [the spectrogram] to be grey.”18 In this context, well-reproduced spectrograms were regarded as a marker of professionalism. One of the Cornell research associates, L. Irby Davis, himself a retired civil engineer who had become skilled at using the sound spectrograph, complained, for instance, about a series of spectrograms accompanying an article in Scientific American because they would reflect poorly on the lab’s reputation: “Everything else, the drawings, paintings, and the photography are of excellent professional quality. But the spectrograms are terrible … . Since the Lab is known all over the world as the leading place for sound work on birds it will be a bit hard to explain how such bad work was permitted to come out from the institution.”19

At the root of this uneasiness with authentic reproductions was a concern that biologists lacked sufficient expertise in acoustics and electrical engineering. This concern was particularly strong among Cornell bioacousticians: like Kellogg, they often complemented an interest in zoology with a background in physics or electrical engineering. They worried that commercial sound spectrographs had been elevated to a kind of out-of-the-box solution for the study of animal vocalizations, and that their serious mechanical and electronic difficulties might go unnoticed by the inattentive, untrained user. Summarizing this concern, Crawford H. Greenewalt (1968, 8), another Cornell associate, pointed out that although the Sona-Graph had been a perfect tool for the analysis of human speech, birdsong spectra were beset with ambiguities that could be more or less serious depending on the purposes of the experiment and the acoustic characteristics of a song. Those ambiguities were spelled out most concretely by Davis, who insisted that without an adequate knowledge of acoustics, biologists would be prone to mistaking artifacts for actual phenomena (or vice versa). In a series of technical papers, he proceeded to show how frequency-modulation and amplitude changes introduced spurious harmonics, time-frequency smear, or displaced scales into the analysis (Davis 1964). His uncompromising scrutiny of published spectrographic analyses led to several critical exchanges not just within the Cornell department—among others with converted behaviorist Robert Stein—but also with Cambridge University ethologists like Thorpe and Marler, whose work had meanwhile set the standard in the field.20 Rather than let such unease simmer, however, Kellogg sought to resolve the matter in characteristic manner; he enlisted Marler, along with collaborators such as William Gunn and William R. Fish on the editorial board of a new quarterly that he launched at Cornell, Bioacoustics Bulletin. Distributed among about three hundred recipients worldwide, the bulletin focused on technical discussions of methodology and instrumentation (including a series of papers on the possible pitfalls of spectrographic analysis), thus bringing a young community together under the banner of bioacoustics and deliberating on new standards for its analytic practice.

Subdued though they may have been, such skirmishes illustrate the variety of practices that had emerged around the sound spectrograph by 1960, and the different gestures of objectivity that their users had developed in the preceding decade.21 For critics of Thorpe’s or Marler’s adjustments, their spectrograms seemed to permit a dangerous interpreters’ bias in the analysis. Marler and Thorpe, for their part, trusted the instrument’s ability to produce an objective image of sound, but found that the cluttered image and incidental detail compromised effective analysis. They thus relied as much on an informed and experienced analyst as they did on technological means, to distill from the noise a more sophisticated and distinct image. This discourse recalls what Daston and Galison term trained judgment, which emerged as a mid-twentieth-century supplement to the doctrine of self-elimination in mechanical objectivity and left ample room for skilled interpretation and expertise. The ideal of trained judgment did not reject objective instruments outright but, unlike mechanical objectivity, did envisage a role for informed evaluation and cultivated perception, alongside the protocol-based image, to distinguish salient and significant structures, categories, or patterns. The Cambridge University zoologists had developed a spectrographic practice that permitted, or even required, a trained observer to order complex acoustic information for the reader. By relying only on the indiscriminate procedures of the sound spectrograph, they reasoned, the biologist would risk obscuring the very structures and patterns of variation that had become of central analytic interest to bioacousticians and ethologists. As the next section will show, this regime extended not only to looking for spectrographic traces, but also to listening for aural patterns.

Aural Patterns

The sound spectrograph changed the ways sound recordings were analyzed. Yet in the field, listening continued to play an important role. In 1966, a new field guide, Birds of North America: A Guide to Field Identification, included sound spectrograms for depicting songs and calls, but like Thorpe and Lade’s proposal for a concise spectrographic notation, the idea failed to catch on initially. One otherwise highly positive review judged the practical use of sonagrams as an aid to field identification very limited. Not only was considerable practice needed to be able to read the sound spectrograms, but when such proficiency was acquired, he noted, “I doubt if anyone can really imagine or ‘hear’ a new song simply by reading its sonagram” better than through a time-honored method of verbal description (Keith 1967, 253).

That doubt was shared widely. Even scientific papers in Animal Behavior or The Auk complemented extensive spectrographic with verbal descriptions or syllabic notations of a bird’s vocal behavior to give an impression of how the sound might seem “to the human ear.” This phrase in particular flagged a deliberately subjective and descriptive account, evoking information that could not be conveyed otherwise. The persistent presence of the observer’s ear is illustrated by British ornithologist Derek Goodwin’s comparative analyses of bird vocalization behavior. In a series of descriptions of blue waxbill calls, he noted that its contact call, “a loud, clear, high-pitched ‘tseep-tseep’ or ‘sweet-sweet’ with a somewhat interrogative tone,” did not, “to my ears, usually sound squeaky.” And although this call might easily be confused with that of a related species, “the experienced ear can usually identify the caller.” For other calls, “I cannot distinguish by ear which of the three forms is calling” (Goodwin 1965, 287–290). Although recordings had often been dutifully analyzed using a sound spectrograph, authors typically included their aural experiences in the field to provide a perceptual marker for other ornithologists.

Skilled listening also mattered in the field when conducting playback experiments or collecting observations of behavior in the field. Observers listened not only for whether a bird responded or not, but also for what exactly the response was. In an investigation of cardinals’ adjustment and coordination of their song patterns to the song of neighboring birds, carried out by playing back prerecorded songs, McGill University biologist Robert Lemon noted that “the data was recorded by hand after identification of the songs by ear. This method is feasible with cardinals because of the relative simplicity and stereotypy of their patterns of song” (1968, 158; emphasis added). In a later study, exploring the statistics of variations in sound patterns, Lemon and his coauthor noted that although all the songs had been dutifully analyzed with a sound spectrograph, much information, “especially relating to the sequences of different song types, was gathered by listening to the birds sing and then recording the data in a notebook” (Lemon and Chatfield 1971, 1).

But aural experiences were not limited to field observations alone; they also played a role in the organization and interpretation of sound spectrographic data itself. Analysts occasionally reported using a kind of discriminative listening when considering song types in the laboratory. Whereas most papers did not give details on how their data had been classified, some explicitly invoked aural experience alongside spectrograms as an aid to interpreting and comparing sound fragments or classifying material. When classifying Carolina wren song phrases for the variations in number, length, and notes that they displayed, Borror had evidently relied on the sound spectrograph. But when drawing up his classification of song phrases, he encountered difficulties in defining the beginning and end of a song phrase on an image. This was important, because “a different delimitation of the phrases would for most songs result in a different classification” (Borror 1956, 223). Borror addressed the problem by combining spectrographic imagery with listening to the recordings played at reduced tape speed. In a comparable vein, at Cambridge University, Thorpe noted that recorded sounds could be studied by a variety of means: even if the sound spectrograph seemed “by far the most valuable method,” “play-back at the lower speeds is an enormous aid to the ear, particularly with sound patterns … having extremely rapid repetitions and relatively high frequencies. … It sometimes happens that comparison of songs of related species at decreased speeds brings to light resemblances which would otherwise have escaped notice” (Thorpe 1958, 542). In such cases, the sound spectrograph could again help to verify observations made by ear.

In other laboratories, too, discriminative listening continued to play an important (albeit a rather inexplicit and context-dependent) role, even when sound spectrography had become firmly established as the standard instrument in birdsong biology. British biologists Peter Slater and S. A. Ince (1979), for instance, reported that in creating a classification of the songs they had collected in the field, they had relied on their own experienced listening. Preparing spectrograms of each song type, they had found that “with practice many of the more distinctive song types could be identified by ear” (Slater and Ince 1979, 148). Identifications were made even easier by listening to slowed-down recordings with the typical sonagrams at hand. Checking their identifications by spectrographic analysis, the authors reported finding a host of reliable features that could help distinguish between some similar-sounding song types, and failing to find them for a few others. The balance between listening and looking, therefore, depended on the task at hand and on the types of sound being listened to. In some cases, Slater and Ince continued, the differences between song types appeared slight on the sonagram, but the types were nevertheless classified as separate on the basis that certain differences in quality had been “immediately recognizable in the field on the first occasion that song type Y was heard” (Slater and Ince 1979, 157). For another group of song types, “there is no doubt that they should be regarded as distinct because the differences in form between them are consistent,” even though the authors had been “unable to separate them reliably by ear” (Slater and Ince 1979, 157). Trained judgment thus operated at two levels. First, it helped the recordist to recognize similarity relations, family resemblances, and distinctive types in the diversity of records; in a second and related way, it involved a keen awareness of the exact moments when particular judgments could dependably be relied on. A trained observer was able to tell, for example, when aural impressions could authoritatively trump the visual evidence suggested by a spectrographic print.

Slater and Ince also found that some of their observations on chaffinch song variation did not match Marler’s (1952) earliest study on the subject, made as a student, which he had completed entirely using naturalist standards—by ear and pencil. The discrepancy in some observations, the authors noted, was probably due to the fact that Marler’s original collecting and analysis had been carried out by ear alone: whereas most end phrases of a chaffinch song might reliably be captured by ear, its extremely rapid trill usually displayed differences that were particularly hard for the human ear to notice without a sound spectrogram. Thorpe and his colleagues (1972, 134), in turn, acknowledged that “any study of sound presupposes the use of the ear, … if the task is not to become cumbersome and time-consuming out of all proportion to the results achieved.” The task of spectrographing and comparing hundreds of records could sometimes be performed more efficiently by relying simply on the ear. But this dependence on aural experience must always be qualified by a visual/mechanical record, since “human aural perception … tends to reduce disorder to a preconceived order and may categorise within the familiar apperception masses those aspects of a study which, when considered with complete objectivity, may be most likely to lead to new notions and syntheses” (Thorpe et al. 1972, 134–135). For that reason, they suggested, a study of sound patterns should generally “begin with aural classification and continue with suitable methods of mechanical analysis” (Thorpe et al. 1972, 135).

In this sense, the trained listener presupposed by these spectrographic studies was different from Witmer Stone’s trained musical listener (chapter 2) or Albert Brand’s all-too-human subjective listener (chapter 3). Listening did not feature here as a distinctive skill, nor did it stand in direct opposition to the spectrographic image, but rather it served to remedy the inefficiencies and deficits inherent in the spectrographic process. This approach was usually not made explicit, but rested instead on a perception cultivated through experience with the sounds under study. While the experienced listeners implied in these papers embraced the mechanical objectivity of a spectrographic image, they also developed an intuitive understanding of how to make objective records work most efficiently. Despite the assumption of congruity between aural and visual patterns that underlay the sound spectrograph’s original design, some aspects of birdsong could not be captured by the spectrographic image alone, requiring the analyst to rely on the ear to adjust, categorize, and classify sound—always in conjunction with the mechanical record.

Musical Spectrograms

However persistent listening’s role in the routine of spectrographic analysis, the authority of the trained ear itself was carefully delineated. The restrictions on the domain in which trained listening could be deployed were most clearly articulated in the 1960s and 1970s, when a small number of researchers expressed discontent with the standard spectrogram and began to codify bird sound in new, unconventional ways that, surprisingly, used music as a reference.

Among these were several biologists and musicians who orbited around Thorpe at the Cambridge University Sub-Department of Animal Behaviour. Thorpe himself had begun to develop an interest in musical notation as an analytic tool. By 1962, his attention had shifted from chaffinch song variations to the ritual “duet” vocalizations of male and female shrikes, and he found that to appreciate exactly how these birds developed their song patterns in interaction with each other, it was most effective to note down their variations musically. When sounds were used for communication, and particularly when they were of a musical nature, he argued, one must consider the ear rather than a mechanical instrument as the analyzer. It was, after all, fair to assume that “since its essential structure is similar, the avian ear is subject to the same distortion” (Thorpe et al. 1972, 135).

To that end, Thorpe surrounded himself with musically expert interlocutors and collaborators. At Cambridge University, musicologist Thurston Dart taught him techniques for the musicological transcription of folk music. Music graduate and professional musician Joan Hall-Craggs joined the sub-department with a project on blackbird song. Thorpe encouraged composer Trevor Hold to develop a musical notation that was better suited to the representation of birdsong. He also maintained regular correspondence with Myles North, the Cornell lab research associate stationed in Kenya. In 1950, North had published a short technical paper advocating for the notation’s continued importance and proposing that ornithologists use a stripped-down musical notation for identification and field observation (an example is shown in figure 5.6). Acknowledging the recent advances in “accurate” birdsong recording and their “physical analysis,” he argued that musical transcription as a parallel and allied line of study would be mutually profitable (1950, 101). North helped Thorpe refine his musical transcriptions—the results of which were published in Nature in 1965; in a second paper in 1966, Thorpe used spectrograms instead.

10307_005_fig_006.jpg

Figure 5.6 Musical notation of duetting shrikes by Thorpe.

Source: W. H. Thorpe, “Ritualization in Ontogeny. II. Ritualization in the Individual Development of Bird Song,” Philosophical Transactions of the Royal Society of London B: Biological Sciences 251 (772) (1966): 355. Reproduced with kind permission of the Royal Society of London.

These notations would not find wide acceptance in the field of birdsong biology. Yet they and their reception do offer important insights into the definition of trained listening. Thorpe and his Cambridge colleagues adopted musical transcription techniques because they were interested in a very specific research problem. They did not want to show that birds developed their song, but to analyze exactly how and according to what principles they did so. To demonstrate such developments qualitatively, the researchers found, musical notation did the job just as well as the shapes of acoustic structures produced by the sound spectrograph. Thorpe wrote that to study how birds elaborated their song patterns through ritualized interactions, it was “essential to put [the songs] first into staff notation so that I can get certain things clear for myself and have the songs in a form which I can show to musical people and get their help and advice.”22 But by transcribing their analyses in musical notation, these researchers did invite in again a particularly thorny issue that had been left far from resolved in the late 1920s. With the general adoption of the sound spectrograph, biologists of birdsong had moved away from the elaborate and—to the human ear—often aesthetically pleasing song patterns that had led early-century naturalists to consider their musical nature and intention. In search of variability, they had focused on particularly brief calls and songs. Within the frame of ethology, moreover, attention had shifted to consider their behavioral functions and the kinds of information they communicated. This was particularly evident in the surge of field studies that tested a species’ response to prerecorded vocalizations that were played back at them or in a concern with the syntax of their signals (Falls 1992). In some cases even, this shared interest in communication as well as a shared investment in spectrographic analyses had brought ethologists and linguists to consider to what extent such vocalizations could be considered a language.23 If biological acousticians in the 1960s looked to relate bird vocalizations to a reference point in human behavior at all, therefore, it was to language much more than music.24

Hence, when Thorpe and his collaborators introduced the songs they had recorded in musical notation, they were careful to point out that these did not imply necessarily that birdsong itself was inherently musical, although they did bring to light several aesthetically pleasing patterns, suggesting “musical inventions” that seemed to transcend biological necessity. Conceding that “whether the musical tonal system employed and the manner of using it provides any justification for assuming the beginnings of a true artistic ability is still an open question,” Thorpe (1966, 357) argued that it would be dishonest to suggest that the biological theories at present available offer a complete explanation for all bird vocalizations. Similarly, when Hall-Craggs recorded and transcribed blackbird song in musical score, she was seeking to conceptualize how the bird reorganized its song patterns in ways apparently beyond a purely biological function. As a musician “with the sole qualification in the study of birdsong of being trained to listen to detail,” she added, “it would be presumptuous to try to draw conclusions from this analysis” before she continued to point out that the biological functions of blackbird song could conceivably be mixed with an aesthetic sense, even if this was not an easy hypothesis to accept as yet (Hall-Craggs 1962, 294). In an attempt to dodge the unavoidable accusation that they might be anthropomorphizing birdsong, Thorpe and Hall-Craggs stressed that their methods had not surrendered accuracy or objectivity. Even for musical transcription, they insisted, their trained ears had depended on mechanically objective records. Thorpe and Hall-Craggs had combined their interpretation of musical transcriptions with spectrograms of the records, and aural transcriptions had only been made on the basis of tape recordings, allowing them to slow down the original sounds to half or even one-sixteenth of the original speed and transcribe the rapidly uttered birdsongs by ear in minute detail, with a corrective factor for transposing them back to their original speed.

The technique was not unique to Thorpe’s laboratory. By the 1960s, there were several semiscientific popular gramophone publications available on the market that employed exactly this technique to uncover the “hidden” nature of birdsong, with titles such as Music and Bird Songs: Sounds from Nature, With Commentary and Analysis (put together by Cornell University’s Peter Kellogg and CBS Radio broadcaster James Fassett [1953]), The Birds World of Song (published by the ornithologist couple Hudson and Sandra Ansley [1961] in the Folkways Science series), or the later The Unknown Music of Birds (by the obscure Hungarian musicologist Peter Szöke [1987]). Their authors used the medium of the gramophone to reduce a selection of bird vocalizations to fractions of up to 1/64th of their original speed. In doing so, they aimed to transpose them to a “human scale,” or rather, present the vocalizations as they were presumably known to the birds themselves. These were records intended for a general audience—Fassett’s record had originated as an experimental artistic composition broadcast by CBS during the intermission of the New York Philharmonic—but they all couched their liner notes in some scientific terminology. While the cover of Music and Bird Songs featured a wave analysis produced by bioacoustician William Fish, the Ansleys and Szöke both presented their approach as the aural equivalent of the microscope—a way to zoom in on sound fragments and overcome the “physical limitations” of the unaided ear—thus mobilizing the same trope that Marler later proposed for the sound spectrograph. At the same time, these authors also resumed an analytic practice that many proponents of the sound spectrograph had expressly sought to avoid. In fact, the Ansleys’ analytical frame of reference was closer to the artisanal transcription techniques of Ferdinand Mathews or Aretas Saunders than to spectrographic visualizations, which they used to warn against strict distinctions between human and animal music. Likewise, Szöke, who is now considered a precursor to investigators in zoomusicology, fought a rearguard action against the spectrograph, which he proclaimed inadequate to study matters of intonation and expression in birdsong. The point is that while the authors of these records embraced listening techniques that were ostensibly very similar to those employed in the laboratory of the Cambridge University zoology department, they also differed in an important way. Thorpe and his collaborators in the mid-1960s and 1970s listened to their recordings to fine-tune, not to replace, spectrographic analysis. Above all, they hoped to reconcile both of these positions.

The Cambridge University Department of Zoology was not alone in seeking such reconciliation by pairing spectrographic analysis with musical conventions. One ornithologist, Joe Marshall Jr., converted the linear frequency scale conventionally applied in the spectrogram into a logarithmic frequency scale of musical octaves (figure 5.7). Spectrographers usually preferred the linear frequency scale because it makes use of an easily measurable unit and stretches the frequency spectrum equally, thus showing as much detail in the higher intervals as it does in the lower. A logarithmic frequency scale, in contrast, squashed all that detail but, so its proponents claimed, allowed readers to judge better how it sounded. This was an important difference; after all, Marshall (1964, 347) explained, “it is pitch and not frequency to which our hearing and that of the birds respond in nature.”25 In his view, bioacousticians’ reliance on frequency as a unit of analysis had reified a focus on spectrographic variations that were often barely perceptible to the ear of the analysts or of the birds, “even granting that birds’ ears are much more sensitive than man’s” (Marshall 1964, 354). The conventional scale, in other words, “resembles nothing in the real world,” whereas the logarithmic scale “like a musical score, constitutes a universal ‘language’ or symbolism by which sounds can be recognized visually by their shapes on a graph” (Marshall 1977, 150). At Cambridge University, Joan Hall-Craggs likewise lamented birdsong biologists’ strong attachment to a spectrographic standard that threatened to dissociate sound patterns from the natural world. Frustrated to observe that auditory stimuli were often discussed exclusively in the visual terms of the sound spectrogram, she suggested developing sound spectrograms that, “while maintaining the objectivity of the analytical process, are (1) presented in a form more accessible to the auditory imagery of readers and (2) comprehensible in musical terms” (Hall-Craggs 1979, 186). Like Marshall, Hall-Craggs tried to enhance the mechanically objective rendering of a sound by superimposing musical conventions onto the sonagram (figures 5.8 and 5.9). This orthographic innovation was especially valuable for fieldworkers, she found, as it allowed the reader not only to examine a sound pattern in detail but also to understand it acoustically and thus to rehearse and memorize it for later recall in the field.

10307_005_fig_007.jpg

Figure 5.7 Adapted spectrogram fitted with logarithmic scale and musical octaves.

Source: J. T. Marshall Jr., “Voice in Communication and Relationships among Brown Towhees,” Condor 66 (5) (1964): 346. Reproduced with kind permission of the American Ornithological Society.

10307_005_fig_008.jpg

Figure 5.8 Sound spectrographic analysis of a wren song, with logarithmic scale and musical score superimposed.

Source: J. Hall-Craggs, “Sound Spectrographic Analysis: Suggestions for Facilitating Auditory Imagery,” Condor 81 (2) (1979): 190. Reproduced with kind permission of the American Ornithological Society.

10307_005_fig_009.jpg

Figure 5.9 Musical staff made by a sound spectrogram, superimposed with microtone intervals.

Source: J. Hall-Craggs, “Sound Spectrographic Analysis: Suggestions for Facilitating Auditory Imagery,” Condor 81 (2) (1979): 189. Reproduced with kind permission of the American Ornithological Society.

In ornithological circles, Hall-Craggs’s proposal for a spectrogram that would also be accessible to fieldworkers found a welcome reception. In the preceding years, she had been asked to assist E. Max Nicholson in editing the voice sections of the renewed handbook The Birds of the Western Palearctic, a mammoth nine-volume tome whose first volume appeared in 1977 (Cramp and Simmons 1977). Taking lessons from earlier field guides, they introduced a new type of sound display to complement the familiar sonogram to represent species vocalizations, namely the melogram. The melograph, the device that produced such melograms, had been developed in the early 1960s by Swedish researchers as a way to measure cardiac and respiratory activity, and refined by musicologists in search of a way of objectively analyzing instrument performances (Hjorth 1970, 5). It had been designed to provide precise information on only the fundamental frequency, not to display its entire acoustic structure (such as the overtones, as rendered by the spectrogram), and to present it on a logarithmic scale as in musical notation (Hjorth 1970, 26).

But whereas reviewers welcomed the displays as a solution to the thorny problem of song description in field guides, biologists of birdsong were more critical of her proposal.26 In a brief commentary in The Condor, Edward Miller (1980, 234) objected that although Hall-Craggs’s suggestions might have “heuristic value,” they were “clearly biased.” “A musical (or other) notation of birdsong implies a particular kind of structure or order,” he warned, adding that “we must be careful not to assume that such order exists, just because of the system of notation used.” If Hall-Craggs had identified musical qualities in the sounds she studied, he concluded, she had focused on an insignificantly small fraction of all possible animal sounds. The commentary echoed a critical analysis of Thorpe’s recent “musicological” work by two other biologists, Charles Dobson and Robert Lemon, who checked their own data on frequency intervals in white-crowned sparrow songs alongside Thorpe’s published data. They found no significant indication that the intervals Thorpe identified correlated with fixed musical intervals. Thorpe’s data were only correct, they argued, because he had used two different scales simultaneously: “by doing so, closer conformity to musical scales could not help but occur” (Dobson and Lemon 1977, 889). Using methods like adapted musical notation, “musicians such as Messiaen have been able to simulate natural bird song with some success. But the use of the standard musical notation commonly employed may lead one to overestimate the musical nature of bird song” (Dobson and Lemon 1977, 890). The problem with musical notation, in other words, was that it implied a preconceived structure that led to subjective projections.

Although this exchange of commentaries did not make significant waves in the field, it throws into relief birdsong biologists’ epistemic investments in both the trained ear and the sound spectrograph. By combining spectrographic visualization with musical listening, a select group of Cambridge researchers not only welded together two distinct technologies that mediated and structured the acoustic world in very different ways (linearly and logarithmically) and that had constituted strongly opposed traditions since the 1920s and 1930s (see chapters 2 and 3). They also sought to recover what had initially been the defining feature of the sound spectrograph: a naturally existing language of sound that enabled sound to be represented objectively and transparently, immediately, and with intuitive intelligibility—a visual language, in other words, that brought together the objective physical properties of sound in the laboratory and their phenomenological expression in the field. The concept of visible speech itself proved difficult to achieve in practice: visual spectrographic patterns were not necessarily congruent with aural patterns, nor were they always immediately intelligible to the reading eye. However, the notational innovations that circulated at the Cambridge Sub-Department of Animal Behaviour—Thorpe and Lade’s diagrammatic notation or Hall-Craggs’s musical spectrogram—extend this aspiration. They also connect the Cambridge University biologist in the 1970s to the naturalist in the 1910s. In both periods, students of birdsong sought to combine accurate, detailed analysis with an intelligibility that allowed sounds to be sensed aurally. In both periods, too, musical notation divided proponents and critics over its perceived abilities to communicate sound effectively but within a preconceived and possibly suggestive format.

Mastery over musical notation had once granted the naturalist the authority of an expert listener. For its proponents, music still provided a frame for analyzing, discussing, and experiencing sound. For its critics, however, musical notation introduced a dangerous bias, one that could lead readers to misinterpret the score as suggesting that birds’ singing behavior was of a musical nature, or even a musical purpose. With that concern, birdsong biologists employed a frame that had existed at least since the 1920s, when musical notation became tainted by associations with anthropomorphism, amateur naturalism, and the arts, more than with the practice of professional biologists. As Eileen Crist (1999) pointedly demonstrates, in the course of the twentieth century, professional biologists had been cultivating a highly mechanical idiom that allowed them to discuss in a technical and detached way those phenomena they had pronounced unverifiable by the human observer, such as animal mind, emotion, or aesthetic intent. To establish the study of animal behavior as a rigorous science and ward off any suspicions of anthropomorphism, ethologists thus sought to purge their observations of any such characterizations of animal subjectivity. Indeed, in bird biology too, by the 1960s, the “validity of drawing upon human experience in the interpretation of bird song” and the conviction that “bird song has a much deeper significance” had been relegated largely to the naturalist domain (Murie 1962, 181–182). Where birdsong biologists did consider aesthetic sensibility among birds, they categorized their specific behavior by a limited subset of species only. By repurposing musical notation, if only to bridge physical analysis and phenomenal experience, birdsong biologists feared that their proponents invited the human measure back in as a frame for ethological explanation. If some birdsong biologists had dismissed notation as subjective, it was not only because it invited interpretation by a skilled listener, but especially because its means of transcription opened the door to an animal subjectivity. In contrast, the spectrogram allowed sound to be considered in terms of physical properties only, dissociating sound not only from its phenomenal (and possibly flawed) human interpretation but also from unwanted analogies to such human experience.

Visual Inscriptions

Although the sound spectrograph failed to live up to its promise to connect the “objective” and physical properties of sound with their phenomenology, biologists ultimately embraced it for wholly other reasons. Chiefly, the sound spectrograph was favored for its ability to package sound into traveling inscriptions—what Bruno Latour (1986) famously termed “immutable mobiles.” In the agonistic network perspective advocated by Latour and Woolgar (1986), such paperwork is crucial in order to muster together and send off convincing proof to convince as many allies and critics as possible in the absence of the original phenomenon or thing. In their most basic form, these inscriptions enabled unique sound events in the field to be cast in many identical copies and circulated widely, but as Latour points out, inscriptions are not only a matter of duplication. They have a host of advantages that help us understand the importance ascribed to the sound spectrograph.

In the first place, inscriptions are highly mobile; because they can be duplicated and printed, they can be circulated more easily around a network of peers. This enabled interpretations of birdsong (whether accurate or not, conflicting or not) from very different places to be collected in a single place—the laboratory or a printed article—without having to return to the specific sound event or its recording. Inscriptions are also immutable. Recorded sounds were vulnerable to destruction (erasure) or alterations. Even for mechanical recordings, their reproduction at different locations remained variable, depending on the type, settings, and calibration of the equipment. Paper inscriptions crystallized such variables into a permanent representation, although preserving that information could be cumbersome too. Some types of oscillograms were light sensitive and in order to provide them with the permanence of a black-and-white photographic print, they had to be developed under orange safelight, fixed, washed, and dried (Greenewalt 1968, 13). Spectrograms were less labor- and time-intensive to produce, even though they still required additional work to retouch and reproduce.

Inscriptions are flat and hence easier to dominate. Sonic inscriptions, such as the musical staff, the graphic drawing, or the sonagram, have always been rendered as two-dimensional forms. Their use is analogous to that of perspective as a way of transposing an immense three-dimensional building onto paper. These forms projected the dimensions of sound (time/rhythm, frequency/pitch, amplitude/loudness) onto the flat surface of paper, where they were more easily overseen, cut up, scaled, recombined, or superimposed—in short, controlled. This is particularly evident in the routine practice of tracing images with pen and ink. In the flat dimensions of the image, sounds could be added or removed with an efficiency and economy that, despite innovations such as the parabolic microphone, simply did not exist in the field. Manipulations on paper enabled the researcher to intervene virtually (but none the less effectively) in its soundscape. The cost and effort that went into reducing acoustic interference in the field were much greater than the ease with which researchers could organize and manipulate the soundscapes of that same field spectrographically.

By transposing sounds onto the paper in a standard layout, moreover, very different inscriptions could be made optically consistent. This enabled users to represent sounds synoptically, by juxtaposing them in the same plane. In this sense, flat and printed inscriptions work differently from sonic experience, as is illustrated by a sound spectrographic comparative analysis of western meadowlark songs by Lanyon and Fish (1958). The article presents a plate of four spectrograms of a call note (figure 5.10). Although the calls were recorded in different field locations, from Chihuahua to Wisconsin, they could be presented together in one plane and in the same scale. This commensurability of sounds sampled in very different environments enabled these authors to argue that all the notes were identical, regardless of their geographic location. The quadruple comparison of evidence was itself Lanyon and Fish’s central argument, made possible only by combining flat inscriptions and adjusting their scale of measurement. This optical consistency allowed researchers to accumulate elements from the soundscapes of dispersed geographies. To understand the additional value of such a representation in the context of bioacoustic research into song variation, one need only imagine the same fourfold presentation of the original recordings, but played out loud uninterruptedly, simultaneously, or even in rapid succession. Although trained listeners could, and indeed often did, try to compare a single pair of samples, none of the published research papers reported analytic benefits from simultaneous listening to four samples, let alone a hundred. Unlike auditory recordings, within their flat dimensions the printed inscriptions could be made to “speak” more easily only by being looked at—one at a time or all at once.

10307_005_fig_010.jpg

Figure 5.10 A synoptic display of the western meadowlark by Wesley Lanyon and William Fish.

Source: W. Lanyon and W. Fish, “Geographical Variation in the Vocalization of the Western Meadowlark,” Condor 60 (5) (1958): 340. Reproduced with kind permission of the American Ornithological Society.

But the meadowlark inscriptions not only allowed researchers to present dispersed sound segments synoptically. They also enabled the authors to deal with the sequentiality of sound. After all, these unique sound events were not only tied to a specific time and place (being ephemeral) but also happened “in time” (being sequential); a record could preserve the sound itself, but when it was played back every note was still evidently replaced by the next, then the next. Whereas listening necessarily takes place in time and thus takes time, inscriptions stabilize time—allowing the reader to move back and forth or cut across sections more or less at will. It is this stabilization and reversibility of time that matters, more so than the visual organization of sound per se (as may be clear when an analyst is imagined to compare four moving images simultaneously while observing them through a narrow window).

It is, therefore, not the ability to look at sound that made the sound spectrograph and other inscription devices preferable to its study, but what it allowed the researcher to do: namely, to take control over the time and space in which a sound event took place. Slowed-down gramophone recordings and combined sound spectrograms both enabled this kind of control, but evidently to different extents. The inscriptive visualization of phenomena is not an end in itself; it is not imaging itself that guarantees the authority of inscriptions among peers. The bioacoustician realized that a multitude of inscriptions could swamp an observer almost as much as the original sounds would have. For that reason, some first classified the sounds according to their aural impressions, and only then spectrographed typical instances. But the advantage of imaging is that visual representations can be made to cascade with increasing efficiency into ever simpler, more abstracted, and more crystallized inscriptions. As such, a sound spectrogram of birdsong constituted a further step in the processes of sterilization, compartmentalization, and abstraction of sound that began with the focused, directional recordings produced in the field. Spectrograms could now be combined with benchmarks, geometry, and scales and thus more easily be conceived of as a set of mathematical relations. Indeed, confronted with a multitude of recorded sound data, researchers were not content merely to compare spectrographic contours. Lemon and Herzog’s (1969) inquiry into the organization of cardinal song, for instance, yielded not just a collection of sonagrams (figure 5.11), but also, and especially, numerous statistical interventions that compressed dozens of individual recordings into a single table (figure 5.12).

10307_005_fig_011.jpg

Figure 5.11 Sonagrams enabling the synoptic comparison of songs of two individual birds in one location.

Source: R. E. Lemon and A. Herzog, “The Vocal Behavior of Cardinals and Pyrrhuloxias in Texas,” Condor 71 (1) (1969): 6. Reproduced with kind permission of the American Ornithological Society.

10307_005_fig_012.jpg

Figure 5.12 One further cascade: graph representing the statistical relation between the duration of syllables and the mean number of repetitions of the syllables for each song, for six separate individuals in two different localities.

Source: R. E. Lemon and A. Herzog, “The Vocal Behavior of Cardinals and Pyrrhuloxias in Texas,” Condor 71 (1) (1969): 8. Reproduced with kind permission of the American Ornithological Society.

The sound spectrograph served this cascade particularly well because its design required the analysis of relatively short sound clips, of only a few seconds’ duration. This contrasts with musical notation, which is virtually unlimited in its ability to render long aural sequences. Students using musical notation preferred birds with varied and often extensive repertoires, such as the song sparrow. Spectrograms, instead, accommodated only vocalizations that were short and preferably repetitive, such as calls or songs of species like the chaffinch, whose variations were concise, stereotypical, and thus easy to understand “at a glance.” The differences between these short song samples were quantified by attending primarily to the distribution of sound over the vertical frequency range, rather than to its horizontal unfolding in time.

Once printed and categorized, hundreds of spectrograms were measured and the extracted information punctualized into a matrix, table, or graph. More than images, numbers can be powerfully mobilized as proof in Latour’s model, because they make it increasingly costly to dissent. As the rift between researchers at Cornell and Cambridge regarding spectrographic retouching demonstrates, single images themselves could be disqualified or disbelieved based on the way they had been produced or interpreted. They could even be countered with another image. To argue against the graph as an abstraction of many individual sound events, however, requires one to muster at least an equal mass of sound events, abstracted from the field site and cascading with equal speed and efficiency from recording, to spectrogram, to numbers, to graph.

Pragmatic Conversions

The sound spectrograph effectively exposed the perceptual limits of human hearing. Birdsongs looked more intricate on paper than they had seemed to the ear. The possibilities that looking at birdsong offered were gratefully incorporated in a new analytic tradition, which found in it a key to uncover mechanisms of song variation and by extension of learning and speciation. The spectrograph did not, however, dispense entirely with the expert ear or specific sonic skills in the field of bioacoustics: the listener got displaced, but not replaced. The point of this analysis is not to extend or restore a putative hierarchy of the senses into the bioacoustics laboratory, but to use it as an entry point to further our understanding of the complex dynamic between an instrument, representation, and the researcher’s body. What role did birdsong biologists negotiate then for embodied listening alongside the conventional spectrographic visualization?

With the sound spectrogram, birdsong biologists subscribed to a desire for an almost synesthetic interconversion of audible vibrations into visible tracings, which has been a persistent concern for instrument developers and acousticians since the early nineteenth century. In contrast to musical scores or other graphic schemes, which seemed to represent acoustic phenomena in an arbitrary fashion, automatic sound images were regarded as a natural analog to sound itself. Underlying these images was a holistic assumption that sound would be able to write itself, and that, eventually, traces could stand in for actual sounds without a loss of information. This motif of synesthetic inscriptions was embodied by the phonautograph and the phonograph, but it also surfaced in the Bell Labs experimenters’ projection of a visible language for the deaf, as well as in the graphic modifications of sonagrams by ornithologists in Cambridge and elsewhere. Like Thorpe’s diagrammatic notations or Marshall’s logarithmically scaled spectrograms, they aspired to a mechanical and universally legible codification of natural sounds. However, such correspondence was never easy to achieve in practice: the conversion of audible sound into such visual formats introduced a new and different resolution in which sounds could be considered by frequency and temporal distributions of acoustic energy. But it also confronted researchers with a loss of information in acoustic energy as sound, both to the human and the animal listener.

It is here that sonic skills evidently played a significant role. Listening was mobilized at moments when the limits of the spectrographic image itself became obvious and visual information failed to stand in fully for auditory perception. This was most clearly the case when the interpretation of visual traces was adjusted, classified, or corrected based on auditory impressions. Here, the perspective shifted back and forth between a sound’s physical measurement and its actual perceptive qualities.27 The difficulty in establishing a consistent correspondence between sound and trace, and in distinguishing pattern from noise, forced researchers to continue to rely on their own embodied perception—if only to assess the degree of correspondence between these formats. In such cases, hearing was mobilized as a complementary “sensuous technology” that helped to make sense of acoustic phenomena (Roberts 1995, 507).

Indeed, although spectrograph users had publicly pitted the “objectivity” of the spectrograph against the “subjectivity” of earlier methods, this does not mean that these were irreconcilable epistemic positions per se, as researchers like Thorpe routinely integrated both aspects in their practices. At the same time, the precise conditions under which trained listening was allowed to supplement the spectrograph were carefully delineated. If trained judgment was invoked, it was as a pragmatic solution, more than as an epistemic commitment to the ear. Listening did not disqualify the spectrographic image, but rather supported and sustained its operation—it provided a way to repair temporary losses of information in what otherwise counted as a successful conversion of sound into image. For instance, while visual patterning was useful in drawing up morphological classifications, categorizing sounds by their graphic shape, it was not always the most efficient approach, since spectrographing hundreds of recordings was a very time-consuming task. It was at this point that auditory impressions were mobilized, with classifications initiated on the basis of aural impressions before spectrographing began. Such exploratory or supportive work could not easily be codified or carried out mechanically, because it was often grounded in the somatic experience of field recording. In turn, although sound spectrograms were deployed to aid and guide observations in the field, they did not do away with actual, sustained experience in the field. These instances of listening were, however, not facilitated by the spectrogram. Sound spectrograms were, after all, particularly useful to bioacousticians in producing authoritative inscriptions, precisely because they congealed auditory information in a way that made it difficult to be reopened to the embodied (and likely individual) experiences of the reader. These pragmatic considerations in which practices of trained listening and sonic skills were integrated in bioacoustics demonstrate interactions between researchers and instruments that are often different, sometimes richer, and sometimes simply more efficient than could be obtained by the mechanical image alone.

Notes