CHAPTER 6
Musical Synesthesia
In the Dionysian dithyramb man is incited to the greatest exaltation of all his symbolic faculties; something never before experienced struggles for utterance. . . . The essence of nature is now to be expressed symbolically; we need a new world of symbols; and the entire symbolism of the body is called into play.
FRIEDRICH NIETZSCHE, The Birth of Tragedy
Marcel Proust describes the experience of tasting a madeleine, a kind of French cookie, as conjuring memories of the entire world he once inhabited.
And suddenly the memory returns. The taste was that of the little crumb of madeleine which on Sunday mornings at Combray (because on those mornings I did not go out before church-time), when I went to say good day to her in her bed room, my aunt Léonie used to give me, dipping it first in her own cup of real or lime-flower tea. . . . And once I had recognized the taste . . . immediately the old grey house upon the street, where her room was, rose up like the scenery of a theatre to attach itself to the pavilion.1
Like the taste of the madeleine in Proust’s memoir, musical experience incites us to respond as if to a whole perceptual world. In transporting us beyond the streams of auditory and tactile input to an impression of the larger sensorium, music effects a type of synesthesia.
This tendency of music to elicit the listener’s imaginative projection of a full sensory “world” in connection with music is one of the common bases for the human experience of music. Musical synesthesia is thus one of the aspects of musical experience that contributes to its being universally so powerful. At the same time, however, cultural differences in the interpretation of the extra-auditory world with which music is connected renders the interpretations connected to musical synesthesia culturally divergent. This synesthetic character of musical experience is evident in the universal tendency to model nonmusical phenomena through music. However, although synesthetic associations and the use of these in forming musical symbolisms and models appear to operate across the human species, the cultural variety of specific symbolisms and models limits the extent to which musical synesthesia serves as a basis for cross-cultural communication.
SYNESTHESIA IN EVERYDAY LIFE
What is synesthesia? In the loosest sense, the term synesthesia connotes any cross-modal connection. More strictly, synesthesia refers to unusual phenomenological associations of perceptual material from one sensory mode with perceptual material in another. I will include among sensory modes certain other kinds of perceptual awareness besides that associated with the five senses taken individually. Among these will be the kinesthetic sense and a general sense of spatiality. I will refer to synesthesia strictly defined as “idiopathic synesthesia,” for it is highly individual in its manifestations.2 When I use synesthesia without a qualifying adjective, I will use the term in a broad sense to include any cross-sensory association.
Idiopathic synesthesia involves individually specific experiences in which sensation in one mode automatically brings with it sensory experience in another.3 The most common type of idiopathic synesthesia is often called “colored hearing,” or “psychochromasthesia,” usually involving spontaneous perception of patches of color, termed “photisms,” in conjunction with hearing particular sounds.4 For some individuals the relevant sounds are musical; for others it may be certain vowels, consonants, or phonemes.5 The Walt Disney movie Fantasia offers a hint of what this type of synesthesia is like for those who have not experienced it. Early in the film, an orchestra begins to play. Gradually the screen focuses on the movements of the violin bows, which turn into a completely abstract pattern of lines, and eventually into flecks that pulsate, even appear and disappear, along with the rhythm of the music.
Not everyone is an idiopathic synesthete, but synesthesia in a loose sense is a common experience. Virtually everyone understands references to “brightness” and “darkness” in reference to sound as well as visual appearance. Lawrence Marks, Robin Hammeal, and Marc Bornstein suggest that there is a basis for this in the neurophysiological similarity of the visual and auditory systems.6 Davies observes that certain timbres have a synesthetic character: “The trumpet’s upper notes are bright and the clarinet’s low register is dark; the tone of the celesta is ethereal, while high string harmonics are brittle.”7
Aristotle had already considered such qualities, which he termed “common sensibles.” The common sensibles (movement, magnitude, and number) are perceived multimodally.8 Aristotle describes them as follows.
The senses perceive each other’s special objects incidentally; not because the percipient sense is this or that special sense, but because all form a unity: this incidental perception takes place whenever sense is directed at one and the same moment to two disparate qualities in one and the same object, e.g. to the bitterness and the yellowness of bile; the assertion of the identity of both cannot be the act of either of the senses; hence the illusion of sense, e.g. the belief that if a thing is yellow it is bile.9
More recent philosophers, Maurice Merleau-Ponty and Charles Hartshorne, also contend that cross-modal experience is common. Merleau-Ponty claims that synesthesia is actually the normal condition of human perception, and he suggests a number of instances of cross-modal experience in everyday life:
The senses intercommunicate by opening on to the structure of the thing. One sees the hardness and brittleness of glass, and when, with a tinkling sound, it breaks, this sound is conveyed by the visible glass. One sees the springiness of steel, the ductility of red-hot steel, the hardness of a plane blade, the softness of shavings. . . . The form of a fold in linen or cotton shows us the resilience or dryness of the fibre, the coldness or warmth of the material. . . . In the jerk of the twig from which a bird has just flown, we read its flexibility or elasticity, and it is thus that a branch of an apple-tree or a birch are immediately distinguishable.10
Hartshorne offers another example of everyday synesthesia, suggesting that the burden of proof is on the person who considers the sensory characteristics of ice cream separable. “What blind dogmatism to deny that in the eating of ice-cream the senses of taste, of cold, of smoothness, of smell, are all so interblended, and far indeed from absolutely heterogeneous, that it is not decided kinship but significant difference of quality that is hard to detect.”11
Presumably such everyday synesthesia was what Franz Liszt, while Weimar Kappelmeister, intended to evoke when, according to anecdotal report, he urged his musicians: “‘more pink here, if you please’; or . . . ‘that is too black’; or ‘here I want it all azure.’”12
Psychologist Lawrence Marks defends the idea that synesthesia, understood broadly, is an occurrence with which everyone is familiar, arguing that synesthesia is essential to cognitive development. Synesthesia is akin to all cross-modal metaphor and to abstract verbal meaning, each of which depend on analogy. Synesthesia becomes a relevant stage in the developmental shift from recognizing similarity in the same domain to recognizing it in another, a shift that is presupposed by abstract reasoning. “Similarities among the qualities of a single sense progress to similarities among qualities of different senses, which in turn progress to similarities and resemblances that transcend simple sensory properties and partake of the myriad relationships that the mind can construct.”13
Along with his colleagues Hammeal and Bornstein, Marks suggests that early synesthesia continues even after children learn to distinguish sensation among the sensory modes. Young children find cross-sensory metaphors among the easiest metaphors to understand, and children continue to prefer perceptually based metaphor long past the stage at which they require them.14 Marks, Hammeal, and Bornstein speculate that
“stumbling onto” cross-modal similarities can precipitate a subsequent search for other similarities within diverse domains—in our view, the very crux of metaphor. . . . If so, then cross-sensory metaphors . . . may provide one key to understanding more generally the establishment in childhood of metaphoric competences.15
Marks postulates a synesthetic stage, which is not lost in childhood, but continues into adulthood, though perhaps diminished in its importance and salience.16 Richard Cytowic similarly argues that a synesthetic stage (in a loose sense) is involved throughout life in the normal perception of objects. When we perceive an object, we form an image of it in our minds. We do not immediately connect a detailed representation to the object; instead, we go through a series of refinements to our mental image as we attempt to map it onto the object that is observed (a series that unfolds at lightning speed). Eventually the image is sufficiently elaborated and coincident with the data we observe that it is finally exteriorized (i.e., projected as an entity in the external world), at which point we take it to be an accurate reflection of the object. Cytowic suggests that synesthesia involves the same processes we all use in object representation, but halted at an earlier stage.
Just as in the microgeny of word finding there is a point at which “table” and “chair” have a covalence, and either one could come out in a task of naming a four-legged piece of furniture, so too in the normal unfolding of object perception the generic form may be inadequately constrained such that it carries additional information that detaches into reality and becomes externalized as a perception in another mode, hence synesthesia.17
We experience everyday cross-modal connections very early in life. Considerable evidence suggests that three-month-old infants already connect brightness in sound with visual brightness.18 Marks, Hammeal, and Bornstein note that one-year-olds are sensitive to “several auditory-visual correspondences, notably a correspondence between rising versus falling pitch and upward versus downward-pointing areas.”19 Here we already encounter a music-related synesthetic association (one that may be culturally relative, given that some cultures reverse what others, including ours, term high and low notes). Another, however, is more noteworthy in connection with the topic of this book. Psychologist Daniel Stern suggests that the emotional and social development of infants depends on their perception of the behavior of their caregivers. Significantly for my purposes, this perception is initially not limited to one or two sensory streams.
According to Stern, infants experience their caregivers in terms of “vitality affects”—dynamic, amodal, kinetic characteristics20—before they experience them as identifiable individuals. The “underlying feature” of the vitality affects is what Stern calls “activation contour.” Various experiences can be characterized by a similar dynamic description if they share “similar envelopes of neural firings,” even if these occur “in different parts of the nervous system.”
Because activation contours (such as “rushes” of thought, feeling, or action) can apply to any kind of behavior or sentience, an activation contour can be abstracted from one kind of behavior and can exist in some amodal form so that it can apply to another kind of overt behavior or mental process. These abstract representations may then permit intermodal correspondences to be made between similar activation contours expressed in diverse behavioral manifestations. Extremely diverse events may thus be yoked, so long as they share the quality of feeling that is being called a vitality affect.21
Infants recognize the vitality affects in the context of interacting with their own bodies and with the behaviors of others in their environment.
Stern explicitly proposes that vitality affects also demonstrate how disparate kinds of experiences—such as music and extramusical content—can be yoked. According to Stern, infants develop a sense of being with other people by attuning themselves to these vitality affects. Attunement to the vitality affects depends importantly on infants’ “musical” sensitivity to rhythm, which itself engages multiple sensory modes.22 Our original sense of being related to others, then, presupposes a basic musical capacity, one that fuses the streams of our sensory perception into impressions of another being as a whole, a point that we will return to in chapter 8.
EVERYDAY SYNESTHESIA AND MUSIC
We have prima facie grounds for thinking that synesthesia is relevant to musical experience. We seem to perceive music multimodally. The fact that deaf people are able to respond to music through the sense of touch is one indication of multimodal perception. Music is particularly suited to invite synesthetic response. It prominently features “amodal sensory qualities,” E. M. von Hornbostel’s term for such qualities as brightness, darkness, and roughness, which are recognizable by more than one sense.23 The vocabularies of many cultures in connection with music indicate cross-modal associations. As we noted previously, many cultures describe pitch in terms of spatial images— high and low. The cross-cultural employment of such terms suggests that multisensory engagement with music is both common and not restricted to a single culture. Some Western societies correlate musical keys with sensations geared to other senses, too. The German use of dür and moll (hard and soft) to refer to major and minor keys is a case in point. The Kota tribe of south India use olfactory metaphors for music.24 The Aboriginal peoples of Australia refer to recognizing a song through its taste or smell.25 The Kaluli of Papua New Guinea employ the cross-modal metaphors of “lift-up-over sounding” and “hardness” to describe their musical ideal.26
We also respond to music multimodally. We feel like dancing, or at least moving, along with most music. Nietzsche describes our muscular response to music in his account of the Dionysian dithyramb, although he notes that our movements are typically inhibited.
Music, as we understand it today, is . . . a total excitement and a total discharge of the affects, but even so only the remnant of a much fuller world of expression of the affects, a mere residue of the Dionysian histrionicism. To make music possible as a separate art, a number of senses, especially the muscle sense, have been immobilized (at least relatively, for to a certain degree all rhythm still appeals to our muscles).27
The impact of music on our musculature—the tendency to tap a foot or rock one’s torso with music—is familiar to all of us.28 I have seen infants too young to talk bouncing their whole bodies in accord with musical beat. Elvis Presley’s gyrations are only a more extreme case of the rhythmic movements to music we are all disposed to make. We learn to suppress these inclinations in the Western concert hall, but many of us catch ourselves swaying a bit or pulsing a toe along with the music in spite of ourselves. In such situations, we associate the art of sound with our kinesthetic awareness.
We might recall here Charles Nussbaum’s account of how music’s kinesthetic involvement is basic to the way our brains represent music, specifically, as virtual layouts and environments that we mentally navigate in the same way that we navigate external space.29 We noted earlier Nussbaum’s suggestion that our overall bodily involvement with music is the basis for the ubiquitous phenomenon of modeling nonmusical domains in music. One instance of such modeling is the impression of music mimicking our activity, discussed in chapter 4.
Our kinesthetic response to music, like kinesthetic response in general, presupposes the communication of the senses. Gestalt psychologist W. S. Boernstein draws on Hornbostel’s conception of “amodal sensory qualities” in his account of kinesthesia. Boernstein emphasizes the role played by physical tonus, the state of mild tension in which the organs of a living body are maintained, and also a means by which the modalities communicate. Because of this systemic tension, stimulation of one organ has an impact on others. Boernstein suggests that the integration of the effects of the multiple sensory modes and amodal sensory stimulation on physical tonus is crucial for spatial orientation, a precondition of internalized movement.30 Physical tonus is thus involved in all sensory experience, and Boernstein considers the connection between the senses and motor functions to be essential to the perception of the amodal sensory qualities.31 The stimulation of amodal sensory qualities, kinesthetic sensation (which integrates physical tonus), and the naturalness of conceiving of music in terms of space (both metaphorically and through physical movement) shed light on why music makes us feel like dancing or otherwise moving along with it.
Besides impact on physical tonus, music activates the limbic system, which is located in the lower brainstem and the areas just above it. On the basis of his experiments with idiopathic synesthetes, Cytowic concludes that the hippocampus, central to the limbic system, is particularly involved in such synesthesia. The hippocampus allows the sensory modalities to communicate with one another, because it is the point at which both external and internal sensory inputs converge before being transferred to the cortex. Cytowic notes that after analysis by the cortex, the inputs return to the limbic system, which determines what is important enough to attend to.32 The limbic system is the emotional core of the nervous system, and it is crucial in forming novel reactions, making value judgments, and experiencing emotion.
Music’s physical appeal is obvious, but one might still doubt that cross-modal stimulation is involved. Music, narrowly conceived, is addressed to audition, and secondarily to touch. However, I propose that music’s appeal to the auditory and tactile senses is precisely what inspires cross-modal imagery and response across the sensorium as a whole.
Music is organized so as to be particularly salient to our senses of hearing and touch. Although music seems to address itself only to this limited range of senses, the clarity and immediacy of its presence in these domains suggests to our awareness that we are encountering a reality that transcends our own bodies. Because our senses normally operate together, the sensory intensity of the experience that we have in connection with music, I suggest, stimulates the rest of the sensorium as well.33 Although most of us are not idiopathic synesthetes who see photons in connection with music, our minds are still disposed to form or seek content for the full range of our senses.34 Thus we use cross-modal imagery in our speech about music, and we are so inclined to move with music that we often do so unawares.
What is going on when we are inclined to conjure responses in other sensory modes in connection with our experience of music? I submit that music encourages such associations, perhaps paradoxically, because it is addressed to a limited range of senses. Because music is designed to make pattern salient, we perceive it as having particular immediacy. Normally, our perception of sounds is but one of several ways that we experience the larger world. In the usual case, our senses operate in tandem. Occasionally we focus on one sense in particular, for example when we are having our eyes tested, but generally speaking, hearing and touch, like our other senses, are among multiple means we use together to orient ourselves to the world. Because music is so salient to us, it makes us strongly aware of connection with a larger reality. This sense of connection enlivens our other senses at the same time, motivating us to respond with our entire bodies to music that engages us. We experience our perception of sound in the case of music as connecting us to the very world that we experience through the range of our senses.35
THE UNIVERSAL DIMENSIONS OF SYNESTHESIA
Synesthetic responses to music are ubiquitous, and we might wonder how much they might serve as a basis for cross-cultural communication. The role that vitality affects appear to have in infant development, with the rhythmic sensitivity that this implies, suggests some common ground on which musical interaction might be built.36 Music researcher Manfred Clynes connects synesthetic potential for expression with what he takes to be universal emotional states, suggesting a hard-wired basis for emotional communication across cultures. He suggests that much of our expressive behavior in connection with music is linked to what he calls “sentic states,” specific emotional states that he contends are universal. These sentic states can be expressed through various “output modalities,” including “a variety of motor modes: gestures, tone of voice, facial expression, a dance step, musical phrase, etc.”37 Clynes claims that human beings are hard-wired to use “precise elements of communication faithful to specific qualities and also to recognize these elements when communicated by others” across this range of sensory modes. In several series of experiments, Clynes found consistent patterns in the vertical and horizontal components of the finger pressure subjects would use to tactilely express particular emotional states. He monitored head position, respiration, and heart rate, which were also common across subjects attempting to express the same emotional state. Clynes found the same patterns across studies in the United States, Japan, rural Mexico, and Bali. He concludes that the correlation between particular affect and motor expression is inherent in the human nervous system. Although one might wish for further empirical research to demonstrate the same commonalities in different modalities and across additional cultures, Clynes’s research is suggestive for an understanding of how synesthetic response might further cross-cultural communication.38
Although our disposition to synesthetic response appears to be a typical human tendency, the particulars that are conjoined with music can be culturally, even individually specific.39 (This is evident whenever we conjoin music and language.) How do these particulars become conjoined? One obvious way is through convention. For example, the cross-modal images used to describe music—the height of pitches and the brightness of tone, for example— are acquired in the context of learning other culturally standard ways to refer to things. But many of the symbolisms or modelings associated with music seem entrenched. One hears them in the music. How do cultural conventions come to be so tenacious that they come to seem “natural”?
A musical case of such a “natural” association is the Western tendency to hear music in major keys as “happy” and music in minor keys as “sad.” I am frequently asked, when I mention my interest in the universal characteristics of music to Westerners, if music in major keys really does turn out to sound happy all around the world. I reply that not only do many cultures have no major keys, the correlation of happy/sad with major/minor is not even longstanding in the West. Nor is it consistent even in relatively recent music. Consider “Lili Marlene,” a song about a soldier separated from his lover by the war, popular among both Allied and German soldiers during World War II. Although it does end with the image of Lili’s face remaining in the soldier’s dreams and his hopes to return someday, it is a melancholy song—and yet it is in a major key.40 My conversational partners are often astonished by my answer. That major is happy and minor is sad just sounds natural to them. Where does this sense of “natural” come from?
Judith Becker suggests an answer to this question in her work on the nature of trance experiences.41 She describes trance behavior in a number of cultures, behavior that varies remarkably. For example, a fifteenth-century peasant woman in the boot heel of Italy (Apulia) is bitten by a tarantula spider (or believes she has been bitten) during the hot months of the summer.42 She goes into a stuporous depressive trance, for which the cure is dancing the tarantella. The woman’s family hires musicians, who try various pieces of music until they hit one that prompts the woman to dance. She dances, wildly, often obscenely, perhaps going into the marketplace with the musicians, where others (including relapsed former patients) might join her. The dance continues until she drops from exhaustion. Another of Becker’s examples of culturally distinctive trances comes from Bali. When the community is out of equilibrium, the Barong/Rangda ritual is performed to restore balance. This ceremony produces trance in the young men who are present, all of whom dance a kris (knife) dance, in which they stab themselves with knives. Considering that trancing individuals are not using everyday consciousness, but indeed are in an altered state of which they will likely have no memory, how do the trancers go into the culturally appropriate type of trance? Why did the Italian woman who believed herself to be bitten by a tarantula dance the tarantella instead of stabbing herself with a knife?
Becker calls the sequence of culturally expected behaviors during trance the cultural “script.” She refers to Gerald Edelman’s theory of neuronal group selection to explain how these scripts become so internalized that they direct the behavior of individuals in trance states. This theory was proposed in part to explain why areas all over the brain are activated by sensory stimuli, not just those that are specialized with respect to the sense in question. Edelman contends that bundles of neurons in the mind are activated together in an operational unit, and that groups of such bundles, called maps, can become connected with each other. Bundles that interact with each other frequently enough develop into “classification couples,” groups of neurons that will be simultaneously activated when the stimulus that initially connected them occurs. The appropriate kind of stimulus, exciting a certain group or groups of neurons, will also excite other groups that have habitually been activated at the same time. The result is that “the initial perceptual stimulus comes, through structural coupling in a ritual context, to excite large areas of the brain with no necessary connection with the original perception.”43 This, according to Becker, is why trancers have the specific experiences that their cultures prescribe. Referring directly to music as a stimulus, Becker asserts,
the auditory system . . . consists of neuronal groupings, each responding to a different aspect of the incoming signal, that is, timbre, pitch, loudness, melody, rhythm, harmony, stress, and so on. Through reentrant or looping processes, that is, synaptic connections going to many other parts of the brain, we are simultaneously, or so it seems, aware of the last time we heard this piece, or one like it, as well as concomitant feelings of joy, sadness, or even fear. . . . In this way, a particular sensory stimulus acts as physiological metonym—one part (music) invokes the whole mythology and its accompanying behavior and emotional feel.44
Becker’s account, suggesting how associations formed in connection with music come to be engaged automatically, is applicable to the culturally specific synesthetic experiences we have in connection with music. Our Western association of major and minor with happy and sad has become part of our cognitive mapping. These associations seem natural because the appropriate neuronal groups linked to the happy/sad opposition, being habitually triggered at the same time, have come to fire automatically when we hear music in major or minor keys. Had we been musically raised in a society without major or minor keys, or without the same emotional pairings, these associations, even if we could recognize them, would not seem “natural” at all.45
Becker’s account also provides an explanation of why cross-modal connections with music are typical of human beings. Our initial social bonding and our cognitive development depends on our being able to link perceptions from our various sensory modalities. Insofar as music is perceptually salient, but salient primarily for hearing and touch, I have argued, it enables and encourages associations with objects and processes that may be primarily salient for other modalities. Particular associations, sufficiently repeated (by virtue of being frequently encountered in the environment where music-making occurs), become deeply ingrained in the sense of being neurologically coupled. Forming deeply ingrained associations with music is a universal propensity. The multimodal images used to talk about music in various societies offer evidence of this tendency.
If “stumbling onto” cross-modal metaphors is the norm, however, we have no reason to think that the cross-modal associations formed in connection with music will be universal. The variation among cross-modal images used by different societies indicates that although the tendency to employ such images may be universal, the images themselves are not.46 Indeed, we have no reason to expect all such associations to be shared by entire cultures. Those who experienced Bugs Bunny’s version of Wagnerian opera may commonly connect “The Ride of the Valkyries” with Bugs Bunny, but this would hardly be the standard across American society. Perhaps others from my grade school class that was exposed to classical music while we were supposed to consume our lunch without talking share my association of “The Hebrides” with these strange circumstances, but I recognize that this reaction is unusual to the point of idiosyncrasy.
Music serves to connect us with the external world and motivates us to forge associations along multiple sensory lines. In this respect, it bears a noteworthy resemblance to language. A significant difference however, is that language requires conventional (i.e., established and standardized) associations between words and those things to which they refer. The associations between aspects of music and aspects of extramusical experience seem in some cases to be as “natural” as the meanings of words seem to someone fluent in a particular language. But as we have seen, divergent backgrounds can result in divergent associations, even with the same particular music. Extramusical associations with features of music are not as established and standardized as the denotations of words.
Feld’s analysis of the “interpretive moves” that listeners employ in response to music provides an explanation of why societies will inevitably differ with respect to what images they conjoin with music.47 He suggests, quite plausibly, that all musical listening involves the listener’s active efforts to relate the music to his or her broader experience. Feld maintains that we foreground and background various aspects of music, making judgments that relate them to a whole schema of relations to other, often extramusical concerns. Feld emphasizes the extent to which many interpretive judgments are specific to social groups, which makes sense, given that one’s sense of group membership often depends on shared location, classification schemes, backgrounds, experiences, metaphors, and values. Thus, for example, the Aboriginal Australians connect songs with particular contours of the landscape, as we considered in chapter 2, while many Westerners associate organ music with ritual experiences in Christian churches.
Feld’s account would also seem to imply, however, that differences in individual backgrounds, whether or not they conform to specific group patterns, will have significant impact on the way particular individuals will interpret the music they hear. The unique set of an individual’s personal experiences may influence which locations, categories, associations, reflections, and evaluations that individual will draw upon in interpreting music as personally relevant.48 Thus music comes to have both personal meaning and shared meanings, or what Constantijn Koopman and Stephen Davies call “meaning-for-us.”49
CONCLUSION
Music draws attention to our participation in a larger world, in part through its powerful multimodal appeal. However, its synesthetic potential enables considerable cross-cultural diversity in the more detailed associations between music and the larger world. Once again, as with musical preference universals, variety is the consequence of an underlying commonality.
The closest we have come so far to a basis in music for cross-cultural comprehension has been music’s communication of affect. The grounds for the variant of the adage that termed music the universal language of emotion seem evident. Thus far, we have emphasized prosody as a universal basis for emotional communication through music as well as speech. I will suggest in the following two chapters that other features of music yield cross-cultural emotional communication as well.