Cross-modal Correspondences
Zohar Eitan
Like the mercilessly overcited Monsieur Jourdain, music practitioners and scholars have been engaged with cross-modal correspondences for centuries without knowing it.1 Musicians use cross-modal correspondences (CMC)—“systematic associations found across seemingly unrelated features from different sensory modalities” (Parise, 2016)—when employing Western music notation, where “higher” pitch is located higher on the page, and changes in loudness are depicted by changes in spatial width (i.e., crescendo and diminuendo wedges); when, as composers or improvisers, they apply slow, pianissimo, muted or low-register sound to depict a dark night; or when (as conductors) they use rising or expanding hand gestures to enhance an orchestral or choral crescendo. Such commonplace musical activities—as well as the most basic terms of music-related vocabulary, like “high” or “low” tones (associating musical pitch and spatial location), “bright” sound (associating musical timbre with visual luminosity), or “soft” sound “volume” (associating loudness with both touch and size)—all indeed employ systematic associations between musical features and “seemingly unrelated” features of non-auditory modalities.
Numerous empirical studies in human perception and cognition, using converging experimental methodologies, have investigated CMC. Several excellent surveys of that research have been published (Marks, 2004, 2014; Parise & Spence, 2013; Spence, 2011; Walker, 2016), and this chapter will not attempt to replicate them. Rather, the chapter primarily aims to suggest how current CMC research—though performed mainly in non-musical contexts, using simple auditory stimuli—may inform musical thought and practice, and how studies of CMC utilizing music-specific features and contexts may in turn enhance CMC research, associating it with complex, culturally significant contexts.
CMC do not seem to present a single, unified phenomenon. Rather, the term has been applied to diverse mappings, which utilize a variety of psychological mechanisms, have different origins, and accomplish a gamut of psychological roles, ranging from basic perceptual and motor functions to the shaping of language metaphors and cultural practices. Below, I briefly survey some of the mappings, psychological functions and sources constituting CMC. Discussing the ramifications of that diversity for music, I highlight gaps remaining to be filled in music-related CMC research.
What musical features partake in CMC, and with what features of non-auditory modalities are they associated? CMC Research involving auditory dimensions has mainly focused on pitch height and loudness, possibly since both are bipolar dimensions, structured along a more-or-less axis (higher/lower, louder/softer), and are thus readily comparable with bipolar dimensions such as visual brightness (bright/dark), physical size (large/small; e.g., Gallace & Spence, 2006), or spatial elevation (high/low). Fewer studies have examined CMC involving other auditory domains, such as tempo (Küssner, Tidhar, Prior, & Leech-Wilkinson, 2014) or dimensions of acoustic timbre (e.g., Pitteri, Marchetti, Priftis, & Grassi, 2015). Importantly, hardly any empirical CMC research has examined whether and how musically specific features and structures (e.g., tonal stability or metric hierarchy) systematically associate with non-auditory features. Rather, even when applying musical stimuli, CMC research typically focuses on their basic auditory features, ignoring possible cross-modal associations of music-specific, higher-level configurations.
For basic auditory dimensions, however (particularly pitch and loudness), research has established a variety of CMC, involving multiple visual, kinesthetic, tactile, and even gustatory and olfactory features. For instance, high-pitched sounds, in addition to their established association with higher spatial position, are (compared to lower pitch) bright, visually light, fast, small, thin, lightweight, sharp, hard, dry, and cold. They also associate perceptually with faster vibrotactile frequencies, are either sweet or sour (while low pitches are bitter or salty), and consistently match specific odors (for research surveys, see Marks, 2004; Spence, 2011; Walker, 2016).
Similarities and differences between cross-modal mappings of auditory dimensions may elucidate their connotative relationships in surprising ways. Consider, for instance, the CMC of pitch and loudness. Loud sound and high pitch match in many of their mappings: both are associated with spatially higher position (Evans & Treisman, 2010; Eitan, Schupak, & Marks, 2008), brighter light (e.g., Marks, 1987), and also sharper shapes and harder surfaces (Eitan & Rothschild, 2010). Furthermore, loudness and pitch were themselves shown to be congruent dimensions, such that higher pitch is perceptually associated with louder sound (Melara & Marks, 1990), and rising pitch with crescendi (Neuhoff & McBeath, 1996). Yet, despite their congruence, loudness and pitch contrast with regard to mappings related to size and mass: high pitch is small, thin, and lightweight, while loud sound “volume” is associated with large (voluminous), thick, and heavy physical objects.
The proliferation of CMC research of basic auditory dimensions notwithstanding, substantial lacunas concerning music-specific cross-modal correspondences are awaiting investigation. As noted above, even when actual musical stimuli are used (rather than rarified stimuli like single sine tones), CMC research mainly examines music as sound, focusing on basic features shared by most auditory domains, like pitch and loudness. Little CMC research examines intrinsically musical features, such as harmonic and melodic intervals, chord structure, or modality, and CMC involving higher-level musical structures, such as tonality, metric hierarchy, or rhythmic and melodic configuration, are almost completely ignored.
One reason such investigation may be worthwhile is the abundance and importance of cross-modal metaphor in historical and contemporary discourse involving musical structures such as Western tonality (Rothfarb, 2001). Lakoff and Johnson (1980) showed how abstract concepts are constructed in terms of physical metaphors, a perspective adapted to investigate metaphors for musical structure (Zbikowski, 2008, this volume). Tonal relationships were mapped, for instance, onto motion and its underlying forces, (e.g., gravity; Rameau, 1722), spatial image schemas like center/periphery, top/bottom, or front/back (see Spitzer, 2004), or visual brightness (“dark” chromaticism vs. “bright” diatonicism; e.g., Boulez, 1986).
Studying musically-specific CMC may be particularly intriguing since CMC may implicitly and subconsciously affect perception and behaviour. Are musically-specific CMC also associated implicitly with musical meanings and behaviours? Do non-musicians, for instance, perceive chromatic tones as “darker” than stable diatonic tones (See Maimon, 2016, for an exploratory study)? For a fuller understanding of CMC in musical contexts, one would need to elucidate such issues empirically.
A second musically-relevant lacuna in CMC research concerns the interaction of dynamic (time-varying) auditory parameters. Music usually involves simultaneous changes in multiple parameters. Few studies, however, have systematically examined how such interactions affect the listener’s cross-modal mappings, and these studies (Eitan & Granot, 2011; Küssner et al., 2014) present surprising interactions. For instance, when combined with a diminuendo, an accelerando may lose its association with accelerated physical motion, and pitch direction disassociated from vertical spatial motion (e.g., a rising pitch in diminuendo would no longer be associated with spatial rise). Such findings indicate that CMC derived from experiments involving a single pair of dimensions (e.g., pitch direction and vertical motion) do not necessarily predict mappings resulting from the complex interaction of multiple dimensions, characteristic of musical contexts. Further examination of multi-parametric interactions in musical contexts is thus called for.
One reason CMC may be particularly intriguing to music research is their role in both high-level and low-level mental processes. While often involved with activities and cultural products requiring conscious awareness, contemplation, and conceptualization (such as the application of synesthetic metaphors to music, or the use of “tone painting”; Zbikowski, 2008, this volume), CMC have also been shown to affect basic perceptual and cognitive domains automatically and subconsciously.
CMC effects cover a range of basic perceptual and information-processing domains. CMC may affect the perception of elementary perceptual attributes, such as spatial location or movement direction: we actually perceive “higher” pitches as stemming from spatially higher locations (Pratt, 1930), and pitch direction may affect the perceived spatial direction of concurrent visual motion (Maeda, Kanai, & Shimojo, 2004). CMC also enhance cross-modal binding —the vital capacity to associate stimuli from different sensory modalities with the same source object—affecting our ability to associate visual and auditory stimuli to each other in time and space (Parise & Spence, 2009). Likewise, CMC affect perceptual learning of cross-modal pairings, facilitating implicit learning of congruent cross-modal pairings such as low pitch and dark visual stimuli, as compared to incongruent pairings (Brunel, Carvalho, & Goldstone, 2015). CMC also affect selective attention—the ability to focus on a specific stimulus or feature in our environment, while ignoring others – as demonstrated in numerous speeded classification and speeded detection tasks (Marks, 2004); and CMC may influence motor responses as well: for instance, responses to a high pitch via a spatially high key are faster and more accurate than responses via a low key (Rusconi et al., 2006).
Importantly, such CMC effects do not require conscious awareness or conceptualization. Rather, they mostly influence mental processes and their outcomes implicitly, automatically, and subconsciously. This suggests that CMC may also operate under the surface of music processing, implicitly shaping musical structures, emotions, and meanings.
However, the implicit and automatic qualities of CMC highlight a lacuna in our understanding of their musical roles. Cross-modal mappings and metaphors significantly partake in the conceptual processing of music, from its basic vocabulary to seemingly abstract systems of music theory and analysis (Zbikowski, 2008, this volume). The gap between such rarified cultural products and the elementary perceptual functions of CMC, as studied by experimental psychologists, may seem insurmountable. Bridging that challenging gap, however, may be highly rewarding for music researchers of diverse orientations – cognitive, historical, and theoretical.
As their involvement in basic perceptual functions may suggest, some CMC featured prominently in musical contexts—for instance, the associations of higher pitch with higher elevation or small physical size, or of increased loudness with increased visual brightness—apply cross-culturally and pre-linguistically. Indeed, increasing evidence suggests that these cross-modal mappings are not solely based on culture-specific conventions or on language idioms, though both sources may partake in shaping their applications in specific cultural contexts. Rather, they reflect cross-cultural, possibly universal dispositions to associate dimensions in different sense modalities.
Spence (2011) proposes two possible sources for such cross-cultural dispositions. “Sta-tistical” CMC stem from natural correlations of stimulus dimensions, experienced since infancy and internalized through processes of statistical learning. Such, for instance, are the associations of pitch and physical size, loudness and distance, and even pitch and elevation (Parise, Knorre, & Ernst, 2014). “Structural” CMC stem from inborn neural connections, or analogies in neural processing. For instance, increases in stimulus intensity (e.g., louder sound, brighter light) are often associated with increased neural firing rate (Stevens, 1957).
Several lines of research suggest that some CMC may indeed be based on such universal (though not necessarily innate) origins: studies of pre-linguistic infants; comparative cross-cultural and cross-linguistic research; and ethological studies, examining CMC as reflected in the behavior of non-human species (e.g., Morton, 1994; Ludwig, Adachi, & Matsuzawa, 2011). Yet, CMC may also stem from culture- or language-specific origins; for instance, they may occur when identical or similar terms are used to describe dimensions in different modalities—for instance, when “high” and “low” are used to denote both pitch and spatial elevation (“semantically” mediated CMC; Spence, 2011). To illustrate the interactions between apparent universal tendencies and the effects of specific language and culture, I briefly describe findings of infant and cross-cultural studies, and then survey recent research demonstrating the complexity of such interactions.
As repeatedly demonstrated by studies utilizing the preferential looking paradigm, infants (<6 months old) associate rising and falling pitch, respectively, with rising and falling visual stimuli (Dolscheid, Hunnius, Casasanto, & Majid, 2014; Wagner, Winner, Cicchetti, & Gardner, 1981; Walker et al., 2010), as well as higher pitch with sharper (Walker et al., 2010), thinner (Dolscheid et al., 2014), and smaller (Fernández-Prieto, Navarra, & Pons, 2015) visual stimuli. Infants as young as 10 months associate higher pitch with brighter colour (Haryu & Kajikawa, 2012). A physiologically-based CMC involving loudness and visual brightness was found for 3-weeks olds, who transferred a cardiac attenuation response generated by exposure to light onto sound of comparable intensity (Lewkowicz & Turkewitz, 1980). These studies suggest that several types of CMC, all commonly used in music and musical discourse, are either inborn or learned through very early experience, and in any case are not dependent on language or acculturation.
Several lines of cross-cultural research suggest that CMC involving auditory features are not solely bound by language or cultural convention. One line of this research involves sound symbolism—the systematic association of sound and meaning in speech. Sound symbolism studies indicate cross-cultural consistency in associating specific features of speech or vocal sound with non-auditory features, such as shape, height, or size (see Hinton, Nichols, & Ohala, 2006, for research survey). For instance, Köhler’s well-known demonstration of sound-shape association (1929), in which the nonsense words “maluma” and “takete” were almost unanimously associated by Westerners with rounded and spiky visual shapes, respectively, was as strongly exhibited by the Himba of Northern Namibia, a remote population with hardly any contact with Western culture (Bremner et al., 2013).
Analogies of sound and movement features may provide another source for cross-cultural analogies. Sievers, Polansky, Casey, and Wheatley (2013) investigated such analogies by examining analogous depiction of basic emotions by Western and non-Western participants. Sievers and associates applied a model representing corresponding features of music and movement (tempo/movement rate, jitter, interval/step size, pitch direction/vertical movement direction, consonance/surface smoothness) analogously. A computer program utilizing the model generated both movement patterns (animations of a moving ball) and musical sequences (monophonic melodies). Participants were asked to create movement animations or musical sequences subjectively representing five basic emotions (angry, happy, peaceful, sad and scared). Results revealed that each emotion was represented by a unique combination of features, features similarly represented through movement or music. Importantly to the present concern, these cross-modal representations were shared by participants of both cultures.
Cross-cultural studies have also examined specific audio-visual CMC directly. Parkinson, Kohler, Sievers, and Wheatley (2012) using a speeded classification paradigm (a paradigm often applied to investigate audio-visual CMC; Marks, 2004), established that members of an isolated community in Northeastern Cambodia, whose language does not use spatial terminology for pitch, implicitly associate pitch direction with vertical visuo-spatial direction. Correspondingly, Westerners applied non-Western metaphors for high and low pitch in accordance with their original use, of which they had no prior information, suggesting that these metaphors may be grounded in basic, cross-cultural CMC (Eitan & Timmers, 2010).
While tendencies for certain cross-modal associations may be universal, these tendencies are not necessarily realized universally in the same ways. For instance, though the pitch/elevation mapping is prevalent across cultures and languages, languages use many other mappings to denote auditory pitch (Eitan & Timmers, 2010), and at least in one language (the Austronesian language ‘Are’are), our “high” pitch is denoted as “low” (Zemp & Malkus, 1979). A key question is, then: how do “natural” cross-modal predispositions (innate or acquired through statistical learning prior to the acquisition of language or enculturation) interact with cross-modal mappings suggested by language or cultural convention?
A series of studies by Dolschied and associates (Dolscheid, Shayan, Majid, & Casasanto, 2013; Dolscheid et al., 2014) demonstrate how complex such interactions may be. In Dolscheid et al. (2013), Dutch speakers (whose language maps pitch onto spatial elevation) and speakers of Farsi (which maps pitch onto thickness—“lower” pitch is thicker—rather than elevation) were asked to sing back a tone while viewing lines that varied in elevation or thickness. Elevation, but not thickness, affected Dutch speakers’ pitch reproduction; Farsi speakers’ performance, in contrast, was affected by thickness, but not by elevation.
These results could suggest that even when non-linguistic tasks are involved, language and acculturation (rather than presumed “natural” tendencies), shape adults’ CMC. Implicit associations of pitch with both elevation and thickness, however, were found in 4 months old infants (Dolscheid et al., 2014; Walker et al., 2010). Are such early CMC simply extinguished later in development by language and cultural practice? Several studies suggest a more complex interaction. In Dolscheid et al. (2013), pitch reproduction of Dutch speakers trained to use the thickness metaphor was affected, as in Farsi speakers, by the thickness of irrelevant visual stimuli. However, training for a “reversed-thickness” mapping, in which higher pitch is associated with thicker lines, did not produce any effects. This suggests that linguistic metaphors and other cultural practices do not create CMC, but modify the expression of preexisting tendencies, such as those revealed in infant studies.
Indeed, the transition from early non-linguistic correspondences to mappings codified in language may create complex interactions even when the two kinds of mappings are expected to support each other. For instance, young Hebrew-speaking children were unable to consistently apply pitch-elevation mappings in music-induced movement or in motion imagery tasks, though Hebrew uses elevation terms for pitch (Eitan & Tubul, 2011; Kohn & Eitan, 2016). Rather, they often applied other CMC, such as loudness-elevation, better. Thus, while language metaphors (and other cultural symbols and practices) may ultimately strengthen corresponding pre-linguistic mappings, such metaphors may initially hinder their early equivalents. This apparently counterintuitive conclusion suggests that there are still considerable gaps to fill in our understanding of how the interaction of “nature” and “nurture” shapes CMC—gaps which are of particular importance for the historically and culturally-laden realm of music.
At this point, a reader may wonder whether and how CMC are related to synesthesia—that intriguing condition in which an experience in one perceptual domain vividly arouses an experience of a different, unrelated domain. How, for instance, is pitch-color synesthesia— in which specific auditory pitches vividly evoke specific color hues—distinguished from (and related to) the widespread association of pitch and visual lightness (Marks, 1987)? And what roles may each phenomenon have in music processing?
Synesthesia and CMC may use similar mechanisms. Thus, the association of lighter color with higher pitch guides both “genuine” pitch-color synesthetes and the rest of us, experiencing pitch-lightness CMC. Indeed, researchers debate whether synesthesia and CMC are distinct phenomena, or different points along the same continuum (Marks & Mulvenna, 2013; Parise & Spence, 2013). This debate notwithstanding, one may point out some distinctions between the two phenomena.
First, CMC lack the vivid, conscious perceptual experience of the induced dimension characteristic of synesthesia. Pitch-lightness correspondence, for instance, does not involve actually seeing lighter colors when hearing higher pitches. Nevertheless, as discussed above (under the heading “Functions”), CMC does affect perception in important ways, both consciously and subconsciously.
Second, synesthesia is involuntary: the induced synesthetic sensation is activated automatically, does not require any conscious effort, and cannot be inhibited at will. While CMC also involve automatic perceptual processes, its manifestations are amenable to voluntary control, and may be affected by training (e.g., Dolscheid et al., 2013).
Third, synesthetic mappings tend to be absolute (context-independent) for each synesthete, and are consistent over time. Thus, sound-color synesthetes associate the same tones and color hues in repeated tests, conducted months apart (Ward, Huckstep & Tsakanikos, 2006). In contrast, CMC is mostly contextual and variable over time: thus, we systematically associate higher pitch with lighter color, but do not consistently match a particular pitch with a specific degree of lightness when presented in isolation or in different contexts (e.g., when the range of compared pitch- or lightness values changes).
However, though synesthetic mappings may be remarkably consistent for individual synesthetes, they considerably vary among synesthetes. For instance, the pitch-color mappings of composers reputed to be synesthetes (e.g., Scriabin, Rimsky-Korsakov, Messiaen) vary widely (Shaw-Miller, 2013).
Finally, while CMC are widespread, and some may be universal (see “origins” above), vivid synesthesias are rare; surprisingly, sound-color synesthesias, their cultural prominence notwithstanding, are among the rarest (Simner et al., 2006).
Synesthesia is fascinating, and may serve as an important tool in studying perceptual and cognitive processing, both behaviorally and neurophysiologically. Yet music-color synesthesias, being rare and idiosyncratic conditions, cannot easily serve to explain how the rest of us assign connotative meanings to music. Cross-modal correspondences, though less exotic than hearing colors or tasting shapes, affect a range of perceptual and cognitive functions deeply, widely and consistently, and are thus highly relevant to understanding how listeners, performers and composers apply and process musical meanings.
The auditory features partaking in CMC often associate with emotional features as well. For instance, low pitch, associated with features such as dark colour or low elevation, also suggests negative emotional valence, particularly sadness (Collier & Hubbard, 2001). Correspondingly, non-auditory sensory dimensions that often map onto features of sound, such as brightness or spatial height, are strongly associated with emotion. Such relationships are evident not only through language metaphors and idioms (e.g. “dark” and “bright” moods; “high” and “low” spirits), but also in non-verbal measures of emotion, often expressed implicitly. For instance, positively-valenced words are processed faster when printed in lighter shades of grey, and negative words—when printed in darker shades (Meier, Robinson, & Clore, 2004); complementarily, valence of evaluation words affects brightness perception—“good” words are perceived as brighter (Meier & Robinson, 2005). Similarly, a word’s valence affects spatial-visual attention: positive words shift attention upward, and negative words downward (Meier & Robinson, 2004); comparably, moving objects up or down enhances recall of positive and negative episodic memories, respectively (Casasanto & Dijkstra, 2010).
In interpreting CMC in music, then, one may consider an interconnected threesome: mappings of sound and non-auditory perceptual dimensions, mappings of sound and emotion, and mappings of non-auditory dimensions and emotion (e.g. low pitch—dark colour, low pitch—sadness, dark colour—sadness). An intriguing hypothesis concerning this triadic complex of mappings suggests that some cross-modal mappings are mediated, at least in musical contexts, by emotion. That is, musical features may correspond with non-auditory features since both associate with emotion in a similar way. For instance, the association between low pitch and dark color may be activated by the shared association of these dimensions with negative valence. Thus, the apparent synesthetic correspondence would actually be a second-order reflection of the emotional associations of sound and vision.
The emotional mediation hypothesis has been recently corroborated for several dimensions. Palmer, Schloss, Xu, and Prado-León (2013) show, in a cross-cultural study, that listeners’ colour and emotional associations of musical pieces are strongly correlated (see also Lindborg & Friberg, 2015), while Levitan, Charney, Schloss, and Palmer (2015) found that emotion mediates correspondences between music and smell. Bhattacharya and Lindsen (2016) show that perceived emotional valence of music biases judgements of visual brightness: stimuli were rated as brighter after listening to positively-valenced music, compared to negatively-valenced music.
Support for the role of emotion in mediating CMC in music is also provided by studies indicating that the emotions associated with specific visual stimuli influence the perception of music presented concurrently with these stimuli (Boltz, 2013; Timmers & Crook, 2014). Likewise, the positive or negative valence of words enhances the perception of simultaneously presented high and low pitches, respectively (Weger, Meier, Robinson, & Inhoff, 2007). Evidence supporting the role of emotion in CMC—particularly the correspondences of musical and movement features—is provided by Sievers et al. (2013), discussed above, revealing that basic emotions are characterized across cultures by a unique combination of analogous musical and movement features.
This intriguing research notwithstanding, the interrelations of emotional and cross-modal mappings in complex musical contexts, and particularly the ways listeners perceive and respond to such interrelationships, have hardly been explored empirically. As Eitan, Timmers, and Adler (in press) demonstrate, conflicts between the emotional and cross-modal connotations of musical features (such as rising pitch) may often underlie complex connotative contexts, particularly when text or visual imagery are also involved. The investigation of such interactions is, then, a challenge for both music analysis and music psychology.
Music is uniquely positioned to serve as a real-world laboratory for the examination of cross-modal correspondences in complex settings. As noted at the beginning of this chapter, cross-modal mappings pervade music and its related activities and artifacts. Across historical and cultural domains, mappings of musical features onto visual, spatial or kinesthetic features (directly or through verbal mediation) are presented through text-settings, “programme” and descriptive music, music aimed to induce movement (e.g., dances, marches, work-songs), and diverse audio-visual musical multimedia, including opera, film music, and music for computer and video games (Tan, Cohen, Lipscomb, & Kendall, 2013). CMC are also ubiquitous in music-induced movement, both spontaneous and pre-organized (Godøy & Leman, 2010; Kohn & Eitan, 2016), in aspects of musical notation, and in music-related vocabulary and metaphor (Zbikowski, 2008), including the conceptual metaphors underlying music theories (Zbikowski, 2005).
While research on cross-domain mappings in these diverse musical contexts is not scant (see also chapters by Clarke and Zbikowski, this volume), several general directions for further research may be pointed out. First, as noted above, empirical CMC research needs to explore facets of the inherent complexity of music: how music-specific features and structures affect CMC; how dynamic interactions of multiple musical parameters are reflected in music’s cross-modal mappings, and how emotional and cross-modal mappings of auditory features interact in musical contexts. Second, perception and performance studies of CMC may be complemented by computer-assisted musical corpus studies. For instance, quantitative studies of musical text settings may systematically explore the association of musical features (extracted from musical scores and recordings) with specific visual, spatial, kinesthetic or tactile references or connotations in the text. Such studies may allow for systematic comparisons of musical CMC across musical styles and genres, highlighting composers’ idiosyncrasies, style-specific idioms, and perhaps universals. Correspondences of musical or auditory features with visuo-kinetic dimensions in film and other audiovisual media could also be investigated quantitatively, applying some of the rich apparatus developed for multimedia information retrieval (see Rüger, 2009, for a survey). Combined with current CMC research methods, such largely unexplored avenues for investigation may produce valuable, perhaps unexpected revenues for both music research and cognitive science.
1. Monsieur Jourdain, Molière’s Bourgeois gentilhomme, was surprised to learn that he has been speaking prose throughout his life.
Marks, L. E. (2014). The unity of the senses: Interrelations among the modalities. Cambridge, MA: Academic Press.
Spence, C. (2011). Cross-modal correspondences: A tutorial review. Attention, Perception & Psychophysics, 73, 971–995.
Tan, S. L., Cohen, A., Lipscomb, S., & Kendall, R. (Eds.) (2013). The psychology of music in multimedia. Oxford: Oxford University Press.
Walker, P. (2016). Cross-sensory correspondences: A theoretical framework and their relevance to music. Psychomusicology: Music, Mind, & Brain, 26, 103–116.
Bhattacharya, J., & Lindsen, J. P. (2016). Music for a brighter world: Brightness judgment bias by musical emotion. PloS ONE, 11, 0148959. doi:10.1371/journal.pone.0148959
Boltz, M. (2013). Music videos and visual influences on music perception and appreciation: Should you want your MTV? In S. L. Tan, A. Cohen, S. Lipscomb, & R. Kendall (Eds.), The psychology of music in multimedia (pp. 217–235). Oxford: Oxford University Press.
Boulez, P. (1986). Orientations. J-J Nattiez (Ed.), M. Cooper (Trans.). London: Faber.
Bremner, A. J., Caparos, S., Davidoff, J., de Fockert, J., Linnell, K. J., & Spence, C. (2013). “Bouba” and “Kiki” in Namibia? A remote culture makes similar shape–sound matches, but different shape–taste matches to Westerners. Cognition, 126, 165–172.
Brunel, L., Carvalho, P. F., & Goldstone, R. L. (2015). It does belong together: Cross-modal correspondences influence cross-modal integration during perceptual learning. Frontiers in Psychology, 6, 358. doi:10.3389/fpsyg.2015.00358
Casasanto, D., & Dijkstra, K. (2010). Motor action and emotional memory. Cognition, 115, 179–185.
Collier, W. G., & Hubbard, T. L. (2001). Judgments of happiness, brightness, speed and tempo change of auditory stimuli varying in pitch and tempo. Psychomusicology, 17, 36–55.
Dolscheid, S., Hunnius, S., Casasanto, D., & Majid, A. (2014). Prelinguistic infants are sensitive to space-pitch associations found across cultures. Psychological Science, 25, 1256–1261.
Dolscheid, S., Shayan, S., Majid, A., & Casasanto, D. (2013). The thickness of musical pitch: Psychophysical evidence for linguistic relativity. Psychological Science, 24b, 613–621.
Eitan, Z., & Granot, R. Y. (2011). Listeners’ images of motion and the interaction of musical parameters. Paper presented at the 10th Conference of the Society for Music Perception and Cognition (SMPC). Rochester, NY.
Eitan, Z., & Rothschild, I. (2010). How music touches: Musical parameters and listeners’ audiotactile metaphorical mappings. Psychology of Music, 39, 449–467.
Eitan, Z., Schupak, A., & Marks, L. E. (2008). Louder is higher: Cross-modal interaction of loudness change and vertical motion in speeded classification. In K. Miyazaki, Y. Hiraga, M. Adachi, Y. Nakajima, & M. Tsuzaki (Eds.), Proceedings of the 10th international conference on music perception & Cognition (ICMPC10). Adelaide, Australia: Causal Productions.
Eitan, Z. & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-domain mappings of auditory pitch in a musical context. Cognition, 114, 405–422.
Eitan, Z., Timmers, R., & Adler, M. (In press). Cross-modal correspondences in a Schubert song. In D. Leech-Wilkinson and H. Prior (Eds.), Music and shape. Oxford and New York: Oxford University Press.
Eitan, Z., & Tubul, N. (2010). Musical parameters and children’s images of motion. Musicae Scientiae, 14, 89–111.
Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and auditory features. Journal of Vision, 10, 1–12.
Fernández-Prieto, I., Navarra, J., & Pons, F. (2015). How big is this sound? Cross-modal association between pitch and size in infants. Infant Behavior and Development, 38, 77–81.
Gallace, A., & Spence, C. (2006). Multisensory synesthetic interactions in the speeded classification of visual size. Perception & Psychophysics, 68, 1191–1203.
Godøy, R. I., & Leman, M. (Eds.). (2010). Musical gestures: Sound, movement, and meaning. New York, NY: Routledge.
Haryu, E., & Kajikawa, S. (2012). Are higher-frequency sounds brighter in colour and smaller in size? Auditory-visual correspondences in 10-month-old infants. Infant Behavior and Development 35, 727–732.
Hinton, L., Nichols, J., & Ohala, J. J. (2006). Sound symbolism. Cambridge: Cambridge University Press.
Köhler, W. (1929). Gestalt psychology. New York, NY: Liveright.
Kohn, D., & Eitan, Z. (2016). Moving music: Correspondences of musical parameters and movement dimensions in children’s motion and verbal responses. Music Perception, 34, 40–55.
Küssner, M. B., Tidhar, D., Prior, H. M., & Leech-Wilkinson, D. (2014). Musicians are more consistent: Gestural cross-modal mappings of pitch, loudness and tempo in real-time. Frontiers in Psychology, 5, 789. doi:10.3389/fpsyg.2014.00789
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press.
Levitan, C. A., Charney, S. A., Schloss, K. B., & Palmer, S. E. (2015). The smell of jazz: Cross-modal correspondences between music, odor, and emotion. In Noelle, D.C., Dale, R., Warlaumont, A. S., Yoshimi, J., Matlock, T., Jennings, C. D., & Maglio, P. P. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (pp. 1326–1331). Austin, TX: Cognitive Science Society.
Lewkowicz, D. J., & Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory–visual intensity matching. Developmental Psychology, 16, 597–607.
Lindborg, P., & Friberg, A. K. (2015). Colour association with music is mediated by emotion: Evidence from an experiment using a CIE lab interface and interviews. PloS ONE, 10 (12), e0144013.
Ludwig, V. U., Adachi, I., & Matsuzawa, T. (2011). Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan Troglodytes) and humans. Proceedings of the National Academy of Sciences, 108, 20661–20665.
Maeda, F., Kanai, R., & Shimojo, S. (2004). Changing pitch induced visual motion illusion. Current Biology, 14, R990–R991.
Maimon, N. (2016). Bright tonic, grey subdominant? Cross-modal correspondence between tonal stability and visual brightness. M.A. Thesis (Cognitive Psychology), Tel Aviv University.
Marks, L. E. (1987). On cross-modal similarity: Auditory-visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception and Performance, 13, 384–394.
Marks, L. E. (2004). Cross-modal interactions in speeded classification. In G. Calvert, C. Spence, and B. E. Stein (Eds.), Handbook of multisensory processes (pp. 85–106). Cambridge, MA: MIT Press.
Marks, L. E., & Mulvenna, C. M. (2013). Synesthesia, at and near its borders. Frontiers in Psychology, 4, 651. doi:10.3389/fpsyg.2013.00651
Meier, B. P., Robinson, M. D., & Clore, G. L. (2004). Why good guys wear white: Automatic inferences about stimulus valence based on brightness. Psychological Science, 15, 82–87.
Meier, B. P., & Robinson, M. D. (2004). Why the sunny side is up: Associations between affect and vertical position. Psychological Science, 15, 243–247.
Meier, B. P., & Robinson, M. D. (2005). The metaphorical representation of affect. Metaphor and Symbol, 20, 239–257.
Melara, R. D., & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch, and loudness. Perception & Psychophysics, 48, 169–178.
Morton, E. (1994). Sound symbolism and its role in non-human vertebrate communication. In L. Hinton, J. Nichols, and J. Ohala (Eds.), Sound symbolism (pp. 348–365). Cambridge: Cambridge University Press.
Neuhoff, J. G., & McBeath, M. K. (1996). The Doppler illusion: The influence of dynamic intensity change on perceived pitch. Journal of Experimental Psychology: Human Perception and Performance, 22, 970.
Palmer, S. E., Schloss, K. B., Xu, Z., & Prado-León, L. R. (2013). Music–colour associations are mediated by emotion. Proceedings of the National Academy of Sciences, 110, 8836–8841.
Parise, C. V. (2016). Cross-modal correspondences: Standing issues and experimental guidelines. Multisensory Research, 29, 7–28.
Parise, C. V., Knorre, K., & Ernst, M. O. (2014). Natural auditory scene statistics shapes human spatial hearing. Proceedings of the National Academy of Sciences, 111, 6104–6108.
Parise, C. V., & Spence, C. (2009). ‘When birds of a feather flock together’: Synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS ONE, 4, e5664. doi:10.1371/ journal.pone.0005664
Parise, C., & Spence, C. (2013). Audiovisual cross-modal correspondences in the general population. In J. Simner, & E. M. Hubbard (Eds.), The Oxford handbook of synaesthesia (pp. 790–815). Oxford: Oxford University Press.
Parkinson, C., Kohler, P. J., Sievers, B., & Wheatley, T. (2012). Associations between auditory pitch and visual elevation do not depend on language: Evidence from a remote population. Perception, 41, 854–861.
Pitteri, M., Marchetti, M., Priftis, K., & Grassi, M. (2015). Naturally together: Pitch-height and brightness as coupled factors for eliciting the SMARC effect in non-musicians. Psychological Research, 15, 1–12.
Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13, 278–285.
Rameau, J.-P. (1722). Traité de l’harmonie réduite à ses principes naturels. Paris: Ballard.
Rusconi, E., Kwan, B., Giordano, B. L., Umilta, C., & Butterworth, B. (2006). Spatial representation of pitch height: The SMARC effect. Cognition, 20, 1–17.
Rothfarb, L. (2001). Energetics. In T. Christensen (Ed.), The Cambridge history of Western music theory (pp. 927–955). Cambridge: Cambridge University Press.
Shaw-Miller, S. (2013). Synesthesia. In T. Shephard, & A. Leonard (Eds.), The Routledge companion to music and visual culture. New York, NY: Routledge.
Sievers, B., Polansky, L., Casey, M., & Wheatley, T. (2013). Music and movement share a dynamic structure that supports universal expressions of emotion. Proceedings of the National Academy of Sciences, 110, 70–75.
Simner, J., Mulvenna, C., Sagiv, N., Tsakanikos, E., Witherby, S. A., Fraser, C., & Ward, J. (2006). Synesthesia: The prevalence of atypical cross-modal experiences. Perception, 35, 1024–1033.
Spitzer, M. (2004). Metaphor and musical thought. Chicago, IL: University of Chicago Press.
Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64, 153–181.
Timmers, R., & Crook, H. (2014). Affective priming in music listening: Emotions as a source of musical expectation. Music Perception, 31, 470–484.
Wagner, Y. S., Winner, E., Cicchetti, D., & Gardner, H. (1981). “Metaphorical” mapping in human infants. Child Development, 52, 728–731.
Walker, P., Bremner, J. G., Mason, U., Spring, J., Mattock, K., Slater, A., & Johnson, S. P. (2010). Preverbal infants’ sensitivity to synaesthetic cross-modality correspondences. Psychological Science, 21, 21–25.
Ward, J., Huckstep, B., & Tsakanikos, E. (2006). Sound-colour synaesthesia: To what extent does it use cross-modal mechanisms common to us all? Cortex, 42, 264–280.
Weger, U. W., Meier, B. P., Robinson, M. D., & Inhoff, A. W. (2007). Things are sounding up: Affective influences on auditory tone perception. Psychonomic Bulletin & Review, 14, 517–521.
Zbikowski, L. M. (2005). Conceptualizing music: Cognitive structure, theory, and analysis. New York, NY: Oxford University Press.
Zbikowski, L. (2008). Metaphor and music. In R. Gibbs, Jr. (Ed.), The Cambridge handbook on metaphor (pp. 502–24). Cambridge: Cambridge University Press.
Zemp, H., & Malkus, V. (1979). Aspects of ‘Are’are musical theory. Ethnomusicology, 23, 5–48.