The Role of Music in Shaping Interpretations of Film
Siu-Lan Tan
Each musical soundtrack creates its own particular type of film and plot.
(Bullerjahn & Güldenring, 1994, p. 112)
Billy Wilder’s film The Lost Weekend (1945) opens with a panoramic shot of the New York City skyline in the 1940s, and continues panning across the back of a brick building, passing by two windows and stopping at the last window. Inside, a man is packing a suitcase and pausing as if to reflect. What could this man be thinking? What are his life circumstances? What could happen next? These were the questions Vitouch (2001) asked participants to answer as they wrote continuations of the plot.
Viewers who watched the excerpt with the original soundtrack to the film, which includes a score by Miklós Rózsa (consisting of pleasant and lush orchestral music, with a few strains of a theremin at the open window), tended to provide positive or ambivalent story continuations. For instance: “The man is in a good mood, has a secure and well-paid job, and is just preparing a meal for his new love (candle-light dinner for two). Then they go for a walk and explore the beauties of this city” (Vitouch, 2001, p. 76). Those who viewed an altered version of the film produced by pairing the same film clip with an excerpt of Barber’s Adagio for Strings that had been shown to evoke “sadness” and “melancholy” in previous studies (e.g., Krumhansl, 1997) produced more negative story continuations. For example: “The man has been left by his wife. He’s visiting the places where he thinks she could be. When he finally finds her together with another man, he shoots him.” (Vitouch, 2001, p. 76)
The content analysis of the participants’ written responses revealed that the music even seemed to alter the perception of the weather and impression of the cityscape, with some participants describing “lovely weather” and “beautiful surroundings” (with Rózsa’s score) versus “all gray and desperate” and “the city looks depressing, without a perspective for the future” (with Barber’s music) (p. 79). Overall, Vitouch found a close match between the participants’ description of the music and the character or mood of their story continuations, even though each participant saw only one version of the film and was not told that the focus of the study was on the music.
In his book The Art of Film Music, the late musicologist and film composer George Burt wrote of the vital role of the score in storytelling:
Music has the power to open the frame of reference to a story and to reveal its inner life in a way that could not have been fully articulated in any other way. In an instant, music can deepen the effect of a scene or bring an aspect of the story into sharper focus. It can have a telling effect on how the characters in the story come across—on how we perceive what they are feeling or thinking—and it can reveal or expand upon subjective aspects and values associated with places and ideas intrinsic to the drama . . . accenting this or that instant or event to help bring out the connections and divergent points of view.
(1994, pp. 4–5)
These words were penned by Burt in 1994, the same year that the journal Psychomusicology published a special issue described by guest editor Annabel Cohen as “the first collection of articles devoted entirely to the experimental psychology of film music” (Cohen, 1994, p. 2), written by contributors who were “pioneers in a new field” (p. 7). Prior to this, the role of music in film had been the focus of only a few empirical investigations, in spite of its constant presence in film, in live keyboard and orchestral accompaniment in so-called “silent” films in the early 1900s and since the early days of sound film in the 1920s. Almost two decades after the Psychomusicology monograph, the first book consolidating the scientific research on music in film and other media was published, entitled The Psychology of Music in Multimedia, a volume edited by Tan, Cohen, Lipscomb, & Kendall (2013).
As the present chapter will show, every line of George Burt’s statement has been supported by one or more studies in the still modest, but steadily growing area of empirical work focusing on film music. Other reviews explore the various functions music serves in film (Cohen, 1999) and emotional dimensions of film music (Cohen, 2010) or provide general surveys of film music research methods and findings (e.g., Cohen, 2014; Tan, 2016; Tan, in press). The present chapter does not duplicate these efforts but sets out to examine the role of music in shaping viewers’ interpretation of film scenes and the evolving storyline. The focus of the chapter is on the epic or narrative function of music in film as described by Buller-jahn, which refers to the way film music supports the narrative course (as cited in Kuchinke, Kappelhoff, & Koelsch, 2013, p. 127).
Film music presents a compelling case for psychological study as music plays an integral role in film and has been shown to have profound effects on many facets of the film viewing experience, without being in the “spotlight” of our attention for very long. This paradox inspired film music theorist Claudia Gorbman’s title for her influential 1987 book, Unheard Music: Narrative Film Music. In most narrative films, the storytelling is foregrounded, whereas stylistic elements—such as camera movement, editing, and the musical score—are often intended to serve the narrative without drawing much attention to themselves.1 For instance, there are 1,000 to 2,000 film edits in a typical 90-minute Hollywood movie but most viewers do not notice the majority of the cuts. Psychologists attribute this “edit blindness” to effective film editing practices designed to bind separate shots together, 2 to coincidences with eye blinks and saccades, and especially to inattentional blindness as attention is focused elsewhere on characters, action, and story (Smith & Henderson, 2008).
Similarly, most viewers have only a momentary or fleeting awareness of the score while watching a film. And when the music credits roll, most viewers do not recall having heard most of the pieces—even if the music is familiar, such as in a compilation score. Conversely, in a pilot study, my colleagues and I found that two-thirds of our 31 participants reported having heard music in one or more of six film clips when in fact there was no music at all (Tan, Spackman, & Bezdek, 2007). How can music significantly influence and even alter viewers’ perception of film when we are only momentarily aware of its presence (or absence)? This paradox underlies many film music studies and has yet to be fully resolved (see Cohen, 2014).
Our apparent lack of awareness of the film score, relative to the visual component of film, is not merely anecdotal. The phenomenon of visual dominance —which refers to how visual input often overrides information from other senses—has been widely demonstrated in studies on perception and memory (e.g., Posner, Nissen, & Klein, 1976; Spence, 2009). More specifically, when it comes to audiovisual contexts, attention is often prioritized to the visual component when both auditory and visual stimuli are present. To cite a musical example, Schutz and Lipscomb (2007) showed that the perceived duration of a tone produced by a marimba can be altered by showing a video of the marimba player striking the key with a long or short arm gesture. Much less frequently, auditory dominance has also been demonstrated. For instance, in the sound-induced flash illusion or SIFI, a single flash of light accompanied by two beeps in rapid succession is often perceived as two flashes (e.g., Shams, Kamitani, & Shimojo, 2000). Some studies show that auditory information may be dominant in audiovisual presentations of stimuli with emotional or affective components (see Kuchinke et al., 2013, for a review).
Most investigations of visual dominance employ speeded discrimination tasks with simple stimuli such as tones and flashing lights. Film music provides a context for studying multimodal processing with more meaningful materials of longer duration, often with a strong emotional component. The participants’ attention is often directed to the film and the focus on the music is typically concealed from the participants, so as not to bring more attention to the music track than when viewing a film in real-world contexts (see Tan, in press). Yet clear patterns often emerge between the responses of participants watching the same film clip with different music tracks. Music even influences viewers’ perception of the intensity of emotional interactions when participants are instructed to focus only on the video (e.g., Bolivar, Cohen, & Fentress, 1994) and incidental learning of (mood-congruent) musical soundtracks has occurred in a memory task, even when participants were instructed to memorize only the visual component of the film (Boltz, 2004). When it comes to film music, the motivating force for researchers is not so much to discover whether visual or auditory dominance prevail, but to better understand how information from multiple senses is integrated to construct meanings that amount to more than each sense can singly or jointly contribute.
For many, the earliest experiences of the storytelling power of a pictorial image can be traced back to a parent pointing to objects or figures in a picture book and commenting on them, shaping selective attention and attaching fragments of narrative to the shared focus of interest. Likewise, directing the viewer’s attention to particular elements of a film scene and ascribing meaning to those elements is fundamental to the comprehension of a scene and to the shaping of an unfolding story in the conscious mind. Can music play such a role in our experience of a film—and if so, how?
In a landmark study in 1988, Marshall and Cohen added two different music tracks to a silent black-and-white animation of geometric shapes, and found that the music changed viewers’ interpretations of some elements of the film. For instance, a small triangle was rated as significantly more “active” when accompanied by what they called “strong” music (in a minor key, played in octaves and chords in the lower register, with a slow but accelerating tempo) than when accompanied by “weak” music (in a major key, with a single line melody played in an upper register, and an unchanging moderate tempo). It is notable that the music tracks in Marshall and Cohen’s (1988) study did not have sweeping effects on the participants’ impression of all elements in the animation in a wholesale fashion, but altered the perception of only specific elements of the scene. The researchers speculated that some features of the “strong” music must have directed attention to the small triangle. They proposed that structural accents in this particular piece of music happened to coincide with many movements of the small triangle, directing attention to it because of its temporal correspondence to the music. Further, the music was strong and lively in character, and thus the characteristic of being “active” became attributed to the focus of attention (i.e., the small triangle).
In sum, two mechanisms were proposed: congruence and association. (1) Congruence: When the structure of the music coincides with events on screen, the eye is drawn to points of temporal (i.e., time-based) matching. Marshall and Cohen referred to this as “ temporal congruence” (1988, p. 18). Even infants look longer at video screens that match the timing of a soundtrack than one that is mismatched (e.g., Spelke, 1976), suggesting that humans have a tendency to seek meaningful congruencies among incoming data from different senses. (2) Association: The meaning of the music then becomes ascribed to the focus of attention; that is, the “connotations” of the music become attached to these elements, thus shaping our interpretation of the scene. These two ideas laid the foundations for the Congruence-Associationist Model or CAM (e.g., Cohen, 2013a), a cognitive theory of music in multimedia that is discussed later in this chapter.
Eye-tracking represents a potentially fruitful way to test the proposition of temporal congruence in film viewing, as this technique involves recording the movement of the eyes, and monitoring fixations (points in space where the eyes stabilize momentarily), saccades (movements involved as the eyes shift from one fixation to another), and scanpaths (the pattern or sequence of fixations and saccades over time). This technology has been used to study gaze behavior while viewing dynamic scenes, although only a handful of studies have examined how music may influence visual attention while viewing films (e.g., Auer et al., 2012; Mera & Stumpf, 2014; Smith, 2014; Wallengren & Strukelj, 2015).
In one study, Mera and Stumpf (2014) selected a scene 3 from Michel Hazanavicius’ silent film The Artist (2011) and added music tracks intended to either direct attention toward the main characters or to diffuse the focus. The selected clip was two and a half minutes long, and showed two characters (George and Peppy) crossing paths on a multilevel staircase with many other people passing by, thus providing multiple competing visual points of interest (as shown in Figure 30.1). The “focusing” music track was selected to direct attention to “the interplay between the central characters with melodic, textural, and orchestrational materials that change[d] . . . fluidly to match the narrative dynamics of the scene” (2014, p. 8). In contrast, the “distracting” music track was designed to disperse attention to the many possible focal points of the scene, and consisted of fast, lively music in 2/2 time with relentless high energy, avoiding audiovisual connections between the music and key elements in the scene. Consistent with the proposition of temporal congruence, Mera and Stumpf (2014) found that participants who watched the version with the “focusing” music track engaged in the lowest number of shifts in gaze and longest gaze duration, compared to those who had watched the version with the “distracting” music track or the (original) “silent” version.
Figure 30.1 Scene from The Artist with many visual points of interest (Studio 37 Orange, 2011). See endnote 3 for details on the selected scene and film.
Of further interest was a scene at the end of the clip, in which the main focus of interest (George) appears very small in an extreme wide-shot, and he is standing still while several passersby are moving at other points on the staircases. The “focusing” music was meant to draw attention to George’s motionless presence with a sudden shift from full orchestration to a single sustained note played by the violins (a common “focusing” device used in film scoring) and the tempo and meter also change suddenly at this point to mark a shift in focal point from the previous shot. Indeed, those in the “focusing” music condition took the shortest time to fixate on George (2.66 seconds from the beginning of that particular scene, compared to 3.76 and 5.02 seconds for “distracting” and silent versions respectively, though this finding did not reach statistical significance).
Marshall and Cohen’s initial (1988) study focused on temporal congruence, or attention to matching parts of a scene due to perceived time-based (synchronized) correspondences between some aspect of the moving image and the structure of the music. Another type of audiovisual matching is semantic congruence— which is based on matching of meanings—such as when the meaning of the music draws viewers’ attention to an object or a part of the scene. Boltz et al. (1991) and Bolivar et al. (1994, p. 32) first shed light on this type of congruence, and focused on affective meaning.
For example, the idea of semantic congruence may explain the divergent perceptions of the city and weather in the opening scene of The Lost Weekend, as described in Vitouch’s (2001) study in the beginning of this chapter. The peaceful, cheerful mood of Rózsa’s score may have drawn viewers’ attention to buildings in the foreground that seemed to be swathed in faint sunshine, whereas the somber and melancholic excerpt of Barber’s music may have drawn more attention to the gray skies and drab buildings in the distance. Thus, viewers may perceive the same cityscape to be either lightly sunlit or set in a gloomy fog, due to the tendency for viewers to focus mostly on parts of the film images that seem to match the affective meaning of the music—as conveyed by features such as mode, tempo, and harmony (see Gabrielsson & Lindström, 2010). Had one of the music tracks consisted of Grieg’s “Morning Mood” from the Peer Gynt suite, another kind of semantic congruence might have come into play. If the viewer recognizes this piece of music, attention might then be drawn to aspects of the scene linked to the idea of morning, such as the faint sunlight on the buildings and windows thrown open to let in the fresh morning air. In each case, the music conveys something that orients a viewer’s attention, and the meaning of the music may then become ascribed to the focus of attention, framing the scene and its story in a particular way.
An early eye-tracking pilot study on film music used a 10-second film clip from John Curran’s The Painted Veil (2006), which showed a young couple riding in a small boat on a river. Auer et al. (2012) showed this scene to participants paired with either calm orchestral music, suspenseful music from a horror film, or without any music. Participants who saw the film with the “calm” track tended to gaze mostly at the rowboat in the center of the screen. 4 However, those who saw the film paired with the “horror” music spent significantly more time looking at a dark patch of the water in the left corner of the screen.
Although the researchers did not set out to test semantic congruence, this finding could be interpreted in this way: The genre (horror) or emotion (scary, chilling) of the music may have directed participants’ attention to the dark patch, where something may be lurking. The composition of the shot is quite symmetrical and there is little camera movement to guide the eye, and no dialogue or narration in this scene. Therefore, the shift in gaze to the left corner seems to have been directed by the music. The two music tracks elicited two different focal points of attention in this scene, and essentially generated two different sets of expectations and stories for the scene.
The physical elements of film are light waves and sound waves impinging on our sense organs in a darkened room. How is this sensory information eventually translated into meaningful scenes and storylines that engage us in a film?
This is what Cohen aims to explain in the Congruence-Associationist Model (CAM), a multi-level framework for representing the cognitive processing of music and film information. A discussion of the full model lies beyond the scope of this short chapter; readers are referred to chapters and articles by Cohen providing thorough descriptions of CAM (e.g., Cohen, 2013a, which traces the development of CAM and includes a link to an animated version of the model narrated by Annabel Cohen herself). Given its relevance to the interpretation of scenes, this section provides only an outline of the “working narrative” level in the center of the model.
Briefly, Cohen (2014) proposes that we make use of two main sources of information in multimedia contexts such as film viewing: lower-order sensory input and higher-order knowledge.
(a) Lower-order input from the senses: In CAM, this sensory input (light waves, sound waves, etc.) comes through two visual channels (visual, text) and three auditory channels (music, speech, sound effects). Recently, a sixth channel has been added to encompass sensory input of a kinesthetic nature, such as MX4D and D-Box theater seats with motion effects (and vibrotactile and other sensations) synchronized with the movie action. In Cohen’s (2013a, 2014) view, the immediate sensory analysis of information includes the discovery of cross-channel congruencies. These structural correspondences or redundancies lead to prioritization of the cross-modally matched information for processing over other information. (In this chapter, we focus on the connections between the music and visual channels, but congruencies can occur between information from all channels—with the visual channel often, but not always, being prioritized).
(b) Higher-order knowledge stored in long-term memory: This includes autobiographical memory (i.e., recollection of episodes from one’s experience similar to what is being viewed on screen), knowledge of the grammar of story construction, knowledge about social rules and conventions, and all other information pertinent to building expectations and predictions about a particular unfolding scene and story. This source of higher-order knowledge is activated in top-down fashion when some of this pre-attentively assessed information from the lower-order input “leaks through” via fast bottom-up processes sufficiently to serve as clues for the generation of expectations or hypotheses about the narrative.
In Cohen’s view (2014), the best match between these two main sources of information at each point in the evolving film is what gives rise to the “working narrative.” Thus, the working narrative can be described as the viewer’s conscious, moment-to-moment, multimodal experience of the film as it is unfolding. Previously referred to as “visual narrative” in earlier formulations of the model, this phrase was replaced with the “working narrative” “because the conscious representation of a film narrative is transient and is always a work in progress, and because of possible connections to Baddeley’s (1986) model of working memory” (Cohen, 2013a, p. 31). Thus, CAM serves as a broad cognitive framework for understanding how we make sense of information in multimedia contexts, which continues to expand and respond to advancements in empirical findings and in cinema and theater technology.
The discussion thus far has focused on how music plays a role in directing attention to particular elements within a film scene, thus influencing the course of the working narrative. However, music can also suggest ideas that cannot always be depicted on screen such as the internal states of characters (i.e., their motives, thoughts, emotions, and desires) and the nature of the relationship between the characters and intentions toward one another (e.g., Boltz, 2001; Bullerjahn & Güldenring, 1994; Hoeckner, Wyatt, Decety, & Nusbaum, 2011; Tan, Spackman, & Wakefield, in press; Vitouch, 2001). Further, the music track may also guide inferences that extend to future events in the form of expectations and predictions, as described in Vitouch’s The Lost Weekend study reviewed in the beginning of this chapter. (See also Boltz, 2001; Boltz et al., 1991; Bullerjahn & Güldenring, 1994; Tan et al., 2007.)
In one study, Boltz (2001) showed three ambiguous film excerpts accompanied by music with a “positive” mood (pieces in major mode, with a clear melodic line, and strong metrical structure), or music with a “negative” mood (pieces in minor mode, with a lot of dissonance, fragments rather than a clear melody, and less predictable metrical structure), or no music. One of the three excerpts was a scene from Paul Schrader’s film Cat People (1982), in which a brother and sister reunite after a long separation, and peruse a closet of old circus toys that they had played with when they were children. Boltz found that the majority of participants who viewed the scene accompanied by music conveying a “positive” mood tended to interpret the reunion as the start of a happy life together. In contrast, the majority of participants who viewed the same scene with music conveying a “negative” mood were more likely to believe that the brother will harm or kill the sister. Positive music also led to more positive descriptions of the brother’s traits (e.g., kind, loving, protective), whereas more negative personality descriptions were ascribed to him (e.g., deranged, evil, manipulative) when the scene was accompanied by negative music.
Further, the music track also affected the accuracy of participants’ recall for whether particular items had appeared in a film scene or not, when given a surprise memory test one week later. For example, those who had watched the scene described earlier with “positive” music correctly recognized more “positive”-themed objects that had been featured in the scene such as old family photographs, juggling balls, and a feathered boa, than those who had viewed the scene without music (control condition). Similarly, those who watched with “negative” music correctly recognized more “negative”-themed objects such as storm clouds and a full moon, a human skull, statues of demons, than those who had viewed the scene without music. Thus it seemed that music directed more attention to items that were congruent with the character or mood of the music (semantic congruence), leading to better recall for these items.
Most surprisingly, participants who watched the film excerpt with “positive” music also falsely recognized more “positive” objects that had not been shown in the film clip—such as a music box, locket, and wrapped gifts. Similarly, those who watched with “negative” music falsely recognized more negative-themed objects, such as a large hunting knife, a book of witchcraft, and a bottle of poison, none of which had appeared in the film scene. Thus, the music may also have evoked associations with items that were congruent with the music, suggesting embellishments or elaborations to the scene that were consistent with these schemas. Boltz concluded that music “exert[s] a direct influence on the cognitive processing of a film by guiding selective attending toward mood-consistent information and away from other information that is inconsistent with its affective valence” (2001, p. 446). Although Boltz interpreted the results in the context of schematic influence, the findings are also consistent with CAM as the affect of the music seemed to direct attention to congruent visual properties of the film, thus influencing the working narrative.
A number of studies have examined the effects of mood-congruent versus mood-incongruent music on viewers’ perceptions of a film scene. Compared to music that is incongruent with the emotional tone and action of a scene, congruent music has been shown to have a stronger intensifying effect on perceived emotions of interactions (e.g., Bolivar et al., 1994, expt. 3), to be more absorbing and to draw more attention to central elements of a scene (Cohen & Siau, 2008), and to strengthen memory for the details of a mood-congruent scene, even when attention is directed away from the audio (Boltz, 2004).
Thus far, the studies reviewed in this chapter all employed non-diegetic music (or a dramatic score ), as the standard method in film music research is based on selecting film scenes that have no inherent music or dialogue so that they can be paired with different background music tracks (Tan, 2016; Tan, in press). Non-diegetic music accompanies a scene but is external to the fictional world of the film, such as music that punctuates the action and mirrors the mood and tension of a high-speed car chase. Empirical studies have very rarely focused on diegetic music, which refers to music presented as if originating from a source inside the fictional world of the film—such as music that the characters produce (e.g., by singing or playing), or control (e.g., by switching on a radio), or can hear because it is supposedly playing within their environment. Pertinent to this discussion, diegetic music is often used to interject music that is incongruent to the mood or events in a scene in a way that may give the impression of being incidental.
Figure 30.2 Three frames from an action sequence in Minority Report (Twentieth Century Fox and Dreamworks, 2002). See endnote 5 for details on the selected scene and film. See insert for color figure.
In an exploratory study, my colleagues and I examined the effects of presenting a piece of incongruent music diegetically or non-diegetically on viewers’ interpretation of the characters and scenario (Tan, Spackman, & Wakefield, in press). We selected a scene 5 from Steven Spielberg’s Minority Report (2002) in which a man and woman make their way hurriedly through a shopping mall, pursued by a troop of armed police (as shown in Figure 30.2). This tense scene is accompanied by a slow, gentle ballad (Henry Mancini’s “Moon River”) that sounds like it is playing distantly over the loudspeakers inside the mall. Thus, the music in the original scene is diegetic and mood-incongruent. We created an alternate version by mixing a recording of Mancini’s “Moon River” music at a louder level in relation to the dialogue and sound effects in order to suggest a non-diegetic dramatic score accompanying the scene.
The relationship and intentions of the two characters, and many aspects of the scenario, are somewhat open to interpretation for those who have not seen the film. My colleagues and I found that viewers who watched the altered version (with mood-incongruent music that we mixed to suggest non-diegetic music accompanying the scene) perceived the scene to be less tense and less suspenseful, assumed a less hostile and antagonistic relationship between the two characters, and believed them to be less fearful and suspicious of each other and less intent to harm each other, compared to those who watched the original Spielberg version (with diegetic mall music). Participants also perceived the male character to be experiencing less fear and more romantic interest in the other character when watching the scene with non-diegetic music, than those who watched the original Spielberg scene with music sounding like it was playing inside the shopping mall. It is possible that the gentle ballad music sustains the tension in a suspenseful action sequence if it sounds like incidental music that happened to be playing inside the mall (diegetic). However, the same mood-incongruent ballad may be assumed to be a commentary on the scene when presented as the non-diegetic dramatic score, thus having a softening or mollifying effect with romantic undertones.
We may posit that along with knowledge of story grammar in the “higher-order knowledge” level of the CAM model, one might include knowledge of musical conventions and film grammar, which studies have shown are assimilated at a young age through exposure to film, television, video games, and other narrative multimedia (e.g., see Wingstedt, Brandstrom, & Berg, 2008). Interestingly, systematic differences between the interpretations of the Minority Report scene for the diegetic and non-diegetic music versions were found even among participants who were unable to correctly recall whether the music had been presented as if playing inside the shopping mall or as part of the dramatic score. Future research may bear out whether the diegetic or non-diegetic nature of the music is registered at the pre-attentive level of CAM or later in the process.
Thus far, most of the research studies reviewed in this chapter focused on the effects of simultaneous presentation of music and moving images. However, the music track may also influence the perception of images not shown concurrently with the music.
For instance, my colleagues and I found that music does not have to accompany a character in order to influence our interpretation of the characters’ emotions (Tan, Spackman, & Bezdek, 2007). We selected four film clips to which we added 15 seconds of music, either ending just as a character entered a scene or beginning just after a character had left the scene. Even though the music was not played during the close-ups of the faces (selected through pilot testing to display neutral affect) and only overlapped a few seconds with the entrance of the character, the viewers’ interpretation of the film characters’ emotions tended to “migrate” toward the emotion expressed by the music. A surprising finding was that even music played after the character left the scene colored viewers’ perceptions of what they had already seen. We interpreted this as a case of backward priming, in which “the evaluative prime succeeds the target stimulus and possibly influences ongoing target processing” (Focken-berg, Koole, & Semin, 2006, p. 800, emphases provided).
Our findings highlight the constantly evolving and non-linear nature of the working narrative in CAM as evaluation does not end with the presentation of a stimulus; viewers continuously update the working narrative with new information they encounter, including cues from the musical score. Music may prime the affective tone of images that follow it, or reframe a prior action or event, just as narration may frame the meaning of a scene that follows it or modify the meaning of a prior action or event earlier in the story.
The placement of the music may also affect the strength of our memory for a scene. In a study focusing on the role of music in foreshadowing film events, it was found that memory for 3- to 4-minute suspenseful film sequences differed, depending on the placement of the music in relation to the resolution of the scene (Boltz, Schulkind, & Kantra, 1991). Specifically, when music foreshadowed the resolution of a scene, participants’ memory of the sequence was better if the affect of the music (sad or happy/positive or negative) was incongruent with the resolution of the scene. On the other hand, if the music accompanied the resolution, memory for the sequence was enhanced when the affect of the music matched the positive or negative resolution of the scene. Boltz et al. concluded that music that accompanies or foreshadows the outcome of a scene engages different attentional mechanisms that enhance the memorability of the film clip. Namely, music accompanying a scene enhances memory by directing attention to corresponding mood-congruent aspects of a scene (i.e., semantic congruence, based on the affective meaning of the music) whereas musical foreshadowing with mood-incongruent music may enhance memory by drawing attention to the discrepancy between one’s expectations and the outcome, in line with research showing that expectancy violations are recalled more accurately than information conforming to expectations (e.g., Maki, 1990).
Finally, music may also influence the degree to which a scene may feel completed or closed. In a series of studies, Thompson, Russo, and Sinclair (1994) found that viewers generally perceive film scenes to have ended with greater closure if paired with a musical soundtrack that resolves to the tonic (e.g., ending on a C chord if in the key of C) than when accompanied by exactly the same musical score but not resolving to the tonic (e.g., ending on the dominant or G chord in the key of C). Further, the perception of completion was stronger for scenes accompanied by tonally closed music with a clear metrical structure and a final melodic note that ended on a strong beat. Most of the participants in Thompson et al.’s experiments had little to no musical training, so these effects do not seem to rely on formal musical knowledge.
The findings were clear for a simple and brief film clip, showing a face emerging from a painting and back again. However, the effects are not always simple: For more elaborate film clips created by third author Sinclair or excerpts from the Hollywood film Clue (1985), tonally closed music increased viewers’ perception of closure of the events in some scenes, whereas for other film clips, musical closure had no effect or even decreased the impression of scene closure. On this question and many others reviewed in this chapter, there is still much to explore and understand with respect to how music interacts with numerous audio and visual elements to shape viewers’ experience of rich and complex film scenes.
Film music provides a compelling case for psychological investigation in many domains, especially music cognition. In spite of our often fleeting awareness of music accompanying a film, the systematic variations in participants’ responses to different musical tracks paired with the same film scene suggest that viewers must be processing some salient characteristics of the music—such as tempo, mode, dynamics, register, consonance/dissonance, timbre and instrumentation, tonal closure, and other parameters of music that have been shown to either express or induce emotions (Gabrielsson & Lindström, 2010). According to the Congruence-Associationist Model, meanings conveyed by the music then become attached to focal points of the visual scene that are structurally congruent in temporal and/ or semantic ways.
In sum, music accomplishes much more than simply mirroring the action and emotion of a film scene, or intensifying the effects of the moving images. By directing attention, guiding inferences and suggesting elaborations, foreshadowing future events and reframing scenes we have already seen, and influencing the degree of perceived openness or closure of scenes, music plays an essential role in shaping the audience’s interpretation of the unfolding storyline of a film.
The author is grateful to Elizabeth A. Penix (Kalamazoo College and Walter Reed Army Institute of Research, Center for Military Psychiatry and Neuroscience) and Miguel Mera (Department of Music at the City University of London) for helpful comments on an earlier draft and kind assistance with a figure.
1. See chapter 2 in Audissino, E. (2014). John Williams’ film music. Madison, WI: University of Wisconsin Press.
2. For instance, see Bordwell and Thompson on continuity editing on pages 232–255 in Bordwell, D., & Thompson, K. (2012). Film art. New York, NY: McGraw-Hill.
3. The selected scene begins at 35 minutes and 26 seconds into the film and is 158 seconds in duration: Langmann, T. (Producer), & Hazanavicius, M. (Director). (2011). The artist [DVD]. France: Studio 37 Orange.
4. This is common, as a center-of-screen bias has been shown in many eye-tracking studies of dynamic scenes. See Mital, P. K., Smith, T. J., Hill, R. M., & Henderson, J. M. (2010). Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, 3, 5–24.
5. The selected scene can be found at 1:35:33 to 1:36:57 in the film and is 84 seconds in duration: Molen, G. R., Curtis, B., Parkes, W. F., de Bont, J. (Producers), & Spielberg, S. (Director). (2002). Minority report [DVD]. United States: 20th Century Fox and DreamWorks.
Boltz, M. G. (2001). Musical soundtracks as a schematic influence on the cognitive processing of filmed events. Music Perception, 18, 427–454.
Cohen, A. J. (2013b). Film music from the perspective of cognitive science. In D. Neumeyer (Ed.), The Oxford handbook of film music studies (pp. 96–130). Oxford: Oxford University Press.
Schrader, M. (Director), & Kraft, R., Willbanks, J., Thompson, T., Holmes, K., Gold, N., & Chavarria, C. (Producers). (2016). Score: A film music documentary [Motion picture]. United States: Epicleff. (Features interviews with over 50 Hollywood composers, tracing the history and process of film scoring. Duration: 94 minutes.)
Tan, S.-L., Cohen, A. J., Lipscomb, S. D., & Kendall, R. A. (2013). The psychology of music in multimedia. Oxford: Oxford University Press.
Audissino, E. (2014). John Williams’ film music. Madison, WI: University of Wisconsin Press.
Auer, K., Vitouch, O., Koreimann, S., Pesjak, G., Leitner, G., & Hitz, M. (2012, July). When music drives vision: Influences of film music on viewers’ eye movements. Paper presented at the 12th International Conference on Music Perception and Cognition and the 8th Triennal Conference of the European Society for the Cognitive Sciences, Thessaloniki, Greece.
Baddeley, A. (1986). Working memory. New York, NY: Oxford University Press.
Bolivar, V. J., Cohen, A. J., & Fentress, J. C. (1994). Semantic and formal congruency in music and motion pictures: Effects on the interpretation of visual action. Psychomusicology, 13, 28–59.
Boltz, M. G. (2004). The cognitive processing of film and musical soundtracks. Memory & Cognition, 32, 1194–1205.
Boltz, M., Schulkind, M., & Kantra, S. (1991). Effects of background music on the remembering of filmed events. Memory & Cognition, 19, 593–606.
Bordwell, D., & Thompson, K. (2012). Film art. New York, NY: McGraw-Hill.
Bullerjahn, C., & Güldenring, M. (1994). An empirical investigation of effects of film music using qualitative content analysis. Psychomusicology, 13, 99–118.
Burt, G. (1994). The art of film music. Boston, MA: Northeastern University Press.
Cohen, A. J. (1994). Introduction to the special volume on psychology of film music. Psychomusicol-ogy, 13, 2–8.
Cohen, A. J. (1999). The functions of music in multimedia: A cognitive approach. In S. W. Yi (Ed.), Music, mind, and science (pp. 53–69). Seoul, Korea: Seoul National University Press.
Cohen, A. J. (2010). Music as a source of emotion in film. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 879–908). New York, NY: Oxford University Press.
Cohen, A. J. (2013a). Congruence-Association Model of music and multimedia: Origin and evolution. In S.-L. Tan, A. J. Cohen, S. D. Lipscomb, & R. A. Kendall (Eds.). The psychology of music in multimedia (pp. 17–47). Oxford: Oxford University Press.
Cohen, A. J. (2014). Resolving the paradox of film music. In J. C. Kaufman & D. K. Simonton (Eds.), The social science of cinema (pp. 47–83). New York, NY: Oxford University Press.
Cohen, A. J., & Siau, Y.-M. (2008). The narrative role of music in multimedia presentations: The Congruence-Association Model (CAM) of music and multimedia. In K. Miyazaki, Y. Hiraga, M. Adachi, Y. Nakajima, & M. Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception and Cognition (ICMPC10) Sapporo, Japan (pp. 77–82). Adelaide, Australia: Causal Productions.
Fockenberg, D., Koole, S., & Semin, G. (2006). Backward affective priming: Even when the prime is late, people still evaluate. Journal of Experimental Social Psychology, 42, 799–806.
Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions. In P. N. Juslin, & J. A. Sloboda (Eds.), Handbook of music and emotion (pp. 367–400). New York, NY: Oxford University Press.
Gorbman, C. (1987). Unheard melodies: Narrative film music. Bloomington, IN: Indiana University Press.
Hoeckner, B., Wyatt, E. M., Decety, J., & Nusbaum, H. (2011). Film music influences how viewers relate to movie characters. Psychology of Aesthetics, Creativity, and the Arts, 5, 146–153.
Krumhansl, C. L. (1997). An exploratory study of musical emotions and psychophysiology. Canadian Journal of Experimental Psychology, 51, 336–353.
Kuchinke, L., Kappelhoff, H., & Koelsch, S. (2013). Emotion and music in narrative films: A neuroscientific perspective. In S.-L. Tan, A. J. Cohen, S. D. Lipscomb, & R. A. Kendall (Eds.), The psychology of music in multimedia (pp. 118–138). Oxford: Oxford University Press.
Langmann, T. (Producer) & Hazanavicius, M. (Director). (2011). The artist [DVD]. France: Studio 37 Orange.
Maki, R. H. (1990). Memory for script actions: Effects of relevance and detail expectancy. Memory & Cognition, 18, 5–14.
Marshall, S. K., & Cohen, A. J. (1988). Effects of musical soundtracks on attitudes toward animated geometric figures. Music Perception, 6, 95–112.
Mera, M., & Stumpf, S. (2014). Eye-tracking film music. Music and the Moving Image, 7, 3–23.
Mital, P. K., Smith, T. J., Hill, R. M., & Henderson, J. M. (2010). Clustering of gaze during dynamic scene viewing is predicted by motion.
Molen, G. R., Curtis, B., Parkes, W. F., de Bont, J. (Producers), & Spielberg, S. (Director). (2002). Minority report [DVD]. United States: 20th Century Fox and DreamWorks.
Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157–171.
Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone duration. Perception, 36, 888–897.
Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions: What you see is what you hear. Nature, 408, 788.
Smith, T. J. (2014). Audiovisual correspondences in Sergei Eisenstein’s Alexander Nevsky: A case study in viewer attention. In T. Nannicelli, & P. Taberham (Eds.), Cognitive Media Theory (pp. 85–105). New York, NY: Routledge.
Smith, T. J., & Henderson, J. M. (2008). Edit blindness: The relationship between attention and global change blindness in dynamic scenes. Journal of Eye Movement Research, 2, 1–17.
Spelke, E. S. (1976). Infants’ intermodal perception of events. Cognitive Psychology, 8, 553–560.
Spence, C. (2009). Explaining the Colavita visual dominance effect. Progress in Brain Research, 176, 245–258.
Tan, S.-L. (2016). Music and the moving image keynote address 2015: The psychology of film music: Framing intuition. Music and the Moving Image, 9, 23–38.
Tan, S.-L. (in press). From intuition to evidence: The experimental psychology of film music. The Routledge companion to screen music and sound. Abingdon, UK: Routledge.
Tan, S.-L., Spackman, M. P., & Bezdek, M. A. (2007). Viewers’ interpretations of film characters’ emotions: Effects of presenting film music before or after a character is shown. Music Perception, 25, 135–152.
Tan, S.-L., Spackman, M. P., & Wakefield, E. M. (in press). The effects of diegetic and nondiegetic music on viewers’ interpretations of a film scene. Music Perception.
Thompson, W. F., Russo, F. A., & Sinclair, D. (1994). Effects of underscoring on the perception of closure in filmed events. Psychomusicology, 13, 99–118.
Vitouch, O. (2001). When your ear sets the stage: Musical context effects in film perception. Psychology of Music, 29, 70–83.
Wallengren, A.-K., & Strukelj, A. (2015). Film music and visual attention: A pilot experiment using eye-tracking. Music and the Moving Image, 8, 69–80.
Wingstedt, J., Brändström, S., & Berg, J. (2008). Narrative music, visuals and meaning in film. Visual Communication, 9, 193–210.