SOUND
Introduction
The very earliest video games were precisely that: games utilizing one sensory modality only, that of sight. Unlike developments in cinema, which took several decades from its commercial beginnings to develop a viable and reliable sound system, sound in mass-produced commercial video games was present from the start—in the arcade machine Computer Space (Nutting Associates, 1971), which was closely followed by PONG (Atari, 1972), whose monotonous, monophonic beeps rapidly became established as a synecdoche for video games—although the first home console, the Magnavox Odyssey of 1972, did not have sound. The circuit boards the arcade machines were built upon had the innate capacity to produce tones and this aided the faster implementation of game sound when compared to the implementation of film sound. Since then, rapid developments in digital technologies have created new ways to design and utilize game sound and this, in turn, has led to developments in the player experience of and relationship to game sound.
The relationship between player and sound was initiated by Computer Space’s and PONG’s simple use of sound cues to indicate to the player the occurrence of important game events. Collins (2008) points to the presence of the repetitive, musical chugging of Space Invaders (Taito, 1978) as an early instance of a more sophisticated relationship; the longer the player survives in the game, the faster the music becomes (along with the aliens’ movements).
Video games operate through various sensory and perceptual modalities of which, currently, the most important are vision and hearing. Sound, though, is capable of depicting events and spaces beyond the confines of the screen to a greater extent than image. Combined with the localization function of sound, the importance of sound to the positioning of the player within the game world cannot be underestimated. This is particularly true in first-person perspective video games where, for example, enemies or rival cars can be heard coming from behind before they are seen.
Mention must be made of a special class of game that simply would not exist without sound: audio-only games. These, of course, are not video games but audio games that might be better classified as computer games. There is a wide variety of genres of audio-only games some of which replicate with sound certain video game genres; for instance, the audio-only version of the first-person shooter video game DOOM (id Software, 1993), and other games found at www.audiogames.net.
Throughout this study, diegetic is used to describe those sounds arising out of the internal logic of the game world whereas non-diegetic refers to all other sounds. This essay begins with a brief survey of the functions of video game sound and then moves to a short summary of the development of game sound technology. Theoretical and empirical issues are assessed in the subsequent section before the essay concludes with a look to the future.
Functions of Video Game Sound
A number of different game genres demonstrate different experiences particularly where those experiences, and thus relationship to sound, help to elucidate the subsequent sections on game sound technology and various theoretical and empirical approaches to game sound. Although this section categorizes game sound as film sound is often categorized (namely dialogue, music, sound effects, and ambient sounds) and although it often draws comparisons between the two, it should be stressed that game sound is not film sound. The former is typically and fundamentally interactive, real-time, and produced according to the actions of players whereas the latter is usually non-interactive, fixed, and unchangeable.
Dialogue
Given technological limitations, such as storage constraints and difficulties in dealing with non-linear aspects of games, the game dialogue or speech that is used in some video games typically has a different role than that of film dialogue. Nevertheless, game dialogue has some functions in common with film dialogue. For example, the accents and dialects of game characters contribute to the mise-en-scène and, in certain genres, aid in identifying friend or foe (Collins, 2008), while the presence of dialogue can indicate the level of attention of game characters toward players (Jørgensen, 2009). Additionally, the emotive quality of such utterances contributes to raising or lowering tension in the game.
As with voice-over narration in film, game voice-overs are an aid to understanding game characters and plot as well as a means to move the action along. In video games, though, such devices can also provide tasks and objectives for the player. Another important function in some multi-player games is communication between team members, making use of voice-over Internet technology, which has become increasingly feasible as Internet bandwidth improves.
Music
Following film sound theory (Chion, 1982, 1994), music in video games is typically described as non-diegetic, and comprises an underscore that often runs throughout gameplay. As an underscore, music is intended primarily to serve emotion and any game narrative. Today’s video games can come with fully scored, orchestral compositions to rival any mainstream film. The increasing rapprochement between film composing and game composing is evidenced by the number of film composers who also write for games (e.g., Michael Giacchino whose credits include the Medal of Honor series (Electronic Arts, 1999–2006) and Mission: Impossible—Ghost Protocol (Brad Bird, 2011)).
Such music serves other purposes. Not least is the use of popular music where, although some music is commissioned specifically, the game provides a platform to re-present existing music tracks by established artists (e.g. Wipeout (Psygnosis, 1995–1996)). Music can also provide a means to attract customers’ attention. This is particularly the case with video games placed in noisy arcades where they must compete to earn the punters’ cash (Collins et al., 2011). In some video games, music can be a vital diegetic component of gameplay and the music game genre is the prime example of this. Here, the attraction of playing is derived from the pleasure and satisfaction of music-making. Thus, the player “composes” throughout the gameplay of games such as Rez (United Game Artists, 2001) and Aurifi (Four Door Lemon, 2010); musical compositions are built up from pre-supplied musical snippets or loops through the skillful navigation of game objectives by the player. In other music games, such as the Guitar Hero series (RedOctane and Harmonix Music Systems, 2005–2010) and Rock Band (Harmonix Music Systems, 2007), the player performs music, often on an external, customized musical instrument, or sings through a microphone in order to score points according to musical ability.
It can be difficult to ascertain where non-diegetic music stops and diegetic sound effects take over. Whalen (2004) draws upon the kinaesthetic practice of “Mickey Mousing” (or isomorphic music (Curtiss, 1992)) in many early animated films to point to some of the functions of music in cartoon-like games. The musical score rhythmically and/or melodically mimics the on-screen action. This practice typically occurs in games with a similar aesthetic to those of Super Mario Bros. (Nintendo, 1985). For instance, when the character Mario jumps in the air, the player hears one of a variety of ascending glissandi. Whalen suggests that such isomorphic music imparts life and anthropomorphic qualities to the virtual characters.
Sound Effects
Non-diegetic sound effects usually involve menu interface actions outside of gameplay. The timbre and form of these sound effects often conform to the sound sets used during gameplay and thus help set the scene for the game, but their main function is merely to confirm the user’s menu actions.
Sound effects in video games are typically diegetic, though, and are triggered by events occurring during gameplay. These events can be actions of the game’s characters or important game events requiring the player’s attention. Their sounds, depending upon genre, can include footsteps, radio messages, gun-shots, car engines and tires screeching on various surfaces, balls being kicked or hit, flesh being punched, and referees’ whistles.
What typically characterizes these sound effects is that they conform to a realism of action; do a sound, hear a sound (a play on the film sound design mantra of see a sound, hear a sound). Many such sounds will be authentic (actual recordings of the sounds produced by those events) or, at the least, will be verisimilitudinous. This latter state derives from the cinematic practice of dubbing sound effects and, in particular, the use of Foley sound effects whereby a sound effect is used that approximates the sound that would be produced by the event depicted on the screen. Through synchronization and realism of action, the sound becomes the sound of the depicted event. Sound effects can, however, also be fantastical; for example, platform game sounds or role-playing fantasy games sounds for events not occurring outside the game world and, over time, these become no less believable as the sounds of those events.
Ambient Sounds
Many video games, particularly those with elaborate and wide-ranging game worlds such as action and adventure games, make use of ambient sounds that occur in different parts of the game world. They are not triggered by game or player events (other than that the player enters that particular space in the game world) and often derive from sources that are not depicted on screen. Such sounds might include the surrounding sounds of battle, wind through the trees, wolves howling, or birds singing.
A large proportion of ambient sounds work with image, plot, and narrative in a variety of functions devoted to the mise-en-scène of the game world. For instance, a large physical space depicted on the flat, two-dimensionality of the screen might be enhanced by the use of reverberation or sounds from off-screen. Ambient sounds can also depict diurnal rhythms such as those sounds of fauna that become heard as day changes into night in Red Dead Redemption (Rockstar, 2010).
Technologies of Video Game Sound
Today’s video games utilize multiple diegetic sounds that are recordings of sounds in the real world or are specially designed, fantastical sounds crafted to match the effects or ambience of a game world’s mise-en-scène. Modern video games also have non-diegetic musical accompaniments that may be either pre-recorded tracks or stored musical scores that are produced anew at each gameplay. Developments in game technology led to new relational possibilities between player and sound and it is these developments that will be summarized next.
Since the circuit board used for PONG had no dedicated sound generators, a video sync generator was used to produce the game’s synthesized tones. Through the 1970s, arcade machines following PONG had to compete with each other in a noisy environment (Collins et al., 2011) and, soon, dedicated synthesis chips were added to these games (as well as to home consoles). These allowed for a wider range of timbres and volumes, greater polyphony, and the use of computerized musical scores to supplement the sound effects with strong, thematic tunes that broadcast their siren call to arcade customers with an ever greater stridency.
The introduction of MIDI soundcards into home computers in the late 1980s, and the growth of gaming on those machines, gave rise to more ambitious music and an increase in the use of audio samples (digital recordings of sound). Such soundcards dramatically increased the palette of timbres available and permitted more voices to be sounded simultaneously (e.g. the Soundblaster AWE32 of 1994 had 128 pre-set instruments with 32-voice polyphony), allowing musical scores, programmed into the game software, to approach the complexity and density of symphonic works.
The use of audio samples was taken further in the early 1990s with the use of large-capacity, digital storage media such as CDs. Today, thousands of high-quality audio samples can be stored allowing developers to introduce greater variety to sounds once limited by small storage capacity. Multiple recordings of footsteps on a variety of surfaces, for example, or a large number of car engine sounds now provide the player with a vastly increased range of sounds when compared to video games of the 1970s and 1980s.
Game audio engines of the mid-2000s (e.g. Valve’s Source and Crytek’s CryEngine), introduced the real-time processing of audio samples. As the player moves through the physical spaces of such games, sound effects are processed with reverberation to approximate the reverberation characteristics (sound perspective) of such spaces and to improve the realism of the soundscape; but even with the increased storage capacity of optical media, game designers still cannot offer the same variety and range of sounds that are heard in the real world.
Theoretical Considerations and Empirical Research
Video game sound performs many functions but all relate in some fashion to the player of the game. Sound, together with image, plot, narrative, and other social activity surrounding the game, helps the player to engage with the affordances offered by the game’s software and hardware and, thus, to take part in the game world itself. Most theoretical consideration and empirical research on video game sound is concerned with the relationship between player and sound but approaches it from different angles.
A number of authors have considered the place of sound in the game’s diegesis. Much of this thinking is a development of film sound theory (e.g. Chion, 1982, 1994), but taking into account the highly interactive nature of video games when compared to cinema, while other thinking develops the environmental soundscape theories of Schafer (1994). Grimshaw (2008a) proposes a number of instances of diegetic sound (for example, kinediegetic and telediegetic) to describe the role of sound in first-person shooters, both single- and multi-player, while Jørgensen (2009) gives us the term transdiegetic as a means to understand the functions of some game music which, initially, might appear to be non-diegetic.
Another area of theory concerns itself with engagement, particularly immersion, in the game world and how sound facilitates this. Such a topic is of interest generally in virtual environments not least because of disagreement as to what is immersion (despite the claims of game publishers that their game will immerse you like no other). Several authors discuss immersion (and the related concept of presence) with regard to virtual environments in the general sense (e.g. Brenton et al., 2005; Waterworth & Waterworth, 2014) and immersion in video games (e.g. Calleja 2011, 2014; Jennett et al., 2008) but only a small number directly deal with immersion as it relates to video game sound. Of these, Ward (2010) makes use of Barthes’ (1977) concept of the grain of the voice to analyze player immersion through the embodying of voices heard during the playing of BioShock (2K, 2007).
Emotion has been argued to be a key component of player immersion and Ekman and Lankoski (2009) investigate the uses of sound in survival horror games to engender fear and thus engage the player. Murphy and Pitt (2001) discuss the use of spatial sound to enhance immersion in interactive, virtual environments whilst Jørgensen (2006) argues that realistic audio samples make the game more immersive. In a number of articles, Grimshaw (e.g. 2008b, 2012) analyzes the sound of first-person shooters, particularly where the ability of the player to contribute sound to the acoustic ecology of the game (the triggering of audio samples through player actions and presence in the game world) is a key factor in player immersion in that ecology and, thus, the game world.
A number of empirical studies have addressed the effect of video game sound on player perception and psychophysiology. Some of this relates to the reception of game sound by consumers; for example, Wood et al. (2004) found that sound was amongst the most highly-rated features of video games. Other studies investigate the effect of sound on player performance showing a deterioration in the absence of non-diegetic music and/or sound effects (e.g. Nacke, Grimshaw, & Lindley, 2010; Tafalla, 2007) although yet other studies on non-diegetic music contradict this (e.g. Cassidy & MacDonald, 2009, 2010; Tan, Baxa, & Spackmann, 2010).
There are several studies that assess the effect of sound on the player’s psychophysiology. Results are mixed, particularly for quantitative, physiological studies. Some studies have shown no significant psychophysiological effects in the presence of sound (Grimshaw, Lindley, & Nacke, 2008; Nacke, Grimshaw, & Lindley, 2010; Wolfson & Case, 2000), while others have found significant effects (e.g. Hébert et al., 2005; Tafalla, 2007). For an overview of psychophysiological methods and empirical studies in the context of video game sound, see either Nacke and Grimshaw (2011) or Grimshaw, Tan, and Lipscomb (2013).
The Future of Video Game Sound
In the few decades since sound was first introduced to video games, it has developed from simple, monophonic synthesized tones to complex musical arrangements and the use of multiple, high-fidelity audio samples with some game audio engines able to process sound effects according to the player’s position in the game world. Although predictions are risky, a number of approaches to the design of game sound may be put forward that point to possible developments.
These approaches use new technologies and computational methods to affect the player’s relationship to sound. Video game sound first involved real-time synthesis, before moving to MIDI and the use of audio samples, and it may be that increasing computational power will allow a return to real-time synthesis using the developing field of procedural audio. Such an approach creates greater variety of sound at a fraction of the storage cost required by audio samples; coupled as the procedures are to precise assessments of the game world’s materials, spaces, and characters, this is likely to further enhance player immersion because such subtle variety is closer to our experience of sound outside the game world (see Farnell, 2011).
Other technological developments open the door to real-time synthesis or processing of video game sound according to the player’s psychophysiologicial state. Commercially-available headset devices that monitor that state through electroencephalography and electromyography are likely to become increasingly utilized especially where they allow game audio engines to monitor, and immediately respond to, the player’s emotional and affect state (e.g. Garner & Grimshaw, 2011; Grimshaw & Garner, 2013). However video game sound develops, it will almost certainly be in a manner that more closely, and in real-time, integrates video game technology and the player.
References
Barthes, R. (1977). The grain of the voice. In Image Music Text (pp. 179–188). London: Fontana Press.
Brenton, H., Gillies, M., Ballin, D., & Chatting, D. (2005). The uncanny valley: Does it exist and is it related to presence? Paper presented at Proceedings of the Human-Animated Characters Interaction.
Calleja, G. (2011). In-Game: From Immersion to Incorporation. Cambridge, MA: MIT Press.
Calleja, G. (forthcoming 2014). Immersion in virtual worlds. In M. Grimshaw (Ed.), The Oxford Handbook of Virtuality. New York: Oxford University Press.
Cassidy, G.G. & MacDonald, R.A.R. (2009). The effects of music choice on task performance: A study of the impact of self-selected and experimenter-selected music on driving game performance and experience. Musicae Scientiae, 13, 357–386.
Cassidy, C.G. & MacDonald, R.A.R. (2010). The effect of music on time perception and performance of a driving game. Scandinavian Journal of Psychology, 51, 455–464.
Chion, M. (1982). La voix au cinéma. Paris: L’Etoile/Cahiers du Cinéma.
Chion, M. (1994). Audio-vision: Sound on Screen (C. Gorbman, Trans.). New York: Columbia University Press.
Collins, K. (2008). Game sound: An introduction to the history, theory, and practice of video game music and sound design. Cambridge, MA: MIT Press.
Collins, K., Tessler, H., Harrigan, K., Dixon, M.J., & Fugelsang, J. (2011). Sound in electronic gambling machines: A review of the literature and its relevance to sound. In M. Grimshaw (Ed.), Game Sound Technology and Player Interaction: Concepts and Development (pp. 1–21). Hershey, PA: IGI Global.
Curtiss, S. (1992). The sound of early Warner Bros. cartoons. In R. Altman (Ed.), Sound Theory Sound Practice (pp. 191–203). New York: Routledge.
Ekman, I. & Lankoski, P. (2009). Hair-raising entertainment: Emotions, sound, and structure in Silent Hill 2 and Fatal Frame. In B. Perron (Ed.), Horror Video Games: Essays on the Fusion of Fear and Play (pp. 181–199). Jefferson, NC: McFarland.
Farnell, A. (2011). Behaviour, structure and causality in procedural audio. In M. Grimshaw (Ed.), Game Sound Technology and Player Interaction: Concepts and Development (pp. 313–339). Hershey, PA: IGI Global.
Garner, T. & Grimshaw, M. (2011, September 7–9). A climate of fear: Considerations for designing an acoustic ecology for fear. In Proceedings of Audio Mostly 2011, Coimbra, Portugal.
Grimshaw, M. (2008a). The Acoustic Ecology of the First-person Shooter: The Player Experience of Sound in the First-person Shooter Computer Game. Saarbrücken: VDM Verlag.
Grimshaw, M. (2008b). Sound and immersion in the first-person shooter. International Journal of Intelligent Games & Simulation, 5(1), 119–124.
Grimshaw, M. (2012). Sound and player immersion in digital games. In T. Pinch & K. Bijsterveld (Eds.), The Oxford Handbook of Sound Studies (pp. 347–366). New York: Oxford University Press.
Grimshaw, M. & Garner, T.A. (forthcoming 2013). Embodied virtual acoustic ecologies of computer games. In K. Collins, B. Kapralos, & H. Tessler (Eds.), The Oxford Handbook of Interactive Media. New York: Oxford University Press.
Grimshaw, M., Lindley, C.A., & Nacke, L. (2008, October 22–23). Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. In Proceedings of Audio Mostly 2008, Piteå, Sweden.
Grimshaw, M. N., Tan, S., & Lipscomb, S.D. (2013). Playing with sound: The role of sound effects and music in gaming. In S. Tan, A. Cohen, S. D. Lipscomb, & R. A. Kendall (Eds.), Psychology of Music in Multimedia. New York: Oxford University Press.
Hébert, S., Béland, R., Dionne-Fournelle, O., Crête, M., & Lupien, S.J. (2005). Physiological stress response to video-game playing: The contribution of built-in music. Life Sciences, 76, 2371–2380.
Jennett, C., Cox, A.L., Cairns, P., Dhopare, S., Epps, A., Tijs, T., & Walton, A. (2008). Measuring and defining the experience of immersion in games. International Journal of Human-Computer Studies, 66, 641–661.
Jørgensen, K. (2006, October 11–12). On the functional aspects of computer game audio. Paper presented at Audio Mostly 2006, Piteå, Sweden.
Jørgensen, K. (2009). A Comprehensive Study of Sound in Computer Games: How Audio Affects Player Action. Lewiston, NY: The Edwin Mellen Press.
Murphy, D. & Pitt, I. (2001). Spatial sound enhancing virtual story telling. Lecture Notes in Computer Science, 2197, 20–29.
Nacke, L. & Grimshaw, M. (2011). Player–game interaction through affective sound. In M. Grimshaw (Ed.), Game Sound Technology and Player Interaction: Concepts and Developments (pp. 264–285). Hershey, PA: IGI Global.
Nacke, L., Grimshaw, M., & Lindley, C.A. (2010). More than a feeling: Measurement of sonic user experience and psychophysiology in a first-person shooter game. Interacting with Computers, 22(5), 336–343.
Schafer, R. M. (1994). The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, VT: Destiny Books.
Tafalla, R.J. (2007). Gender differences in cardiovascular reactivity and game performance related to sensory modality in violent video game play. Journal of Applied Social Psychology, 37, 2008–2023.
Tan, S.L., Baxa, J., & Spackman, M.P. (2010). Effects of built-in audio versus unrelated background music on performance in an adventure role-playing game. International Journal of Gaming and Computer-Mediated Simulations, 2(3), 1–23.
Ward, M. (2010). Voice, videogames, and the technologies of immersion. In N. Neumark, R. Gibson, & T. van Leeuwen (Eds.), VOICE Vocal Aesthetics in Digital Arts and Media (pp. 265–277). Cambridge, MA: MIT Press.
Waterworth, J.A. & Waterworth, E.L. (forthcoming 2014). Distributed embodiment: Real presence in virtual bodies. In M. Grimshaw (Ed.), The Oxford Handbook of Virtuality. New York: Oxford University Press.
Whalen, Z. (2004). Play along—an approach to videogame music. Game Studies, 4(1). Retrieved June 9, 2012, from www.gamestudies.org/0401/whalen/.
Wolfson, S. & Case, G. (2000). The effects of sound and colour on responses to acomputer game. Interacting with Computers, 13, 183–192.
Wood, R.T.A., Griffiths, M., Chappell, D., & Davies, M.N.O. (2004). The structural characteristics of video games: A psycho-structural analysis. CyberPsychology, 7, 1–10.