6 Mixing

We call the balancing of the combination of sounds mixing, and the mix involves the overall composition, emphasis, dynamic range, and spatial position of sounds. Mixing determines the relationship between sounds in a composition—whether it’s a song, a soundscape, or the sound for other media. To use a visual analogy, sounds are composed in the mix much as we learn to compose images if we study art: we learn how to direct the attention and keep the eye moving around a composition, and we can do this with the ear and sound. We try to add clarity and direct the listener’s attention to the most important elements.

Everything we’ve learned so far comes together in the mix. Mixing involves many aspects we’ve already talked about in regard to the use of effects and creating a sense of space. We’ve already covered many effects used in mixing like equalization, and we’ll cover spatial positioning a bit more in this and the next chapter. Here we’re going to focus on a few other concepts related to putting sounds together. Mixing is usually done by a mixing engineer: someone who specializes in the task. It’s an enormous area, and takes a lot of practice to get right. There is a lot more to know about mixing than what we can cover here, but we’ll tackle the basics and get you started.

6.1 Mixing Theory: Three-Dimensional Sound

We looked at how sound responds differently in space in chapter 4, and we’ll continue this work in chapter 7. But another aspect of sound in space relates to mixing, namely, the perception of sound as part of a composition (whether music, film, soundscape, etc.) that is part of a three-dimensional space, even when we mix in stereo. With two ears, we hear sound as existing in space around us. This perception is enhanced by a good mix, which typically situates us in the center of this sonic space. I find that it’s useful when mixing to think of sounds as occurring in space in terms of four dimensions: three physical dimensions—in visual coordinates we speak of these in terms of the Cartesian x/y/z co-ordinates, which consist of left-to-right (usually x), top-to-bottom (usually y), and near-to-far (usually z) (figure 6.1). The fourth dimension is time: how things change over time is an important aspect of the mix, but let’s focus on the first three dimensions for now.

The left-to-right or x axis is controlled by our panning settings. We will discuss panning in further detail below and in the next chapter; the first step in understanding our “sound box” (see Moore 1992) is the left-to-right position. This position is set virtually with panning, but there are also perceptual aspects to panning that can alter where we think we are putting a sound in the field, and this perception shifts depending on where the listener is located with regard to speakers, and on whether they are using headphones. Panning algorithms use a complex mix of effects to create the sense of distance, including:

1. Volume (sounds will appear louder as they move toward the center)
2. Spectral adjustments (filters and equalization)
3. Phase changes
4. Timing and delays
5. Reverberation

The height or y axis is created through the perception of frequency: high frequencies are perceived as being physically higher in space, and low frequencies are perceived as physically lower in space. In part, this can be attributed to the bioacoustics of the human body: higher-frequency sounds tend to resonate higher in our body (e.g., nasal cavities) and lower-frequency sounds tend to resonate lower in our bodies.

The z axis or the near-far axis is created perceptually through a number of phenomena:

1. Louder sounds appear closer to us.
2. Sounds without reverb (when used in conjunction with sounds with reverb) will sound closer to us (since the reverberation implies a space between us and the sounds).
3. Sounds that have been closely miked will appear closer to us.
4. Sounds with unusual effects tend to stick out and appear closer to us.
5. Sounds with hard compression (see below) will appear closer to us.
6. Sounds with more high-frequency components will appear closer to us.
7. Discrete sounds against an ambient bed will tend to appear closer to us (e.g., the back-up beepers on a truck against the white noise of a construction site).

We can use this knowledge to place sounds into a perceptual three-dimensional location in space around the listener, and it is this space that can form the basis of our understanding of the mix. Mixing, in other words, goes beyond just achieving clarity between sounds: it is a delicate balance of space over time.

6.2 A Note on Mixing in Audacity

Audacity isn’t really designed for mixing, but we can create basic mixes in the program. As you progress, you’ll want to switch to a program designed for more advanced multitrack mixing, like Nuendo, Reaper, Logic, or ProTools. So, while you can mix in Audacity, I recommend you invest in a professional audio software program once you’ve learned the basics. Reaper is becoming increasingly popular since it offers a very affordable price for those just starting out, and offers a sixty-day free trial. We’ll stick with Audacity for now, because we can do the basics and the concepts will carry over to other software you want to try.

You may have noticed that when we select Mix and Render in Audacity the amplitude increases. Mixing in Audacity adds the two waveforms together, so we will in most cases have a higher amplitude in the resulting track. If we are mixing a lot of tracks together, the result will be much louder than we probably intended. It’s important, therefore, to reduce the overall gain in advance of using Mix and Render. If we generate two sine waves with the amplitude set to 0.5 for each, and open the mixer board (View > Mixer Board), we can adjust the gain on these two files separately before we mix them (figure 6.2). If we click Mix and Render without adjusting the gain, we can see the output gain is a much louder file—it’s twice 0.5 (or 1) which means we’re just about clipping. When we use Mix and Render, we’ll now only have one track on the mixer board, but we can adjust the gain back down, which will alter the playback amplitude, or we can use Effects > Amplify and reduce the amplitude of the track itself (do this before it clips!). Note that we can change the amplitude settings in Audacity from a –1 to 1 scale to decibels, so we can be more accurate and detailed in the numbers involved. To double the amplitude of a track, increase it by 6 dB. To cut the amplitude in half, decrease it by 6 dB. We may want to use Mix and Render to New Track, which will preserve our original tracks and allow us to make adjustments on individual tracks if we don’t like the render—just delete the render and try again. It’s always best to keep a backup, though, in case we end up not liking the mix later. As we deal with multiple tracks in Audacity, as seen earlier, we can also mute tracks using the mute button, or solo a track if we want to focus on just one track in a multitrack session.

Mixing can involve the addition of all the types of effects discussed in previous chapters—especially EQing and filters. In addition, there are other effects that we haven’t yet discussed that often fall under mixing’s domain. First, however, we need to learn a bit about dynamic range.

6.3 Dynamic Range

Dynamic range is the distance between the peaks (loudness) and valleys (quiet) of a sound file over time. We can think of dynamic range like a black–and-white photograph: if the file were all black, or all white, the image would be boring. Variations—all the shades of gray—are what make it interesting. Imagine if you held a party at your house every day. Eventually, the party would be boring, because it’s just a party all the time. You need down time in between the parties, the boredom of everyday life, in order to be excited about the party. By using contrast between the quiet and loud, we create interest and focus and can draw attention to particular sounds. It’s important to think not just of the loud sounds, but also the quiet, and how we can use quiet or even silence to emphasize important points when we bring the loud back in. This is the role of dynamic range, the light and dark of a soundscape.

Nathan McCree, who created the sound and music for the original Tomb Raider game, explained:

And in the same way that a piece of music tells a story, a soundscape can tell a story. If we take it—like a horror genre, for instance: You know you will lead the player into a false sense of security, and then just before you scare them, you will just drop all the sounds out, until there’s this eerie moment of nothing. And then, “Dah!” Suddenly out jumps the monster and you jump out of your seat. So this sort of storytelling approach is the same for music as it is for sound. (quoted in Collins 2016, 20)

If we look at the four tracks in Audacity in figure 6.3, which track has the highest dynamic range? Which track has the lowest dynamic range? If you guessed track 3 for the highest, and track 1 for the lowest, you are correct. Track 1 has no dynamic range—the entire track is at the same volume. Track 3 has the greatest peaks and valleys. Dynamic range is not the amplitude, but the changes in amplitude over time.

We use dynamic range not just for technically representing the loud sounds, but also for emphasis and to draw the listener’s attention. In a story, for instance, the loudest sounds are probably going to be in the most dramatic scenes, at the point where the audience is supposed to feel the most tension; they might not be what should be the loudest sounds in real life. But dynamic range can also be tricky, because if there is too much distance between the loud sounds and the quiet sounds, we may find our listener has to “ride the remote” and keep turning up the volume when it’s just dialogue and turn down the volume when there are a lot of action sounds. Films are usually mixed for theater listening, where they’re not worried about waking the baby or annoying your neighbors, so there tends to be a lot of dynamic range. Music used to display greater dynamic range, too, but over time the range has gradually gotten compressed (see below).

Exercise 6.1 Dynamic Range

Think about the dynamic range of the listening exercises we’ve been doing so far: in which location have you found the most dynamic range? For your next listening exercise, focus on the dynamic range of the soundscape you’re listening to. What is the loudest sound, and what is the quietest sound you can hear? What is the most important sound, and what is the least important sound, and how would you adjust the dynamic range to reflect that?

6.4 Compression, Limiting, and Normalization

The function of a compressor is to decrease the dynamic range of an input sound source, that is, to reduce (compress!) the difference between the quietest and the loudest sounds. The end result is to effectively make the loud sounds quieter and the quiet sounds louder. For this reason, it doesn’t make much sense to compress a single one-shot sound. Rather, we would compress an overall music track, field recording, or composition. Compressing a file can be useful when parts are too soft or too loud, like a dialogue recording where someone is speaking at different volumes. The downside is, if the quiet sounds contain unwanted noise (the hum of an air conditioner, for instance), that noise may get boosted in the output, so it’s best to filter unwanted sounds out first.

A compressor works by weakening the input signal only when it is above a certain set threshold (volume) value. Above that threshold, a change in the input level produces a smaller change in the output level. So as the overall signal of the file is boosted, if the signal above the threshold is too loud, the compressor brings the amplitude down above that threshold. The compression ratio is the change in output levels from the given input, so a 2:1 ratio means that a 2 dB change in input results in only a 1 dB change in output.

Compressors cap the sound at an absolute, which is useful when we know the desired peak amplitude of what we’re mixing something (e.g., film is often compressed at –6 dBFS). We can also use compressors to extend the sustain of sounds by boosting the reverb, but this can sound flat, loud, and distorted if used too much. Compressors can be used in this way to increase the perception of loudness—a fact that some audiophiles have complained about when it comes to recent music, which in some instances has been so heavily compressed that the track seems louder than other songs or media. It’s quite commonly used in television or radio commercials, which usually have a set loudness limit, but the use of compressors will increase the perception of overall amplitude.

Multiband compressors work in separate bands of the frequency input, so a full-band compressor sets the overall compression based on the low-frequency sounds. Essentially, the compressor first filters the sound into bands, and each band is then fed into its own compressor and combined back into the mix.

Normalization, another common effect, sets the loudest peak at a particular point, raising or lowering the entire file to that level and adjusting the gain by a constant value across the entire file. So with compression, we are adjusting the portion of the audio that is below a certain threshold. With normalization, we are setting the overall volume to a standard level. Peak normalization finds the peak volume in the file, and then uses that peak to boost the file to the set point. RMS (root means square) normalization finds the average amplitude of the file and uses that to adjust the entire file, which can lead to clipping. Normalization may be applied to batches of files, so if we are building some online podcasts, for instance, we can normalize all our files so the listener doesn’t have to jump back and forth to adjust volume when continuing to the next file.

If the compression ratio is very high—a ratio of 10:1 to infinity:1, this is known as limiting, which is like a shelf above which there will be no increase in volume at all. Limiters are typically set only to round out high peaks, in cases where sounds must not go above a particular threshold (as in film, for instance). In digital recording, we know the maximum set loudness is 0 dBFS, so all sounds must drop below that threshold. Limiters are similar to low-pass filters, but affect volume instead of frequency, so the threshold will be the cut-off volume and everything above that threshold will be pushed down into the mix, while anything below that volume will “pass through.”

6.4.1 Digital Compression

We’ll need a fairly long (ideally, several minutes) file with a lot of dynamic range to work with digital compression. Import the file into Audacity and then use a compressor. Try different settings and listen and look at how it changes the file. The main settings are as follows:

Noise floor adjusts the gain on audio below this level so that it’s not amplified. For example, too much background ambient room noise would be boosted by compressors, but if we adjust the noise floor above the room ambience, it won’t be boosted.
Attack time determines how quickly the amplitude is reduced once the input exceeds the threshold—you can think of it as how quickly the compressor will respond. If the attack time is set too slow, then a short burst of louder sound can get through, which can lead to distortion. You may want to leave a slow attack sound in some cases to ensure that the full attack of a sound comes through.
Release time is the time it takes for the compressor to return to its full gain after reaching the threshold. If it is too fast, you get a pumping, breathy quality where you can hear the volume go up and down.
Make-up gain (output volume) lets you bring the compressed audio back up to an acceptable level, given that the compressor will tend to reduce the overall volume of the file. The make-up gain will take it back to a certain level, saving you a step in processing.

Exercise 6.2 Compression

Try compression, normalization, and limiting as audio effects. These effects can be used, for instance, to enhance sustain/reverb on files, and set the limit where it starts to distort. Take a lot of time on this one—it’s a more delicate effect than some we’ve heard so far, and it takes practice to do it right.

6.4.2 LUFS: Finalizing your Mix

If you are producing content for broadcast or distribution you may come across LUFS, or Loudness Units relative to Full Scale. LUFS is a standard for measuring loudness designed to enable normalization across multiple different content sources—think of a radio that has a radio drama, with advertisements in it, followed by news broadcasts, and so on. You don’t want your listener to have to keep getting up to turn the volume up or down. By standardizing the loudness levels, it makes it much easier for your end user. A LUFS meter will tell you the averaged loudness of your audio file—an overall level of your file. If you had a file with a lot of loud sounds and little dynamic range, it will get played back at a lower volume, since it has a higher overall loudness than a file that has more dynamic range. LUFS loudness meters are included in Adobe Audition, but you’ll need to download a separate plugin for Audacity.

Figure 6.4

A file before any effects are used.

Figure 6.5

Compression with a 2:1 ratio.

Figure 6.6

Compression with a 7:1 ratio.

Figure 6.7

Hard limiting to –10 dB.

LUFS standards vary, depending on where you are distributing your content and if your content is distributed in mono or stereo. The European Union has a standard called EBU R128, which recommends –23 LUFS for television. Audio for podcasts on iTunes is supposed to be set to –16 LUFS; YouTube and Spotify aim for –14 LUFS.

6.5 Expansion and Gating

An expander (sometimes called a “stereo image expander” or “stereo widener”) is the opposite of a compressor: it expands the dynamic range by reducing the level of signals below a threshold, making quiet sounds quieter. As with compressors, the ratio can be adjusted. An extreme ratio results in a gating effect. There is no expander plugin for Audacity, but we can download a plugin called Noise Gate for gating.

Gating is a kind of inverted limiter, or a particularly harsh expander. With gates, all signals below a threshold are reduced by a set amount (the range). In other words, the input signal is scanned for sounds below a certain threshold, and if they are below that threshold, the gate closes and the sound is muted, so if we have a hum in the background, we can get rid of it by reducing the amplitude of sounds that are in that range.

Gates are used to “add punch” to percussion in music by shortening the decay time, or to cut the level of noise between sections of a recording.

Exercise 6.3 Gating

Make a recording with some background ambience behind some discrete sounds. Try different gating settings and see if you can get rid of the ambience.

6.6 Ducking

Ducking reduces all signals above a threshold by a set amount (the range). Ducking is used in dialogue in film or radio when the music must get quieter so that we can hear the speaker’s voice. The “background” sound is “ducked” under the sound we want to hear. For instance, if you have an overdub of a foreign language, you may hear the original language ducked underneath.

Audacity comes with an Auto-Duck plugin. We need a control, which is the signal we want to keep (e.g., the voice). The control should be on the bottom in track order. Then we have the signal that is the “background”; in this case I generated white noise. Select the white noise and apply the Auto-Duck. The background signal will now duck under the voice. You can see (figure 6.8) where I’ve paused in the voice track, the background ramps back up. Note that when we’re dealing with multiple tracks, we can change the track order by clicking on the drop-down menu where we changed between Waveform and Spectrogram view (or the track name): here we have the option to move the track up or down. Incidentally, we can also change the track color here, which can be useful when working with multiple tracks.

Exercise 6.4 Ducking

Try some different ducking settings. How might you use this as an effect on something other than voice?

One term you’ll come across with effects is sidechaining. A sidechain feeds a different signal to an effect, so ducking is an example of side-chaining: it’s got a control file that tells the original file where to drop down. So if we wanted to apply an effect to a file, we feed it data from a second file instead of the original file. This is probably most commonly used in compression effects, where, for instance, the kick drum operates in a similar frequency to a bass synth in a dance track. A compressor is used on the bass every time the kick drum occurs, to give the kick some “oomph” and avoid the “muddy” sound created by having two tones in the same frequency band. But sidechaining doesn’t have to be used for practical purposes—we can use sidechaining on sounds with all kinds of effects for creative purposes. You probably won’t be able to find these tools for Audacity, but they are common in other DAWs.

6.7 Noise Reduction

If we have a noisy track, most audio editors now have built-in noise removal plugins. To remove the noise, we have to teach the program what to remove by selecting a sample (control) of the noise as a noise profile. Using the mixed noise we created in the last example, select Noise Reduction from effects, highlight a few seconds of noise and get the profile. The profile is now stored and we have to select Noise Reduction again. Now we can reduce the noise by a set dB, set the sensitivity of the reduction, and apply it to the file. Listen to the difference between this noise reduction and ducking. The trouble with noise reduction is that it’s going to remove some of the frequencies we want to keep—in this case, some frequencies from my voice, so the voice sounds a bit unnatural. There are ways we can tweak a bad recording, but as the saying goes, garbage in, garbage out! It’s always better to take the time to get a clean recording to begin with.

Exercise 6.5 Noise Reduction

Find or record a noisy file, and try to reduce the noise. Try files with varying levels of noise (you can find some with low ratings on Freesound for a starting point). See if you can find the balance between reducing noise and not interfering with the main signal.

6.8 Figure and Ground: Signal to Noise

We talked earlier about the signal-to-noise ratio (section 2.8). Understanding the right signal-to-noise ratio for our audience is an important part of mixing. Some people like to think in terms of figure and ground, which is used in the visual arts to describe the subject and the background. We can also talk in terms of figure, field, and ground. Some sound designers refer to these as the foreground, midground, and background, terms also commonly used by cinematographers when they talk about image in film. We can also think of these as the focus, support, and background.

The foreground is what we want listeners to focus on—it should be the most active part of their listening. Speech is usually the most foregrounded element in film, for instance, or the voice or lead in music. The midground contains sounds that are there to support the foreground image. They may be Foley sounds, or sounds that are part of the scene to be noticed but not focused on. They are often isolated, one-shot sounds or repeating sounds that are part of the scene. The background is the ambient bed that is part of the setting of the stage.

If we think of a theater set, the foreground would be the actors, the mid-ground would be the objects they pick up, use, or move around, and the background would be the scenery. In sound, then, the foreground would be the actors’ voices and anything they do with objects that is really important (e.g., slapping someone with a pair of gloves); the midground would be sounds made by objects that are part of the scene but not necessarily important, like a chair moving or footsteps; and the background would be the ambient effects that tell us where they are, like wind or rain. Another example is an emergency services scene. An ambulance siren might be a foreground sound, immediately telling us there is an emergency. Midground sounds might be people yelling, a fire burning, and so on. And ambience might be nighttime sounds around the scene, road traffic near the scene, and the like. What sounds constitute foreground, mid-ground, and background depends on the context and what we want the audience to focus on.

In section 6.1 we talked about the perception of near and far in the mix. These are the parameters we can play with to get the right perceptual distance between sonic objects in a mix. By foregrounding sounds in a mix, we focus the audience’s attention on those sounds and draw their attention to them. Thinking about what is most important to the audience is the first step in thinking about our mix: What is the most important sound they hear? What is the least important sound?

Exercise 6.6 Grouping: Figure and Ground

For your listening exercise today, group the sounds you’ve described into foreground, midground, and background sounds. Then record a single sound as a foreground, midground, and background by increasing your distance (and potentially mic axis, microphone polar pattern, etc.). Play the recordings back: how does the shift in perspective change the way you think about the sound?

Exercise 6.7 Too Much Noise

How many sounds can you layer into a mix before it gets to be too much and just becomes “muddy”? How can you adjust that same number of sounds to separate them out further in the mix?

6.9 Panning

Panning involves the perceptual position in the stereo field (as defined above, the x-axis). As we know, stereo files have two tracks, and there’s usually something different in each one. Each track is assigned to one channel and is designed to be played through one of two loudspeakers or headphone sides. Typically, the loudspeakers in stereo are set at 60° from the listener’s head. We call the head’s position in this equilateral triangle the sweet spot, since it gets an even mix of both speakers.

Professional audio software often has advanced stereo-imaging plugins, which can give you a sense of the exact placement of sounds in the stereo field. Izotope makes a free stereo imager called Ozone that you can download from their website. The Waves S1 Stereo Imager, which is not a free plugin, allows you to change visually not just the spread but the rotation and symmetry of the stereo field. Stereo imagers allow us to get a better sense of where we’re placing the sound in the stereo field.

Panning algorithms don’t all work the same way; some vary in how they adjust the amplitude of the sound signal as we move further from center, so it’s important to understand how we’re adjusting our sound. It is not just assigning a sound to a channel (left or right), but perceptually placing that sound nearer to or farther from the center through volume, timing adjustment reverberation, and filters or equalization. The perceptual distance from our ears, then, which we’ll talk about further in chapter 7, is a complex process, and advanced virtual panners can really do a lot to change the perceptual location of sound.

In Audacity, we’re limited to very simple panning settings unless we download extra plugins. The panning in Audacity can be done in a few ways: the entire track can be panned using Tracks > Pan (although here we are limited to left/right/center). We can also use the pan slider in the track control panel. If we hold down the shift key, we are given smaller increments to adjust the slider on a finer level; otherwise, we are given 10 percent increments.

While we don’t have variable panning within a single file in Audacity, we can appear to adjust pans by layering the same sound and using cross-fades while switching between left-panned and right-panned files. We can also put different pans on different files within the mix before we do the final Mix and Render. By varying the apparent distance from the center, we can separate sounds from each other and make them more interesting.

Exercise 6.8 Panning Perception

Set two speakers at a 60° angle to your head. Pan a sound all the way left and another all the way right. Where does the sound appear to come from? Most people find that sounds appear farther away than the speaker. Where does the sound appear to come from in terms of the front–back location, and how does that change as you change the volume? What about height and frequency?

6.9.1 A Phantom Image

When we have two or more speakers, we can create the perception that the sound is coming from somewhere else with panning. With stereo speakers, a phantom center is also common if the speakers are playing the same sound at the same volume, and the listener is in the sweet spot (figure 6.12). If the listener moves out of the sweet spot, the phantom image can collapse. The phantom image also has a different coloration (different spectral quality) than if a real speaker were located in that spot. Panning sounds alters the phantom image.

Exercise 6.9 Phantom Imaging

Try to position a sound in the phantom center position using stereo. Now play with the pan on those speakers and listen for the perceptual changes. What happens when we start panning?

6.10 Mixing across Media Devices

One of the challenges of mixing is the fact that our listeners may hear the mix in circumstances very different from a nice mixing room with speakers positioned correctly and the room acoustics optimized for hearing sound. We might spend many days getting a mix to sound perfect in the studio, but when we take it to our car or play it on headphones, it sounds terrible. It’s important to try the mix out on different media and get a sense of how different formats influence the overall mix. Scott Gershin, a sound designer and mixer for Hollywood and for games, explained to me some of the challenges of mixing:

It’s fun when you get to do multichannel with subs and you get to rock the room, and you get to be the rock star with sound, whether it’s movies or games. . . . But what’s happened is we’re expanding the market. It’s not just that. You know, we’re in a time, an era, of disposable entertainment and entertainment on demand. So at that point, people will go for a little lesser quality—think about the music we’re listening to, and your MP3 is not a great-sounding format, but it’s convenient. Neither was the cassette, but it was convenient. So people like convenience, and they’re willing to trade off.

So the question, then, for us becomes, how do we bring the best experiences? When I started in this business we were 8-bit. It was really horrid, actually. We started with very low sample rates; pick your sixty best sounds, at best. So when I see mobile, I think, “Hey, we’re kind of back here again.” A little bit better audio quality, but we’re kind of back here. You know, at that point, you have to be able to make those decisions—how to best support the story or the game or whatever you’re working on within the environment you’re at.

There are a lot of different tricks—compression tricks, the way you approach a mix is very different, different dynamics if they’re wearing headphones. Are they going to wear headphones in a noisy environment? Is there dialog involved? How’s the music going to be involved? So at that point it creates a different kind of technical challenge. So you then have to figure out what medium you’re being played back on. And then you create your technology, your techniques, your trade, your artistry, to fit that environment. . . . But for me that’s kind of what makes it fun, in that it’s not easy, you’ve got to figure out maybe a different way of working, and one thing I love about the gaming world that’s a little different from film. (quoted in Collins 2016, 184)

6.11 Technical versus Creative Mixing

In addition to thinking about different listening formats, it’s also important to think about mixing as an art form. It is perhaps tempting to get as close to the “real world” as possible in a sound design mix. But this is not always the best choice for our creative projects. There are plenty of examples of mixing where there is an intentional intervention into the “real world” approach, instead applying more creative uses of the mix. The films of Jacques Tati are an excellent example: not only do the sound editors on the film make some unusual choices in terms of the sounds chosen for events, but the sounds are often mixed at unusually high volumes. Rather than being purely technical, many creative choices need to be made in the mix.

Exercise 6.10 Technical and Creative Mixes

Take a group of samples to create a soundscape. Try to make it as realistic as possible. Then, play with the mix to illustrate the following concepts: The person in the scene has just been hit hard on the head. The person in the scene is raging. It’s a warm, fuzzy flashback. It’s all in someone’s head.

6.12 Point of Audition: Objects in Ears May Be Closer Than They Appear

When we mix, as described above, it’s important to think about our listeners and what they need to hear. We can create a subjective position for our listener—we can position them as “in” a space or character, or external to that space. We call this the point of audition, roughly analogous to point of view. Decades of film research have shown how camera position can alter subjective experiences of scenes, whether we’re put in the place of a character, or an inanimate object in the scene. Think, for instance, of how the positioning of a serial killer watching his prey is often shot with us in the killer’s point of view (POV); alternatively, we can be positioned in a POV shot as the victim, looking into out into the dark, afraid. By changing POV, we can be positioned as inside and involved in a scene, or as third-party observer. In film and TV, the audio mix often mimics the angle and distance of the camera. Although in many cases the auditory and visual perspectives don’t match, usually the camera and sound reinforce each other, so if the camera is close, the sound is usually close too.

There are several techniques to accomplish this point of audition:

(1) Emphasis in the mix through panning, volume, and equalization: The most obvious way to position the audience is through the position in the sound box described above. As shown, it’s not as simple as loud = close, quiet = far. There are details in the sounds that we hear if items are closer, and these need to be emphasized in the mix as well. The use of equalization can also help to emphasize some aspects of sounds over others: bright sounds that stand out and are in our most important frequency range (i.e., the speech range) can be emphasized even without altering their volume.
(2) Filters: In addition to the volume and EQ, we also know that sound frequencies drop off as they travel over distance or meet surfaces that absorb some frequencies. The use of filters to filter out some frequencies can help to create a sense of distance or place. As we saw earlier, a low-pass filter can create the illusion of sound obstructed by a window or wall.
(3) Effects: Reverberation will create a sense of space and of distance between the listener and the sound. When we looked at reflections, we determined how far we need to be from a sound to get more reverb than direct signal. If there is a lot of reverb, we know we are some distance from the sound source. Likewise, more early reflections can lead to the perception of a smaller, more claustrophobic space. But we can also use reverb, phasing, or other effects to position sounds as coming from inside someone’s head by contrasting those sounds with the natural world around them. We can, in other words, create a psychological mix that places us in the minds of characters—as we saw when discussing the rail cart scene in Tarkovsky’s Stalker. The use of unusual effects can also make a sound perceptually closer to us, since we’re likely to pay attention to something more unfamiliar.
(4) Microphone selection: We know that some microphones capture more detail and boost certain frequencies. The selection of microphone, then, can influence the perception of distance. Some microphones are better at capturing more delicate sounds as well as different frequencies, and some have a faster transient response time (the ability to respond to changing sounds), which influences the timbre of the sound. For instance, a condenser microphone such as the Neumann U87 has a nearly flat frequency response except for a slight lift in frequencies from about 5k Hz to 15 kHz. This higher frequency range will catch more sibilance on a voice, for example, and give the impression of closeness.
(5) Microphone location: Where we place our microphone is going to make a big difference in the perception of position. We looked at close-miking and proxemics earlier, and we know that placing the microphone further away from a source changes both the amount of environmental noise and reverberation that it picks up, as well as changes how much detail we get from the higher frequency components of the sound. A closer microphone will capture small sounds that even our ears may miss. On vocals, this closeness means capturing mouth smacks, tiny clicks from inside the mouth from tongue and teeth. The angle can affect the “warmth” of a sound, as well as the nasality of the noise when capturing vocals. Also, when the microphone is close to the object, it causes what is known as the “proximity effect,” which means a boost or lift in the lower frequencies of the sound, making for a richer, “fatter” effect. The mix of direct (no reverberation from the room) and indirect (reflections) sound can influence the perceived distance (close-miking will have stronger direct than indirect sound).
(6) Channel/speaker location: We may have two or five or twenty speakers to position sounds in. How far the physical speakers are from us will influence our perceptual distance. We will hear more detail on headphones or close speakers than we will from speakers that are farther away from us. We will also pay more attention to sounds located in the front speakers (and in particular the center, or phantom center) than we will in the surrounds.
(7) Spatial location of the sounds: We’ll take a look in the next chapter at spatial sound, but in brief, headphones can lead to in-head localization, and we can even use that as a device to place sounds into a listener’s head. Where we position sounds virtually in the spatial field, then, can emphasize or deemphasize their position and help to create a perspective. Sounds positioned outside the 60° angle location between our head and speakers can sound farther away.

Exercise 6.11 Point of Audition Analysis

Listen to a few different sound mixes—from music or film or games. Think about where it positions you as the listener. What impact does that positioning have on your subjective experience of the mix? How loud are sounds in relation to each other? What frequency bands do sounds appear in? Does the mix stay the same as the track progresses? How long is the reverb on each sound—is it one reverb on the whole file, or are different sounds treated differently? How have sounds been placed in the stereo field?

Exercise 6.12 Play with a Mix

Make a soundscape and play with different aspects of the mix to change the subjective position: record at different distances, and with different mics, and explore how much you can then play with those recordings in the mix to alter the subjective position. Attempt to create perceptions of near, far, and inside a character’s head with the same sounds.

Exercise 6.13 Play with a Mix 2

Record/find all the sounds you’d need to create a car crash scene. Mix it as if you are outside the car, then mix it again as if you are inside the car. What changes did you need to make and why?

6.13 Summary and Further Mixing Exercises

When we’ve put so much time into our design, it can be tempting to rush a mix to get it done and move on to the next project. As shown, though, mixing can take a lot of time to get just right, and can have a big impact on how your audience hears your work. It’s important to take your time on mixing. This means leaving enough time at the end of your project to do a mix, then leave it and come back to it after you’ve had a good break, like a night’s sleep, to listen with fresh ears. When your ears are fresh, you’ll hear all kinds of things you missed on the first few passes of your mix.

Exercise 6.14 Reference Mixes

When you listen to movies, music, podcasts, radio, or other media, focus on their mixing. If you like the mix, be sure to record it so you can use it as a reference for your own work later. What were the differences in the mixes you liked, and how does that change the feel of the sound? Find a mix that you don’t like. What do you think is wrong with it, and how would you change it?

Exercise 6.15 Stacking Sounds

Mix two sounds together by stacking the layers to create a new sound. Leave the sounds dry (don’t use any effects). Then use three sounds. Then four sounds. At what point do you just get a muddy mess?

Exercise 6.16 Stacking Sounds 2

Repeat the previous exercise, but with effects this time—what effects can you put on multiple sounds to help to reduce the mud while retaining the sounds’ uniqueness?

Exercise 6.17 Take Your Mix Out into the World

Now that you’ve got some mixes, try them out in different settings: your iPod, your car, your friend’s stereo, and so on. How does the sound change from place to place? How would you balance the trade-offs necessary if you knew most people were going to listen to your mix on a $10 pair of smartphone headphones over Bluetooth?

Reading and Listening Guide

Walter Murch, “Dense Clarity-Clear Density” (2005)

A two-part article written by Murch about the womb tone and listening as a sense in the first part, and then in the second part, an important and influential treatise on mixing. Murch takes us through his theories on mixing and then presents a breakdown of his mixing approach on Apocalypse Now.

David Gibson, The Art of Mixing: A Visual Guide to Recording (2018)

Although Gibson’s book is about mixing music (as are most other books on mixing out there!), the techniques and examples are useful to practice listening. Gibson breaks down popular songs in terms of graphic visual shapes and colors that illustrate their position in the mix—the basic concept of the 3D sound box I drew on here. Reading the separate mix breakdowns and listening to the songs is a great way to help tune your ears to the mixer’s job.

A Quiet Place (dir. John Krasinski, 2018)

This film is a sound designer’s dream job in many ways: sound plays a pivotal role, since the aliens detect people through sound. But it’s also a great example of changing the subjective position. The character Regan is hearing impaired, and the audio cuts back and forth to help us experience many scenes from her perspective. Watch the film, then watch it again with headphones, and really focus on the effects used to change the subjective perspective.