Chapter 10

Surround sound techniques

5.1 surround sound uses five discrete channels of audio plus a subwoofer which contains mainly LF effects. In this chapter, the channels will be referred to as LF, C, RF (left front, centre, right front) for the front channels and LS, RS (left surround, right surround) for the rear channels. LFE refers to the use of a subwoofer, which in cinema is used for ‘low frequency effects’, and it is a mono channel used for enhancement of very low frequencies. In orchestral recording, its use will usually be limited to some bass drum and bass spot signal.

10.1 Purpose of surround sound in classical music recording

The move from mono to stereo in the 1950s and 1960s meant that instruments could be spatially separated in a recording, making perception of detail in individual parts much easier. The move from stereo to surround provides a change in listening experience of a similar magnitude, although not simply because spatial separation of sources can be extended to include positions behind the listener. Although the rear channels are commonly used for sound effects located off-screen in film sound, classical recording in surround places the performers predominantly in front of the listener to reflect the real-life listening experience. It is certainly possible to make an orchestral recording that places the listener somewhere in the middle of the ensemble, and this has been tried as part of the exploration of the medium. However, given that most repertoire is designed to be heard from outside the ensemble to achieve the appropriate musical balance, surrounding the listener with players on all sides elevates the novel element of the technology over its use to transform a musical listening experience. In the spirit of always trying to make the recording enhance the music and the composer’s intentions, experimental placement of performers that would not work for the repertoire in real life is outside the remit of this book. The use of surround sound has become most prevalent in opera recording (see Chapter 16), where the capture of live performance for cinema reproduction has naturally moved the audio into surround. The ability to place theatrical sound effects and offstage choruses into the rear channels has hugely enhanced the immersive nature of the listening experience.

The change from stereo to surround for classical recording involves something more than placing sound sources in the rear channels (although this is useful where the repertoire includes deliberate antiphonal or offstage effects). When we sit and listen to real musicians in a physical space, perception of detail in the playing is made easier by our having a secure sense of the location of each player that is unaffected by listener movement. There are many more location cues than can be reproduced in stereo, and hence the stereo listening experience is quite a fragile illusion that collapses if we move away from the central area between the loudspeakers. The surround listening experience can give the listener additional information about source location and the nature of the space, which makes aural perception closer to real life, requiring less concentration on the part of the listener to draw out detail from individual parts. The real value of surround sound for classical music lies in the subtle use of the technology to produce a realistic sensation of being in the room with the players. If stereo places the listener outside and looking into the room through the window, surround sound brings the listener right into the space by skilful use of reverb in the rear channels in particular. An additional benefit is that louder orchestral and choral tuttis (which can feel somewhat saturated in stereo) can feel more open in surround sound, as if there is always room for the sound to get louder.

If a recording is mixed in surround, it will not necessarily fold down well to stereo; it is likely to lose perceptible detail if this is done by simply summing the existing surround channels into left and right only. The two reproduction methods are different, and a separate stereo mix will usually be needed if one is required as part of the project deliverables.

10.2 Panning a Decca Tree in 5.1 surround

The core microphone techniques described for stereo orchestral recording can be rigged in the same way when working in surround, so we can look at the case of how to pan a Decca Tree and outriggers between five channels instead of two.

Figure 10.1 shows the layout of a set of five surround loudspeakers suggested by the ITU (International Telecommunication Union ITU-R BS.775–3). The subwoofer location is not included.

In order to pan between five loudspeakers (LF, C, RF, LS, RS), the desk or DAW used needs to have surround panning capability. Apart from a simple joystick arrangement for panning and sending different amounts of a signal to all five loudspeakers in varying quantities, there is a divergence control that is not found in a stereo system. This allows the engineer to avoid a source appearing to be very tethered to a single loudspeaker. For example, if a signal is sent to the LS loudspeaker and divergence is increased, the same signal will also be sent in increasing amounts to the RS and LF speakers.

Using divergence is a particularly useful technique to avoid the centre channel becoming over-localised when panning a Decca Tree. When working in stereo, a centrally panned signal (such as the centre microphone of the Decca Tree) will be sent to LF and RF equally and will appear as a phantom centre source. Where there is a discrete centre channel, panning the microphone centrally will send it to the centre loudspeaker only, and this can make the centre of the orchestral image collapse into the loudspeaker location. Many classical engineers will avoid using the centre channel for this reason, while others will use it with a degree of divergence applied. Divergence will send increasing amounts of the centre channel to the LF and RF speakers instead of the centre speaker alone, reducing the level to centre as the levels to LF and RF are increased.

When it comes to presenting an orchestral recording in surround, it can be very effective and enveloping to bring the sides of the orchestra partly around the sides of the listener almost in a horseshoe shape, placing the listener into something like the conductor’s position. The instruments that are located towards the back of the orchestra and produce the sense of depth in a stereo orchestral perspective (woodwinds, brass, percussion, and so on) are left in the front speakers, with only the string section being ‘wrapped around’. Reverberation is used in both the front and the rear loudspeakers, taken partly from microphones rigged for the purpose (see section 10.3) and artificial reverb (see section 10.4).

Figure 10.1

Figure 10.1 5.1 surround monitoring system layout

In order to move the apparent location of sources beyond the LF and RF loudspeakers and around to the side of the listener, some of the signal needs to be sent to the LS and RS loudspeakers. Care needs to be taken here; because the listener is orientated with their ears across the left to right axis of the loudspeaker set-up, creation of phantom images works best either in front of or behind them. Sources panned down the sides (between LF and LS, or between RF and RS) are only located very imprecisely, and if too much signal is sent to the surrounds or the listener moves a little out of position, the image will tend to collapse into the rear loudspeakers, thus moving the orchestra behind the listener and breaking the illusion.

Bearing all this in mind, the five-microphone Decca Tree panned in surround would be approximately as follows:

*Use of centre or phantom centre: for music-only mixing, it is usual to send something to the centre channel with some divergence. For orchestral film score recording, many engineers avoid using the centre channel altogether, as it is used for the film’s dialogue. A centre orchestral channel can get in the way during the film dubbing (‘re-recording’) mix and end up being removed or reduced in level, thus affecting the orchestral balance.

Ancillary microphones such as woodwind section microphones can also be treated slightly differently in surround by sending a very little of their signal to the rear loudspeakers with the majority sent to the front. This will bring the players closer to the listener, but it should be minimal as not much is required to achieve the desired effect. Again, if it is overdone, the front image will fall into the surrounds. As noted earlier, the LFE channel can be used to enhance the bass drum and double basses by sending some of their microphone signal to this channel. This does not affect the stereo imaging, but just adds some very low frequency bass energy to the sound in the room.

10.3 Natural reverberation: additional microphones for 5.1 surround

Additional microphones are needed to collect reverb cues that can be used to create a realistic sense of space, and there is a body of academic research into the best ways of collecting signals for this purpose. There is also a variety of industry practices driven primarily by practical experimentation and trial and error.

The signals used for reverb in the surrounds (LS, RS) need to be different from those sent to the front loudspeakers. If the same signal is sent to the front and rear speakers (i.e. one signal sent to LF and LS and the other to RF and RS), the reverb localisation will tend to be at the side of the listener. Therefore, all methods used for collecting reverb for surround sound use four individual microphones that will collect four different but sufficiently correlated signals.

There are two ways to produce four useful reverb signals from a group of microphones: one is to use omnidirectional microphones that are set at different positions in the room; the other is to use directional microphones and point them in different directions, with or without a degree of additional physical separation between them. It is important to avoid picking up anything other than very low levels of direct sound on the rear ambience microphones; these signals will be sent to the rear loudspeakers, and if they contain a significant amount of direct sound, the main image can be pulled behind the listener. This is a particular risk if the listener moves or is not ideally placed in the centre of the loudspeakers. Therefore, if omnidirectional microphones are being used to collect room reverb, they will need to be placed well into the reverberant field at a good distance from the players.

When using spaced microphones to collect reverb, the principles of working with spaced microphones apply (see Chapter 3). If they are placed a long way apart, the resulting signals are too de-correlated to produce an enveloping stereo image and will produce reverb signals that tend to remain located in each loudspeaker. For better correlation that will produce a more realistic sense of space when panned between LF and RF, or between LS and RS, they need to be spaced at no more than about 1 m (3′4″). It should be noted, however, that a very widely spaced pair of omnis (at about the separation of orchestral outriggers but further back in the room) can be useful for filling in additional reverb at the sides of the surround loudspeaker array. One microphone will be panned between LF and LS and the other between RF and RS.

The following are two suggested methods using directional microphones. The first is the Hamasaki square,1 which is an array of sideways-facing fig of 8 microphones, as shown in Figure 10.2. These are placed at some distance back in the hall from the orchestra, in the reverberant sound field, and they produce a signal rather like that from a surround reverb unit. The front microphones are used for front reverb (sent to LF and RF channels), and the rear ones are used for rear reverb (sent to LS and RS). This arrangement is used at the Royal Opera House, Covent Garden, for cinema and DVD recordings in surround sound (see Chapter 16). Taken as a whole, the rig is quite coherent in itself, and so it produces a good spread of reverberant sound in the image across the front and the rear of the listener.

Figure 10.2

Figure 10.2 Hamasaki square of fig of 8 microphones

Because the microphones are fig of 8s, they will discriminate against the direct sound from the orchestra, and any low level of direct sound that is present will arrive tens of milliseconds after the direct sound arrives at the main orchestral pickup. This means that the direct sound in the front loudspeakers dominates any direct sound in the rears both in level and in earlier timing (the precedence effect). The listener interprets the direct sound as coming solely from the front speakers, thus placing the orchestra in front of the listener.

Some engineers will choose to remove the timing differences between orchestral and distant ambience microphones by using delays. Adding delays to the orchestral microphones in this way can make the sound clearer by avoiding any slight smearing of transients, but it also has a tendency to make the recording feel ‘flatter’ in perspective. Leaving the difference in timing between distant ambience and orchestral microphones untouched tends to maintain a greater sense of spaciousness. As seen in Chapter 16, the use of delays is essential when applied to very close radio microphones in live opera recording, but it is otherwise optional.

Figure 10.3

Figure 10.3 Decca Tree with additional KM84 pairs for LF-RF and LS-RS ambience

A second method of collecting ambient signals is illustrated in Figures 10.3 and 10.4, and it is a development of the upwards-facing ambient pairs used in stereo recording that we have seen elsewhere in the book (see Chapter 12 for an example). An upwards-facing pair of KM84s spaced at about 20 cm (8′) and angled at about 90°–100° is mounted at the front of the tree, and a similar pair of KM84s is mounted immediately behind the tree, facing away from the orchestra. The upwards-facing pair is sent to LF and RF, and the rear-facing pair is sent to LS and RS.

These microphones are close to the main pickup, so they provide time-coherent signals and produce a good ambient pickup from the strings in particular. Although there is a theoretical concern that picking up any significant amount of direct sound on microphones aimed at producing reverb for the rear speakers will cause imaging problems, in practice this technique does not usually cause any difficulties.

Figure 10.4

Figure 10.4 Photograph of the array from Figure 10.3, although only the front ambience pair is clearly visible

Photo: Carlos Lellis, Programme Director, Abbey Road Institute.

The addition of a little brightness boost of about 2 dB shelf at 10 kHz to any ambience microphones, whether front or rear, can make the recording feel more exciting and will compensate a little for the acoustic loss of HF when sound travels larger distances.

10.4 Artificial reverberation in 5.1 surround

The room reverb collected by whatever surround microphone technique is used will almost certainly need to be subtly augmented by artificial reverb. This is similar to blending the natural room sound with some additional artificial reverb in stereo, an essential skill that is discussed in in Chapter 17. The signals from the surround microphones are very useful for reproducing the characteristic of the early reflections in the actual hall and can be blended in with a smaller amount of artificial reverb for the remaining tail of the reverb. A professional reverb unit such as a Bricasti M7 has a control for the balance between early reflections and reverb tail, so it can be skewed in favour of the latter if it helps with the sense of realism. It is usually preferable to use as much real reverb from the room as possible if the hall is good, as the effect will be more convincing.

There are very few reverb units that have surround capability, that is those that will generate five channels of reverberation appropriate to the panned position of a source within a surround phantom image. Common practice is to use a front stereo reverb unit and a rear stereo reverb unit, possibly of a different type. If there is a difference in quality between the units available, then the better one will be used at the front. It can actually be beneficial to have slightly different reverb in the front and rear speakers as it reduces the obvious signature of the artificial reverb, which can be noticeable even with the very best reverb programmes. Both reverb programmes need to be reasonably well matched in terms of RT60, decay shape, and colouration characteristics for this to work well. One way of checking that the front and rear reverbs are sufficiently well matched is to sit sideways to the monitoring, thus facing either the LF and LS speakers or the RF and RS speakers. From this position, we can utilise our excellent L-R lateral positioning discrimination to see if reverb decay retains a stable stereo image or whether it pulls to one side or another at any point during the decay. If it moves, it indicates that the front and rear reverbs are not sufficiently well matched and will need to be adjusted.

In deciding which microphones need to be sent to the front and rear reverb units, there are no hard and fast rules, but there are some general guidelines. The microphones designed for collecting hall reverb will not be sent to either reverb unit. For the initial set-up, as for stereo (see Chapter 17), all the orchestral microphones should be sent to the front reverb unit at the same post-fade aux send level. This can then be adjusted to suit if necessary; the microphones that might be best left dry are those that are being used to capture transients from an instrument for clarity and are only present at low level in the mix. These would include percussion microphones and maybe the double bass microphone.

When it comes to deciding what to send to the rear reverb units, this can be quite dependent on repertoire and situation. Bear in mind that the effect of adding more reverb to a signal (even if it is reproduced behind the listener) is to either make it feel bigger if the instrumental spot microphones are high enough in level (good for a concerto soloist) or to give it a sense of distance (good for woodwinds in some repertoire).

Taking a typical example from the Royal Opera House, the main orchestral microphones would be sent to the rear reverb, but the string section spot microphones would not. The woodwind microphones might be sent to the rear reverb for repertoire where they are playing a traditional orchestral role (suited to the 18th and 19th centuries), but where they have more individualistic solo lines and need to be more present (for 20th-century repertoire such as Prokofiev and Shostakovich), they will not be sent to the rear reverb. If the celeste and harp are used for ethereal effects, they will be sent to the rear reverb, and horns will depend on the effect required, so they might not be sent if it makes them too distant. Concerto soloists such as piano will be sent to front and rear. This will make the piano feel larger rather than more distant when taken in conjunction with an appropriate level of piano microphones in the mix.

Setting the level of the reverb returns is important, and it can be quite a delicate balancing act between front and rear reverb return levels. They will usually be in very roughly equal amounts (i.e. they are likely to be within 5 dB of one another), but care needs to be taken not to bring back too much rear reverb as it can quickly become very swimmy.

10.5 Offstage effects in surround: location of sources behind the listener

Sources that are panned between LS-RS behind the listener are perceived as a phantom image in a similar way to sources panned between LF-RF in front. The rear loudspeakers are spaced at a wider angle of 120°–160° rather than the 60° angle between the front speakers, and the practical experience of many engineers suggests that the phantom source location is not as focussed as it is when sources are placed in front. It does work sufficiently well, however, to indicate that a group of musicians is definitely located behind the listener.

In the context of an opera recording, offstage sources might include chorus, sound effects, or solo brass (such as fanfare trumpets or hunting horns). There are choices to be made, partly dependent on how clearly the source needs to be located. Sounds placed exclusively in the rear loudspeakers will be very clearly located behind the listener, but they can be brought around one side or the other by panning between the front and rear sets of speakers. The effect of this is a source that is less clearly localised, but this might be desirable for something like hunting horns. For the surround mix of the 2004/2005 EMI recording of Tristan und Isolde, the offstage horns were panned down the side of the listener. To bring a source that is located only in the rear speakers a short way around to the sides, a small amount of the signal has to be added into the front speakers. However, the HF from the rear speakers is partly blocked by the ears’ pinnae, and the image can be quite quickly pulled around to the front with only a modest signal level. To prevent this from happening, an HF shelf of around 10 kHz can be added to the rear signal to compensate for the HF loss at the ears, and this will help to keep the image anchored mainly behind the listener.

10.6 Object-based audio: Dolby Atmos

Dolby Atmos2 is the next generation of surround sound reproduction. It incorporates height reproduction by means of having loudspeakers in different horizontal as well as vertical planes, so forming a truly enveloping soundscape. It is designed so that encoded audio (in the form of a Dolby Atmos Bitstream) can be reproduced in a range of environments, with different numbers and locations of loudspeakers.

In traditional 5.1 cinema sound reproduction, there are multiple surround loudspeakers to cover the large auditorium, but they are all reproducing either the LS or RS signal. To avoid any member of the audience hearing sound from the nearest surround loudspeaker ahead of sound from the front loudspeakers, all the surround speakers have a delay inserted into their signal path according to the size of the individual cinema. The delay cannot be perfect for every seat in the house, so its calculation is a compromise between compensating for the worst seat (i.e. that which experiences the largest natural delay between the front and the nearest surround speaker) whilst not degrading the sound for the best seats.

To increase the realism and effectiveness of sound source location from around an auditorium, additional discrete channels of audio could be added, played from loudspeakers placed around the sides and above the listeners, thus effectively encasing the audience in a hemispherical array of loudspeakers. Provided the mixing engineer was able to mix the audio in exactly the same listening environment, with the same number of loudspeakers placed at the same distances, the results would be very effective. However, given that no two auditoria are alike in terms of dimensions and speaker location, taking surround sound to a more immersive and detailed level needed a different approach.

‘Object-based audio’ is such an approach, and involves treating each individual sound ‘object’ separately rather than mixing the audio together to form new audio signals. For classical recording, an ‘object’ could be a microphone source or single channel of reverb return. Once the mixing engineer has created a mix in a standardised Dolby Atmos mixing room, the panning location and level of each ‘object’ is encoded as data for the duration of the mix. When the time comes for this to be reproduced in a different listening environment, the desired location of each object is decoded and panned to its original position using the loudspeakers that are available. The reproduction environment (which might be a cinema, or a Dolby Atmos system in a domestic living room) is likely to have fewer loudspeakers which are placed in less ideal locations than those in the standardised mixing room, but the original panning locations are recreated using the available loudspeaker set-up, using phantom images where necessary.

The final mix therefore, is not stored as discrete channels of audio as it would be for 5.1 surround, but as a collection of individual mix sources (or ‘objects’) stored as audio files, each with an associated data stream of encoded panning positions. Any level changes during mixing are captured as part of the audio file, and the audio can only be played back with the aid of a Dolby Atmos renderer.

When working in Dolby Atmos, the engineer can pan things to different vertical locations as well, and this can be particularly useful for subtly enhancing the sense of a real enveloping space in classical recording. In 5.1 surround sound and stereo, reflections from the ceiling level of the hall are reproduced in the same horizontal plane as the direct sound and other reverb, but Dolby Atmos enables the engineer to locate these reflections in a higher plane that is above the orchestral image and also sits above behind the listener. The effect of this is that space is freed up for crescendos and large tuttis; they can keep getting louder without feeling overwhelming or congested, as they can in simple stereo reproduction. When all the reverb and acoustic energy is no longer concentrated in the horizontal plane but is spread all around the listener, the acoustic levels involved in the hall can be more realistically reproduced.

As with the move from stereo to surround, the temptation to locate sources all around for sake of demonstrating the technology is a potential red herring for the classical engineer. Aside from the wonderful possibilities for offstage effects, and for recording compositions that have a spatial component to the performance, the medium can be used much more subtly but to great effect for the enhancement of the sense of space essential to classical music performance.

At the time of writing, there is a lot of interest in remastering old recordings for release in Dolby Atmos, particularly opera material. This can be done with skilful use of reverb even from an original two-track recording, but where there are four-track and eight-track edited masters, the engineer has more to play with. Four-track masters were very common for 1960s and 1970s opera, with two tracks for the orchestra and another two tracks for the voices. This allows the voices to be pulled forwards during mixing for surround if necessary.