CHAPTER 12 Don’t Believe Everything You Hear

You’ll probably be astonished to learn that most of the music we listen to is out of tune, even when the musicians involved are experts. There are three main factors involved in this pitch sloppiness:

1. Our main scale system deliberately makes use of notes that are all slightly out of tune.

2. Musical instruments drift out of tune quite easily.

3. Musicians make small errors all the time.

Thankfully our senses are designed to deal with inaccuracy, and in general, we blithely overlook any errors and get on with enjoying the music. In this chapter we’ll be taking a look at how our brain deals with all the out-of-tune-ness. But before we start, we need to understand what it means to be “in tune.”

What does “in tune” mean?

“In tune” generally means that the notes involved sound good and smooth when they are played together. For a perfect example of in-tune-ness, let’s take two notes an octave apart. As we saw in chapter nine, if you play a note simultaneously with the note an octave above it, the two notes mix together excellently:

This happens because the cycle time * of the higher-pitched note fits precisely twice into the cycle time of the lower note.

But an octave is a very large interval. Most of us have a singing range of only about two octaves. So to get a workable music system with several notes in it, we need to divide the octave into smaller steps. We need to create a scale that starts on one note and rises up to the note an octave above it. And as far as possible, we want all the notes in our scale to be in tune with one another.

We’ll begin by calling the lower note the keynote.

And now we have a bit of a conundrum on our hands. How are we going to choose the notes in our scale?

Initially we have only two notes in our scale: the keynote and the note an octave above it.

And as I just mentioned, these two notes are in tune with each other because two halves are equal to one. So perhaps we should look at other simple fractions. Maybe a note that has two thirds of the keynote cycle time might be somehow in tune with the keynote?

And yes, this note and the keynote do sound good together. Two cycles of the keynote match three cycles of the new note to give a nice repeating pattern.*

Because the repeat is spread over two cycles of the keynote, it’s not quite as strong a link as the one between two notes an octave apart (you can still hear the two separate notes), but we nevertheless hear it as a very pleasant combination. You can test this for yourself if you’ve got a friend nearby.

In “Baa, Baa, Black Sheep,” “baa” is the keynote and “black” is the “⅔ keynote cycle time” note. So, sing “Baa, baa” and make the second “baa” last several seconds. In the middle of your long “baa,” get your friend to sing “black” at the correct pitch and you’ll hear how well the two notes go together. (If your dog starts yowling you’re probably doing it wrong.)

Carrying on with our simple fraction logic, if we look at the note that has three quarters of the keynote cycle time, we can see that four of its cycles fit exactly into three of the keynote cycles. And sure enough, this does make another pleasant-sounding combination. If you and your friend are feeling in particularly fine voice today, you can try this one using the first two notes of “Here Comes the Bride.” (You sing a long “here,” your friend joins in with “comes.”)

We can create a major scale of eight notes that are all as in tune with the keynote as they can be by using simple fractions like this. Here are the fractions of the keynote cycle time for them all, starting with the keynote:

This system of choosing notes from simple fractions creates what is called a “Just” scale. If you play any of these notes at the same time as the keynote, they set up a repeating pattern similar to the one I drew for “Baa”/“Black,” and that’s how two different notes can be in tune with each other. The smoothest-sounding combinations come from the fractions with the lowest numbers above the line. So the keynote and the ½ note (the octave) sound smoothest together. The keynote with the ⅔ note is the next-smoothest combination, then the keynote with the ¾ note, and so on. The final two fractions ( and ) sound rough when we play them with the keynote because they involve too many (8) keynote cycles before the pattern repeats, but we need notes in these positions; otherwise we’d have big gaps at either end of our scale.

So, our definition of “in tune” goes like this:

If a few cycle times of one note fit exactly into a few cycle times of another note, the two notes are in tune with each other.

And it would be lovely if the Just scale obeyed this rule all the time. But unfortunately, even though the notes of the Just scale are all in tune with the keynote, they are not all in tune with one another. For example, if the note and the note are played together, they sound unpleasantly out of tune because the two fractions are incompatible with each other. If a couple of singers or violin players had to play these two notes together, one of them would have to change the pitch of their note a little until the combination was in tune. But it’s impossible to do this sort of thing on an instrument like a piano, where the pitches of the notes can’t be changed while you are playing.

Basically, you can’t use the Just scale system on any fixed-note instrument * because some combinations of notes would be out of tune, and some entire keys would be more out of tune than others.*

To overcome this problem, a different scale system called equal temperament (ET) was developed. If you’d like more detail about this and the Just system, please have a look at “Fiddly Details,” section E, “Scales and Keys.” But for now all we need to know is that the equal temperament system corrects the occasional serious tuning problems of the Just system by making all the fractions slightly wrong. So now all the notes (except the octaves) are slightly out of tune with one another. This means that all the piano music you’ve ever heard has been slightly out of tune.

In fact, most of the pieces of music you’ve ever heard on any instrument will have been slightly out of tune for one reason or another. But even if we had a tuning system that worked perfectly, you couldn’t trust musical instruments to stay in tune.

Inaccurate musical instruments

Let’s imagine we are off to a concert of flute music in the local concert hall. The flute in question is an expensive piece of engineering that has been tuned to the equal temperament scale as accurately as possible. So what could possibly be wrong with the notes it produces?

Well, for a start, let’s hope that the flute player is not a beginner, because nearly all the notes on a flute are out of tune. When I said “tuned as accurately as possible” in the previous paragraph, I omitted to tell you that, even if we ignore the flaws in the ET system, the physics of the instrument prevents us from putting the finger holes in the correct places if we want both the high notes and the low notes to be perfectly tuned. So the hole positions are a compromise. To compensate for this, expert flute players spend a lot of time learning to change the position of their upper lip during any performance in order to alter the pitch of various notes.

Other wind instruments have similar problems. On clarinets and saxophones the compensations need to be made by increasing or decreasing the pressure of the lips on the reed. And here is what John Backus, a well-known author on the subject of musical acoustics, has to say about the bassoon:

The bassoon demonstrates woodwind deficiencies especially well. Practically every note of this instrument is out of tune and needs to be pulled into tune with the lips.… Except for having keys added, it has changed little in several hundred years, has aptly been called a “fossil” and badly needs an acoustical working-over. (The author is a bassoonist and hopes to do this working over if he lives long enough.)1

Unfortunately, Dr. Backus died in 1988 without completing the project, and the bassoon is still an acoustic box of frogs.

But at least players of wind instruments can console themselves with the fact that their prehensile lip manipulation skills make them the best kissers in the orchestra.

To return to our flute concert, the hall is warming up as we come to the final few pieces. When we came into the hall, the temperature was about 60^° F (15^° C). The body heat of the audience has heated up the room until it’s now 80^° F (25° C). In this warmer air the sound waves bouncing up and down inside the flute travel faster and all the notes are now about one third of a semitone higher than they were at the beginning of the concert.

And tuning inaccuracies aren’t, of course, restricted to wind instruments. How often have you seen a guitarist retune a string or two between songs? And how often did you notice that the guitar was out of tune during that last song?

The note given off by a guitar string is related to the tension on the string and its length. Guitar strings are made of steel or nylon, and the body of the guitar is usually made of wood. Steel, nylon, and wood all expand by different amounts if you warm them up by taking them from a cold backstage area onto a hot concert hall stage, so the tension on the strings will change and the guitar will go out of tune. This is why, when you go to a concert, all the guitars are already onstage, and they are often being tuned by the roadies until just before the band comes on. Later, as the gig gets really hot and sweaty, the guitars will need retuning. Also, the constant twanging of the guitarists tends to stretch the strings—so they get slacker and drop in pitch. It’s a busy life being a guitar string.

Musicians are only human

As we’ve seen, it’s difficult to keep most instruments in tune. In fact the only instrument that stays in tune all the time is the electronic keyboard. Keyboards also don’t allow musicians to make mistakes in the pitch of the notes they play. On a full-sized keyboard you simply get eighty-eight different notes all tuned to equal temperament (so they are all slightly out of tune with one another) and that’s it. By contrast, musicians who sing, or who play completely variable-pitch instruments (violins, cellos, slide trombones etc.), have lots of freedom to choose the pitches of their notes, but this means that it’s very easy to make mistakes and produce notes that are higher or lower in pitch than the ones you intended.

The cello demonstrates this problem most clearly because the notes can be a long distance from one another. Sometimes a cellist will have to move her hand a foot or so at karate-like speed and try to stop as accurately as possible. In most types of music the notes need to be played smoothly, one after the other, with the minimum amount of silence between them. Composers would really like a zero gap between two notes, but that would give you zero time in which to move your hand. In reality a professional player will often have to complete fast, large, accurate movements in a tenth of a second or less!

Cellists do this sort of thing all the time, and it involves an awful lot of skill because there’s nothing on the neck of the cello to tell you where the correct positions are—you just have to learn. Which is why it takes a long time to train to be a good cellist. As a beginner you wouldn’t be doing big jumps, but if you tried it, you’d often overshoot or undershoot by half an inch or more—which could make you about a semitone out—and you’d end up producing the wrong note altogether.

Obviously, as you train you get more accurate, but even after twenty years of practice, the chances of a human executing a lightning-fast hand movement of a foot or so with an accuracy of, say, a twentieth of an inch are vanishingly small. So just how accurate is a professional cellist?

Neuroscientist Jessie Chen and her team developed a specially built cello that measured exactly where a cellist’s fingers landed on the neck of the instrument while it was being played. They tested eight highly trained cellists and found that when they were moving rapidly between two notes nearly a foot apart, even professionals were often more than a quarter of an inch off.2

As part of the investigation the cellists were also asked to play a tune that, at some point, involved an ascending phrase of three notes, each one a semitone higher than the previous one. The middle note of the three had a very short duration: it was supposed to last only one eighth of a second (which is about as long as it takes to pronounce the “a” in the phrase “of a second”). To both the scientists and the cellists the tunes sounded as if they were played accurately, but analysis of the cellists’ finger movements told quite a different story.

Actually, what typically happened was that, during the eighth of a second, the cellist’s finger started at about the right place and then slid about a third of an inch during the production of the short note. The resulting blur of sound didn’t actually have a fixed frequency, but it sounded fine. Fortunately, in any fast, simple passage of music like this, the listener—and the instrumentalist—pay most attention to the notes at the beginning and end of the phrase. So we don’t generally notice whether or not the intermediate notes are blurred or inaccurate.

But don’t sack all the cellists. Given the times and distances involved, they should be praised for getting the notes even approximately right.

So, the music we hear is often out of tune. The scales we use have intrinsic tuning problems, most of the instruments aren’t perfectly accurate, and musicians make mistakes.

With all these problems, how does music still sound OK?

How our brain makes sense of all this pitch sloppiness

We manage to enjoy music even though a lot of it is slightly out of tune because our senses are champions of approximation and the editing of information. Your senses don’t need to be completely accurate; it’s more important that they are very fast, and accurate enough. You are continually flooded with all sorts of information, and your brain has to process the important bits rapidly to come up with a meaningful conclusion, such as “Oh, look—a carrot.”

And if you think that’s the last we’ve heard of carrots in this tightly woven narrative, you are woefully mistaken.

Because your senses are all you have, and they do such a brilliant job, it is tempting to think that they are flawless. Let’s take our eyesight as an example. We all know how easy it is to confuse the eye with dodgy information like this image:

Even if your eyes aren’t being fed deliberately misleading information, they can make mistakes—that is, they can “see” things that aren’t there or completely miss things that are right in front of you. On the other hand, your vision system can also be amazingly good at filling in gaps in the information it receives. For example, each of your eyes has a blind spot that is surprisingly close to the center of your field of vision. Light enters your eye through the iris, and the image of what’s going on is projected onto the back of your eye like a movie being projected onto a screen. But near the middle of our “screen” there is a hole where the optic nerve connects with the retina to take all the information off to the brain. You never notice this blind spot unless you look for it. But try this:

Close your left eye and extend your right arm as far as you can, with the thumb pointing upwards. Your thumb should now be at arm’s length, level with and in front of your nose. Swing your arm slowly to your right while staring directly forwards. Don’t follow your thumb with your eye. Keep your right eye stationary by focusing on something directly in front of you, and keep your left eye closed. When your thumb is just a little bit to the right of your right shoulder it will disappear!

Yet on a day-to-day basis we never realize that there is a blind spot.

Your vision system, like your hearing system, works in three stages. First, information is collected, then it’s analyzed, and finally a summary is presented to your conscious mind. When we look at the impossible triangle we saw earlier, the data is collected and analyzed correctly, but the third step in the process, the summary, breaks down. In the case of the blind spot, we have an error at the first stage—the data collection. Fortunately the second and third stages are so sophisticated that they usually compensate for any gaps in the information and construct a useful summary anyway: “Yes—that’s a bunch of carrots all right.”

Just like your eyesight, your hearing system can be fooled, but usually it does a great job of producing useful summaries from inaccurate or incomplete information. The rather sloppy give-and-take of our hearing system allows us to understand and/or enjoy what’s going on far more rapidly that we could if we always required perfect, accurate information.

Categorization of notes

Probably the most important function of our hearing system nowadays is that it allows us to understand what other people are saying to us. There are hundreds of accents, affectations, and afflictions that affect how we pronounce words—yet we manage to understand one another surprisingly well. If you encounter someone with a strong unfamiliar accent you might have a bit of difficulty, but it’s rarely enough to prevent you from communicating.

Computer-aided sound analysis has shown that spoken sounds can be broken down into smaller components, each of which plays a role in determining what we hear. Detailed investigation in the case of d and t has revealed that the main components of the two sounds are very similar, but we use them slightly differently depending on which letter we are pronouncing. For a d, the two main components of the sound begin at almost the same moment, but for a t, one component starts about six one hundredths of a second later than the other.3

OK—so we can get our computer to make d noises (as in “drip”) by producing the two component sounds together, then we can start to delay one of the components by one hundredth of a second, then two hundredths, then three and so on, up to six. You might think that we would start off hearing a clear d that would gradually become less distinct until it eventually turned into a t (as in “trip”) once the delay was close to six one hundredths of a second—so we would hear d, d?, ?, ?, t?, t.

But that isn’t what happens. Tests have shown that we don’t hear a sliding scale between the two sounds.⁴ We hear a d until a certain delay is exceeded and then we flip over to hearing a t. We hear d, d, d, t, t, t.

This is because we don’t find a vague noise somewhere between a d and a t useful, so we use our categorization skills to divide all the d/t sounds into two categories—either t’s or d’s. This is inaccurate but efficient. Communication would be almost impossible if our brain accepted only an exact sound as a t or a d (or any other verbalization). We need to quickly put sounds that are approximately OK into the correct category so we can work out what is being said. Our senses do this useful approximation thing a lot. We do it, for example, when we’re identifying whether something is a carrot or not. We accept that the particular orange pointy thing we are looking at is a carrot even if it’s bigger or smaller than average—because it still fits into our mental carroty category.

“And what,” you might reasonably ask, “has all this to do with music?”

Well, we use categorization techniques all the time when we listen to music. If a note is slightly out of tune, we don’t usually notice it. We don’t require the note to be exactly the correct frequency; we just need it to be close enough to be put into a category.

Back in 1973 Simeon Locke and Lucia Kellar decided to find out how out of tune a note could be and still be allowed into its category. They played three-note chords to musicians. The outer two notes of the chords were in tune, but the middle note varied from in tune to out of tune. The experimenters used the range from C to C sharp (the next note up) for the middle note. Sometimes it was an in-tune C, sometimes it was an in-tune C sharp, but most of the time it was somewhere between these two notes—and therefore out of tune.

My forefinger is on C sharp, my second finger is on C. If you (like Locke and Kellar) want to produce a note between these two, you need to get a piano tuner to come in and slacken the C sharp strings a bit.

The musicians had a properly tuned chord played to them first and were then asked if the next chord they heard (which was usually out of tune) was the same. If their hearing systems were entirely accurate, they should have spotted that the middle note of the chord was now out of tune. But generally they categorized whatever they heard as being either a C or a C sharp.5 So in music as in speech, our hearing system is rather forgiving and inaccurate.

We are, however, more sensitive to notes being out of tune with one another in certain circumstances than we are in others. It all depends which notes in the chord are involved.

I mentioned earlier that, when played at the same time as the keynote, the ½ cycle time note sounds smoother than ⅔, which sounds smoother than ¾, and so on.

One odd feature of this hierarchy of smoothness is that the smoothest combinations are the ones that are most negatively affected if one of the notes is out of tune with the other. When played at the same time as a keynote, an out-of-tune ½ cycle time note (octave) sounds the worst, the next worst is an out-of-tune ⅔, then an out-of-tune ¾, etc. By the time we get to the or notes, we barely notice if they are out of tune when they are played at the same time as the keynote because the combination is so rough-sounding even when they are in tune.

Think of it like this: we don’t notice an extra smudge on an already dirty mirror, but a fingerprint on a clean, polished mirror stands out a mile. Simeon Locke and Lucia Kellar knew that if they de-tuned the ⅔ note in their chord, everyone would notice it pretty quickly. So they de-tuned the other note in the chord—the one with a cycle time of . They were then starting with a “dirty mirror” and making it dirtier by de-tuning it. During the experiment they varied the note from a ratio of to . When the note was , the musicians heard a major chord. When it was , they heard a minor chord. During the experiment the note was usually between these two fractions and therefore out of tune. But the categorization systems of the listeners ignored the out-of-tune-ness and only heard either a or note.

Another experiment to investigate our sensitivity to out-of-tune-ness was carried out in 1977 by psychologists Jane and William Siegel, who were trying to find out how accurate we are in identifying intervals. (An interval is the size of the jump in pitch between two notes.)

The Siegels pre-screened twenty-four highly trained music students, all of whom played several instruments and had begun their musical training at an early age, to determine how accurate they were at identifying intervals. Only the best six were chosen to go on to do the test.

These six students were then asked to identify a series of intervals and, at the same time, judge how in or out of tune they were.

Only three of the thirteen intervals used in the tests were actually in tune, but during the test all the students insisted that eight or nine of them were correct. The students thought that the intervals were correct even if they were out by one fifth of a semitone. A computer could have shown them how far off the notes were—but everyone involved subconsciously fitted the intervals into categories using our relaxed mental approximation techniques.

The results of the tests were a triumph for the psychologists, who had been expecting categorization to win the day. But it was all a bit embarrassing for the music students, who assumed that they would be much more accurate than they actually were.

As the Siegels put it in their report on the experiment:

Most were unaware of how poorly they were doing during the course of the experiment, and considerable tact and compassion were required when we debriefed the subjects.

And the final verdict:

Musicians had a strong tendency to rate out-of-tune stimuli as in tune. Their attempts to make fine judgments were highly inaccurate and unreliable.6

Given all this mental sloppiness, it’s rather surprising that our ears can actually distinguish between notes that are very close together in pitch. Well-trained listeners can tell that two pure tones^* are different even if their frequencies are as close together as 1000 Hz and 1002 Hz—a difference of only about one thirtieth of a semitone.⁷ When the researchers tried untrained listeners with this pitch discrimination test, they were far less sensitive at first, but they caught up with the experts after only a few hours’ practice.⁸

But for both our vision and hearing systems, there is a big difference between being able to distinguish between two things and being able to remember the difference between them.

No doubt you’ve all seen those paint color charts we pore over before we decorate the bedroom—although, frankly, I don’t know why the paint manufacturers bother with all the bright colors nowadays, since nearly everyone I know chooses variations on a hue that should be called “damp string.” There is usually a selection of about a dozen of these beiges, and countless newlyweds have wasted their entire honeymoon quibbling about the merits of decorating their new home with “Bamboo Dream” as compared to the even subtler tonal underplay of “Cardboard Dawn.”

Well, imagine if there were a choice of fifty beiges rather than a dozen or so. You would have no trouble distinguishing between them on the color chart—but could you remember your favorite shade without looking at the chart? Let’s imagine a bizarrely motivated burglar broke into your house and cut up your beige color chart into fifty little nameless squares and then stole five of them. You could stare all day at the forty-five remaining squares, but you’d never be sure that your favorite wasn’t one of the missing ones.

Or, to consider another implausible scenario: I could show you three pieces of string—a long one, a short one, and a medium-length one. If I presented you with one of them in isolation later in the evening, you could easily remember whether it was the long, medium, or short one. But let’s say I gave you twenty pieces of string. You would still have no problem in ranking them in order of length. In this case, however, if I presented you with one of them in isolation a couple of hours later, you wouldn’t be able to tell me exactly which one it was. You might be able to remember that the one I was showing you was somewhere in the middle of the range, but you wouldn’t be able to identify whether it was the eighth-, ninth-, or tenth-longest. We are all far more skilled at distinguishing between things than we are at remembering the details of the items involved. This rule also works for music. Our ability to remember tunes depends upon the fact that there are only a few notes involved, and they have fairly large jumps in pitch between them.

With only seven fairly widely spaced notes to choose from in any key, we can use our categorization skills to make sense of and remember tunes even if the musician or instrument is a little inaccurate in producing the correct note pitches.

Singers, and players of completely variable-pitch instruments (violins, cellos, slide trombones etc.), can decide on the scale system they will use depending on the context. If they are playing along with a piano, they might choose to match its ET tuning. If they are playing solo or with a bunch of other variable-pitch instruments (in a choir or string quartet), they might choose to play according to Just tuning.

Whichever they choose, they are often somewhere between the two (or off target altogether) because of the natural difficulties in hitting exactly the correct pitch—particularly when playing quickly. For example, on a violin neck the difference between the ET note and the Just one is often only a tenth of an inch or less. Even professional players are not that accurate when they are playing even moderately quickly. This works out quite well because our hearing isn’t accurate for short, rapid notes either. So neither the instrumentalist nor the listener is aware of the errors in pitch.

The only time when small errors in pitch are really troublesome is in harmonies at moderate to slow speeds. When we hear long notes played together, we can spot if they are out of tune, particularly if the out-of-tune notes are the prominent ones in the chord. In this type of music, whether it’s a Beethoven string quartet or a harmonized gospel song, errors in tuning stand out very clearly. Fortunately these are also the occasions when errors can be rapidly repaired. Skilled performers on variable-pitch instruments (including the voice) will quickly and instinctively adjust the note they’re producing to bring it into tune with the others that are playing at that point. In this way all the peculiarities of the Just system can be ironed out as you go along.

Some players of variable-pitch instruments, playing the melody against a background of harmony, or playing completely solo, use expressive intonation. This involves making some of the jumps in pitch in the tune slightly bigger and some of them slightly smaller in order to emphasize certain notes in a tune. For example, some melodies end with the keynote preceded by the note just below it. Playing this penultimate note at a slightly higher pitch than normal exaggerates the feeling that we are on our way home to the keynote, and we’re nearly there. Blues guitarists and singers use similar techniques, sliding up toward notes and then away from them (in some cases never reaching the “proper” pitch of the note). Supporters of expressive intonation and blues techniques say that they add more oomph to the emotional content of the music. Detractors say they merely sound out of tune.

Given all the approximations and inaccuracies involved in every type of music, we should be thankful that our ability to categorize helps us make sense of it all. It’s amazing how well everything works out. But no matter what tuning system we use, or how accurately we use it, there will always be notes that sound rough when they are played together, an effect known as dissonance.