4. Anticipation

What We Expect from Liszt (and Ludacris)

When I’m at a wedding, it is not the sight of the hope and love of the bride and groom standing in front of their friends and family, their whole life before them, that makes my eyes tear up. It is when the music begins that I start to cry. In a movie, when two people are at long last reunited after some great ordeal, the music again pushes me and my emotions over the sentimental edge.

I said earlier that music is organized sound, but the organization has to involve some element of the unexpected or it is emotionally flat and robotic. The appreciation we have for music is intimately related to our ability to learn the underlying structure of the music we like—the equivalent to grammar in spoken or signed languages—and to be able to make predictions about what will come next. Composers imbue music with emotion by knowing what our expectations are and then very deliberately controlling when those expectations will be met, and when they won’t. The thrills, chills, and tears we experience from music are the result of having our expectations artfully manipulated by a skilled composer and the musicians who interpret that music.

Perhaps the most documented illusion—or parlor trick—in Western classical music is the deceptive cadence. A cadence is a chord sequence that sets up a clear expectation and then closes, typically with a satisfying resolution. In the deceptive cadence, the composer repeats the chord sequence again and again until he has finally convinced the listeners that we’re going to get what we expect, but then at the last minute, he gives us an unexpected chord—not outside the key, but a chord that tells us that it’s not over, a chord that doesn’t completely resolve. Haydn’s use of the deceptive cadence is so frequent, it borders on an obsession. Perry Cook has likened this to a magic trick: Magicians set up expectations and then defy them, all without you knowing exactly how or when they’re going to do it. Composers do the same thing. The Beatles’ “For No One” ends on the V chord (the fifth degree of the scale we’re in) and we wait for a resolution that never comes—at least not in that song. But the very next song on the album Revolver starts a whole step down from the very chord we were waiting to hear, a semi-resolution (to the flat seven) that straddles surprise and release.

The setting up and then manipulating of expectations is the heart of music, and it is accomplished in countless ways. Steely Dan do it by playing songs that are essentially the blues (with blues structure and chord progressions) but by adding unusual harmonies to the chords that make them sound very unblues—for example on their song “Chain Lightning.” Miles Davis and John Coltrane made careers out of reharmonizing blues progressions to give them new sounds that were anchored partly in the familiar and partly in the exotic. On his solo album Kamakiriad, Donald Fagen (of Steely Dan) has one song with blues/funk rhythms that leads us to expect the standard blues chord progression, but the first minute and a half of the song is played on only one chord, never moving from that harmonic position. (Aretha Franklin’s “Chain of Fools” is all one chord.)

In “Yesterday,” the main melodic phrase is seven measures long; the Beatles surprise us by violating one of the most basic assumptions of popular music, the four- or eight-measure phrase unit (nearly all rock/ pop songs have musical ideas that are organized into phrases of those lengths). In “I Want You (She’s So Heavy),” the Beatles violate expectations by first setting up a hypnotic, repetitive ending that sounds like it will go on forever; based on our experience with rock music and rock music endings, we expect that the song will slowly die down in volume, the classic fade-out. Instead, they end the song abruptly, and not even at the end of a phrase—they end right in the middle of a note!

The Carpenters use timbre to violate genre expectations; they were probably the last group people expected to use a distorted electric guitar, but they did on “Please Mr. Postman” and some other songs. The Rolling Stones—one of the hardest rock bands in the world at the time—had done the opposite of this just a few years before by using violins (as for example, on “As Tears Go By”). When Van Halen were the newest, hippest group around they surprised fans by launching into a heavy metal version of an old not-quite-hip song by the Kinks, “You Really Got Me.”

Rhythm expectations are violated often as well. A standard trick in electric blues is for the band to build up momentum and then stop playing altogether while the singer or lead guitarist continues on, as in Stevie Ray Vaughan’s “Pride and Joy,” Elvis Presley’s “Hound Dog,” or the Allman Brothers’ “One Way Out.” The classic ending to an electric blues song is another example. The song charges along with a steady beat for two or three minutes and—wham! Just as the chords suggest an ending is imminent, rather than charging through at full speed, the band suddenly starts playing at half the tempo they were before.

In a double whammy, Creedence Clearwater Revival pulls out this slowed-down ending in “Lookin’ Out My Back Door”—by then such an ending was already a well-known cliché—and they violate the expectations of that by coming in again for the real ending of the song at full tempo.

The Police made a career out of violating rhythmic expectations. The standard rhythmic convention in rock is to have a strong downbeat on 1 and 3 (indicated by the kick drum) with a snare drum backbeat on 2 and 4. Reggae music (most clearly exemplified by Bob Marley) can be felt as happening half as fast as rock music because its kick and snare occur half as often for a given musical phrase. Its basic beat is characterized by a guitar on the upbeats (or offbeats); that is, the guitar plays in the space that would be halfway between the main beats that you count: 1 AND-A 2 AND 3 AND-A 4 AND. Because of its “half-time” feel, it has a lazy quality, but the upbeats give it a sense of movement, propelling ever forward. The Police combined reggae with rock to create a new sound that fulfilled some and violated other rhythmic expectations simultaneously. Sting often played bass guitar parts that were entirely novel, avoiding the rock clichés of playing on the downbeat or of playing synchronously with the bass drum. As Randy Jackson of American Idol fame, and one of the top session bass players, told me (back when we shared an office in a recording studio in the 1980s), Sting’s basslines are unlike anyone else’s, and they wouldn’t even fit in anyone else’s songs. “Spirits in the Material World” from their album Ghost in the Machine takes this rhythmic play to such an extreme it can be hard to tell where the downbeat even is.

Modern composers such as Schönberg threw out the whole idea of expectation. The scales they used deprive us of the notion of a resolution, a root to the scale, or a musical “home,” thus creating the illusion of no home, a music adrift, perhaps as a metaphor for a twentieth-century existentialist existence (or just because they were trying to be contrary). We still hear these scales used in movies to accompany dream sequences to convey a lack of grounding, or in underwater or outer space scenes to convey weightlessness.

These aspects of music are not represented directly in the brain, at least not during initial stages of processing. The brain constructs its own version of reality, based only in part on what is there, and in part on how it interprets the tones we hear as a function of the role they play in a learned musical system. We interpret spoken language analogously. There is nothing intrinsically catlike about the word cat or even any of the letters in the word. We have learned that this collection of sounds represents the feline house pet. Similarly, we have learned that certain sequences of tones go together, and we expect them to continue to do so. We expect certain pitches, rhythms, timbres, and so on to co-occur based on a statistical analysis our brain has performed of how often they have gone together in the past. We have to reject the intuitively appealing idea that the brain is storing an accurate and strictly isomorphic representation of the world. To some degree, it is storing perceptual distortions, illusions, and extracting relationships among elements. It is computing a reality for us, one that is rich in complexity and beauty. A basic piece of evidence for such a view is the simple fact that light waves in the world vary along one dimension—wavelength—and yet our perceptual system treats color as two dimensional (the color circle described on page 29). Similarly with pitch: From a one-dimensional continuum of molecules vibrating at different speeds, our brains construct a rich, multidimensional pitch space with three, four, or even five dimensions (according to some models). If our brain is adding this many dimensions to what is out there in the world, this can help explain the deep reactions we have to sounds that are properly constructed and skillfully combined.

When cognitive scientists talk about expectations and violating them, we mean an event whose occurrence is at odds with what might have been reasonably predicted. It is clear that we know a great deal about a number of different standard situations. Life presents us with similar situations that differ only in details, and often those details are insignificant. Learning to read is an example. The feature extractors in our brain have learned to detect the essential and unvarying aspect of letters of the alphabet, and unless we explicitly pay attention, we don’t notice details such as the font that a word is typed in. Even though surface details are different, all these words are equally recognizable, as are their individual letters. (It may be jarring to read sentences in which every word is in a different font, and of course such rapid shifting causes us to notice, but the point remains that our feature detectors are busy extracting things like “the letter a” rather than processing the font it is typed in.)

An important way that our brain deals with standard situations is that it extracts those elements that are common to multiple situations and creates a framework within which to place them; this framework is called a schema. The schema for the letter a would be a description of its shape, and perhaps a set of memory traces that includes all the a’s we’ve ever seen, showing the variability that accompanies the schema. Schemas inform a host of day-to-day interactions we have with the world. For example, we’ve been to birthday parties and we have a general notion—a schema—of what is common to birthday parties. The birthday party schema will be different for different cultures (as is music), and for people of different ages. The schema leads to clear expectations, as well as a sense of which of those expectations are flexible and which are not. We can make a list of things we would expect to find at a typical birthday party. We wouldn’t be surprised if these weren’t all present, but the more of them that are absent, the less typical the party would be:

~ A person who is celebrating the anniversary of their birth

~ Other people helping that person to celebrate

~ A cake with candles

~ Presents

~ Festive food

~ Party hats, noisemakers, and other decorations

If the party was for an eight-year-old we might have the additional expectation that there would be a rousing game of pin-the-tail-on-the-donkey, but not single-malt scotch. This more or less constitutes our birthday party schema.

We have musical schemas, too, and these begin forming in the womb and are elaborated, amended, and otherwise informed every time we listen to music. Our musical schema for Western music includes implicit knowledge of the scales that are normally used. This is why Indian or Pakistani music, for example, sounds “strange” to us the first time we hear it. It doesn’t sound strange to Indians and Pakistanis, and it doesn’t sound strange to infants (or at least not any stranger than any other music). This may be an obvious point, but it sounds strange by virtue of its being inconsistent with what we have learned to call music. By the age of five, infants have learned to recognize chord progressions in the music of their culture—they are forming schemas.

We develop schemas for particular musical genres and styles; style is just another word for “repetition.” Our schema for a Lawrence Welk concert includes accordions, but not distorted electric guitars, and our schema for a Metallica concert is the opposite. A schema for Dixieland includes foot-tapping, up-tempo music, and unless the band was trying to be ironic, we would not expect there to be overlap between their repertoire and that of a funeral procession. Schemas are an extension of memory. As listeners, we recognize when we are hearing something we’ve heard before, and we can distinguish whether we heard it earlier in the same piece, or in a different piece. Music listening requires, according to the theorist Eugene Narmour, that we be able to hold in memory a knowledge of those notes that have just gone by, alongside a knowledge of all other musics we are familiar with that approximate the style of what we’re listening to now. This latter memory may not have the same level of resolution or the same amount of vividness as notes we’ve just heard, but it is necessary in order to establish a context for the notes we’re hearing.

The principal schemas we develop include a vocabulary of genres and styles, as well as of eras (1970s music sounds different from 1930s music), rhythms, chord progressions, phrase structure (how many measures to a phrase), how long a song is, and what notes typically follow what. When I said earlier that the standard popular song has phrases that are four or eight measures long, this is a part of the schema we’ve developed for late twentieth-century popular songs. We’ve heard thousands of songs thousands of times and even without being able to explicitly describe it, we have incorporated this phrase tendency as a “rule” about music we know. When “Yesterday” plays with its seven-measure phrase, it is a surprise. Even though we’ve heard “Yesterday” a thousand or even ten thousand times, it still interests us because it violates schematic expectations that are even more firmly entrenched than our memory for this particular song. Songs that we keep coming back to for years play around with expectations just enough that they are always at least a little bit surprising. Steely Dan, the Beatles, Rachmaninoff, and Miles Davis are just a few of the artists that some people say they never tire of, and this is a big part of the reason.

Melody is one of the primary ways that our expectations are controlled by composers. Music theorists have identified a principle called gap fill; in a sequence of tones, if a melody makes a large leap, either up or down, the next note should change direction. A typical melody includes a lot of stepwise motion, that is, adjacent tones in the scale. If the melody makes a big leap, theorists describe a tendency for the melody to “want” to return to the jumping-off point; this is another way to say that our brains expect that the leap was only temporary, and tones that follow need to bring us closer and closer to our starting point, or harmonic “home.”

In “Over the Rainbow,” the melody begins with one of the largest leaps we’ve ever experienced in a lifetime of music listening: an octave. This is a strong schematic violation, and so the composer rewards and soothes us by bringing the melody back toward home again, but not by too much—he does come down, but only by one scale degree—because he wants to continue to build tension. The third note of this melody fills the gap. Sting does the same thing in “Roxanne”: He leaps up an interval of roughly a half octave (a perfect fourth) to hit the first syllable of the word Roxanne, and then comes down again to fill the gap.

We also hear gap fill in the andante cantabile from Beethoven’s “Pathétique” Sonata. As the main theme climbs upward, it moves from a C (in the key of A-flat, this is the third degree of the scale) to the A-flat that is an octave above what we consider the “home” note, and it keeps on climbing to a B-flat. Now that we’re an octave and a whole step higher than home, there is only one way to go, back toward home. Beethoven actually jumps toward home, down an interval of a fifth, landing on the note (E-flat) that is a fifth above the tonic. To delay the resolution—Beethoven was a master of suspense—instead of continuing the descent down to the tonic, Beethoven moves away from it. In writing the jump down from the high B-flat to the E-flat, Beethoven was pitting two schemas against each other: the schema for resolving to the tonic, and the schema for gap fill. By moving away from the tonic at this point, he is also filling the gap he made by jumping so far down to get to this midpoint. When Beethoven finally brings us home two measures later, it is as sweet a resolution as we’ve ever heard.

Consider now what Beethoven does to expectations with the melody to the main theme from the last movement of his Ninth Symphony (“Ode to Joy”). These are the notes of the melody, as solfège, the do-re-mi system:

mi - mi - fa - sol - sol - fa - mi - re - do - do -re - mi - mi - re - re

(If you’re having trouble following along, it might help if you sing in your mind the English words to this part of the song: “Come and sing a song of joy for peace a glory gloria …”)

The main melodic theme is simply the notes of the scale! The best-known, overheard, and overused sequence of notes we have in Western music. But Beethoven makes it interesting by violating our expectations. He starts on a strange note and ends on a strange note. He starts on the third degree of the scale (as he did on the “Pathétique” Sonata), rather than the root, and then goes up in stepwise fashion, then turns around and comes down again. When he gets to the root—the most stable tone—rather than staying there he comes up again, up to the note we started on, then back down so that we think and we expect he will hit the root again, but he doesn’t; he stays right there on re, the second scale degree. The piece needs to resolve to the root, but Beethoven keeps us hanging there, where we least expect to be. He then runs the entire motif again, and only on the second time through does he meet our expectations. But now, that expectation is even more interesting because of the ambiguity: We wonder if, like Lucy waiting for Charlie Brown, he will pull the football of resolution away from us at the last minute.

What do we know about the neural basis for musical expectations and musical emotion? If we acknowledge that the brain is constructing a version of reality, we must reject that the brain has an accurate and strictly isomorphic representation of the world. So what is the brain holding in its neurons that represents the world around us? The brain represents all music and all other aspects of the world in terms of mental or neural codes. Neuroscientists try to decipher this code and understand its structure, and how it translates into experience. Cognitive psychologists try to understand these codes at a somewhat higher level—not at the level of neural firings, but at the level of general principles.

The way in which a picture is stored on your computer is similar, in principle, to how the neural code works. When you store a picture on your computer, the picture is not stored on your hard drive the way that a photograph is stored in your grandmother’s photo album. When you open your grandmother’s album, you can pick up a photo, turn it upside down, give it to a friend; it is a physical object. It is the photograph, not a representation of a photograph. On the other hand, a photo in your computer is stored in a file made up of 0s and 1s—the binary code that computers use to represent everything.

If you’ve ever opened a corrupt file, or if your e-mail program didn’t properly download an attachment, you’ve probably seen a bunch of gibberish in place of what you thought was a computer file: a string of funny symbols, squiggles, and alphanumeric characters that looks like the equivalent of a comic-strip swear word. (These represent a sort of intermediate hexadecimal code that itself is resolved into 0s and 1s, but this intermediate stage is not crucial for understanding the analogy.) In the simplest case of a black-and-white photograph, a 1 might represent that there is a black dot at a particular place in the picture, and a 0 might indicate the absence of a black dot, or a white dot. You can imagine that one could easily represent a simple geometric shape using these 0s and 1s, but the 0s and 1s would not themselves be in the shape of a triangle, they would simply be part of a long line of 0s and 1s, and the computer would have a set of instructions telling it how to interpret them (and to what spatial location each number refers). If you got really good at reading such a file, you might be able to decode it, and guess what sort of image it represents. The situation is vastly more complicated with a color image, but the principle is the same. People who work with image files all the time are able to look at the stream of 0s and 1s and tell something about the nature of the photograph—not at the level of whether it is a human or a horse, perhaps, but things like how much red or gray is in the picture, how sharp the edges are, and so forth. They have learned to read the code that represents the picture.

Similarly, audio files are stored in binary format, as sequences of 0s and 1s. The 0s and 1s represent whether or not there is any sound at particular parts of the frequency spectrum. Depending on its position in the file, a certain sequence of 0s and 1s will indicate if a bass drum or a piccolo is playing.

In the cases I’ve just described, the computer is using a code to represent common visual and auditory objects. The objects themselves are decomposed into small components—pixels in the case of a picture, sine waves of a particular frequency and amplitude in the case of sound—and these components are translated into the code. Of course, the computer (brain) is running a lot of fancy software (mind) that translates the code effortlessly. Most of us don’t have to concern ourselves with the code itself at all. We scan a photo or rip a song to our hard drive, and when we want to see it or hear it, we double-click on it and there it appears, in all its original glory. This is an illusion made possible by the many layers of translation and amalgamation going on, all of it invisible to us. This is what the neural code is like. Millions of nerves firing at different rates and different intensities, all of it invisible to us. We can’t feel our nerves firing; we don’t know how to speed them up, slow them down, turn them on when we’re having trouble getting started on a bleary-eyed morning, or shut them off so we can sleep at night.

Years ago, my friend Perry Cook and I were astonished when we read an article about a man who could look at phonograph records and identify the piece of music that was on them, by looking at the grooves, with the label obscured. Did he memorize the patterns of thousands of record albums? Perry and I took out some old record albums and we noticed some regularities. The grooves of a vinyl record contain a code that is “read” by the needle. Low notes create wide grooves, high notes create narrow grooves, and a needle dropped inside the grooves is moving thousands of times per second to capture the landscape of the inner wall. If a person knew many pieces of music well, it would be possible to characterize them in terms of how many low notes there were (rap music has a lot, baroque concertos don’t), how steady versus percussive the low notes are (think of a jazz-swing tune with walking bass as opposed to a funk tune with slapping bass), and to learn how these shapes are encoded in vinyl. This fellow’s skills are extraordinary, but they’re not inexplicable.

We encounter gifted auditory-code readers every day: the mechanic who can listen to the sound of your engine and determine whether your problems are due to clogged fuel injectors or a slipped timing chain; the doctor who can tell by listening to your heart whether you have an arrhythmia; the police detective who can tell when a suspect is lying by the stress in his voice; the musician who can tell a viola from a violin or a B-flat clarinet from an E-flat clarinet just by the sound. In all these cases, timbre is playing an important role in helping us to unlock the code.

How can we study neural codes and learn to interpret them? Some neuroscientists start by studying neurons and their characteristics—what causes them to fire, how rapidly they fire, what their refractory period is (how long they need to recover between firings); we study how neurons communicate with each other and the role of neurotransmitters in conveying information in the brain. Much of the work at this level of analysis concerns general principles; we don’t yet know much about the neurochemistry of music, for example, although I’ll reveal some exciting new results along this line from my laboratory in Chapter 5.

But I’ll back up for a minute. Neurons are the primary cells of the brain; they are also found in the spinal cord and the peripheral nervous system. Activity from outside the brain can cause a neuron to fire—such as when a tone of a particular frequency excites the basilar membrane, and it in turn passes a signal up to a frequency-selective neurons in the auditory cortex. Contrary to what we thought a hundred years ago, the neurons in the brain aren’t actually touching; there’s a space between them called the synapse. When we say a neuron is firing, it is sending an electrical signal that causes the release of a neurotransmitter. Neurotransmitters are chemicals that travel throughout the brain and bind to receptors attached to other neurons. Receptors and neurotransmitters can be thought of as locks and keys respectively. After a neuron fires, a neurotransmitter swims across that synapse to a nearby neuron, and when it finds the lock and binds with it, that new neuron starts to fire. Not all keys fit all locks; there are certain locks (receptors) that are designed to accept only certain neurotransmitters.

Generally, neurotransmitters cause the receiving neuron to fire or prevent it from firing. The neurotransmitters are then absorbed through a process called reuptake; without reuptake, the neurotransmitters would continue to stimulate or inhibit the firing of a neuron.

Some neurotransmitters are used throughout the nervous system, and some only in certain brain regions and by certain kinds of neurons. Serotonin is produced in the brain stem and is associated with the regulation of mood and sleep. The new class of antidepressants, including Prozac and Zoloft, are known as selective serotonin reuptake inhibitors (SSRIs) because they inhibit the reuptake of serotonin in the brain, allowing whatever serotonin is already there to act for a longer period of time. The precise mechanism by which this alleviates depression, obsessive-compulsive disorder, and mood and sleep disorders is not known. Dopamine is released by the nucleus accumbens and is involved in mood regulation and the coordination of movement. It is most famous for being part of the brain’s pleasure and reward system. When drug addicts get their drug of choice, or when compulsive gamblers win a bet—even when chocoholics get cocoa—this is the neurotransmitter that is released. Its role—and the important role played by the nucleus accumbens—in music was unknown until 2005.

Cognitive neuroscience has been making great leaps in understanding over the last decade. We now know so much more about how neurons work, how they communicate, how they form networks, and how neurons develop from their genetic recipes. One finding at the macro level about the function of the brain is the popular notion about hemispheric specialization—the idea that the left half of the brain and the right half of the brain perform different cognitive functions. This is certainly true, but as with much of the science that has permeated popular culture, that real story is somewhat more nuanced.

To begin with, the research on which this is based was performed on right-handed people. For reasons that aren’t entirely clear, people who are left-handed (approximately 5 to 10 percent of the population) or ambidextrous sometimes have the same brain organization as right-handers, but more often have a different brain organization. When the brain organization is different, it can take the form of a simple mirror image, such that functions are simply flipped to the opposite side. In many cases, however, left-handers have a neural organization that is different in ways that are not yet well documented. Thus, any generalizations we make about hemispheric asymmetries are applicable only to the right-handed majority of the population.

Writers, businessmen, and engineers refer to themselves as left-brain dominant, and artists, dancers, and musicians as right-brain dominant. The popular conception that the left brain is analytical and the right brain is artistic has some merit, but is overly simplistic. Both sides of the brain engage in analysis and both sides in abstract thinking. All of these activities require coordination of the two hemispheres, although some of the particular functions involved are clearly lateralized.

Speech processing is primarily left-hemisphere localized, although certain global aspects of spoken language, such as intonation, emphasis, and the pitch pattern, are more often disrupted following right-hemisphere damage. The ability to distinguish a question from a statement, or sarcasm from sincerity, often rests on these right-hemisphere lateralized, nonlinguistic cues, known collectively as prosody. It is natural to wonder whether music shows the opposite asymmetry, with processing located primarily on the right. There are many cases of individuals with brain damage to the left hemisphere who lost the power of speech, but retained their musical function, and vice versa. Cases like these suggest that music and speech, although they may share some neural circuits, cannot use completely overlapping neural structures.

Local features of spoken language, such as distinguishing one speech sound from another, appear to be left-hemisphere lateralized. We’ve found lateralization in the brain basis of music as well. The overall contour of a melody—simply its melodic shape, while ignoring intervals—is processed in the right hemisphere, as is making fine discriminations of tones that are close together in pitch. Consistent with its language functions, the left hemisphere is involved in the naming aspects of music—such as naming a song, a performer, an instrument, or a musical interval. Musicians using their right hands or reading music from their right visual field also use the left brain because the left half of the brain controls the right half of the body. There is also new evidence that tracking the ongoing development of a musical theme—thinking about key and scales and whether a piece of music makes sense or not—is lateralized to the left frontal lobes.

Musical training appears to have the effect of shifting some music processing from the right (imagistic) hemisphere to the left (logical) hemisphere, as musicians learn to talk about—and perhaps think about—music using linguistic terms. And the normal course of development seems to cause greater hemispheric specialization: Children show less lateralization of musical operations than do adults, regardless of whether they are musicians or not.

The best place to begin to look at expectation in the musical brain is in how we track chord sequences in music over time. The most important way that music differs from visual art is that it is manifested over time. As tones unfold sequentially, they lead us—our brains and our minds—to make predictions about what will come next. These predictions are the essential part of musical expectations. But how to study the brain basis of these?

Neural firings produce a small electric current, and consequently the current can be measured with suitable equipment that allows us to know when and how often neurons are firing; this is called the electroencephalogram, or EEG. Electrodes are placed (painlessly) on the surface of the scalp, much as a heart monitor might be taped to your finger, wrist, or chest. The EEG is exquisitely sensitive to the timing of neural firings, and can detect activity with a resolution of one thousandth of a second (one millisecond). But it has some limitations. EEG is not able to distinguish whether the neural activity is releasing excitatory, inhibitory, or modulatory neurotransmitters, the chemicals such as serotonin and dopamine that influence the behavior of other neurons. Because the electrical signature generated by a single neuron firing is relatively weak, the EEG only picks up the synchronous firing of large groups of neurons, rather than individual neurons.

EEG also has limited spatial resolution—that is, a limited ability to tell us the location of the neural firings, due to what is called the inverse Poisson problem. Imagine that you’re standing inside a football stadium that has a large semitransparent dome covering it. You have a flashlight, and you point it up to the inside surface of the dome. Meanwhile, I’m standing on the outside, looking down at the dome from high above, and I have to predict where you’re standing. You could be standing anywhere on the entire football field and shining your light at the same particular spot in the center of the dome, and from where I’m standing, it will all look the same to me. There might be slight differences in the angle or the brightness of the light, but any prediction I make about where you’re standing is going to be a guess. And if you were to bounce your flashlight beam off of mirrors and other reflective surfaces before it reached the dome, I’d be even more lost. This is the case with electrical signals in the brain that can be generated from multiple sources in the brain, from the surface of the brain or deep down inside the grooves (sulci), and that can bounce off of the sulci before reaching the electrode on the outer scalp surface. Still, EEG has been helpful in understanding musical behavior because music is time based, and EEG has the best temporal resolution of the tools we commonly employ for studying the human brain.

Several experiments conducted by Stefan Koelsch, Angela Friederici, and their colleagues have taught us about the neural circuits involved in musical structure. The experimenters play chord sequences that either resolve in the standard, schematic way, or that end on unexpected chords. After the onset of the chord, electrical activity in the brain associated with musical structure is observed within 150–400 milliseconds (ms), and activity associated with musical meaning about 100–150 ms later. The structural processing—musical syntax—has been localized to the frontal lobes of both hemispheres in areas adjacent to and overlapping with those regions that process speech syntax, such as Broca’s area, and shows up regardless of whether listeners have musical training. The regions involved in musical semantics—associating a tonal sequence with meaning—appear to be in the back portions of the temporal lobe on both sides, near Wernicke’s area.

The brain’s music system appears to operate with functional independence from the language system—the evidence comes from many case studies of patients who, postinjury, lose one or the other faculty but not both. The most famous case is perhaps that of Clive Wearing, a musician and conductor, whose brain was damaged as a result of herpes encephalitis. As reported by Oliver Sacks, Clive lost all memory except for musical memories, and the memory of his wife. Other cases have been reported for which the patient lost music but retained language and other memories. When portions of his left cortex deteriorated, the composer Ravel selectively lost his sense of pitch while retaining his sense of timbre, a deficit that inspired his writing of Bolero, a piece that emphasizes variations in timbre. The most parsimonious explanation is that music and language do, in fact, share some common neural resources, and yet have independent pathways as well. The close proximity of music and speech processing in the frontal and temporal lobes, and their partial overlap, suggests that those neural circuits that become recruited for music and language may start out life undifferentiated. Experience and normal development then differentiate the functions of what began as very similar neuronal populations. Consider that at a very early age, babies are thought to be synesthetic, to be unable to differentiate the input from the different senses, and to experience life and the world as a sort of psychedelic union of everything sensory. Babies may see the number five as red, taste cheddar cheeses in D-flat, and smell roses in triangles.

The process of maturation creates distinctions in the neural pathways as connections are cut or pruned. What may have started out as a neuron cluster that responded equally to sights, sound, taste, touch, and smell becomes a specialized network. So, too, may music and speech have started in us all with the same neurobiological origins, in the same regions, and using the same specific neural networks. With increasing experience and exposure, the developing infant eventually creates dedicated music pathways and dedicated language pathways. The pathways may share some common resources, as has been proposed most prominently by Ani Patel in his SSIRH—shared syntactic integration resource hypothesis.

My collaborator and friend Vinod Menon, a systems neuroscientist at Stanford Medical School, shared with me an interest in being able to pin down the findings from the Koelsch and Friederici labs, and in being able to provide solid evidence for Patel’s SSIRH. For that, we had to use a different method of studying the brain, since the spatial resolution of EEG wasn’t fine enough to really pinpoint the neural locus of musical syntax.

Because the hemoglobin of the blood is slightly magnetic, changes in the flow of blood can be traced with a machine that can track changes in magnetic properties. This is what a magnetic resonance imaging machine (MRI) is, a giant electromagnet that produces a report showing differences in magnetic properties, which in turn can tell us where, at any given point in time, the blood is flowing in the body. (The research on the development of the first MRI scanners was performed by the British company EMI, financed in large part from their profits on Beatles records. “I Want to Hold Your Hand” might well have been titled “I Want to Scan Your Brain.”) Because neurons need oxygen to survive, and the blood carries oxygenated hemoglobin, we can trace the flow of blood in the brain too. We make the assumption that neurons that are actively firing will need more oxygen than neurons that are at rest, and so those regions of the brain that are involved in a particular cognitive task will be just those regions with the most blood flow at a given point in time. When we use the MRI machine to study the function of brain regions in this way, the technology is called functional MRI, or fMRI.

fMRI images let us see a living, functioning human brain while it is thinking. If you mentally practice your tennis serve, we can see the flow of blood move up to your motor cortex, and the spatial resolution of fMRI is good enough that we can see that it is the part of your motor cortex that controls your arm that is active. If you then start to solve a math problem, the blood moves forward, to your frontal lobes, and in particular to regions that have been identified as being associated with arithmetic problem solving, and we see this movement and ultimately the collection of blood in the frontal lobes on the fMRI scan.

Will this Frankenstein science I’ve just described, the science of brain imaging, ever allow us to read people’s minds? I’m happy to report that the answer is probably not, and absolutely not for the foreseeable future. The reason is that thoughts are simply too complicated and involve too many different regions. With fMRI I can tell that you are listening to music as opposed to watching a silent film, but we can’t yet tell if you’re listening to hip-hop versus Gregorian chants, let alone what specific song you’re listening to or thought you’re thinking.

With the high spatial resolution of fMRI, one can tell within just a couple of millimeters where something is occurring in the brain. The problem, however, is that the temporal resolution of fMRI isn’t particularly good because of the amount of time it takes for blood to become redistributed in the brain—known as hemodynamic lag. But others had already studied the when of musical syntax/musical structure processing; we wanted to know the where and in particular if the where involved areas already known to be dedicated to speech. We found exactly what we predicted. Listening to music and attending to its syntactic features—its structure—activated a particular region of the frontal cortex on the left side called pars orbitalis—a subsection of the region known as Brodmann Area 47. The region we found in our study had some overlap with previous studies of structure in language, but it also had some unique activations. In addition to this left hemisphere activation, we also found activation in an analogous area of the right hemisphere. This told us that attending to structure in music requires both halves of the brain, while attending to structure in language only requires the left half.

Most astonishing was that the left-hemisphere regions that we found were active in tracking musical structure were the very same ones that are active when deaf people are communicating by sign language. This suggested that what we had identified in the brain wasn’t a region that simply processed whether a chord sequence was sensible, or whether a spoken sentence was sensible. We were now looking at a region that responded to sight—to the visual organization of words conveyed through American Sign Language. We found evidence for the existence of a brain region that processes structure in general, when that structure is conveyed over time. Although the inputs to this region must have come from different neural populations, and the outputs of it had to go through distinctive networks, there it was—a region that kept popping up in any task that involved organizing information over time.

The picture about neural organization for music was becoming clearer. All sound begins at the eardrum. Right away, sounds get segregated by pitch. Not much later, speech and music probably diverge into separate processing circuits. The speech circuits decompose the signal in order to identify individual phonemes—the consonants and vowels that make up our alphabet and our phonetic system. The music circuits start to decompose the signal and separately analyze pitch, timbre, contour, and rhythm. The output of the neurons performing these tasks connects to regions in the frontal lobe that put all of it together and try to figure out if there is any structure or order to the temporal patterning of it all. The frontal lobes access our hippocampus and regions in the interior of the temporal lobe and ask if there is anything in our memory banks that can help to understand this signal. Have I heard this particular pattern before? If so, when? What does it mean? Is it part of a larger sequence whose meaning is unfolding right now in front of me?

Having nailed down some of the neurobiology of musical structure and expectation, we were now ready to ask about the brain mechanisms underlying emotion and memory.