3.1 Learning

Learning Objectives

After Chapter 3.1, you will be able to:

Apply principles of habituation, dishabituation, and sensitization to real-life scenarios
Identify the conditioned stimulus, unconditioned stimulus, conditioned response, and unconditioned response in a Pavlovian learning paradigm
Distinguish between negative reinforcement, positive reinforcement, negative punishment, and positive punishment
Predict how reinforcement schedule will affect relative frequency of behavioral response in an operant conditioning scenario:

effectiveness of various reinforcement schedules

To a psychologist, learning refers specifically to the way in which we acquire new behaviors. To understand learning, we must start with the concept of a stimulus. A stimulus can be defined as anything to which an organism can respond, including all of the sensory inputs we discussed in Chapter 2 of MCAT Behavioral Sciences Review. The combination of stimuli and responses serves as the basis for all behavioral learning.

Responses to stimuli can change over time depending on the frequency and intensity of the stimulus. For instance, repeated exposure to the same stimulus can cause a decrease in response called habituation. This is seen in many first-year medical students: students often have an intense physical reaction the first time they see a cadaver or treat a severe laceration, but as they get used to these stimuli, the reaction lessens until they are unbothered by these sights.

The opposite process can also occur. Dishabituation is defined as the recovery of a response to a stimulus after habituation has occurred. Dishabituation is often noted when, late in the habituation of a stimulus, a second stimulus is presented. The second stimulus interrupts the habituation process and thereby causes an increase in response to the original stimulus. Imagine, for example, that you’re taking a long car trip and driving for many miles on a highway. After a while, your brain will get used to the sights, sounds, and sensations of highway driving: the dashed lines dividing the lanes, the sound of the engine and the tires on the road, and so on. Habituation has occurred. At some point you use an exit ramp, and these sensations change. As you merge onto the new highway, you pay more attention to the sensory stimuli coming in. Even if the stimuli are more or less the same as on the previous highway, the presentation of a different stimulus (using the exit ramp) causes dishabituation and a new awareness of—and response to—these stimuli. Dishabituation is temporary and always refers to changes in response to the original stimulus, not the new one.

Key Concept

Dishabituation is the recovery of a response to a stimulus, usually after a different stimulus has been presented. Note that the term refers to changes in response to the original stimulus, not the new one.

Learning, then, is a change in behavior that occurs in response to a stimulus. While there are many types of learning, the MCAT focuses on two types: associative learning and observational learning.

Associative Learning

Associative learning is the creation of a pairing, or association, either between two stimuli or between a behavior and a response. On the MCAT, you’ll be tested on two kinds of associative learning: classical and operant conditioning.

Classical Conditioning

Classical conditioning is a type of associative learning that takes advantage of biological, instinctual responses to create associations between two unrelated stimuli. For many people, the first name that comes to mind for research in classical conditioning is Ivan Pavlov. His experiments on dogs were not only revolutionary, but also provide a template for the way the MCAT will test classical conditioning.

Classical conditioning works, first and foremost, because some stimuli cause an innate or reflexive physiological response. For example, we reflexively salivate when we smell bread baking in an oven, or we may jump or recoil when we hear a loud noise. Any stimulus that brings about such a reflexive response is called an unconditioned stimulus, and the innate or reflexive response is called an unconditioned response. Many stimuli do not produce a reflexive response and are known as neutral stimuli. Neutral stimuli can be referred to as signaling stimuli if they have the potential to be used as a conditioning stimulus.

In Pavlov’s experiment, the unconditioned stimulus was meat, which would cause the dogs to salivate reflexively, and the neutral stimulus was a ringing bell. Through the course of the experiment, Pavlov repeatedly rang the bell before placing meat in the dogs’ mouths. Initially, the dogs did not react much when they only heard the bell ring without receiving meat. However, after this procedure was repeated several times, the dogs began to salivate when they heard the bell ring. In fact, the dogs would salivate even if Pavlov only rang the bell and did not deliver any meat. Pavlov thereby turned a neutral stimulus into a conditioned stimulus: a normally neutral stimulus that, through association, now causes a reflexive response called a conditioned response. Classical conditioning, then, is the process of taking advantage of a reflexive, unconditioned stimulus to turn a neutral stimulus into a conditioned stimulus, as shown in Figure 3.1. This process is also referred to as acquisition.

food (UCS) leads to salivation (UCR); bell (neutral stimulus) leads to no response; after learning, bell (CS) leads to salivation (CR) — Figure 3.1. Classical Conditioning UCS = unconditioned stimulus, UCR = unconditioned response, CS = conditioned stimulus, CR = conditioned response

Notice that the stimuli change in this experiment, but the response is the same throughout. Because salivation in response to food is natural and requires no conditioning, it is an unconditioned response in this context. On the other hand, when paired with a conditioned stimulus, salivation is considered a conditioned response.

MCAT Expertise

On the MCAT, the key to telling conditioned and unconditioned responses apart will be to look at which stimulus is causing them: unconditioned stimuli cause an unconditioned response, while conditioned stimuli cause a conditioned response.

Just because a conditioned response has been acquired does not mean that it is permanent. If the conditioned stimulus is presented without the unconditioned stimulus enough times, the organism can become habituated to the conditioned stimulus and extinction occurs. If the bell rings often enough without the dog getting meat, the dog may stop salivating when the bell sounds. Interestingly, even this extinction of a response is not always permanent; after some time, if an extinct conditioned stimulus is presented again, a weak conditioned response can sometimes be exhibited, a phenomenon called spontaneous recovery.

There are a few processes that can modify the response to a conditioned stimulus after acquisition has occurred. Generalization is a broadening effect by which a stimulus similar enough to the conditioned stimulus can also produce the conditioned response. In one famous experiment, researchers conditioned a child called Little Albert to be afraid of a white rat by pairing the presentation of the rat with a loud noise. Subsequent tests showed that Little Albert’s conditioning had generalized such that he also exhibited a fear response to a white stuffed rabbit, a white sealskin coat, and even a man with a white beard.

Finally, in discrimination, an organism learns to distinguish between two similar stimuli. This is the opposite of generalization. Pavlov’s dogs could have been conditioned to discriminate between bells of different tones by having one tone paired with meat, and another presented without meat. In this case, association could have occurred with one tone but not the other.

MCAT Expertise

Classical conditioning is a favorite topic on the MCAT. Expect at least one question to describe a Pavlovian experiment and ask you to identify the role of one of the stimuli or responses described.

Operant Conditioning

Whereas classical conditioning is concerned with instincts and biological responses, operant conditioning links voluntary behaviors with consequences in an effort to alter the frequency of those behaviors. Just as the MCAT will test you on the difference between conditioned and unconditioned responses and stimuli, it will ask you to distinguish between reinforcement and punishment too. Operant conditioning is associated with B. F. Skinner, who is considered the father of behaviorism, the theory that all behaviors are conditioned. The four possible relationships between stimulus and behavior are summarized in Figure 3.2.

Reinforcement

Reinforcement is the process of increasing the likelihood that an individual will perform a behavior. Reinforcers are divided into two categories. Positive reinforcers increase a behavior by adding a positive consequence or incentive following the desired behavior. Money is an example of a common and strong positive reinforcer: employees will continue to work if they are paid. Negative reinforcers act similarly in that they increase the frequency of a behavior, but they do so by removing something unpleasant. For example, taking an aspirin reduces a headache, so the next time you have a headache, you are more likely to take one. Negative reinforcement is often confused with punishment, which will be discussed in the next section, but remember that the frequency of the behavior is the distinguishing factor: any reinforcement—positive or negative—increases the likelihood that a behavior will be performed.

Real World

This concept of learning by consequence forms the foundation for behavioral therapies for many disorders including phobias, anxiety disorders, and obsessive–compulsive disorder.

Negative reinforcers can be subdivided into escape learning and avoidance learning, which differ in the timing of the unpleasant stimulus. Taking aspirin is an example of escape learning: the role of the behavior is to reduce the unpleasantness of something that already exists, like a headache. Avoidance learning, on the other hand, is meant to prevent the unpleasantness of something that has yet to happen. In fact, you are practicing avoidance right now: you are studying to avoid the unpleasant consequence of a poor score on the MCAT. When you do well on Test Day, that success will positively reinforce the behavior of studying for the next major exam of your medical career: the United States Medical Licensing Examination® (USMLE®)!

Classical and operant conditioning can be used hand-in-hand. For example, dolphin trainers take advantage of reinforcers when training dolphins to perform tricks. Sometimes, the trainers will feed the dolphin a fish after it performs a trick. The fish can be said to be a primary reinforcer because the fish is a treat that the dolphin responds to naturally. Dolphin trainers also use tiny handheld devices that emit a clicking sound. This clicker would not normally be a reinforcer on its own, but the trainers use classical conditioning to pair the clicker with fish to elicit the same response. The clicker is thus a conditioned reinforcer, which is sometimes called a secondary reinforcer. Eventually, the dolphin may even associate the presence of the trainer with the possibility of reward, making the presence of the trainer a discriminative stimulus. A discriminative stimulus indicates that reward is potentially available in an operant conditioning paradigm.

Punishment

In contrast to reinforcement, punishment uses conditioning to reduce the occurrence of a behavior. Positive punishment adds an unpleasant consequence in response to a behavior to reduce that behavior; for example, in some countries a thief may be flogged for stealing, which is intended to stop him from stealing again. Negative punishment is the reduction of a behavior when a stimulus is removed. For example, a parent may forbid her child from watching television as a consequence for bad behavior, with the goal of preventing the behavior from happening again.

Key Concept

Negative reinforcement is often confused with positive punishment. Negative reinforcement is the removal of a bothersome stimulus to encourage a behavior; positive punishment is the addition of a bothersome stimulus to reduce a behavior.

Bridge

Sociological institutions often rely on punishments and rewards to adjust behavior. Within a society, formal sanctions, or rules and laws, can be used to reinforce or punish behavior. Likewise, informal sanctions, such as ostracization, praise, and shunning, can be used to reinforce or punish social behavior without depending on rules established by social institutions. Socialization and social institutions are discussed in Chapters 8 and 11 of MCAT Behavioral Sciences Review, respectively.

Reinforcement Schedules

The presence or absence of reinforcing or punishing stimuli is just a part of the story. The rate at which desired behaviors are acquired is also affected by the schedule being used to affect those behaviors. Reinforcement schedules have two different factors: whether the schedule is fixed or variable, and whether the schedule is based on a ratio or an interval.

Fixed-ratio (FR) schedules reinforce a behavior after a specific number of performances of that behavior. For example, in a typical operant conditioning experiment, researchers might reward a rat with a food pellet every third time it presses a bar in its cage. Continuous reinforcement is a fixed-ratio schedule in which the behavior is rewarded every time it is performed.

Variable-ratio (VR) schedules reinforce a behavior after a varying number of performances of the behavior, but such that the average number of performances to receive a reward is relatively constant. With this type of reinforcement schedule, researchers might reward a rat first after two button presses, then eight, then four, then finally six.

Fixed-interval (FI) schedules reinforce the first instance of a behavior after a specified time period has elapsed. For example, once our rat gets a pellet, it has to wait 60 seconds before it can get another pellet. The first lever press after 60 seconds gets a pellet, but subsequent presses during those 60 seconds accomplish nothing.

Variable-interval (VI) schedules reinforce a behavior the first time that behavior is performed after a varying interval of time. Instead of waiting exactly 60 seconds, for example, our rat might have to wait 90 seconds, then 30 seconds, then three minutes. In each case, once the interval elapses, the next press gets the rat a pellet.

Of these schedules, variable-ratio works the fastest for learning a new behavior, and is also the most resistant to extinction. The effectiveness of the various reinforcement schedules is demonstrated in Figure 3.3.

Figure 3.3. Reinforcement Schedules Hatches correspond to instances of reinforcement. The start of each line corresponds to time zero for that schedule.

There are a few things to note in this graph. First, variable-ratio schedules have the fastest response rate: the rat will continue pressing the bar quickly with the hope that the next press will be the “right one.” Also note that fixed schedules (fixed-ratio and fixed-interval) often have a brief moment of no responses after the behavior is reinforced: the rat will stop hitting the lever until it wants another pellet, once it has figured out what behavior is necessary to receive the pellet.

Mnemonic

VR stands for Variable-Ratio, but it can also stand for Very Rapid and Very Resistant to extinction.

Real World

Gambling (and gambling addiction) is so difficult to extinguish because most gambling games are based on variable-ratio schedules. While the probability of winning the jackpot on any individual pull of a slot machine is the same, we get caught in the idea that the next pull will be the “right one.”

One final idea associated with operant conditioning is the concept of shaping. Shaping is the process of rewarding increasingly specific behaviors. For example, if you wanted to train a bird to spin around in place and then peck a key on a keyboard, you might first give the bird a treat for turning slightly to the left, then only for turning a full 90 degrees, then 180, and so on. Then you might only reward this behavior if done near the keyboard until eventually the bird is only rewarded once the full set of behaviors is performed. While it may take some time, the use of shaping in operant conditioning can allow for the training of extremely complicated behaviors.

Cognitive and Biological Factors in Associative Learning

It would be incorrect to say that classical and operant conditioning are the only factors that affect behavior, nor would it be correct to say that we are all mindless and robotic, unable to resist the rewards and punishments that occur in our lives. Since Skinner’s initial perspectives, it has been found that many cognitive and biological factors are at work that can change the effects of associative learning or allow us to resist them altogether.

Many organisms undergo latent learning, which is learning that occurs without a reward but that is spontaneously demonstrated once a reward is introduced. The classic experiment associated with latent learning involves rats running a maze. Rats that were simply carried through the maze and then incentivized with a food reward for completing the maze on their own performed just as well—and in some cases better—than those rats that had been trained to run the maze using more standard operant conditioning techniques by which they were rewarded along the way.

Problem solving is another method of learning that steps outside the standard behaviorist approach. Think of the way young children put together a jigsaw puzzle: often, they will take pieces one-by-one and try to make them fit together until they find the correct match. Many animals will also use this kind of trial-and-error approach, testing behaviors until they yield a reward. As we get older, we gain the ability to analyze the situation and respond correctly the first time, as when we seek out the correct puzzle piece and orientation based on the picture we are forming. Humans and chimpanzees alike will often avoid trial-and-error learning and instead take a step back, observe the situation, and take decisive action to solve the challenges they face.

Not all behaviors can be taught using operant conditioning techniques. Many animals are predisposed to learn (or not learn) behaviors based on their own natural abilities and instincts. Animals are most able to learn behaviors that coincide with their natural behaviors: birds naturally peck when searching for food, so rewarding them with food in response to a pecking-based behavior works well. This predisposition is known as preparedness. Similarly, it can be very difficult to teach animals behaviors that work against their natural instincts. For example, researchers used behavioral techniques to try to train raccoons to place coins in a piggy bank. Their efforts were unsuccessful, as the raccoons would pick up the coins, rub them together, and dip them into the bank before pulling them back out. The researchers concluded that the task they were trying to train the raccoons to perform was conflicting with their natural food-gathering instinct, which was to rub seeds together and wash them in a stream to clean them before eating. This difficulty in overcoming instinctual behaviors is called instinctive drift. The researchers had far better luck training the raccoons to place a ball in a basketball net, as the ball was too large to trigger the food-washing instinct.

Observational Learning

Observational learning is the process of learning a new behavior or gaining information by watching others. The most famous and perhaps most controversial study into observational learning is Albert Bandura’s Bobo doll experiment, in which children watched an adult in a room full of toys punching and kicking an inflatable clown toy. When the children were later allowed to play in the room, many of them ignored the other toys in the room and inflicted similar violence on the Bobo doll just as they had seen the adult do. It’s important to note that observational learning is not simply imitation because observational learning can be used to teach individuals to avoid behavior as well. In later iterations of the Bobo doll experiment, children who watched the adult get scolded after attacking the Bobo doll were less likely to be aggressive toward the Bobo doll themselves.

Real World

The connection between violent video games and aggressive behavior is still under active debate. While there are many interest groups on both sides of the controversy, the American Academy of Pediatrics (a major medical society) published one report in which they attributed a 13 to 22% increase in aggressive behavior to observational learning from video games.

Like associative learning, there are a few neurological factors that affect observational learning. The most important of these are mirror neurons. These neurons are located in the frontal and parietal lobes of the cerebral cortex and fire both when an individual performs an action and when that individual observes someone else performing that action. Mirror neurons are largely involved in motor processes, but additionally are thought to be related to empathy and vicarious emotions; some mirror neurons fire both when we experience an emotion and also when we observe another experiencing the same emotion. Mirror neurons also play a role in imitative learning by a number of primates, as shown in Figure 3.4.

Figure 3.4. Use of Mirror Neurons in a Macaque Many neonatal primates imitate facial expressions using mirror neurons.

Research suggests that observational learning through modeling is an important factor in determining an individual’s behavior throughout his or her lifetime. People learn what behaviors are acceptable by watching others perform them. Much attention is focused on violent media or domestic abuse as models for antisocial behavior, but prosocial modeling can be just as powerful. Of course, observational learning is strongest when a model’s words are consistent with his or her actions. Many parents adopt a Do as I say, not as I do approach when teaching their children, but research suggests that children will disproportionately imitate what the model did, rather than what the model said.

MCAT Concept Check 3.1:

Before you move on, assess your understanding of the material with these questions.

Which of the following might cause a person to eat more food during a meal: eating each course separately and moving to the next only when finished with the current course, or interrupting the main course several times by eating side dishes?

A college student plays a prank on his roommate by popping a balloon behind the roommate’s head after every time he makes popcorn. Before long, the smell of popcorn makes the roommate nervous. Which part of the story corresponds to each of the classical conditioning concepts below?

Conditioned stimulus:

Unconditioned stimulus:

Conditioned response:

Unconditioned response:

What is the difference between negative reinforcement and positive punishment? Provide an example of each.

Negative reinforcement:

Positive punishment: