Chapter 3
Extinction and Intermittent Reinforcement

"So long as life endures, a creature's behavior is a clay to be molded by circumstances, whimsical or planned. Acts added to it, and other acts which fall out, are the means by which it is shaped. Like the two hands of an artisan, busily dabbing and gouging, are the processes, reinforcement and extinction."

F.S.Keller and W.N.Schoenfeld, Principles of Psychology, 1950.

Operant conditioning results in changes in the behavioral repertoire. It provides a method by which the organism adapts to its own circumstances by selectively increasing the frequency of responses that are followed by reinforcing stimuli. It is not surprising to find that if a previously reinforced operant response is no longer followed by its usual reinforcing consequence, the frequency of the operant declines. This process is called extinction. We shall see in a later section of this chapter that operant extinction has the same overall effect as the related process of extinction following classical conditioning, first studied by I.P. Pavlov (Pavlov, 1927). This overall effect common to the two conditioning processes is the reduction in frequency of a specific response. In both cases, the process broadly reverses the effects of conditioning once the circumstances, or contingencies, that resulted in conditioning have been removed or substantially changed. However, there are also a number of associated phenomena specific to operant extinction. Not surprisingly, both operant extinction and classical extinction exert important influences on human behavior.

3.1 Changes in Response Rate During Operant Extinction

Extinction occurs when the reinforcement contingency that produced operant conditioning is removed, but this can be done in at least two ways. Following operant conditioning, the normal operation is to cease presenting the reinforcing stimulus at all, and most of the findings reported here are based on this procedure, but there is an alternative. If it is the particular relationship, or contingency, between operant response and reinforcer that defines operant conditioning and brings about changes in behavior, then any operation that removes the contingency will constitute extinction. Thus, a procedure in which the reinforcing stimulus continues to occur, but is no longer dependent on occurrences of the operant response, could result in extinction.

Note that the word "extinction" is used in two different ways. Extinction refers to the experimental procedure, or operation, of breaking the contingency between response and reinforcer. However, it is also a name for the observed resulting decline in the frequency of the response when that operation is carried out. We call this change in behavior the extinction process. A response is said to have been extinguished if the frequency has fallen close to its operant level as a result of there no longer being a contingency between that behavior and the reinforcer. There are, of course, other methods of reducing the frequency of a response (some of which will be discussed in later chapters), but these are not called extinction.

Figure 3.1 Cumulative record of responding in extinction of lever press response previously reinforced with food (from Skinner, 1938, data of F.S. Keller and A. Kent).

Figure 3.1 Cumulative record of responding in extinction of lever press response previously reinforced with food (from Skinner, 1938, data of F.S. Keller and A. Kent).

The decline in the rate of the once-reinforced response is the best documented effect of extinction. The changes in rate are clearly seen on a cumulative record, where they appear as wavelike fluctuations superimposed on a general negative acceleration of the rate of responding. In Figure 3.1 such an extinction curve appears for the lever-press response of a rat, previously accustomed to receiving a food pellet for each lever press. The response rate is highest at the start (just after reinforcement is withdrawn), and gradually diminishes over the period of an hour and a half. By the end of 90 minutes, the rat is responding at a rate only slightly higher than its operant-level rate. As Figure 3.1 shows, the extinction curve is very irregular and contains many periods of high activity interspersed with periods of low activity (the flat portions of the curve). The latter becomes more prominent towards the end of extinction. Some researchers have found that the extinction process is due principally to a gradual increase in the number of these inactive periods over time, and that when the organism responds it does so at its usual high rate.

Accompanying the overall decline in response rate in extinction, there is often seen a transitory increase in rate at the very beginning of extinction. This can be seen at the beginning of the cumulative record in Figure 3.1. The reason for this transient effect is suggested by the results of another procedure in which the contingency between the response (lever pressing) and the reinforcer (food pellets) is broken. Figure 3.2 shows cumulative records for two rats given food reinforcement for 100 lever presses, and then shifted to a procedure where food pellets were delivered automatically at approximately the same intervals at which they had been obtained during reinforcement.

This "free food" procedure was effective in reducing response rates, which rapidly declined to near zero, but the transient increase in rate was slight for one experimental participant and non-existent for the other. We may conclude that the transient rate increase is related to the shift to a procedure in which the reinforcer is no longer presented. Time allocation again proves a useful concept in describing this finding. During reinforcement of lever pressing, rats typically spend a great deal of time retrieving and consuming the reinforcers. This leaves comparatively little time available for making the operant response. If they are subsequently transferred to a procedure in which no reinforcing stimuli occur, the time spent on retrieval and consumption is "released" for other activities such as the operant response. The transient increase in rate described above may thus reflect the allocation of more time to lever pressing at the beginning of extinction than was available during reinforcement.

3.2 Topographical and Structural Changes of Responding in Extinction

The effects of extinction are by no means confined to frequency changes in the selected response. In particular, marked changes occur in the form of the

Figure 3.2 Cumulative records of lever pressing for two rats reinforced with food 100 times and then transferred to a response independent food presentation schedule. The vertical line indicates the transition from operant conditioning to "free food" at an equivalent rate. Food presentations are indicated by vertical marks on the horizontal line below the cumulative record (data collected by M. Keenan).

Figure 3.2 Cumulative records of lever pressing for two rats reinforced with food 100 times and then transferred to a response independent food presentation schedule. The vertical line indicates the transition from operant conditioning to "free food" at an equivalent rate. Food presentations are indicated by vertical marks on the horizontal line below the cumulative record (data collected by M. Keenan).

behavior during extinction. In a study by Antonitis (1951), in which the operant under study involved a rat poking its nose through a slot in the wall (see Figure 3.3), the effects of several sessions of extinction interspersed with reinforcement were measured. One wall of the chamber used by Antonitis contained a horizontal slot, 50 centimeters long. Whenever the rat poked its nose into the slot, a light beam was broken, causing a photograph of the rat to be made at the exact instant of the response. By reinforcing nose poking with food, the frequency of this behavior was first increased above operant level. Subsequently, nose poking was extinguished, reconditioned, reextinguished, and reconditioned again. Antonitis found that response position and angle tended to become stereotyped during reinforcement: the animal confined its responses to a rather restricted region of the slot. Extinction, however, produced variability in nose poking at least as great as that observed during operant level; the animal varied its responses over the entire length of the slot. Finally, reconditioning resulted in even more stereotyped behavior (more restricted responses) than the original conditioning had produced.

Figure 3.3 Apparatus used by Antonitis (1951) to reinforce nose poking.

Figure 3.3 Apparatus used by Antonitis (1951) to reinforce nose poking.

The loop or chain of behavior established by reinforcement degenerates when reinforcement no longer follows the operant response. Frick and Miller (1951) gave rats 300 reinforcements over a period of five sessions in a modified Skinner box. This box had a lever on one wall and a food tray on the opposite wall. This increased the distance between lever and food tray, and made the two responses topographically and spatially distinct, and thus easy to record separately. These were denoted as RL: lever press and RT: tray visit. During extinction, Frick and Miller observed the degeneration of the previously-established RL-RT-RL-RT... loop. As extinction progressed, lever presses began to follow lever presses (RL-RL-, and so on), and tray visits began to follow tray visits (RT-RT-, and so on). There was very little tendency for the pattern to become random during extinction. Rather, the strengthened pattern of RL-RT-RL-RT-... gradually gave way to the operant-level pattern of repeated occurrences of the same response. Notice that this result was by no means logically inevitable, for the loop of behavior could simply have declined in frequency during extinction, yet remained intact.

To summarize, the extinction procedure instigates a behavioral process whose effects include decline in frequency of the operant response, an increase in its variability, and a breakdown in the sequential structure of the behavior. These important properties of extinction indicate that while operant reinforcement acts to selectively increase certain response sequences and thus restrict the behavioral repertoire, in extinction these effects are reversed, and the variability of behavior is to some extent reinstated.

3.3 Extinction-Induced Aggression

We have described some changes in the formerly reinforced response resulting from extinction. What happens to the other behaviors that do not have a history of reinforcement in the experiment? Not surprisingly, some responses that are reduced in frequency when an operant is reinforced (in the case of a rat reinforced with food these may include grooming and investigatory behavior) increase again during extinction. More surprisingly, some "new" behavior may be seen. That is, responses occur that were not seen during reinforcement and had effectively zero operant level prior to reinforcement. The most remarkable of these is aggression. Azrin, Hutchinson and Hake (1966) trained a hungry bird to peck a disk for food. When the experimental bird had acquired key-pecking behavior, a second "target" bird, immobilized in a specially designed box, was introduced into the experimental compartment (see Figure 3.4). The box holding the target bird was mounted on an assembly that caused a switch underneath to close whenever the box was jiggled vigorously. The assembly was carefully balanced so that normal spontaneous movements of the target bird were insufficient to close the switch, whereas any forceful attacks that the experimental bird might direct against the exposed body of the target bird would be recorded. Attacks occurred predictably: whenever its reinforcement contingencies were abruptly changed from reinforcement of pecking to extinction, the experimental bird invariably attacked the target bird. The attacks were vicious and aggressive, lasting up to 10 minutes.

Figure 3.4 Apparatus used for measuring aggression induced by extinction in a Skinner box (Azrin, Hutchinson, & Hake, 1966).

Figure 3.4 Apparatus used for measuring aggression induced by extinction in a Skinner box (Azrin, Hutchinson, & Hake, 1966).

Other experiments have established a considerable degree of generality for this result by demonstrating that extinction-induced aggression can be obtained with various species and reinforcers. The important point to remember is that these attacks do not occur simply because no reinforcement is available, but because reinforcement was previously available and has now been discontinued.

3.4 Resistance to Extinction

Were the extinction process allowed to go to completion, the operant-level state might eventually be reached; that is, the frequency of the operant response might return to the before-conditioning level. The time taken for this to occur could then be used as an index of the individual's persistence in the face of extinction, and thus of the strength of responding before extinction was begun. In actual experiments, a return to operant level is rarely, if ever, reached. Hence, more convenient and practical measures of persistence are based on how fast the response rate declines during extinction. For instance, the number of responses emitted, or the amount of time, up until the point at which some low rate criterion (such as a period of 5 minutes with no responses) is met, are called resistance-to-extinction measures.

Resistance to extinction provides a quantitative behavioral index which is related in an interesting way to a number of experimental operations. In everyday life, we are often interested in how persistent a person will be in the face of no reward. A person whose resistance to extinction is low is said to "give up too easily" or to lack "perseverance" at a difficult task. On the other hand, too much resistance to extinction is sometimes counterproductive. The man or woman who spends too much time fruitlessly trying to patch up a broken love affair may miss a good chance for a new and better relationship.

One of the variables that has been shown to affect resistance to extinction is the number of previous reinforcements. It seems plausible that if a large number of responses have been reinforced, resistance to extinction will be greater than if only a few have been. This general hypothesis has been confirmed by several experiments (for example, Williams, 1938; Perin, 1942; Hearst, 1961) which indicate that the resistance to extinction of an operant is low when only a few reinforcements have been given in conditioning, and then gradually increases with increasing number of reinforcements until a maximum is reached. Even bigger effects on resistance to extinction result from exposure to intermittent reinforcement, which is discussed later in this chapter.

Another variable that would seem likely to affect the persistence of a response in extinction is the effortfulness of the response. Mowrer and Jones (1943) hypothesised that responses that required great effort to make would extinguish more quickly than would responses requiring less effort. This prediction has been confirmed in a study by Capehart, Viney, and Hulicka (1958), who trained rats to press a lever for food. They varied the force necessary to depress the lever during conditioning, so that on some sessions, a heavy lever was present, and on others, it was light or intermediate. The animals were then divided into three groups, one of which was extinguished using the heavy lever, another using the light lever, and the last on the intermediate lever. Using a criterion of no responses in 5 minutes as the index of resistance to extinction, they obtained the function shown in Figure 3.5.

Figure 3.5 Resistance to extinction of lever pressing as a function of the weight of the lever (or bar) (after Capehart, Viney, & Hulicka, 1958).

Figure 3.5 Resistance to extinction of lever pressing as a function of the weight of the lever (or bar) (after Capehart, Viney, & Hulicka, 1958).

3.5 Spontaneous Recovery

Extinction may be extended until the rate of a previously reinforced operant response has reached a low level. If the experimental participant (for example, a rat in a Skinner box) is then removed from the situation and returned a bit later, another (smaller) extinction curve will be obtained (see Figure 3.6). Even though no reconditioning (a process which we discuss in the next section) has taken place between the two extinction sessions, a certain amount of spontaneous increase in responding has occurred.

Figure 3.6 Spontaneous recovery from extinction of a rat's lever-press response. The portions of the curve to the left and right of the vertical line in the middle of the graph were separated by 47 hours away from the experiment (Skinner, 1938).

Figure 3.6 Spontaneous recovery from extinction of a rat's lever-press response. The portions of the curve to the left and right of the vertical line in the middle of the graph were separated by 47 hours away from the experiment (Skinner, 1938).

The amount of this spontaneous recovery (as measured by the resistance to extinction in the second extinction session) depends on the time lapse between the end of the first extinction and the beginning of the second one. With food reinforced operants, spontaneous recovery has been observed after as little as 15 minutes, and often has been found to reach a maximum after about two hours. The existence of spontaneous recovery supports a conclusion from the schedule-induced aggression findings: once a response has been extinguished, the organism is not returned to the state it was in before conditioning was started. Further support for this hypothesis will be found in the next section.

3.6 Successive Conditioning and Extinction

The first extinction after original reinforcement is a unique phenomenon. Later extinctions (after reconditioning by the reintroduction of reinforcement) differ by being more rapid and containing fewer total responses. This effect was documented by Bullock and Smith (1953). They exposed rats to 10 daily sessions of a procedure that reinforced the first 40 lever responses, followed directly by 1 hour of extinction. When the extinction curves were examined, it was found that they became progressively smaller over Sessions 1 to 10. The effect is shown in Figure 3,7. Whereas in Session 1 the average resistance to extinction in 1 hour was 50 responses, by Session 10 this had dropped to only 10 responses.

Figure 3.7 Averaged cumulative response curves for the first (1), fifth (5), and tenth (10) sessions of extinction (after Bullock & Smith, 1953).

Figure 3.7 Averaged cumulative response curves for the first (1), fifth (5), and tenth (10) sessions of extinction (after Bullock & Smith, 1953).

These results can be extrapolated beyond ten sessions. It would seem that only a few more sessions would be needed before the animals would reach what is called one-trial extinction. In one-trial extinction, only a single response is emitted following the withdrawal of reinforcement. The change in behavior has become abrupt, and it seems reasonable to conclude that the organism has come to discriminate the extinction procedure as such. The concept of discrimination is of great general importance and will be a central topic of Chapter 4. Few responses in extinction are the rule at the human level, as many of our own responses show a rapid decrement when reinforcement ceases: we do not continue to insert coins into a faulty soft drinks or candy dispenser when we fail to receive the payoff. When we open the mailbox and discover it is empty, we do not keep opening it. Like Bullock and Smith's rats, we have learned to not to respond needlessly. However, unlike those rats, we are likely to formulate verbal rules as to what is happening in the situation. As we will see later (Chapter 6), this is likely to produce particularly rapid extinction.

3.7 The Operant Extinction Paradigm

The extinction procedure gives rise to the extinction process. As we have seen, the extinction process consists, in part, of a decline in response rate. However, a number of other behavioral processes (such as fatigue, habituation, satiation, and punishment) entail a similar decline, and we must be careful to distinguish them. If a decline in rate of response is all we observe, we are likely to find it difficult to say which response-reduction process is operating. As we will wish to use the extinction process to explain more complex processes, it is important that we understand both its specific procedure and the various characteristics of its resulting process. We will then be able to distinguish those instances of decline in response rate that are the result of extinction from those that reflect other processes.

Formally, the operant extinction paradigm is defined as follows. The contingency between a previously-reinforced operant response and its reinforcer is removed by either (a) ceasing to present the reinforcing stimulus, or (b) presenting that stimulus independent of the occurrence of the response. This has the following effects:

  1. A gradual, somewhat irregular decline in response rate marked by progressive increases in frequency of relatively long periods of non-responding. This may be preceded by a transient increase in response rate.
  2. An increase in the variability of the form and magnitude of the response.
  3. A disruption of the loop or sequence of behavior that characterized the reinforced operant.

The decline in rate continues until the operant level is approached as a limiting value.

3.8 Extinction Outside the Laboratory

We are all familiar with the power of extinction; many instances can be identified in everyday life in which the frequency or probability of certain behavior declines because it is no longer reinforced. In ordinary language, this decline in response probability may be attributed to other causes, but to the experimental psychologist the role of extinction is clear:

An aspiring writer who has sent manuscript after manuscript to the publishers only to have them all rejected may report that "he can't write another word". He may be partially paralyzed with what is called "writer's cramp". He may still insist that he "wants to write", and we may agree with him in these terms: his extremely low probability of response is mainly due to extinction. Other variables are still operative which, if extinction had not taken place, would yield a high probability of the behavior in question (Skinner, 1953, pp. 71-72).

The task of the psychologist is easy when he or she merely provides post hoc analyses of everyday terms and situations, but we have already seen how, in this case, laboratory studies have already provided us with an account of the extinction process that goes well beyond that which can be extracted from casual observation.

3.9 Extinction of Classically Conditioned Responses

As noted in Chapter 1, not only did Pavlov (1927) carry out the initial studies of classical conditioning, but he sustained a systematic research program over many years. The investigation of extinction following conditioning was a major part of this. In parallel with operant conditioning, classical extinction occurs if the CS-US pairing is broken. This is usually achieved by repeatedly presenting the CS without the US. The result is a fairly steady diminution of the response to the CS. An example for the rabbit's nictitating membrane (blinking) response is shown in Figure 3.8. Although the data in this figure are the average performances of groups of rabbits, very similar results would be obtained from individual rabbits. In the control group, the CS and US were presented but not paired together in the "acquisition phase" and thus conditioning did not occur in that group. The experimental group showed conditioning occurring rapidly over 5 days, then reaching a steady asymptotic level. In extinction, the number of conditioned responses fell fairly steadily from day to day.

Figure 3.8 Average data for classical conditioning followed by extinction of rabbits' nictitating membrane response (Gormezano, Schneiderm-an, Deaux & Fuentes, 1962). Details are given in the text.

Figure 3.8 Average data for classical conditioning followed by extinction of rabbits' nictitating membrane response (Gormezano, Schneiderm-an, Deaux & Fuentes, 1962). Details are given in the text.

Various inhibitory phenomena were discovered in the experiments of Pavlov and his associates. This is interesting because Pavlov, being trained as a physiologist, presumed that inhibitory (response-suppressing) processes as well as excitatory (response-eliciting) processes will occur in conditioning, it is not inevitable that inhibitory concepts will be needed to explain behavior. After all, a response either occurs or fails to occur; we need not necessarily infer from its absence that it is inhibited, it might simply be that the stimulus no longer elicits a response. The need for inhibition as an explanatory concept will be made clearer by considering the related phenomenon of spontaneous recovery within classical conditioning, also demonstrated by Pavlov.

If a period of time elapses after the extinction of a classically conditioned response and then the experimental participant is returned to the experimental situation and the CS is again presented, a certain amount of spontaneous recovery occurs. This means that a higher level of responding is observed than at the end of the previous session. Pavlov argued that this demonstrates that inhibition had developed during the first extinction session and had dissipated, to some extent, before the next test, thus allowing the response to recover. Whatever the details of the theoretical explanation, it is interesting that the two pioneers in their respective fields, Pavlov and Skinner, both demonstrated spontaneous recovery in their conditioning paradigms. We should clearly expect to see evidence of this in real world applications. That is, when human behavior has been conditioned and then extinguished, we can anticipate some subsequent brief recovery of the behavior.

3.10 Intermittent Reinforcement

So far, we have restricted our discussions of operant behavior to examples of simple operant conditioning, where operant responses are continuously reinforced and every occurrence of the response is followed by delivery of the reinforcing stimulus, and examples of extinction, where that contingency is removed. If we change the conditions so that the reinforcing stimulus occurs only after some of the designated responses, we have defined the general procedure of intermittent reinforcement. Intermittent reinforcement procedures can be arranged in a number of ways, with varying rules, or schedules, determining which individual responses are followed by the reinforcing stimulus.

It has been found that intermittent reinforcement procedures have great utility for generating stable, long-term baselines of learned behavior, against which effects of drugs, physiological manipulations, emotional stimuli, and motivational factors can be studied. These applications of the principles of behavioral analysis are very important for the development of psychology and the neurosciences in general. These procedures have as yet been used less often outside the laboratory in applications to significant human problems, where the procedures of simple operant conditioning and extinction have traditionally been preferred, but they are currently gaining greater use there as well. Examples of these applications will be discussed in later chapters.

Early experimental studies of learned behavior were conducted by investigators who were mainly concerned with the acquisition of behavior, and these investigators took little interest in intermittent reinforcement. Although it is true that the acquisition of "new" behavior usually proceeds most smoothly when each and every response is reinforced, it turns out that intermittent reinforcement procedures produce reliable and distinctive patterns of behavior, which are extremely resistant to extinction. In intermittent reinforcement procedures, the response is still a necessary condition, but no longer a sufficient condition, for the delivery of the reinforcer. We refer to those experimental procedures that specify which instances of an operant response shall be reinforced as schedules of reinforcement. Thus, continuous reinforcement is a schedule in which every response is reinforced whenever it occurs (for reasons that will become apparent shortly, this very simple schedule is also called FR1). A host of schedules have been devised and studied in which reinforcement is noncontinuous, or intermittent. Each of these schedules specifies the particular condition or set of conditions that must be met before the next response is reinforced.

If we consider the relatively simple situation where only one response is to be examined and the stimulus conditions are constant, there are at least two conditions for reinforcement of an individual response that we may specify: the number of responses that must occur, and the time that must elapse. Schedules involving a required number of responses are called ratio schedules (referring to the ratio of responses to reinforcers); and schedules specifying a period of time are called interval schedules (referring to the imposed intervals of time between reinforcement). Schedules can also be either fixed (where every reinforcer is delivered after the same ratio or interval requirement has been fulfilled), or variable (where the ratios and intervals can vary within the schedule).

Here are verbal descriptions corresponding to an example of each of these four simple types of reinforcement schedules:

Fixed ratio 10 (FR 10). The tenth operant response that occurs will be reinforced, then the tenth of the subsequent operant responses will be reinforced, and so on. The response requirement is fixed at 10 responses.

Fixed interval 20 seconds (FI 20 seconds). The first operant response that occurs once 20 seconds have elapsed will be reinforced, then the first response that occurs once a further 20 seconds have elapsed will be reinforced, and so on.

Variable ratio 15 (VR 15). The operant response requirement varies with an average value of 15. Thus it might be that twenty-fifth response that occurs is reinforced, and then the tenth response, and then the thirtieth response, and so on. Over a long run, the average of these requirements will be 15.

Variable interval 40 seconds (VI 40 seconds). The inter-reinforcement interval varies, with an average value of 40 seconds. It might be that the first operant response after 20 seconds is reinforced, then the first response after a subsequent interval of 55 seconds, then the first response after a subsequent interval of 35 seconds, and so on. Over a long run, the average of these times will be 40 seconds.

Although the procedures require a complicated verbal description, each of the four schedules we have defined generates a characteristic performance, or behavioral steady state. These states can be easily identified by looking at cumulative records (see Figure 3.9). Recall from Chapter 2 that the cumulative recorder steps vertically, a small and fixed amount, each time a response occurs, while continuously moving horizontally at a fixed speed (this record can either be made during the experiment, or it can be simulated by a computer from recorded details of the session). So the slope of the record at any point reflects the rate of response; cessation of responding produces a flat record, while a very high response rate produces a steep one. Vertical marks on the record indicate the delivery of reinforcers.

Although the records in Figure 3.9 are hypothetical, they are in no sense idealized (Leslie, 1996, for example, presents many actual records that are almost indistinguishable from these). Reinforcement schedules exert such powerful control over behavior that even a previously "untrained" rat, placed in a Skinner box by an "untrained" student, could generate one of these records after a few hours. Significantly, these performance patterns have been produced in many species, and with a variety of operant responses.

Figure 3.9 Typical cumulative records of performances maintained by four schedules of intermittent reinforcement.

Figure 3.9 Typical cumulative records of performances maintained by four schedules of intermittent reinforcement.

Here are some typical characteristics of performance generated by each schedule:

Fixed ratio (FR). A high rate of response is sustained until reinforcement occurs. This is followed by a relatively lengthy post-reinforcement pause before the high rate of responding is resumed. The post-reinforcement pause increases with the ratio of responses required for reinforcements, and can occupy the greater part of the experimental participant's time.

Variable ratio (VR). In common with the fixed ratio schedule, this procedure generates a high rate of response, but regular pausing for any length of time is very uncommon.

Fixed interval (FI). Like the fixed ratio schedule, this procedure also produces a post-reinforcement pause, but responding at other times occurs at a lower rate than on ratio schedules, except towards the end of the interval, where it accelerates to meet the reinforcer. The characteristic positively-accelerated curve seen on the cumulative record is called a "scallop". In general, longer fixed intervals produce lower rates of responding.

Variable interval (VI). As with the variable ratio schedule, consistent pauses of any length are rare. However, response rates are moderate to low, depending on the mean interval, with longer mean intervals producing lower response rates.

Intermittent schedule performances cannot, in general, be explained as the experimental participant adopting the most optimal strategy: experimental participants do not invariably learn to do what would benefit them most. For instance, treating operant responding as analogous to working (perhaps as a laborer), we would expect response rates on fixed interval schedules to fall until only one response per reinforcer occurred, emitted at precisely the end of the interval. Indeed, several early theorists suggested that the number of responses made on what we now call a schedule of reinforcement would be the least required to obtain all the reinforcers: this notion has not stood the test of time as many schedules generate numbers of responses that greatly exceed the minimum required. Furthermore, on fixed ratio schedules we might expect a maximum rate of operant responding to be continuously sustained for as long as possible, because that would maximize the rate of reinforcement, but long pauses occur. Leslie (1996, Chapter 4) provides a review of theoretical analyses of the behavioral processes that come together to generate the characteristic schedule performances. For our present purposes, the most important fact about schedules is that they do generate a great deal of responding, and it occurs in reliable and persistent patterns.

3.11 Differential Reinforcement Schedules

We have described four simple, or basic, schedules, FI, VI, FR, and VR. Another important class of schedules of reinforcement are those which specify a required rate of the operant response or of other behavior. An example of this is the differential reinforcement of low rates (DRL) schedule. In animal experimental studies, this schedule is usually programmed by reinforcement of only those responses which follow a previous response with a delay greater than a specified minimum value. On DRL 10 seconds, for example, a response is reinforced if, and only if, it occurs more than ten seconds after the previous one.

The DRL schedule can be conceptualized as a way of reducing the frequency of an operant response, unlike the other schedules that we have discussed so far. Consequently, it has been employed to deal with human behavioral problems where the behavior of concern occurs excessively frequently. Psychologists working with human behavioral problems are often seeking ways to reduce behavior, and a variety of other reinforcement schedules have been devised that indirectly reinforce reductions in a target behavior. These include differential reinforcement of other behavior (DRO), differential reinforcement of incompatible behavior (DRI), and differential reinforcement of alternative behavior (DRA). As implied by their titles, these schedules reinforce various categories of behavior other than the target behavior, but their effectiveness is measured primarily by the reduction in the target behavior that occurs. The "schedule of choice" will depend on details of the situation, but many effective interventions using these schedules have been reported. Some of these are described in Chapter 9.

Human behavioral interventions that reduce target behaviors are clearly of as much general importance as those that increase target behaviors. Another general way of eliminating unwanted behavior is through the use of aversive contingencies. However, as we shall see in Chapter 5, there are widespread contemporary objections to the use of aversive contingencies in modifying human behavior, and differential reinforcement schedules described here are consequently of great practical importance.

3.12 Extinction Following Intermittent Reinforcement

Earlier we noted that the amount of responding in extinction was affected by the number of prior reinforcers, and the effortfulness of the response during the previous period of reinforcement. An even more powerful influence on resistance to extinction is the schedule on which reinforcers were previously delivered. Indeed, the fact that any type of intermittent reinforcement increases resistance to extinction has generated a large research area of its own. This phenomenon, termed the partial reinforcement extinction effect, has been used as a baseline to study the effects of drugs and physiological manipulations believed to affect emotional processes taking place during extinction (see Gray, 1975, for a review).

From a purely behavioral standpoint, the introduction of extinction, once a schedule-controlled performance has been established, provides further evidence of the powerful control of behavior by schedules, because the pattern of behavior in extinction depends on the nature of the preceding schedule. For example, extinction following training on an FR schedule consists mostly of periods of time responding at the high "running rate" of responding that is characteristic of the schedule performance interspersed with increasingly long pauses, while extinction following training on a VI schedule consists of long periods of responding at a fairly low rate —initially similar to that maintained by the schedule — which gradually declines (Ferster and Skinner, 1957). The latter performance might be described as an "extinction curve" (similar to the one seen in Figure 3.1), but a large number of responses may be emitted. In one study (Skinner, 1950), a pigeon emitted over 3,000 responses in more than 8 hours.

In general, it is the similarity between the extinction situation and the previous conditions under which reinforcement was available that maintains behavior, and the transition from VR to extinction produces little apparent change, because the previous occasions on which responses were reinforced were unpredictable. In experiments, this can lead to high rates of behavior being maintained in extinction, at least initially. Ferster and Skinner (1957) reported that after a number sessions on VR (with the later sessions being on VR173 where an average of 173 responses was required for each reinforcement), a pigeon made around 5000 responses in extinction without a break at the high rate that had been characteristic of performance on the schedule. It is not surprising, in the light of findings such as this, that many games of chance and gaming machines provide payoffs on VR schedules. It is also clear that all these schedules give the experimental participant experience of "intermittent extinction," and thus can lead to remarkable perseveration of behavior.

It was believed for a long time that the effects of intermittent reinforcement constituted a major difference between operant and classical conditioning. This view developed because of the powerful "response-strengthening" effects of intermittent reinforcement within operant conditioning, and the apparent "response-weakening" effect seen in some classical conditioning studies. That is, use of intermittent presentation of the US following the CS can lead to a weaker conditioned response. However, it now clear that a partial reinforcement extinction effect can be obtained with a number of classical conditioning procedures; even though the conditioned response may be weaker in the intermittent reinforcement condition than with continuous, or 100%, reinforcement, when extinction is introduced there is more persistent responding from experimental participants that have previously received intermittent reinforcement (Pearce, Redhead, and Aydin, 1997).

One of the most persistent problems faced by those engaged in modification of human behavioral problems is the observation that treatment gains are not maintained once the behavioral intervention has been withdrawn. The partial reinforcement extinction effect has been recognized by a number of researchers as a potential solution to this problem (Kazdin, 1994; Nation and Woods, 1980; Tierney and Smith, 1988). The way in which this effect may be used is to initially train the desired behavior on a continuous reinforcement schedule until it occurs at a high rate. At this stage, an intermittent reinforcement schedule is introduced. For example, a FR2 schedule might be used initially and the schedule value incremented gradually until the client is responding on a very "thin" reinforcement schedule (that is, one where a large number of responses are required for each reinforcer). If the program is withdrawn at this point the behavior will be highly resistant to extinction and stands a greater chance of being brought under the control of naturally occurring reinforcers in the environment. Kazdin and Polster (1973) demonstrated this effect with a group of adults with learning difficulties who were reinforced for increased levels of social interaction. In the case of highly persistent problem behaviors the opposite strategy may be adopted. Behaviors such as attention-seeking seem to be maintained on "natural" intermittent reinforcement schedules. We try to ignore them but give in occasionally, effectively reinforcing them on an intermittent basis. This makes them extremely resistant to extinction, rendering the use of extinction as a therapeutic strategy difficult. Paradoxically, deliberately reinforcing the behavior on a continuous basis can have the effect of reducing the time taken for extinction of the response to occur once the extinction phase is introduced.

3.13 Human Behavior Under Schedules of Reinforcement

It was mentioned earlier that while many species generate similar and characteristic patterns of behavior on simple reinforcement schedules, adult humans do not generally exhibit the same patterns of behavior. This is obviously an important discrepancy, because our general interest in the behavior of non-human animal species is sustained by the expectation, often borne out by empirical evidence, that their behavior resembles that of humans in similar situations.

Given very simple tasks, such as pressing a button or a key for reinforcement with small amounts of money or tokens, adult humans often produce behavior that is consistent with the "common-sense" view that they are acting in accordance with what they believe to be the rule determining when reinforcers are delivered. On an Fl schedule, for example, the experimental participant may have come to the view that "If I wait 20 seconds, then the next button press will produce a token."

We can conceptualize this as the experimental situation leading the participant to engage in a certain sort of verbal behavior which generates a particular pattern of "button-pressing behavior". This may result in him or her counting to pass the appropriate length of time and then making one reinforced response. Unfortunately, the experimental participant's "common sense" may lead him or her to formulate a variety of different verbal rules describing the reinforcement contingencies that may be operating, and thus it is often impossible to predict how a number of experimental participants will behave under the same set of contingencies. One principle that this illustrates is that we need to know about relevant aspects of the history of the organism to predict how they will behave in a given situation. In everyday parlance, different people will bring different expectations (because of significant differences in their past lives) to the situation. Once again, this underlines the value of doing experiments on non-human animals where there is more likelihood that we can specify relevant previous experiences in such a way that they will not confound the results of an experiment. However, this does not resolve the serious problem we have encountered; a science of behavior requires us to establish how we can obtain the same behavior from each experimental participant under the same circumstances. We also wish to know why there are apparently differences between human behavior and that of many other species.

Fortunately, a number of different approaches have begun to resolve these problems. First of all, young children will produce the patterns of behavior typical of other species, provided they are trained before the age at which language acquisition is becoming rapid (Bentall, Lowe, and Beasty, 1985). Secondly, shaping the verbal behavior by successive approximation (by awarding tokens to "guesses" that approximate increasingly closely to the contingency in operation) changes both the verbal rule being formulated and the operant button-pressing behavior of adults (Catania, Matthews and Shimoff, 1990). Thirdly, reorganizing the situation slightly so that an alternative attractive behavior is available as well as the button-pressing activity can lead to adult humans producing the patterns of behavior characteristic of Fl reinforcement schedules in other species (Barnes and Keenan, 1993).

These findings underline the importance of language, or verbal behavior, in the control of human operant non-verbal behavior, and as we shall see in Chapter 6, our developing understanding of the links between non-verbal and verbal behavior is beginning to give us an account of how humans differ from, and have greater skills than, other species. In the present context, they suggest that while other things being equal the behavior of adult humans is likely to correspond to verbal rules that they formulate, or are instructed to follow, in a situation involving a schedule of reinforcement, this rule-governed behavior is itself influenced by operant reinforcement contingencies.

3.14 Summary

Conditioning changes behavior because certain relationships in the environment have been established. Once these relationships no longer exist, extinction occurs. This term refers both to the procedure of removing the conditioning relationship and to the outcome, which is that the frequency of the previously conditioned response declines.

Following operant conditioning, extinction is generally arranged by ceasing to present the reinforcing stimulus. Often there is a brief increase in response rate, followed by an erratic decline to a very low, or zero, level of responding. During extinction, response variability (which declines in operant conditioning, see Chapter 2) increases. If the operant reinforcement contingency is re-introduced, response frequency increases and response variability again declines.

The transition from operant conditioning to extinction often induces aggression in laboratory animals, if there is another animal available to be attacked. Such aggression does not occur solely because the operant is unreinforced; it only occurs when reinforcement has previously been available.

The amount of operant behavior in extinction (the resistance to extinction) is affected by a number of features of the conditioning situation, For example, both the number of reinforcements that have been received and the effort required for a response affect resistance to extinction. Following extinction, a period of time away from the conditioning situation may result in spontaneous recovery. That is, on return to the conditioning situation, the previously reinforced experimental participant emits a number of responses, even though extinction remains in effect.

Repeated cycles of operant conditioning and extinction have predictable effects. The transitions become very quick, in that as soon as operant conditioning is reinstated responding returns to its characteristic rate, and as soon as extinction is reinstated responding stops. This pattern takes a number of cycles to develop.

Classically conditioned responses are generally extinguished by repeatedly presenting the CS without presenting the US, A steady reduction in conditioned response magnitude is usually seen, with no measurable response occurring if enough CS-only trials are presented. As with operant conditioning, spontaneous recovery will occur (if the extinction session is terminated and then resumed sometime later).

In intermittent operant reinforcement, some but not all responses are followed by the reinforcing stimulus. Schedules of intermittent reinforcement, defined by a required number of responses or by the requirement for a period of time to elapse before a response is reinforced, generate large amounts of operant behavior organized into distinctive patterns. These can be used as a behavioral baseline against which the effect of motivational and other variables can be assessed. Differential operant reinforcement schedules are a further class. These are designed to either sustain a response at a low rate or to eliminate it through the intermittent reinforcement of other behavior. These are very important strategies for modifying problematic human behavior.

Perhaps the most important general consequence of intermittent reinforcement is that behavior becomes highly persistent. Following training on a schedule where many responses are required for reinforcement, for example, the experimental participant may produce hundreds of responses in extinction before response rate falls to a low level. This partial (or Intermittent) reinforcement extinction effect also occurs in classical conditioning, at least in so far as conditioned responses are more persistent following intermittent reinforcement.

Although most intermittent reinforcement schedules have powerful effects on behavior, these are not always observed when adults are trained with these procedures in experiments. This is because of the role of verbal behavior and verbal rule-following in humans, and experiments with modified reinforcement schedules are providing a means of studying this complex area of human behavior.