CHAPTER V


OPERANT BEHAVIOR

THE CONSEQUENCES OF BEHAVIOR

Reflexes, conditioned or otherwise, are mainly concerned with the internal physiology of the organism. We are most often interested, however, in behavior which has some effect upon the surrounding world. Such behavior raises most of the practical problems in human affairs and is also of particular theoretical interest because of its special characteristics. The consequences of behavior may “feed back” into the organism. When they do so, they may change the probability that the behavior which produced them will occur again. The English language contains many words, such as “reward” and “punishment,” which refer to this effect, but we can get a clear picture of it only through experimental analysis.

LEARNING CURVES

One of the first serious attempts to study the changes brought about by the consequences of behavior was made by E. L. Thorndike in 1898. His experiments arose from a controversy which was then of considerable interest. Darwin, in insisting upon the continuity of species, had questioned the belief that man was unique among the animals in his ability to think. Anecdotes in which lower animals seemed to show the “power of reasoning” were published in great numbers. But when terms which had formerly been applied only to human behavior were thus extended, certain questions arose concerning their meaning. Did the observed facts point to mental processes, or could these apparent evidences of thinking be explained in other ways? Eventually it became clear that the assumption of inner thought-processes was not required. Many years were to pass before the same question was seriously raised concerning human behavior, but Thorndike’s experiments and his alternative explanation of reasoning in animals were important steps in that direction.

If a cat is placed in a box from which it can escape only by unlatching a door, it will exhibit many different kinds of behavior, some of which may be effective in opening the door. Thorndike found that when a cat was put into such a box again and again, the behavior which led to escape tended to occur sooner and sooner until eventually escape was as simple and quick as possible. The cat had solved its problem as well as if it were a “reasoning” human being, though perhaps not so speedily. Yet Thorndike observed no “thought-process” and argued that none was needed by way of explanation. He could describe his results simply by saying that a part of the cat’s behavior was “stamped in” because it was followed by the opening of the door.

The fact that behavior is stamped in when followed by certain consequences, Thorndike called “The Law of Effect.” What he had observed was that certain behavior occurred more and more readily in comparison with other behavior characteristic of the same situation. By noting the successive delays in getting out of the box and plotting them on a graph, he constructed a “learning curve.” This early attempt to show a quantitative process in behavior, similar to the processes of physics and biology, was heralded as an important advance. It revealed a process which took place over a considerable period of time and which was not obvious to casual inspection. Thorndike, in short, had made a discovery. Many similar curves have since been recorded and have become the substance of chapters on learning in psychology texts.

Learning curves do not, however, describe the basic process of stamping in. Thorndike’s measure—the time taken to escape—involved the elimination of other behavior, and his curve depended upon the number of different things a cat might do in a particular box. It also depended upon the behavior which the experimenter or the apparatus happened to select as “successful” and upon whether this was common or rare in comparison with other behavior evoked in the box. A learning curve obtained in this way might be said to reflect the properties of the latch box rather than of the behavior of the cat. The same is true of many other devices developed for the study of learning. The various mazes through which white rats and other animals learn to run, the “choice boxes” in which animals learn to discriminate between properties or patterns of stimuli, the apparatuses which present sequences of material to be learned in the study of human memory—each of these yields its own type of learning curve.

By averaging many individual cases, we may make these curves as smooth as we like. Moreover, curves obtained under many different circumstances may agree in showing certain general properties. For example, when measured in this way, learning is generally “negatively accelerated”—improvement in performance occurs more and more slowly as the condition is approached in which further improvement is impossible. But it does not follow that negative acceleration is characteristic of the basic process. Suppose, by analogy, we fill a glass jar with gravel which has been so well mixed that pieces of any given size are evenly distributed. We then agitate the jar gently and watch the pieces rearrange themselves. The larger move toward the top, the smaller toward the bottom. This process, too, is negatively accelerated. At first the mixture separates rapidly, but as separation proceeds, the condition in which there will be no further change is approached more and more slowly. Such a curve may be quite smooth and reproducible, but this fact alone is not of any great significance. The curve is the result of certain fundamental processes involving the contact of spheres of different sizes, the resolution of the forces resulting from agitation, and so on, but it is by no means the most direct record of these processes.

Learning curves show how the various kinds of behavior evoked in complex situations are sorted out, emphasized, and reordered. The basic process of the stamping in of a single act brings this change about, but it is not reported directly by the change itself.

OPERANT CONDITIONING

To get at the core of Thorndike’s Law of Effect, we need to clarify the notion of “probability of response.” This is an extremely important concept; unfortunately, it is also a difficult one. In discussing human behavior, we often refer to “tendencies” or “predispositions” to behave in particular ways. Almost every theory of behavior uses some such term as “excitatory potential,” “habit strength,” or “determining tendency.” But how do we observe a tendency? And how can we measure one?

If a given sample of behavior existed in only two states, in one of which it always occurred and in the other never, we should be almost helpless in following a program of functional analysis. An all-or-none subject matter lends itself only to primitive forms of description. It is a great advantage to suppose instead that the probability that a response will occur ranges continuously between these all-or-none extremes. We can then deal with variables which, unlike the eliciting stimulus, do not “cause a given bit of behavior to occur” but simply make the occurrence more probable. We may then proceed to deal, for example, with the combined effect of more than one such variable.

The everyday expressions which carry the notion of probability, tendency, or predisposition describe the frequencies with which bits of behavior occur. We never observe a probability as such. We say that someone is “enthusiastic” about bridge when we observe that he plays bridge often and talks about it often. To be “greatly interested” in music is to play, listen to, and talk about music a good deal. The “inveterate” gambler is one who gambles frequently. The camera “fan” is to be found taking pictures, developing them, and looking at pictures made by himself and others. The “highly sexed” person frequently engages in sexual behavior. The “dipsomaniac” drinks frequently.

In characterizing a man’s behavior in terms of frequency, we assume certain standard conditions: he must be able to execute and repeat a given act, and other behavior must not interfere appreciably. We cannot be sure of the extent of a man’s interest in music, for example, if he is necessarily busy with other things. When we come to refine the notion of probability of response for scientific use, we find that here, too, our data are frequencies and that the conditions under which they are observed must be specified. The main technical problem in designing a controlled experiment is to provide for the observation and interpretation of frequencies. We eliminate, or at least hold constant, any condition which encourages behavior which competes with the behavior we are to study. An organism is placed in a quiet box where its behavior may be observed through a one-way screen or recorded mechanically. This is by no means an environmental vacuum, for the organism will react to the features of the box in many ways; but its behavior will eventually reach a fairly stable level, against which the frequency of a selected response may be investigated.

To study the process which Thorndike called stamping in, we must have a “consequence.” Giving food to a hungry organism will do. We can feed our subject conveniently with a small food tray which is operated electrically. When the tray is first opened, the organism will probably react to it in ways which interfere with the process we plan to observe. Eventually, after being fed from the tray repeatedly, it eats readily, and we are then ready to make this consequence contingent upon behavior and to observe the result.

We select a relatively simple bit of behavior which may be freely and rapidly repeated, and which is easily observed and recorded. If our experimental subject is a pigeon, for example, the behavior of raising the head above a given height is convenient. This may be observed by sighting across the pigeon’s head at a scale pinned on the far wall of the box. We first study the height at which the head is normally held and select some line on the scale which is reached only infrequently. Keeping our eye on the scale we then begin to open the food tray very quickly whenever the head rises above the line. If the experiment is conducted according to specifications, the result is invariable: we observe an immediate change in the frequency with which the head crosses the line. We also observe, and this is of some importance theoretically, that higher lines are now being crossed. We may advance almost immediately to a higher line in determining when food is to be presented. In a minute or two, the bird’s posture has changed so that the top of the head seldom falls below the line which we first chose.

When we demonstrate the process of stamping in in this relatively simple way, we see that certain common interpretations of Thorndike’s experiment are superfluous. The expression “trial-and-error learning,” which is frequently associated with the Law of Effect, is clearly out of place here. We are reading something into our observations when we call any upward movement of the head a “trial,” and there is no reason to call any movement which does not achieve a specified consequence an “error.” Even the term “learning” is misleading. The statement that the bird “learns that it will get food by stretching its neck” is an inaccurate report of what has happened. To say that it has acquired the “habit” of stretching its neck is merely to resort to an explanatory fiction, since our only evidence of the habit is the acquired tendency to perform the act. The barest possible statement of the process is this: we make a given consequence contingent upon certain physical properties of behavior (the upward movement of the head), and the behavior is then observed to increase in frequency.

It is customary to refer to any movement of the organism as a “response.” The word is borrowed from the field of reflex action and implies an act which, so to speak, answers a prior event—the stimulus. But we may make an event contingent upon behavior without identifying, or being able to identify, a prior stimulus. We did not alter the environment of the pigeon to elicit the upward movement of the head. It is probably impossible to show that any single stimulus invariably precedes this movement. Behavior of this sort may come under the control of stimuli, but the relation is not that of elicitation. The term “response” is therefore not wholly appropriate but is so well established that we shall use it in the following discussion.

A response which has already occurred cannot, of course, be predicted or controlled. We can only predict that similar responses will occur in the future. The unit of a predictive science is, therefore, not a response but a class of responses. The word “operant” will be used to describe this class. The term emphasizes the fact that the behavior operates upon the environment to generate consequences. The consequences define the properties with respect to which responses are called similar. The term will be used both as an adjective (operant behavior) and as a noun to designate the behavior defined by a given consequence.

A single instance in which a pigeon raises its head is a response. It is a bit of history which may be reported in any frame of reference we wish to use. The behavior called “raising the head,” regardless of when specific instances occur, is an operant. It can be described, not as an accomplished act, but rather as a set of acts defined by the property of the height to which the head is raised. In this sense an operant is defined by an effect which may be specified in physical terms; the “cutoff” at a certain height is a property of behavior.

The term “learning” may profitably be saved in its traditional sense to describe the reassortment of responses in a complex situation. Terms for the process of stamping in may be borrowed from Pavlov’s analysis of the conditioned reflex. Pavlov himself called all events which strengthened behavior “reinforcement” and all the resulting changes “conditioning.” In the Pavlovian experiment, however, a reinforcer is paired with a stimulus; whereas in operant behavior it is contingent upon a response. Operant reinforcement is therefore a separate process and requires a separate analysis. In both cases, the strengthening of behavior which results from reinforcement is appropriately called “conditioning.” In operant conditioning we “strengthen” an operant in the sense of making a response more probable or, in actual fact, more frequent. In Pavlovian or “respondent” conditioning we simply increase the magnitude of the response elicited by the conditioned stimulus and shorten the time which elapses between stimulus and response. (We note, incidentally, that these two cases exhaust the possibilities: an organism is conditioned when a reinforcer [1] accompanies another stimulus or [2] follows upon the organism’s own behavior. Any event which does neither has no effect in changing a probability of response.) In the pigeon experiment, then, food is the reinforcer and presenting food when a response is emitted is the reinforcement. The operant is defined by the property upon which reinforcement is contingent—the height to which the head must be raised. The change in frequency with which the head is lifted to this height is the process of operant conditioning.

While we are awake, we act upon the environment constantly, and many of the consequences of our actions are reinforcing. Through operant conditioning the environment builds the basic repertoire with which we keep our balance, walk, play games, handle instruments and tools, talk, write, sail a boat, drive a car, or fly a plane. A change in the environment—a new car, a new friend, a new field of interest, a new job, a new location—may find us unprepared, but our behavior usually adjusts quickly as we acquire new responses and discard old. We shall see in the following chapter that operant reinforcement does more than build a behavioral repertoire. It improves the efficiency of behavior and maintains behavior in strength long after acquisition or efficiency has ceased to be of interest.

QUANTITATIVE PROPERTIES

It is not easy to obtain a curve for operant conditioning. We cannot isolate an operant completely, nor can we eliminate all arbitrary details. In our example we might plot a curve showing how the frequency with which the pigeon’s head is lifted to a given height changes with time or the number of reinforcements, but the total effect is clearly broader than this. There is a shift in a larger pattern of behavior, and to describe it fully we should have to follow all movements of the head. Even so, our account would not be complete. The height to which the head was to be lifted was chosen arbitrarily, and the effect of reinforcement depends upon this selection. If we reinforce a height which is seldom reached, the change in pattern will be far greater than if we had chosen a commoner height. For an adequate account we need a set of curves covering all the possibilities. Still another arbitrary element appears if we force the head to a higher and higher position, since we may follow different schedules in advancing the line selected for reinforcement. Each schedule will yield its own curve, and the picture would be complete only if it covered all possible schedules.

We cannot avoid these problems by selecting a response which is more sharply defined by features of the environment—for example, the behavior of operating a door latch. Some mechanical indicator of behavior is, of course, an advantage—for example, in helping us to reinforce consistently. We could record the height of a pigeon’s head with a photocell arrangement, but it is simpler to select a response which makes a more easily recorded change in the environment. If the bird is conditioned to peck a small disk on the wall of the experimental box, we may use the movement of the disk to close an electric circuit—both to operate the food tray and to count or record responses. Such a response seems to be different from stretching the neck in that it has an all-or-none character. But we shall see in a moment that the mechanical features of striking a key do not define a “response” which is any less arbitrary than neck-stretching.

An experimental arrangement need not be perfect in order to provide important quantitative data in operant conditioning. We are already in a position to evaluate many factors. The importance of feed-back is clear. The organism must be stimulated by the consequences of its behavior if conditioning is to take place. In learning to wiggle one’s ears, for example, it is necessary to know when the ears move if responses which produce movement are to be strengthened in comparison with responses which do not. In re-educating the patient in the use of a partially paralyzed limb, it may be of help to amplify the feed-back from slight movements, either with instruments or through the report of an instructor. The deaf-mute learns to talk only when he receives a feed-back from his own behavior which can be compared with the stimulation he receives from other speakers. One function of the educator is to supply arbitrary (sometimes spurious) consequences for the sake of feed-back. Conditioning depends also upon the kind, amount, and immediacy of reinforcement, as well as many other factors.

A single reinforcement may have a considerable effect. Under good conditions the frequency of a response shifts from a prevailing low value to a stable high value in a single abrupt step. More commonly we observe a substantial increase as the result of a single reinforcement, and additional increases from later reinforcements. The observation is not incompatible with the assumption of an instantaneous change to a maximal probability, since we have by no means isolated a single operant. The increased frequency must be interpreted with respect to other behavior characteristic of the situation. The fact that conditioning can be so rapid in an organism as “low” as the rat or pigeon has interesting implications. Differences in what is commonly called intelligence are attributed in part to differences in speed of learning. But there can be no faster learning than an instantaneous increase in probability of response. The superiority of human behavior is, therefore, of some other sort.

THE CONTROL OF OPERANT BEHAVIOR

The experimental procedure in operant conditioning is straightforward. We arrange a contingency of reinforcement and expose an organism to it for a given period. We then explain the frequent emission of the response by pointing to this history. But what improvement has been made in the prediction and control of the behavior in the future? What variables enable us to predict whether or not the organism will respond? What variables must we now control in order to induce it to respond?

We have been experimenting with a hungry pigeon. As we shall see in Chapter IX, this means a pigeon which has been deprived of food for a certain length of time or until its usual body-weight has been slightly reduced. Contrary to what one might expect, experimental studies have shown that the magnitude of the reinforcing effect of food may not depend upon the degree of such deprivation. But the frequency of response which results from reinforcement depends upon the degree of deprivation at the time the response is observed. Even though we have conditioned a pigeon to stretch its neck, it does not do this if it is not hungry. We have, therefore, a new sort of control over its behavior: in order to get the pigeon to stretch its neck, we simply make it hungry. A selected operant has been added to all those things which a hungry pigeon will do. Our control over the response has been pooled with our control over food deprivation. We shall see in Chapter VII that an operant may also come under the control of an external stimulus, which is another variable to be used in predicting and controlling the behavior. We should note, however, that both these variables are to be distinguished from operant reinforcement itself.

OPERANT EXTINCTION

When reinforcement is no longer forthcoming, a response becomes less and less frequent in what is called “operant extinction.” If food is withheld, the pigeon will eventually stop lifting its head. In general when we engage in behavior which no longer “pays off,” we find ourselves less inclined to behave in that way again. If we lose a fountain pen, we reach less and less often into the pocket which formerly held it. If we get no answer to telephone calls, we eventually stop telephoning. If our piano goes out of tune, we gradually play it less and less. If our radio becomes noisy or if programs become worse, we stop listening.

Since operant extinction takes place much more slowly than operant conditioning, the process may be followed more easily. Under suitable conditions smooth curves are obtained in which the rate of response is seen to decline slowly, perhaps over a period of many hours. The curves reveal properties which could not possibly be observed through casual inspection. We may “get the impression” that an organism is responding less and less often, but the orderliness of the change can be seen only when the behavior is recorded. The curves suggest that there is a fairly uniform process which determines the output of behavior during extinction.

Under some circumstances the curve is disturbed by an emotional effect. The failure of a response to be reinforced leads not only to operant extinction but also to a reaction commonly spoken of as frustration or rage. A pigeon which has failed to receive reinforcement turns away from the key, cooing, flapping its wings, and engaging in other emotional behavior (Chapter X). The human organism shows a similar double effect. The child whose tricycle no longer responds to pedaling not only stops pedaling but engages in a possibly violent emotional display. The adult who finds a desk drawer stuck may soon stop pulling, but he may also pound the desk, exclaim “Damn it!,” or exhibit other signs of rage. Just as the child eventually goes back to the tricycle, and the adult to the drawer, so the pigeon will turn again to the key when the emotional response has subsided. As other responses go unreinforced, another emotional episode may ensue. Extinction curves under such circumstances show a cyclic oscillation as the emotional response builds up, disappears, and builds up again. If we eliminate the emotion by repeated exposure to extinction, or in other ways, the curve emerges in a simpler form.

Behavior during extinction is the result of the conditioning which has preceded it, and in this sense the extinction curve gives an additional measure of the effect of reinforcement. If only a few responses have been reinforced, extinction occurs quickly. A long history of reinforcement is followed by protracted responding. The resistance to extinction cannot be predicted from the probability of response observed at any given moment. We must know the history of reinforcement. For example, though we have been reinforced with an excellent meal in a new restaurant, a bad meal may reduce our patronage to zero; but if we have found excellent food in a restaurant for many years, several poor meals must be eaten there, other things being equal, before we lose the inclination to patronize it again.

There is no simple relation between the number of responses reinforced and the number which appear in extinction. As we shall see in Chapter VI, the resistance to extinction generated by intermittent reinforcement may be much greater than if the same number of reinforcements are given for consecutive responses. Thus if we only occasionally reinforce a child for good behavior, the behavior survives after we discontinue reinforcement much longer than if we had reinforced every instance up to the same total number of reinforcements. This is of practical importance where the available reinforcers are limited. Problems of this sort arise in education, industry, economics, and many other fields. Under some schedules of intermittent reinforcement as many as 10,000 responses may appear in the behavior of a pigeon before extinction is substantially complete.

Extinction is an effective way of removing an operant from the repertoire of an organism. It should not be confused with other procedures designed to have the same effect. The currently preferred technique is punishment, which, as we shall see in Chapter XII, involves different processes and is of questionable effectiveness. Forgetting is frequently confused with extinction. In forgetting, the effect of conditioning is lost simply as time passes, whereas extinction requires that the response be emitted without reinforcement. Usually forgetting does not take place quickly; sizeable extinction curves have been obtained from pigeons as long as six years after the response had last been reinforced. Six years is about half the normal life span of the pigeon. During the interval the pigeons lived under circumstances in which the response could not possibly have been reinforced. In human behavior skilled responses generated by relatively precise contingencies frequently survive unused for as much as half a lifetime. The assertion that early experiences determine the personality of the mature organism assumes that the effect of operant reinforcement is long-lasting. Thus if, because of early childhood experiences, a man marries a woman who resembles his mother, the effect of certain reinforcements must have survived for a long time. Most cases of forgetting involve operant behavior under the control of specific stimuli and cannot be discussed adequately until that control has been covered in Chapter VII.

The effects of extinction. The condition in which extinction is more or less complete is familiar, yet often misunderstood. Extreme extinction is sometimes called “abulia.” To define this as a “lack of will” is of little help, since the presence or absence of will is inferred from the presence or absence of the behavior. The term seems to be useful, however, in that it implies that the behavior is lacking for a special reason, and we may make the same distinction in another way. Behavior is strong or weak because of many different variables, which it is the task of a science of behavior to identify and classify. We define any given case in terms of the variable. The condition which results from prolonged extinction superficially resembles inactivity resulting from other causes. The difference is in the history of the organism. An aspiring writer who has sent manuscript after manuscript to the publishers only to have them all rejected may report that “he can’t write another word.” He may be partially paralyzed with what is called “writer’s cramp.” He may still insist that he “wants to write,” and we may agree with him in paraphrase: his extremely low probability of response is mainly due to extinction. Other variables are still operative which, if extinction had not taken place, would yield a high probability.

The condition of low operant strength resulting from extinction often requires treatment. Some forms of psychotherapy are systems of reinforcement designed to reinstate behavior which has been lost through extinction. The therapist may himself supply the reinforcement, or he may arrange living conditions in which behavior is likely to be reinforced. In occupational therapy, for example, the patient is encouraged to engage in simple forms of behavior which receive immediate and fairly consistent reinforcement. It is of no advantage to say that such therapy helps the patient by giving him a “sense of achievement” or improves his “morale,” builds up his “interest,” or removes or prevents “discouragement.” Such terms as these merely add to the growing population of explanatory fictions. One who readily engages in a given activity is not showing an interest, he is showing the effect of reinforcement. We do not give a man a sense of achievement, we reinforce a particular action. To become discouraged is simply to fail to respond because reinforcement has not been forthcoming. Our problem is simply to account for probability of response in terms of a history of reinforcement and extinction.

WHAT EVENTS ARE REINFORCING?

In dealing with our fellow men in everyday life and in the clinic and laboratory, we may need to know just how reinforcing a specific event is. We often begin by noting the extent to which our own behavior is reinforced by the same event. This practice frequently miscarries; yet it is still commonly believed that reinforcers can be identified apart from their effects upon a particular organism. As the term is used here, however, the only defining characteristic of a reinforcing stimulus is that it reinforces.

The only way to tell whether or not a given event is reinforcing to a given organism under given conditions is to make a direct test. We observe the frequency of a selected response, then make an event contingent upon it and observe any change in frequency. If there is a change, we classify the event as reinforcing to the organism under the existing conditions. There is nothing circular about classifying events in terms of their effects; the criterion is both empirical and objective. It would be circular, however, if we then went on to assert that a given event strengthens an operant because it is reinforcing. We achieve a certain success in guessing at reinforcing powers only because we have in a sense made a crude survey; we have gauged the reinforcing effect of a stimulus upon ourselves and assume the same effect upon others. We are successful only when we resemble the organism under study and when we have correctly surveyed our own behavior.

Events which are found to be reinforcing are of two sorts. Some reinforcements consist of presenting stimuli, of adding something—for example, food, water, or sexual contact—to the situation. These we call positive reinforcers. Others consist of removing something—for example, a loud noise, a very bright light, extreme cold or heat, or electric shock—from the situation. These we call negative reinforcers. In both cases the effect of reinforcement is the same—the probability of response is increased. We cannot avoid this distinction by arguing that what is reinforcing in the negative case is the absence of the bright light, loud noise, and so on; for it is absence after presence which is effective, and this is only another way of saying that the stimulus is removed. The difference between the two cases will be clearer when we consider the presentation of a negative reinforcer or the removal of a positive. These are the consequences which we call punishment (Chapter XII).

A survey of the events which reinforce a given individual is often required in the practical application of operant conditioning. In every field in which human behavior figures prominently—education, government, the family, the clinic, industry, art, literature, and so on—we are constantly changing probabilities of response by arranging reinforcing consequences. The industrialist who wants employees to work consistently and without absenteeism must make certain that their behavior is suitably reinforced—not only with wages but with suitable working conditions. The girl who wants another date must be sure that her friend’s behavior in inviting her and in keeping the appointment is suitably reinforced. To teach a child to read or sing or play a game effectively, we must work out a program of educational reinforcement in which appropriate responses “pay off” frequently. If the patient is to return for further counsel, the psychotherapist must make sure that the behavior of coming to him is in some measure reinforced.

We evaluate the strength of reinforcing events when we attempt to discover what someone is “getting out of life.” What consequences are responsible for his present repertoire and for the relative frequencies of the responses in it? His responses to various topics of conversation tell us something, but his everyday behavior is a better guide. We infer important reinforcers from nothing more unusual than his “interest” in a writer who deals with certain subjects, in stores or museums which exhibit certain objects, in friends who participate in certain kinds of behavior, in restaurants which serve certain kinds of food, and so on. The “interest” refers to the probability which results, at least in part, from the consequences of the behavior of “taking an interest.” We may be more nearly sure of the importance of a reinforcer if we watch the behavior come and go as the reinforcer is alternately supplied and withheld, for the change in probability is then less likely to be due to an incidental change of some other sort. The behavior of associating with a particular friend varies as the friend varies in supplying reinforcement. If we observe this covariation, we may then be fairly sure of “what this friendship means” or “what our subject sees in his friend.”

This technique of evaluation may be improved for use in clinical and laboratory investigation. A direct inventory may be made by allowing a subject to look at an assortment of pictures and recording the time he spends on each. The behavior of looking at a picture is reinforced by what is seen in it. Looking at one picture may be more strongly reinforced than looking at another, and the times will vary accordingly. The information may be valuable if it is necessary for any reason to reinforce or extinguish our subject’s behavior.

Literature, art, and entertainment, are contrived reinforcers, Whether the public buys books, tickets to performances, and works of art depends upon whether those books, plays, concerts, or pictures are reinforcing. Frequently the artist confines himself to an exploration of what is reinforcing to himself. When he does so his work “reflects his own individuality,” and it is then an accident (or a measure of his universality) if his book or play or piece of music or picture is reinforcing to others. Insofar as commercial success is important, he may make a direct study of the behavior of others. (The interpretation of the activity of the writer and artist as an exploration of the reinforcing powers of certain media will be discussed in Chapter XVI.)

We cannot dispense with this survey simply by asking a man what reinforces him. His reply may be of some value, but it is by no means necessarily reliable. A reinforcing connection need not be obvious to the individual reinforced. It is often only in retrospect that one’s tendencies to behave in particular ways are seen to be the result of certain consequences, and, as we shall see in Chapter XVIII, the relation may never be seen at all even though it is obvious to others.

There are, of course, extensive differences between individuals in the events which prove to be reinforcing. The differences between species are so great as scarcely to arouse interest; obviously what is reinforcing to a horse need not be reinforcing to a dog or man. Among the members of a species, the extensive differences are less likely to be due to hereditary endowment, and to that extent may be traced to circumstances in the history of the individual. The fact that organisms evidently inherit the capacity to be reinforced by certain kinds of events does not help us in predicting the reinforcing effect of an untried stimulus. Nor does the relation between the reinforcing event and deprivation or any other condition of the organism endow the reinforcing event with any particular physical property. It is especially unlikely that events which have acquired their power to reinforce will be marked in any special way. Yet such events are an important species of reinforcer.

CONDITIONED REINFORCERS

The stimulus which is presented in operant reinforcement may be paired with another in respondent conditioning. In Chapter IV, we considered the acquisition of the power to elicit a response; now we are concerned with the power to reinforce. Although reinforcement is a different stimulus function, the process resulting from the pairing of stimuli appears to be the same. If we have frequently presented a dish of food to a hungry organism, the empty dish will elicit salivation. To some extent the empty dish will also reinforce an operant.

We can demonstrate conditioned reinforcement more readily with stimuli which can be better controlled. If each time we turn on a light we give food to a hungry pigeon, the light eventually becomes a conditioned reinforcer. It may be used to condition an operant just as food is used. We know something about how the light acquires this property: the more often the light is paired with the food, the more reinforcing it becomes; the food must not follow the light by too great an interval of time; and the reinforcing power is rapidly lost when all food is withheld. We should expect all of this from our knowledge of stimulus conditioning.

Conditioned reinforcers are often the product of natural contingencies. Usually, food and water are received only after the organism has engaged in “precurrent” behavior—after it has operated upon the environment to create the opportunity for eating or drinking. The stimuli generated by this precurrent behavior, therefore, become reinforcing. Thus before we can transfer food from a plate to our mouth successfully, we must get near the plate, and any behavior which brings us near the plate is automatically reinforced. The precurrent behavior is, therefore, sustained in strength. This is important since only a small part of behavior is immediately reinforced with food, water, sexual contact, or other events of obvious biological importance. Although it is characteristic of human behavior that primary reinforcers may be effective after long delay, this is presumably only because intervening events become conditioned reinforcers. When a man puts storm windows on his house in October because similar behavior last October was followed by a warm house in January, we need to bridge the gap between the behavior in October and the effect in January. Among the conditioned reinforcers responsible for the strength of this behavior are certain verbal consequences supplied by the man himself or by his neighbors. It is often important to fill in a series of events between an act and an ultimate primary reinforcement in order to control behavior for practical purposes. In education, industry, psychotherapy, and many other fields, we encounter techniques which are designed to create appropriate conditioned reinforcers. The effect of providing immediately effective consequences where ultimate consequences are delayed is to “improve morale,” to “heighten interest,” to “prevent discouragement” or to correct the condition of low operant strength which we called abulia, and so on. More concretely, it is to induce students to study, employees to come to work, patients to engage in acceptable social behavior, and so on.

Generalized reinforcers. A conditioned reinforcer is generalized when it is paired with more than one primary reinforcer. The generalized reinforcer is useful because the momentary condition of the organism is not likely to be important. The operant strength generated by a single reinforcement is observed only under an appropriate condition of deprivation—when we reinforce with food, we gain control over the hungry man. But if a conditioned reinforcer has been paired with reinforcers appropriate to many conditions, at least one appropriate state of deprivation is more likely to prevail upon a later occasion. A response is therefore more likely to occur. When we reinforce with money, for example, our subsequent control is relatively independent of momentary deprivations. One kind of generalized reinforcer is created because many primary reinforcers are received only after the physical environment has been efficiently manipulated. One form of precurrent behavior may precede different kinds of reinforcers upon different occasions. The immediate stimulation from such behavior will thus become a generalized reinforcer. We are automatically reinforced, apart from any particular deprivation, when we successfully control the physical world. This may explain our tendency to engage in skilled crafts, in artistic creation, and in such sports as bowling, billiards, and tennis.

It is possible, however, that some of the reinforcing effect of “sensory feed-back” is unconditioned. A baby appears to be reinforced by stimulation from the environment which has not been followed by primary reinforcement. The baby’s rattle is an example. The capacity to be reinforced in this way could have arisen in the evolutionary process, and it may have a parallel in the reinforcement we receive from simply “making the world behave.” Any organism which is reinforced by its success in manipulating nature, regardless of the momentary consequences, will be in a favored position when important consequences follow.

Several important generalized reinforcers arise when behavior is reinforced by other people. A simple case is attention. The child who misbehaves “just to get attention” is familiar. The attention of people is reinforcing because it is a necessary condition for other reinforcements from them. In general, only people who are attending to us reinforce our behavior. The attention of someone who is particularly likely to supply reinforcement—a parent, a teacher, or a loved one—is an especially good generalized reinforcer and sets up especially strong attention-getting behavior. Many verbal responses specifically demand attention—for example, “Look,” “See,” or the vocative use of a name. Other characteristic forms of behavior which are commonly strong because they receive attention are feigning illness, being annoying, and being conspicuous (exhibitionism).

Attention is often not enough. Another person is likely to reinforce only that part of one’s behavior of which he approves, and any sign of his approval therefore becomes reinforcing in its own right. Behavior which evokes a smile or the verbal response “That’s right” or “Good” or any other commendation is strengthened. We use this generalized reinforcer to establish and shape the behavior of others, particularly in education. For example, we teach both children and adults to speak correctly by saying “That’s right” when appropriate behavior is emitted.

A still stronger generalized reinforcer is affection. It may be especially connected with sexual contact as a primary reinforcer but when anyone who shows affection supplies other kinds of reinforcement as well, the effect is generalized.

It is difficult to define, observe, and measure attention, approval, and affection. They are not things but aspects of the behavior of others. Their subtle physical dimensions present difficulties not only for the scientist who must study them but also for the individual who is reinforced by them. If we do not easily see that someone is paying attention or that he approves or is affectionate, our behavior will not be consistently reinforced. It may therefore be weak, may tend to occur at the wrong time, and so on. We do not “know what to do to get attention or affection or when to do it.” The child struggling for attention, the lover for a sign of affection, and the artist for professional approval show the persevering behavior which, as we shall see in Chapter VI, results from only intermittent reinforcement.

Another generalized reinforcer is the submissiveness of others. When someone has been coerced into supplying various reinforcements, any indication of his acquiescence becomes a generalized reinforcer. The bully is reinforced by signs of cowardice, and members of the ruling class by signs of deference. Prestige and esteem are generalized reinforcers only insofar as they guarantee that other people will act in certain ways. That “having one’s own way” is reinforcing is shown by the behavior of those who control for the sake of control. The physical dimensions of submissiveness are usually not so subtle as those of attention, approval, or affection. The bully may insist upon a clear-cut sign of his dominance, and ritualistic practices emphasize deference and respect.

A generalized reinforcer distinguished by its physical specifications is the token. The commonest example is money. It is the generalized reinforcer par excellence because, although “money won’t buy everything,” it can be exchanged for primary reinforcers of great variety. Behavior reinforced with money is relatively independent of the momentary deprivation of the organism, and the general usefulness of money as a reinforcer depends in part upon this fact. Its effectiveness is also due to its physical dimensions. These permit a sharper contingency between behavior and consequence: when we are paid in money, we know what our behavior has accomplished and what behavior has accomplished it. The reinforcing effect can also be more successfully conditioned: the exchange value of money is more obvious than that of attention, approval, affection, or even submissiveness.

Money is not the only token. In education, for example, the individual behaves in part because of the marks, grades, and diplomas which he has received. These are not so readily exchanged for primary reinforcement as money, but the possibility of exchange is there. Educational tokens form a series in which one may be exchanged for the next, and the commercial or prestige value of the final token, the diploma, is usually clear. As a rule, prizes, medals, and scholarships for high marks or specialized skills or achievements are not explicitly paired with primary reinforcers, but the clear-cut physical dimensions of such awards are an advantage in arranging contingencies. Usually the ultimate reinforcement is similar to that of prestige or esteem.

It is easy to forget the origins of the generalized reinforcers and to regard them as reinforcing in their own right. We speak of the “need for attention, approval, or affection,” “the need to dominate,” and “the love of money” as if they were primary conditions of deprivation. But a capacity to be reinforced in this way could scarcely have evolved in the short time during which the required conditions have prevailed. Attention, affection, approval, and submission have presumably existed in human society for only a very brief period, as the process of evolution goes. Moreover, they do not represent fixed forms of stimulation, since they depend upon the idiosyncrasies of particular groups. Insofar as affection is mainly sexual, it may be related to a condition of primary deprivation which is to some extent independent of the personal history of the individual, but the “signs of affection” which become reinforcing because of their association with sexual contact or with other reinforcers can scarcely be reinforcing for genetic reasons. Tokens are of even more recent advent, and it is not often seriously suggested that the need for them is inherited. We can usually watch the process through which a child comes to be reinforced by money. Yet the “love of money” often seems to be autonomous as the “need for approval,” and if we confined ourselves to the observed effectiveness of these generalized reinforcers, we should have as much reason for assuming an inherited need for money as for attention, approval, affection, or domination.

Eventually generalized reinforcers are effective even though the primary reinforcers upon which they are based no longer accompany them. We play games of skill for their own sake. We get attention or approval for its own sake. Affection is not always followed by a more explicit sexual reinforcement. The submissiveness of others is reinforcing even though we make no use of it. A miser may be so reinforced by money that he will starve rather than give it up. These observable facts must have their place in any theoretical or practical consideration. They do not mean that generalized reinforcers are anything more than the physical properties of the stimuli observed in each case or that there are any nonphysical entities which must be taken into account.

WHY IS A REINFORCER REINFORCING?

The Law of Effect is not a theory. It is simply a rule for strengthening behavior. When we reinforce a response and observe a change in its frequency, we can easily report what has happened in objective terms. But in explaining why it has happened we are likely to resort to theory. Why does reinforcement reinforce? One theory is that an organism repeats a response because it finds the consequences “pleasant” or “satisfying.” But in what sense is this an explanation within the framework of a natural science? “Pleasant” or “satisfying” apparently do not refer to physical properties of reinforcing events, since the physical sciences use neither these terms nor any equivalents. The terms must refer to some effect upon the organism, but can we define this in such a way that it will be useful in accounting for reinforcement?

It is sometimes argued that a thing is pleasant if an organism approaches or maintains contact with it and unpleasant if the organism avoids it or cuts it short. There are many variations on this attempt to find an objective definition, but they are all subject to the same criticism: the behavior specified may be merely another product of the reinforcing effect. To say that a stimulus is pleasant in the sense that an organism tends to approach or prolong it may be only another way of saying that the stimulus has reinforced the behavior of approaching or prolonging. Instead of defining a reinforcing effect in terms of its effect upon behavior in general, we have simply specified familiar behavior which is almost inevitably reinforced and hence generally available as an indicator of reinforcing power. If we then go on to say that a stimulus is reinforcing because it is pleasant, what purports to be an explanation in terms of two effects is in reality a redundant description of one.

An alternative approach is to define “pleasant” and “unpleasant” (or “satisfying” and “annoying”) by asking the subject how he “feels” about certain events. This assumes that reinforcement has two effects—it strengthens behavior and generates “feelings”—and that one is a function of the other. But the functional relation may be in the other direction. When a man reports that an event is pleasant, he may be merely reporting that it is the sort of event which reinforces him or toward which he finds himself tending to move because it has reinforced such movement. We shall see in Chapter XVII that one could probably not acquire verbal responses with respect to pleasantness as a purely private fact unless something like this were so. In any case, the subject himself is not at an especially good point of vantage for making such observations. “Subjective judgments” of the pleasantness or satisfaction provided by stimuli are usually unreliable and inconsistent. As the doctrine of the unconscious has emphasized, we may not be able to report at all upon events which can be shown to be reinforcing to us or we may make a report which is in direct conflict with objective observations; we may report as unpleasant a type of event which can be shown to be reinforcing. Examples of this anomaly range from masochism to martyrdom.

It is sometimes argued that reinforcement is effective because it reduces a state of deprivation. Here at least is a collateral effect which need not be confused with reinforcement itself. It is obvious that deprivation is important in operant conditioning. We used a hungry pigeon in our experiment, and we could not have demonstrated operant conditioning otherwise. The hungrier the bird, the oftener it responds as the result of reinforcement. But in spite of this connection it is not true that reinforcement always reduces deprivation. Conditioning may occur before any substantial change can take place in the deprivation measured in other ways. All we can say is that the type of event which reduces deprivation is also reinforcing.

The connection between reinforcement and satiation must be sought in the process of evolution. We can scarcely overlook the great biological significance of the primary reinforcers. Food, water, and sexual contact, as well as escape from injurious conditions (Chapter XI), are obviously connected with the well-being of the organism. An individual who is readily reinforced by such events will acquire highly efficient behavior. It is also biologically advantageous if the behavior due to a given reinforcement is especially likely to occur in an appropriate state of deprivation. Thus it is important, not only that any behavior which leads to the receipt of food should become an important part of a repertoire, but that this behavior should be particularly strong when the organism is hungry. These two advantages are presumably responsible for the fact that an organism can be reinforced in specific ways and that the result will be observed in relevant conditions of deprivation.

Some forms of stimulation are positively reinforcing although they do not appear to elicit behavior having biological significance. A baby is reinforced, not only by food, but by the tinkle of a bell or the sparkle of a bright object. Behavior which is consistently followed by such stimuli shows an increased probability. It is difficult, if not impossible, to trace these reinforcing effects to a history of conditioning. Later we may find the same individual being reinforced by an orchestra or a colorful spectacle. Here it is more difficult to make sure that the reinforcing effect is not conditioned. However, we may plausibly argue that a capacity to be reinforced by any feedback from the environment would be biologically advantageous, since it would prepare the organism to manipulate the environment successfully before a given state of deprivation developed. When the organism generates a tactual feed-back, as in feeling the texture of a piece of cloth or the surface of a piece of sculpture, the conditioning is commonly regarded as resulting from sexual reinforcement, even when the area stimulated is not primarily sexual in function. It is tempting to suppose that other forms of stimulation produced by behavior are similarly related to biologically important events.

When the environment changes, a capacity to be reinforced by a given event may have a biological disadvantage. Sugar is highly reinforcing to most members of the human species, as the ubiquitous candy counter shows. Its effect in this respect far exceeds current biological requirements. This was not true before sugar had been grown and refined on an extensive scale. Until a few hundred years ago, the strong reinforcing effect of sugar must have been a biological advantage. The environment has changed, but the genetic endowment of the organism has not followed suit. Sex provides another example. There is no longer a biological advantage in the great reinforcing effect of sexual contact, but we need not go back many hundreds of years to find conditions of famine and pestilence under which the power of sexual reinforcement offered a decisive advantage.

A biological explanation of reinforcing power is perhaps as far as we can go in saying why an event is reinforcing. Such an explanation is probably of little help in a functional analysis, for it does not provide us with any way of identifying a reinforcing stimulus as such before we have tested its reinforcing power upon a given organism. We must therefore be content with a survey in terms of the effects of stimuli upon behavior.

ACCIDENTAL CONTINGENCIES AND “SUPERSTITIOUS” BEHAVIOR

It has been argued that Thorndike’s experiment is not typical of the learning process because the cat cannot “see the connection” between moving a latch and escaping from a box. But seeing a connection is not essential in operant conditioning. Both during and after the process of conditioning, the human subject often talks about his behavior in relation to his environment (Chapter XVII). His reports may be useful in a scientific account, and his reaction to his own behavior may even be an important link in certain complex processes. But such reports or reactions are not required in the simple process of operant conditioning. This is evident in the fact that one may not be able to describe a contingency which has clearly had an effect.

Nor need there be any permanent connection between a response and its reinforcement. We made the receipt of food contingent upon the response of our pigeon by arranging a mechanical and electrical connection. Outside the laboratory various physical systems are responsible for contingencies between behavior and its consequences. But these need not, and usually do not, affect the organism in any other way. So far as the organism is concerned, the only important property of the contingency is temporal. The reinforcer simply follows the response. How this is brought about does not matter.

We must assume that the presentation of a reinforcer always reinforces something, since it necessarily coincides with some behavior. We have also seen that a single reinforcement may have a substantial effect. If there is only an accidental connection between the response and the appearance of a reinforcer, the behavior is called “superstitious.” We may demonstrate this in the pigeon by accumulating the effect of several accidental contingencies. Suppose we give a pigeon a small amount of food every fifteen seconds regardless of what it is doing. When food is first given, the pigeon will be behaving in some way—if only standing still—and conditioning will take place. It is then more probable that the same behavior will be in progress when food is given again. If this proves to be the case, the “operant” will be further strengthened. If not, some other behavior will be strengthened. Eventually a given bit of behavior reaches a frequency at which it is often reinforced. It then becomes a permanent part of the repertoire of the bird, even though the food has been given by a clock which is unrelated to the bird’s behavior. Conspicuous responses which have been established in this way include turning sharply to one side, hopping from one foot to the other and back, bowing and scraping, turning around, strutting, and raising the head. The topography of the behavior may continue to drift with further reinforcements, since slight modifications in the form of response may coincide with the receipt of food.

In producing superstitious behavior, the intervals at which food is given are important. At sixty seconds the effect of one reinforcement is largely lost before another can occur, and other behavior is more likely to appear. Superstitious behavior is therefore less likely to emerge, though it may do so if the experiment is carried on for a long time. At fifteen seconds the effect is usually almost immediate. When a superstitious response has once been established, it will survive even when reinforced only infrequently.

The pigeon is not exceptionally gullible. Human behavior is also heavily superstitious. Only a small part of the behavior strengthened by accidental contingencies develops into the ritualistic practices which we call “superstitions,” but the same principle is at work. Suppose we find a ten-dollar bill while walking through the park (and suppose this is an event which has a considerable reinforcing effect). Whatever we were doing, or had just been doing, at the moment we found the bill must be assumed to be reinforced. It would be difficult to prove this in a rigorous way, of course, but it is probable that we shall be more likely to go walking again, particularly in the same or a similar park, that we shall be slightly more likely to keep our eyes cast downward precisely as we did when we saw the money, and so on. This behavior will vary with any state of deprivation to which money is relevant. We should not call it superstitious, but it is generated by a contingency which is only rarely “functional.”

Some contingencies which produce superstitious behavior are not entirely accidental. A response is sometimes likely to be followed by a consequence which it nevertheless does not “produce.” The best examples involve a type of stimulus which is reinforcing when removed (Chapter XI). The termination of a brief stimulus of this sort may occur at just the right time to reinforce the behavior generated by its onset. The aversive stimulus appears and the organism becomes active; the stimulus terminates, and this reinforces some part of the behavior. Certain illnesses, lamenesses, and allergic reactions are of such duration that any measure taken to “cure” them is likely to be reinforced when the condition clears up. The measure need not actually be responsible for the cure. The elaborate rituals of nonscientific medicine appear to be explained by this characteristic of many forms of illness.

In superstitious operant behavior, as in the superstitious conditioned reflexes discussed in Chapter IV, the process of conditioning has miscarried. Conditioning offers tremendous advantages in equipping the organism with behavior which is effective in a novel environment, but there appears to be no way of preventing the acquisition of non-advantageous behavior through accident. Curiously, this difficulty must have increased as the process of conditioning was accelerated in the course of evolution. If, for example, three reinforcements were always required in order to change the probability of a response, superstitious behavior would be unlikely. It is only because organisms have reached the point at which a single contingency makes a substantial change that they are vulnerable to coincidences.

Superstitious rituals in human society usually involve verbal formulae and are transmitted as part of the culture. To this extent they differ from the simple effect of accidental operant reinforcement. But they must have had their origin in the same process, and they are probably sustained by occasional contingencies which follow the same pattern.

GOALS, PURPOSES, AND OTHER FINAL CAUSES

It is not correct to say that operant reinforcement “strengthens the response which precedes it.” The response has already occurred and cannot be changed. What is changed is the future probability of responses in the same class. It is the operant as a class of behavior, rather than the response as a particular instance, which is conditioned. There is, therefore, no violation of the fundamental principle of science which rules out “final causes.” But this principle is violated when it is asserted that behavior is under the control of an “incentive” or “goal” which the organism has not yet achieved or a “purpose” which it has not yet fulfilled. Statements which use such words as “incentive” or “purpose” are usually reducible to statements about operant conditioning, and only a slight change is required to bring them within the framework of a natural science. Instead of saying that a man behaves because of the consequences which are to follow his behavior, we simply say that he behaves because of the consequences which have followed similar behavior in the past. This is of course, the Law of Effect or operant conditioning.

It is sometimes argued that a response is not fully described until its purpose is referred to as a current property. But what is meant by “describe”? If we observe someone walking down the street, we may report this event in the language of physical science. If we then add that “his purpose is to mail a letter,” have we said anything which was not included in our first report? Evidently so, since a man may walk down the street “for many purposes” and in the same physical way in each case. But the distinction which needs to be made is not between instances of behavior; it is between the variables of which behavior is a function. Purpose is not a property of the behavior itself; it is a way of referring to controlling variables. If we make our report after we have seen our subject mail his letter and turn back, we attribute “purpose” to him from the event which brought the behavior of walking down the street to an end. This event “gives meaning” to his performance, not by amplifying a description of the behavior as such, but by indicating an independent variable of which it may have been a function. We cannot see his “purpose” before seeing that he mails a letter, unless we have observed similar behavior and similar consequences before. Where we have done this, we use the term simply to predict that he will mail a letter upon this occasion.

Nor can our subject see his own purpose without reference to similar events. If we ask him why he is going down the street or what his purpose is and he says, “I am going to mail a letter,” we have not learned anything new about his behavior but only about some of its possible causes. The subject himself, of course, may be in an advantageous position in describing these variables because he has had an extended contact with his own behavior for many years. But his statement is not therefore in a different class from similar statements made by others who have observed his behavior upon fewer occasions. As we shall see in Chapter XVII, he is simply making a plausible prediction in terms of his experiences with himself. Moreover, he may be wrong. He may report that he is “going to mail a letter,” and he may indeed carry an unmailed letter in his hand and may mail it at the end of the street, but we may still be able to show that his behavior is primarily determined by the fact that upon past occasions he has encountered someone who is important to him upon just such a walk. He may not be “aware of this purpose” in the sense of being able to say that his behavior is strong for this reason.

The fact that operant behavior seems to be “directed toward the future” is misleading. Consider, for example, the case of “looking for something.” In what sense is the “something” which has not yet been found relevant to the behavior? Suppose we condition a pigeon to peck a spot on the wall of a box and then, when the operant is well established, remove the spot. The bird now goes to the usual place along the wall. It raises its head, cocks its eye in the usual direction, and may even emit a weak peck in the usual place. Before extinction is very far advanced, it returns to the same place again and again in similar behavior. Must we say that the pigeon is “looking for the spot”? Must we take the “looked for” spot into account in explaining the behavior?

It is not difficult to interpret this example in terms of operant reinforcement. Since visual stimulation from the spot has usually preceded the receipt of food, the spot has become a conditioned reinforcer. It strengthens the behavior of looking in given directions from different positions. Although we have undertaken to condition only the pecking response, we have in fact strengthened many different kinds of precurrent behavior which bring the bird into positions from which it sees the spot and pecks it. These responses continue to appear, even though we have removed the spot, until extinction occurs. The spot which is “being looked for” is the spot which has occurred in the past as the immediate reinforcement of the behavior of looking. In general, looking for something consists of emitting responses which in the past have produced “something” as a consequence.

The same interpretation applies to human behavior. When we see a man moving about a room opening drawers, looking under magazines, and so on, we may describe his behavior in fully objective terms: “Now he is in a certain part of the room; he has grasped a book between the thumb and forefinger of his right hand; he is lifting the book and bending his head so that any object under the book can be seen.” We may also “interpret” his behavior or “read a meaning into it” by saying that “he is looking for something” or, more specifically, that “he is looking for his glasses.” What we have added is not a further description of his behavior but an inference about some of the variables responsible for it. There is no current goal, incentive, purpose, or meaning to be taken into account. This is so even if we ask him what he is doing and he says, “I am looking for my glasses.” This is not a further description of his behavior but of the variables of which his behavior is a function; it is equivalent to “I have lost my glasses,” “I shall stop what I am doing when I find my glasses,” or “When I have done this in the past, I have found my glasses.” These translations may seem unnecessarily roundabout, but only because expressions involving goals and purposes are abbreviations.

Very often we attribute purpose to behavior as another way of describing its biological adaptability. This issue has already been discussed, but one point may be added. In both operant conditioning and the evolutionary selection of behavioral characteristics, consequences alter future probability. Reflexes and other innate patterns of behavior evolve because they increase the chances of survival of the species. Operants grow strong because they are followed by important consequences in the life of the individual. Both processes raise the question of purpose for the same reason, and in both the appeal to a final cause may be rejected in the same way. A spider does not possess the elaborate behavioral repertoire with which it constructs a web because that web will enable it to capture the food it needs to survive. It possesses this behavior because similar behavior on the part of spiders in the past has enabled them to capture the food they needed to survive. A series of events have been relevant to the behavior of web-making in its earlier evolutionary history. We are wrong in saying that we observe the “purpose” of the web when we observe similar events in the life of the individual.