Roots of Artificial Intelligence
THE most important years of my life as a scientist were 1955 and 1956, when the maze branched in a most unexpected way. During the preceding twenty years, my principal research had dealt with organizations and how the people who manage them make decisions. My empirical work had carried me into real-world organizations to observe them and occasionally to carry out experiments on them. My theorizing used ordinary language or the sorts of mathematics then commonly employed in economics. Although I was somewhat interdisciplinary in outlook, I still fit rather comfortably the label of political scientist or economist and was generally regarded as one or both of these.
All of this changed radically in the last months of 1955. While I did not immediately drop all of my concerns with administration and economics, the focus of my attention and efforts turned sharply to the psychology of human problem solving, specifically, to discovering the symbolic processes that people use in thinking. Henceforth, I studied these processes in the psychological laboratory and wrote my theories in the peculiar formal languages that are used to program computers. Soon I was transformed professionally into a cognitive psychologist and computer scientist, almost abandoning my earlier professional identity.
This sudden and permanent change came about because Al Newell, Cliff Shaw, and I caught a glimpse of a revolutionary use for the electronic computers that were just then making their first public appearance. We seized the opportunity we saw to use the computer as a general processor for symbols (hence for thoughts) rather than just a speedy engine for arithmetic. By the end of 1955 we had invented list-processing languages for programming computers and had used them to create the Logic Theorist, the first computer program that solved non-numerical problems by selective search. It is for these two achievements that we are commonly adjudged to be the parents of artificial intelligence.
Put less technically, if more boastfully, we invented a computer program capable of thinking non-numerically, and thereby solved the venerable mind/body problem, explaining how a system composed of matter can have the properties of mind. With that, we opened the way to automating a wide range of tasks that had previously required human intelligence, and we provided a new method, computer simulation, for studying thought. We also acquired considerable notoriety, and attracted critics who knew in their hearts that machines could not think and wanted to warn the world against our pretensions.
In this and the following chapters I will give a blow-by-blow account of our research during 1955 and 1956, and place it in the intellectual atmosphere, the zeitgeist, within which it took place. I will present, of course, an egocentric view of that atmosphere, emphasizing how it influenced the ideas of our research group.
To understand how the Logic Theorist came about, we have to roam over several disciplines, including psychology, logic, and economics, and examine their views of the world just before our effort got under way. Those world views both shaped our own outlook and defined the constraints and assumptions we had to modify in order to go forward.
COGNITIVE PSYCHOLOGY BEFORE 1945
On the American side of the Atlantic Ocean, there was a great gap in research on human thinking from the time of William James almost down to World War II. American psychology was dominated by behaviorism, the stimulus-response connection (S → R), the nonsense syllable, and the rat. Cognitive processes—what went on between the ears after the stimulus was received and before the response was given—were hardly mentioned, and the word mind was reserved for philosophers, not to be uttered by respectable psychologists.
My footnotes in Administrative Behavior show that William James and Edward C. Tolman were my principal sources among American psychologists. Tolman was the farthest from the dominant behaviorists (except for immigrant Gestalt psychologists from Europe). In his principal book, Purposive Behavior in Animals and Men (1932), he treated humans (and rats) as goal-seeking, hence decision-making, organisms, whose behavior was molded by the environment. But although well respected, Tolman remained at the edge of mainstream American psychology.
In Europe, psychologists were less preoccupied with rigor and were willing to use data from verbal protocols, paying more attention to complex behavior. In Remembering (1932), the English psychologist Frederick C. Bartlett examined how information is represented “inside the head” and modified by the processes that store and retrieve it. The Würzburg school in Germany, and Otto Selz and his followers, had similar concerns for the processes of complex thought and the ways in which the information used in thought is organized and stored in the head.
This point of view, carried forward by the Gestaltists Max Wertheimer and Karl Duncker, was hardly known in the United States until after World War II. Similarly, before the war, Jean Piaget’s work on the development of thought in children was familiar to some American educational psychologists (and to me through Harold Guetzkow) but to hardly any American experimental psychologists.
Apart from Tolman, one other viewpoint in prewar American psychology departs from behaviorism: the “standard” viewpoint of physiological psychology, well expressed by Edwin G. Boring in the preface to The Physical Dimensions of Consciousness:
[T]he simple basic fact in psychology is a correlation of a dependent variable upon an independent one. Ernst Mach made this point and B. F. Skinner took it up about the time this book was being written. He created the concept of “empty organism” (my phrase, not his), a system of correlation between stimulus and response with nothing (no C.N.S., the “Conceptual Nervous System”—his phrase, not mine) in between. This book does not go along with Skinner . . . , but rather argues that these correlations are early steps in scientific discovery and need to be filled—for the inquiring mind dislikes action at a distance, discontinuities that remain “unexplained.” Thus my text undertook to assess the amount of neurological filling available in 1932—how much fact there was ready to relieve the psychophysiological vacuum. [Boring 1933, pp. vi–vii]
This last sentence of Boring separates his psychophysiological viewpoint (typified also by Karl Lashley) from both behaviorism and our own approach. He assumes that: (1) the “empty organism” is to be filled with explanatory mechanisms, an assumption accepted by all psychologists except radical behaviorists like Skinner; but also (2) the explanatory mechanisms are to be neurological.
Al, Cliff, and I did not share this second assumption—not because of in-principle opposition to reductionism but because we believed that complex behavior can be reduced to neural processes only in successive steps, not in a single leap. Physics, chemistry, biochemistry, and molecular biology accept in principle that the most complex events can be reduced to the laws of quantum physics, but they carry out the reduction in stages, inserting four or five layers of theory between gross biological phenomena and the sub-microscopic events of elementary particles. Analogously for psychology, a theory at the level of symbols, located midway between complex thought processes and neurons, is essential.
In agreement with Boring and in contrast to our view, almost all American psychologists who were not behaviorists identified explanation in psychology with neurophysiology. This confounding continued into the postwar period with Donald Hebb’s influential The Organization of Behavior, and the confusion has today been inherited by those cognitive scientists who espouse parallel connectionist networks (“neural nets”) to model the human mind.
Since information processing theories of cognition represent a specific layer of explanation lying between behavior (above) and neurology (below), they resonate most strongly with theories that admit constructs of this kind. Would the forerunners of our own work, principally Selz, the Gestaltists and their allies, be pleased to be labeled “information-processing psychologists,” and would they accept our operationalizing their vague (in our eyes) concepts? With or without their consent, we acknowledge our debt to them.
THE INFLUENCE OF FORMAL LOGIC
To build a successful scientific theory, we must have a language that can express what we know. For a long time, cognitive psychology lacked a clear and operational language. Advances in formal logic brought about by Giuseppe Peano, Gottlob Frege, and Alfred North Whitehead and Bertrand Russell around the turn of the century provided it.
The relation of formal logic to psychology is often misunderstood. Both logicians and psychologists agree nowadays that logic is not to be confused with human thinking.* For the logician, inference has objective, formal standards of validity that can exist only in Plato’s heaven of ideas and not in human heads. For the psychologist, human thinking frequently is not rigorous or correct, does not follow the path of step-by-step deduction—in short, is not usually “logical.”
How, then, could formal logic help start psychology off in a new direction? By example, it demonstrated that manipulating symbols is as concrete as sawing pine boards in a carpentry shop; symbols can be copied, compared, rearranged, and chunked just as definitely as boards can be sawed, planed, measured, and glued. Symbols are the stuff of thought, but symbols are patterns of matter. The mind/body problem arises because of the apparent radical incongruity of “ideas”—the material of thought—with the tangible biological substances of the brain. Formal logic, treating symbols as material patterns (for example, patterns of ink on paper) showed that ideas, at least some ideas, can be represented by symbols, and that these symbols can be altered in meaningful ways by precisely defined processes.
Even a metaphorical use of the similarities between symbol manipulation and thinking liberated my concept of thinking. Influenced by Rudolf Carnap’s lectures at the University of Chicago and his books, and by my study of Whitehead and Russell’s Principia Mathematica, I very early used this metaphor explicitly as the framework for my thinking about administrative decision making: “Any rational decision may be viewed as a conclusion reached from certain premises. . . . The behavior of a rational person can be controlled, therefore, if the value and factual premises upon which he bases his decisions are specified for him” (Simon 1944, p. 19).
Exploiting this new idea in psychology requires enlarging symbol manipulation to embrace much more than deductive logic. Symbols can be used for everyday thinking, for metaphorical thinking, even for “illogical” thinking. This crucial generalization began to emerge at about the time of World War II, though it took the appearance of the modern computer to perfect it.
Parallel to the growth of logic, economics, in close alliance with statistical decision theory, constructed new formal theories of “economic man’s” decision making. Although economic man was patently too rational to fit the human form, the concept nudged economics toward explicit concern with reasoning about action. But the economist’s concern only for reasoning that was logical, deductive, and correct somewhat delayed recognition of the common interests of economics and psychology.
In striving to handle symbols rigorously and objectively—as objects—logicians gradually became more explicit about their manipulation. When, in 1936, Alan Turing, an English logician, defined the processor now known as a Turing machine, he completed this drive toward formalization by showing how to manipulate symbols by machine. I did not become aware of Turing’s work until later, but I did glean some of the same ideas from Claude Shannon’s Master’s thesis (1938), which showed how to implement the logic of Boolean algebra with electrical switching circuits.
Finally, at this time the belief was also growing that mathematics could be used in biology, psychology, and sociology as it had been used in the physical sciences. Alfred Lotka’s Elements of Physical Biology (1924), embodying this view, foreshadowed some of the central concepts of cybernetics. My former teacher Nicholas Rashevsky was another pioneer in this sphere. Although there had been little use of mathematics in psychology before the war, a few psychologists (including Clark L. Hull) had begun to show strong interest in its potential.
THE POSTWAR SETTING FOR MACHINE INTELLIGENCE
The developments I have been tracing came to public notice at the end of World War II under the general rubric of cybernetics, a term that Norbert Wiener devised to embrace such elements as information theory, feedback systems (servomechanism theory, control theory), and the electronic computer (Wiener 1948). In other countries, particularly behind the Iron Curtain, the term cybernetics was used even more broadly to encompass, in addition, game theory, mathematical economics and statistical decision theory, and management science and operations research. Wiener presented, in the first chapter of Cybernetics, his version of the history of these developments. The computer as symbolic machine played a minor role in the early development of cybernetics; its main underpinnings were feedback and information theory, while the computer was simply “the biggest mechanism.”
World War II did not produce the cybernetic developments; more likely, in fact, it delayed them slightly. Their ties with formal logic had been evident earlier, as I mentioned, in Shannon’s Master’s thesis, as well as in a closely parallel paper by Walter Pitts and Warren McCulloch (1943) that provided a Boolean analysis of nerve networks, and in a paper by Arturo Rosenblueth, Norbert Wiener, and Julian Bigelow (1943) that provided a cybernetic account of behavior, purpose, and teleology. Most of the prominent figures in these developments had early in their careers been deeply immersed in modern symbolic logic. Wiener had been a student of Russell’s in Cambridge; much of von Neumann’s work in the 1920s and 1930s had been in logic; Shannon’s and Pitts and McCulloch’s use of logic has just been mentioned.
Work that too far anticipates its appropriate zeitgeist tends to be ignored, while work that fits the contemporary zeitgeist is recognized promptly. Von Neumann’s contributions to game theory in the 1920s were known to few persons before 1945; Lotka was read by a few biologists; a few logicians were aware of the rapid strides of logic and the esoteric discoveries of Kurt Gödel, Alan Turing, Alonzo Church, and Emil Post. All this changed in the early postwar years.
My own experiences in hearing lectures by Carnap, Rashevsky, and Schultz document this dramatic shift in the climate of ideas. Through these teachers, I learned of Lotka, of recent developments in statistical decision theory, of Gödel—but not immediately of Church or Turing. I found a few other teachers and fellow students who shared this vague sense of the zeitgeist. My dissertation reflected the intellectual climate of about 1940 to 1942.
The “invisible college” operated with some efficiency, then as now; news of the new contributions, published in widely scattered journals and books, spread rapidly. My attention was called to most of them either before publication or shortly thereafter. Similarly, the decision-making approach of my dissertation rapidly became known in economics and operations research.
Biology and the behavioral sciences did not long stay aloof from cybernetics. Its feedback notions soon were being used, particularly by the physiologically inclined, and most enthusiastically in Great Britain (Ashby 1952; Walter 1953). The computer-as-brain metaphor suggested itself almost immediately—followed almost as immediately by warnings against taking too literally the analogy between the neurological organization of the brain and the wiring of the computer (von Neumann 1958). Turing was one of the first to see the more fruitful analogy at a different level, the abstract level of symbol processing.
Feedback concepts had considerable, but relatively unspecific, impact on psychology, but the influence of the Shannon-Weaver information theory was clear and precise (Miller and Frick 1949). W. E. Hick proposed and tested a relation between response time and amount of information contained in the response, while others sought to measure the information capacity of the human sensory and motor channels. Limits on the applicability of information theory to psychology gradually became clear, and by the early 1960s the theory had become only a specialized tool for the analysis of variability.
In the early postwar years, the European work on thought processes was just beginning to reach the United States through translation and migration. The translations of Duncker’s On Problem Solving and Wertheimer’s Productive Thinking appeared in 1945; Humphrey, in Thinking (1951), provided the first extensive English-language discussion of Selz’s research and theories; Katona’s Organizing and Memorizing and a number of Maier’s papers on problem solving had appeared by 1940 but were little noticed until the end of the war.
Information theory, statistical decision theory, and game theory had roused new interest in concept formation, and had suggested new research methods and theoretical ideas. Carl Hovland’s “A ‘Communication Analysis’ of Concept Learning” was published in 1952; while in the critical year 1956 there appeared both George Miller’s “Magical Number Seven” paper, setting forth a theory of the capacity limits of short-term memory, and Bruner, Goodnow, and Austin’s A Study of Thinking, a book that brought to the study of concept formation notions of strategy borrowed from game theory.
The war had produced a vast increase of research in human skills and performance (“human factors” research). Because much of this work was concerned with the human members of complex man-machine systems—pilots, gunners, radar personnel—the researchers could observe the analogies between human information processing and the behaviors of servomechanisms and computers. The title of Donald Broadbent’s “A Mechanical Model for Human Attention and Immediate Memory” (1954) illustrates the main emphasis in this line of inquiry. The research in human factors and in concept formation provided a bridge, also, between psychology and the newly emerging field of computer science, and a major route through which ideas drawn from the latter field began to be legitimized in the former.
A long-standing concern with formalization in linguistics received a new impetus with Zelig Harris’s Methods in Structural Linguistics (1951). The work of Noam Chomsky, which was to redirect much of linguistics from a preoccupation with structure to a concern with processing (that is, with generative grammars) was just beginning (see Chomsky 1955). I do not know to what extent these developments in linguistics derived from the zeitgeist I have described. Some connections with logic are clear (for example, Chomsky 1956). But early efforts in the mechanical translation of languages lay outside the mainstream of linguistics (see Locke and Booth 1955 for the history).
DIGITAL COMPUTERS ENTER THE SCENE
Digital computers developed rapidly in the early postwar era. While logicians understood that they were universal machines (Turing machines), others viewed them primarily as arithmetic engines, working with numbers rather than with general symbols.
The use of mechanisms (robots, not computers) to illustrate psychological theories and enforce operationality had a long history, predating the computer. Boring (1946) surveys that history in his article “Mind and Mechanism.” And the developments in cybernetics produced a new pulse of mechanical robot building (Walter 1953; Ashby 1952), with which I had some contact at RAND beginning in 1952.
But all of these efforts were rather separate from simulation on the computer, which tended not toward activating mechanical beasts but toward programming game playing and other symbolic activities. The first checker program was coded in 1952 (Strachey), and other game-playing schemes are described in Bowden (1953). And Turing (1950), in a justly famous discussion, “Computing Machinery and Intelligence,” had put the problem of simulation in a highly sophisticated form, proposing the Turing Test to determine the ability of a computer to give answers to questions indistinguishable from human answers. The stage was set for the appearance of artificial intelligence.
Note
*An influential coterie of contemporary artificial intelligence researchers, including Nils Nils-son, John McCarthy, and others, believe that formal logic provides the appropriate language for A.I. programs, and that problem solving is a process of proving theorems. They are horribly wrong on both counts, but this is not the place to pursue my quarrel with them, beyond the comments in the next paragraphs.