5   The Birth of Artificial Intelligence

Humans, when engaged in problem solving in the kinds of tasks we have considered, are representable as information processing systems.

—Allen Newell and Herbert A. Simon80

Newell and Simon’s view—that the brain is an information processing system and can therefore be reproduced in machine form and can also be studied—seemed outrageously radical in 1972. They were among the pioneers who laid the groundwork for the first developments in artificial intelligence (AI).

Your computer is a box connected to your printer, screen, keyboard, and mouse. If you opened it, you would find devices (hardware) for storing information—data—and retrieving it to use in problem-solving programs (algorithms), along with CPUs for calculations. This is the computer’s functional architecture for processing information.

In the early 1970s, psychologists like Newell and Simon, who were interested in AI, expanded the concept of a computer’s functional architecture to reflect the way the brain is put together—that is, the brain’s cognitive functional architecture. Like a computer, the brain has storage for information or facts—memory—and ways of retrieving information and working on it, using rules for solving problems—algorithms. And also like a computer, the brain is an information-processing system and can be studied using computer science. In other words, the brain is like a computer and a computer is like the brain.

It all began more than a decade earlier, back in the 1950s, when psychologists in the cognitive science field began to apply scientific methods to psychology. Newell and Simon were among the chief contributors. Their method was to ask people to solve problems and explain their procedures step-by-step, the aim being to formulate a general theory of problem solving.

In 1956, a group of scientists and mathematicians interested in whether it might be possible to simulate human intelligence in machines gathered for an informal conference at Dartmouth College in New Hampshire. One of the organizers, John McCarthy, coined the term artificial intelligence, or AI. Newell and Simon discussed their work and presented their program, the Logic Theorist. It was the first program deliberately created to mimic the problem-solving skills of a human being and the first true AI program.

In 1976, mathematicians Kenneth Appel and Wolfgang Haken were among the first to apply computers to mathematics. They used a computer to prove the four-color theorem, the long-standing conjecture that no more than four colors are needed to color the regions of a map in such a way that no two adjacent regions have the same color. Many mathematicians objected to their use of computers and derided their proof, saying it lacked the generality of mathematical proofs using equations full of x’s and y’s standing for every number and every conceivable map. They insisted that computers could never replace human beings in the “queen of the sciences,” mathematics. Nevertheless, over the years, Appel and Haken’s proof was checked and rechecked and its validity firmly established.81

Meanwhile, Simon was developing algorithms that he claimed could make scientific discoveries in the same way that human scientists do. He claimed that “creativity involves nothing more than normal problem-solving processes.”82 His assumption was that there were no differences in anyone’s thought processes. It was just that certain people, like Bach, Einstein, Poincaré, and Picasso, had better heuristics—better problem-solving methods. This assumption was essential for him to write discovery software—computer programs that can make discoveries—because these programs were based on the problem-solving strategies (heuristics) of people of ordinary intelligence.

In 1987, Simon and his coworkers published a book called Scientific Discovery: Computational Explorations of the Creative Process, in which they gave detailed descriptions of their software. They used an information-processing language that was symbolic rather than numerical, based on people’s descriptions of how they solved a problem. They then took a new problem and compared how the computer and the person solved it, going into greater and greater detail. The question was whether a computer program with certain selective problem-solving capabilities could come up with a solution to a specific problem. “If an affirmative answer can be given,” they wrote, “then we can claim to have driven the mystery out of these kinds of scientific creativity.”83 A big claim indeed! But did they succeed?

Using the same data that had been used by seventeenth-century German astronomer Johannes Kepler, Simon’s purpose-built program, BACON, was able to generate one of Kepler’s laws. It was purpose built in that it was programmed expressly to study quantities that could be expressed as ratios of one another. The BACON program runs through millions of possibilities until it finds a ratio of terms that produces a number that is the same for the entire set of data. In this way, the program deduced Kepler’s third law, which states (roughly) that the ratio of the annual time it takes for a planet to go around the sun squared and the planet’s average distance from the sun cubed is a constant—that is, that there is a specific and unchanging relationship between the distance of a planet from the sun and the length of its year.

This would be interesting as a simulation. But Simon claimed it was the real thing, that the computer was thinking precisely as Kepler had thought. The real issue, though, was not how Kepler had thought but why he had chosen to work on this particular problem. The answer is that he had discovered it. It was a brand-new problem, an example of problem discovery. Moreover, Simon had not taken into account Kepler’s belief in and use of astrology and mysticism, added to which the scientific discovery that Simon was looking for was already in the software.

In his defense, Simon claimed his program was capable of discovering laws other than Kepler’s84—but they were all laws based on the ratios of certain terms, like Kepler’s.

Simon never agreed that his discovery software was merely a number cruncher. Nevertheless, taken as a simulation of creative thinking, alongside the success of Appel and Haken in proving the four-color theorem, it seemed to indicate that the computer did indeed function in the same way as the brain, which meant that the human brain was indeed an information-processing system, just like a computer, and could therefore be explored using the tools of AI.

The First Inklings of Computer Creativity

The first inkling anyone had that computers might be more than giant calculators or glorified typewriters came in the 1960s. It occurred in 1965, to be precise, pretty much simultaneously on two continents. In Germany, artists Frieder Nake and Georg Nees, inspired by the philosopher Max Bense’s suggestion of using computers to pin down a scientific notion of aesthetics, produced geometric patterns of lines and curves, drawn by a pen attached to their computer, which plotted the computer’s output—a plotter. It was the first computer art—intriguing if modest.

At the same time, at Bell Labs in the United States, the massive IBM 7094, which occupied an entire room there, was being used to produce numerical solutions to complex equations. The output numbers were transmitted to a plotter, which laid them out on a graph. One day the plotter malfunctioned and drew a collection of random lines. The user ran down the halls shouting that the computer had “produced art.” A. Michael Noll, a scientist there, dubbed it “computer art.” It had been produced by accident, but Noll set about creating it deliberately. The Library of Congress balked at copyrighting one of his creations because it had been made by a computer, which the library saw as a mere number-cruncher. Noll replied that the programs were written by a human being, and the bureaucrats finally relented. It was akin to the criticisms later leveled against Appel and Haken for using a computer to prove the four-color theorem. How could a mere machine ever be as creative as a human being?85

But all this work was programmed. What computer scientists really wanted was a way to turn the computer loose and let it create. Artist Harold Cohen’s computer program AARON was an early effort. It began in 1968 and ran for many years. AARON randomly assembled arms, legs, shapes, and colors and produced attractive images—but these were essentially pastiches. It produced only what Cohen programmed it to produce.

Computer music began to take off in the 1970s. One of the pioneers was George Lewis, an American trombonist and jazz improviser par excellence. As part of the sonic art scene in New York City, he and his friends wondered what could be done to link computers and music. They used very basic computers, microcomputers—in particular, the KIM-1 (Keyboard Input Monitor), with a tiny memory of one thousand bytes. These would serve as models for the Commodore 64—the bestselling home computer of all time. Lewis and his colleagues hooked several KIMs together, each with its own program to produce its own music. “It sounded to me like our own group improvising. People were creating amazing programs” for these simple devices, he tells me.86 Lewis released his first recording, The KIM and I, in 1979; in it, he improvised along with the KIM’s music. This primitive setup evolved into what became, in 1987, his famous “Voyager” system, in which human instrumentalists interact with an improvising orchestra generated by Lewis’s software and a trio of Yamaha DX-7 synthesizers.87 The digital orchestra created the sounds of instruments from all over the world, combining symphonic strings with instruments from Africa, the Americas, Asia, and the Middle East. The software chose groupings of instruments to create a rich tapestry of sound.

For Lewis, “computer programming was a theoretical tool for learning and for exploring improvisation as a practice and also as a way of life.”88 He notes, “My favourite non-human improviser is the Mars Rover. It plops down on the surface and goes to work with what it has. That’s all improv is.”89

Exciting work was also getting underway in linking the computer with performers. At the MIT Media Lab, composer Tod Machover was developing a computer-enhanced cello, which had its debut in 1991; meanwhile, over at the NYU Media Lab, Robert Rowe was composing instrumental music with computer and human musicians on stage together.90 The Greek-French composer Iannis Xenakis pioneered computer-assisted composition, using a computer to transform complex probabilistic mathematics into musical notation.

As in computer art, producing truly creative computer music required breakthroughs in algorithms—algorithms that, once launched, can take off in unexpected directions. Creative or generative algorithms began to emerge in the 1990s in, for example, work by computer scientist and artist Scott Draves in his famous Electric Sheep, an open-source screen saver.

There were even attempts at software for psychoanalysis. The first, Joseph Weizenbaum’s ELIZA, appeared in the 1960s. It was called ELIZA because, like Eliza Doolittle of Pygmalion fame, it could be taught to speak increasingly well. It provided canned replies to key words, which could be used when a patient was being treated by a nondirectional therapist. If a patient said, “My mother hates me,” the computer might reply, “Who else hates you?”91

When ELIZA appeared in the 1960s, it caused a sensation because it could do something as complex as therapy. This shook Weizenbaum himself, who insisted that computers could not really delve into the unconscious. He emphasized ELIZA’s limited vocabulary and that it was merely giving a parody of responses, and he firmly rejected the idea that the unconscious could be treated as an information-processing system. He concluded that people who raved about ELIZA misunderstood it. His opposition to AI went against the trend at MIT, where he taught, which was a bastion of AI. As a humanist, he had effectively ended up in the wrong institution.

Computers That Mimic the Brain

Most personal computers have three or four CPUs that oversee the step-by-step solution to a problem. But how does the brain work in real life? Supposing you see someone in the distance whom you haven’t run across for years. You recognize not only their physical characteristics, their facial characteristics, perhaps even their smell, but even recall that they collect stamps from Ecuador. All this information seems to bubble up from different parts of your brain, akin to when the solution to a problem over which you’ve struggled for days and then seemingly forgotten about suddenly emerges into consciousness. The brain can process many different sorts of stimuli simultaneously because it is a massively parallel system, like a computer with a huge number of CPUs working simultaneously.

It seems that thinking occurs through the interplay of many paths, as in the model I proposed to understand unconscious thought.

By the 1940s, researchers had begun to muse over how to develop computer architectures that could mimic the brain’s neural network. Such architectures act like a collection of neurons, neurons being nerve cells, the building blocks or atoms of the brain. The human brain is made up of about one hundred billion neurons. Each grows fibers, or dendrites, which connect with those of other neurons at junction points called synapses. Depending on the level of chemical or electrical stimulation sparked by incoming information, such as a reaction to an event you perceive, the synapses are in an on or off state.

This process can be mimicked in a computer equipped with an artificial neural network made up of layers of artificial neurons, signal-detection devices that process information, functioning in a way loosely analogous to the nerve cells in the human brain. When information is put in, it stimulates an artificial neuron and causes it to start to work—to compute—as with nerve cells in the brain. Like our brains, an artificial neural network can be primed to seek patterns in data.

An artificial neural network is made up of three parts. The first contains the input information or data to be processed. The second, made up of layers of simulated neurons, is where the data is processed. The third is where data is recognized as some sort of pattern, such as a face or the process of steering a driverless car. Artificial neural networks have up to three layers, whereas deep neural networks—which were developed very recently, in the second decade of this century—have many more.

In the 1970s and 1980s, this form of computer architecture, artificial neural networks, was usually called parallel distributed processing (PDP). Scientists of the time claimed that these setups provided faster computation and were more faithful models of human cognition than had been in use before.92

The machines that most of us use are programmed with logic. To solve a problem, they follow rigid and/or, yes/no routes as they test out alternatives. This is how Simon’s BACON program worked, trying out different sorts of ratios for different distances and times, whether cubed or squared, and so on, until it found the correct one. This is one type of machine learning, in which a machine learns by means of input rules and symbols such as equations or symbolic representations of the way people solve problems.

Artificial neural networks, conversely, need not be extensively preprogrammed, unlike our laptops, and they don’t manipulate symbols. Instead they are based on another sort of machine learning in which data is fed into the machine with no explicit instructions. In other words, the machine learns by itself. Henceforth this is what I mean by “machine learning.” In their early days, scientists taught machines by feeding data such as variously angled lines into their single layer of neurons. The machine would then compare an input letter of the alphabet with these shapes and identify it as, for example, a T. To begin with, it was a triumph whenever an artificial neural network managed to recognize a letter of the alphabet.

In the 1990s, artificial neural networks were developed that could read the numbers on checks, which was of great use for banks—but that was as far as they went. These early machines had trouble finding patterns, an essential aspect of data analysis. And there wasn’t enough data available to train them—data such as that later provided by social media like Facebook—nor were the machines powerful enough.

Scientists were also trying to develop programs that could translate from one language to another, working on a word-for-word basis, by painstakingly inputting rules of grammar. But the problem seemed insurmountable.

AI seemed to overpromise and underdeliver, as well as being of little use in real-world situations such as reading documents, translating from foreign languages, and finding patterns in data. Funding dried up and people lost interest, leading to the so-called AI winter.93 All this would dramatically change in the twenty-first century with the advent of more powerful computers and algorithms, together with a cornucopia of data from archives, social media, the public web (government, regulatory, banks, health care services, stocks and bonds), and medical records, to name but a few. And so we hurtled into the age of big data.

Notes