8

The Digital World

image

Prototype of the Automatic Computing Engine (ACE) based on a design by Alan Turing.

Alan Turing invented digital computing, and proposed the idea of artificial intelligence, which also provides bases for theories of how we perceive, think, and use language. Geoffrey Hinton has shown that learning can occur by changes of connections in networks of artificial neurons, which make generalizations and construct inner models from being offered many examples. The means by which Google now offers translations from one language to another is based not on rules of the kind Chomsky proposed, but in this way.

The Turing Machine

The first article by which Alan Turing became famous was published in 1936. In it he demonstrated that any problem that is computable by a human can be computed by a machine, for which he gave specifications. It has become known as the Turing Machine, and it is the basis of all computers. A pilot version of one of the earliest computers, the Automatic Computing Engine designed by Turing, is shown in the chapter opening photo.

Turing’s innovation is fundamental to modern life. If you have a smart phone, it’s a small computer, based on Turing’s idea. Computation became fundamental to cognitive science as well. The new cognitive science was a real revolution. Before Turing’s work, psychological theories involved metaphors or analogies. Memory, for instance, was said to include a trace like that which occurs in a photograph or a tape recorder, stored somewhere in the brain. Turing’s new idea, which has become fundamental to cognitive science, was to program computers actually to remember, to make decisions, and to think.1 Theories began to be produced that were not just about mental processes. They were based on processes that do what the human mind does. The principle was, and is, that both the mind and a computer make models of the world and, from such models, draw inferences and generate actions.

Although Alan Turing and Kenneth Craik were both at Cambridge at the same time, they seem not to have met. Their ideas were in many ways similar. Craik, too, had thought of the mental models he proposed as being like calculating machines. But these two young people were at colleges on opposite sides of the university.

Turing would cross to the other side of the university to attend courses given by Ludwig Wittgenstein, who had written in Tractatus (4.01): “The proposition is a model of the reality as we think it is.”2 In 1936 Turing went to Princeton University, in the United States, to complete his PhD. When World War II started in 1939, he returned to England and began work at the secret British Government Code and Cypher School at Bletchley Park, a forty-minute train journey from London’s Euston Station.

Computation at Bletchley Park

During World War II, Turing’s idea of mind-like models that could be programmed in computers was developed at Bletchley Park to crack the codes used by Nazi commanders to send messages to their military forces. The initial focus of Turing and his team was on a coding machine called Enigma, which was used by the Germans.3 It had typewriter keys, and above them letters in typewriter layout with lights behind each one that lit up to show what each coded output letter was when a typewriter key was pressed. The machine allowed an operator to type in a message in which each letter of the message, called the “plain text,” was transformed into the coded output, seen as a letter with the light behind it.

One part of the Enigma mechanism worked by means of a plugboard that the operator would set up in a different configuration each day, using a coding book that all the operators possessed. It enabled the machine to be set up in some 150,000,000,000,000 ways. The second mechanism was based on three rotors, chosen by the operator out of a set of five. The rotors worked so that with each letter typed in, they first routed the electrical signal to produce a coded output letter and then, for the next letter, they rotated to a new setting, and hence a new code for the next letter. To choose three rotors from a set of five gives 60 possibilities. The number of settings derived from the three rotors, each with 26 positions, is 26 × 26 × 26 = 17,576. So, given these processes, the overall number of configurations was 150,000,000,000,000 × 60 × 17,576: a very large number. If a human being were to work through all the possibilities for a coded message of 20 letters at the rate of one per second, it would take several billion years to do so. The Nazis thought that Enigma codes were unbreakable.

Before the War, Polish code-breakers knew of the Enigma machine and had started work on decoding its messages. They knew that each coding operation was symmetrical: if an A were input and the coded output was R, then if an R was input its coded output was A. Because of this pairing, to receive a coded message, the operator would type in a coded message to a properly set up Enigma machine, and the original German plain-text message appeared in the sequence of the machine’s letters lighting up one at a time. The Polish code-breakers also knew that the instructions for how to set the rotors, which were different for each day, were sent over the airwaves as three-letter signals, transmitted twice.

With Gordon Welshman, a Cambridge mathematician, Alan Turing was able to build on the work of the Polish code-breakers, to make inferences from the pairings of plain text and coded letters of the initial rotor-setting messages and other regularities in the system. Based on a Polish prototype, called the Bomba, Turing and Welshman constructed an improved version, the Bombe, which was a model of the Enigma machine. It was designed to work backwards from coded letter sequences picked up as Morse code messages over the radio to the settings of the plug-board and rotors, and hence to messages that had been sent.

The Bombe was a specialized computer, constructed to search its model of the Enigma machine. The task for Turing, Welshman, and their colleagues was to exploit regularities of the Enigma machine to reduce the amount of searching that had to be done, and to use the Bombe to work through possible settings much faster than a human could do. The goal was to eliminate settings that produced contradictions, for instance settings that would not work because of German spelling rules, until the model could offer a small number of hypotheses about what the settings were, from which humans could infer the plain-text words in German. Search was then, and continues to be, central to computational models of mind.

It’s said that Turing’s work in decoding messages from Hitler and his commanders shortened the War by as much as two years, and hence saved millions of lives. By the end of the War, ten sizable electronic computers, called Colossus, were in use at Bletchley Park. But during the Cold War years, everything that had taken place there was kept secret. Beyond just a tiny number of people, no one understood the whole operation. One consequence was that the history of computation had huge holes, which have only gradually been filled.

The Imitation Game

The 1936 paper on the idea of the Turing Machine, which could calculate anything that a human being could calculate, was fundamental for the development of cognitive science. In 1950 Turing published another paper, which took the idea further. It began the study of artificial intelligence. The paper was entitled “Computing machinery and intelligence.” In it Turing outlined what he calls “the imitation game.” As described in the paper, two humans, let’s call them Amy and Beatrice, and a computer, let’s call it Chloe, interact with each other by means of teletypes; nowadays they would send text messages. Beatrice answers questions that Amy puts to her. Chloe is an imitation of a human mind that can think in the sense that Craik outlined by operating on models and it, too, answers Amy’s questions. Amy knows that one of the two, Beatrice or Chloe, is a human and the other is a computer. From their answers she must work out which is which. Turing proposed that in the future a person in the role of Amy, in this game, would not be able to tell the difference between a human being and a computer-based model of a human.

Here is part of a conversation that Turing imagined.4

QUESTION: Please write me a sonnet on the subject of the Forth Bridge.

ANSWER: Count me out on this one. I never could write poetry.

QUESTION: Add 34957 to 70764.

ANSWER: (Pause about 30 seconds and then give as answer) 105621.

QUESTION: Do you play chess?

ANSWER: Yes.

QUESTION: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?

ANSWER: (After a pause of 15 seconds) R-R8 mate.

What do you think? Are these answers being given by Beatrice the human or Chloe the computer? The answer to the first question is cute, but ambiguous. As to the second, we know that computers can add numbers. They are better at it than people, and they can do it quickly. If Chloe computer were answering, she would have inserted the 30-second pause to imitate a human. What about the third question, about the chess problem? If Beatrice were answering she would make a mental model of the chessboard with the positions of the three pieces on it. Chess players can do this; the better they are at chess, the easier they find this kind of mental-modeling. Chess was among the earliest problems on which artificial intelligence programmers worked. In 1997, a computer program called Deep Blue beat the reigning chess champion Garry Kasparov in a six-game match.5 Deep Blue was a chess-playing program with mental models of a chessboard, of chess pieces, and of chess rules, that ran on an IBM computer.

A 2014 film directed by Morten Tyldum, called The Imitation Game, is mainly about Alan Turing’s code-breaking at Bletchley Park. A clever idea in this film is that we human beings often speak to each other in codes. Turing was dedicated to truth, and wondered why we don’t just tell each other what we mean. In the film, he realizes that he has spoken in code to the mathematician Joan Clarke, who came to work at Bletchley Park. Turing and Clarke went together to the cinema a few times, and Turing proposed marriage. Clarke accepted. Turing felt affection for her and was pleased for the two of them to visit both her parents and his, but his model of himself was that he was homosexual. When he told her this, she was unfazed. How would you have decoded these messages? Turing later thought he should not marry, so the relationship had to be adjusted. He and Clarke remained close friends.

There are now many computer models that can give answers to a range of questions in a human-like way, and can even hold conversations. Turing’s idea, which he put forward with “the imitation game,” has become known as the Turing Test. So far no computer program has passed it over a range of questions, but many experts think that this will happen in time, perhaps a fairly short time.

There are deeper issues to consider. Computers and artificial intelligence brought a new principle into psychology. If you understand some mental process, really understand it, then you should be able to write a computer-based version of this process. Now, for instance, following some of the principles that Helmholtz laid out, as discussed in chapter 1, robots have been programmed to perceive an environment and its objects as they move around in it.

To do good research in cognitive science, the ability to write computer programs has become central. It works like this. To understand a psychological process—to perceive, to hold a conversation, to reason—one can put together parts that have been discovered by experimentation and other methods into computational models. Programming such working models can then give one lovely insights into what it takes for the mind to work as it does. Writing programs enables one to create understandings and new hypotheses about mental processes in a way that nothing else can. We can think of this principle as synthesis, and put it alongside the principle of analysis that derives from hypothesis and experimentation.

Artificial Intelligence

Among the most common computational models, or simulations, today are the kinds used in video games. If you play such a game, or watch a trailer for one, you will see a landscape through which you or your avatar can move. You may see castles and ravines. You may decide to go along this path, or through that gate. Birds of prey may hover above. Sword-wielding warriors may approach you. All this is made possible by understandings developed over the last fifty years of the relation between three-dimensional models and projections of these models onto two-dimensional surfaces such as the screens of your computer or television set. Understanding how to use these models has derived from developments in the cognitive understanding of perception.

The people who write the programs for video games start with three-dimensional models for which they can specify the coordinates of principal parts. Think of a map. A map reference is a coordinate in an east-west direction and a coordinate in a north-south direction. In three-dimensional space another coordinate is needed, too, in the up-down direction. The three coordinates define exactly where some part of an object is, for instance the tip of a sword. Then, with coordinates for other parts of the model object, one can know where the whole object is in three-dimensional space and how it is oriented. Then, by algebra, one can work how images of the object would look when projected on a two-dimensional screen, from any viewpoint. And it can be calculated how the object will look when it moves or changes.

Video games and virtual reality involve solving puzzles and, often, interacting with adversaries. The games derive from Craik’s idea of models, and Turing’s ideas of how to program computational implementations of them. The computational processors that video-gamers use to play their games conduct billions of algebraic calculations to generate from their models the two-dimensional views of the simulated world that the gamers see as they move through it.

In 1951, having done more than almost anyone else to help win the war against the Nazis, and having designed computers at the National Physical Laboratory in Teddington, near London, a prototype of which is seen in a photo at the beginning of this chapter, and in Manchester, Turing was charged with homosexual practices. He was subjected to chemical castration by injection of female hormones. In 1953, at the age of forty-one, he died, perhaps by suicide. He was pardoned posthumously for an activity that few people now think should ever have been made illegal.

Turing’s principles of search over many possibilities in a model, and of narrowing the search space, became basic to artificial intelligence. It was by these methods that Deep Blue was able to think ahead through variations for each move and their possible replies to beat Garry Kasparov at chess. More recently the process of search, done in clever ways to narrow the search space, is what provides the oomph inside Google’s success on the Internet.

The New Cognitive Science

In 1985, Howard Gardner published The Mind’s New Science, a brilliant introduction to the cognitive approach to mind. In it he discusses research by Bartlett and Piaget. He mentions Craik, and reviews Johnson-Laird’s experiments on reasoning, based on Craik’s ideas. He writes, however, that “the logical-mathematical work that ultimately had the greatest import for cognitive science was carried out by … Alan Turing.”6

A proponent of the new cognitive approach was George Miller. Gardner tells us of Miller’s proposal that September 11, 1956 was the date of the real turning point,7 the day on which cognitive science really got started, when a meeting was held at the Massachusetts Institute of Technology (MIT) on information theory. Noam Chomsky presented a paper at this meeting, called “Three models of language,” in which he sketched his ideas on transformational grammars. Allan Newell and Herbert Simon presented the first proof of a theorem by a computer: the “Logic theory machine.” George Miller presented his idea that short-term memory has approximately seven slots in it, with each slot able to carry something as simple as a number or as complex as a concept. The idea would issue in a paper called “The magical number seven, plus or minus two.” Miller came away from the meeting with a “strong conviction … that human experimental psychology, theoretical linguistics, and computer simulation of cognitive processes were all pieces of a larger whole.”8

With Eugene Galanter and Karl Pribram, Miller published Plans and the Structure of Behavior. They weren’t quite bold enough to call it Plans and the Structure of Mind, but it was a step toward showing how the human mind is indeed based on structures, and on plans and other processes that can produce outcomes. A plan is made by starting from a goal in a mental model of the world as one would like it to be, then working backward through a series of states of the world to the current state. These states are stored, and for each one an action is conceived to accomplish it. To enact the plan, each step is taken in reverse order, from the current state to the goal. At last the mind was beginning to be seen as made up of operations that are able to construct outcomes that are not so much behavior, but that much more important entity, action.

To start with, the only method in artificial intelligence was to write programs by hand, as sequences of instructions based mainly on “if-then”: if this is the case, then do that, or if the result of a computation is this, then the state of the model-world is that. It was a development of Piaget’s idea that in adulthood a person thinks by means of logical operations. Programs were written to analyze visual scenes, starting with visual images and making inferences about the structure of the three-dimensional world. A good deal of what we know about vision can now be performed computationally and this has been helpful in understanding how the human visual system works.9 Another informative application has been for language in which programs have been written to implement grammatical rules, including those of the kind Chomsky proposed.10

image

Figure 13. Simplified diagram of a system that learns by adjusting the strengths of connections among artificial neurons in layers. Source: Diagram by Keith Oatley.

A different approach has been pioneered by David Rumelhart, Geoffrey Hinton, and colleagues. According to this approach, operations are not based on logic and rules but on artificial neurons, which have activations. The simulated neurons are simpler than the brain’s neurons, but they embody an idea, which physiologists have established, that information is stored in the strengths of the connections among them. In this kind of system, illustrated in figure 13, there is a set of input neurons (equivalent to receptors), layers of intermediate, or hidden, neurons, and output neurons (equivalent to motor neurons) that implement actions.

Imagine that in a network of the kind pictured in figure 13, the pieces of information given to the input neurons (at the bottom of the diagram) are digits (ten of them rather than just the three represented in the figure), and that the network’s task is to classify each as odd or even. Imagine digits being offered to the system, one by one, as activations of one of the ten input neurons. Imagine then that activation of the output neuron at the top right of the diagram is taken to mean that the input digit is “even,” while activation of the output neuron at the top left means it’s “odd.” When each example digit is presented—say 2, 4, 6—the programmers have arranged that a signal is sent backward (a process called back-propagation) from the right-hand output neuron to change the strengths of the connections in the network in the hidden layers. Connection strengths that tend to activate the “even” output neuron are increased, and connection strengths of those that tend to activate the “odd” output neuron are decreased. Many iterations of back-propagation are applied for each input digit, with each one reducing errors in the system’s connections by small amounts. The result is that, when any digit is offered to it, the system will indicate whether it is even or odd. This is called supervised learning, because someone has had to specify whether each output is correct. The approach is like learning by reinforcement, of the kind the behaviorists proposed.

Deep Learning

Far more important, now, is unsupervised learning, in which there are no signals to say whether an output has been right or wrong, no reinforcements. Instead, huge networks can be offered millions of digitized pictures as inputs, and from these inputs the systems learn regularities in the visual world, which become embodied in the strengths of the systems’ connections. The systems work not by logical operations, but by forming distributed mental models of these regularities. This unsupervised learning occurs by a process related to the making of associations, as proposed by David Hartley in the eighteenth century. Generalizations are based on associations between things that are close in time and place. Two hundred years after Hartley, the idea was augmented by a proposal from Donald Hebb, that connections among neurons are strengthened when neurons fire together.

Geoffrey Hinton calls the new unsupervised mode—the making of associations by changing connections in a neural net—“deep learning.”11 For one system of this kind, designed by Hinton and his colleagues, input is given to the input-receptor layer from a digital video camera directed at a real scene. If the system moves the camera, then in the same way as occurs when the human eyes move, the system knows how much change to expect in the input image. From such movements, changes of pattern on the input-receptor layer allow the system to calculate the changes of the three-dimensional coordinates of the parts of the objects that gave rise to the input pattern: perhaps corners of objects, perhaps pieces of bright reflection. In this way the system can build three-dimensional models of objects in the world.

Hinton went to work for Google and, using one of his deep learning systems of visual perception, Google will show you pictures, from the Internet, of something you request verbally. You can try it yourself. Go to Google and type “Houses.” Now select “Images.” You will see a range of images of houses. If you type, “Jewelry,” you will see pictures of jewelry. Houses are not too difficult. Features that are regularly associated with them include straight-line edges, rectangular windows, and so on. But jewelry? Such images might have features that are bright; but light-bulbs are bright. The jewelry might be on people’s necks, or ears, or laid out in rows. The system is able to generalize from millions of examples to show you a wide range of images of jewelry, in different ways in which they might appear.

To see is to pick up information from the two-dimensional patches of activation of receptors of the retinae, in patterns that act as cues and prompt us to construct three-dimensional models of the world. In this kind of way, Hinton’s system is the modern version of unconscious inference as explained by Helmholtz, on which we reflected in chapter 1.

An important accomplishment of deep learning has been in translation of text from one language to another, say English to French. This kind of translation used to be done computationally by making lists of phrases and, by hand-coding, matching each English phrase to its equivalent in French. In the kind of unsupervised learning networks invented by Hinton and his colleagues, sentences have been input from Wikipedia, with a total of half a billion words. The system finds what other words each input word is associated with in sentences, and makes generalizations, that is to say mental models, which are meaningful deep structures. So, when Google Translate is now given a sentence in English, it creates something like a thought, distributed among the network’s connections: something like a meaning of what the writer might have intended.12 To do the translation, the model is run in reverse, and offers an output in French based on this inner meaning.

In figure 14, you can see a map of meaning derived from the kind of system that is used by Google Translate: the closeness of associations of 2,500 English words from the input from Wikipedia, derived from vector models, and represented as spatial closeness on the map. In the figure, the cluster near the top and just to the right of center is of place names. At the top of it is Virginia, close to Missouri, and to Washington. At the bottom of this cluster, with the two words very close in meaning-space to each other, are Vietnam and Iraq.

On Google, now, you can find examples of something in which you are interested by typing in words and phrases. Using a meaning-based system, in the future you will be able to ask Google to find passages based on meaning: as Hinton suggests, you might want to find passages that seem to support efforts to slow down climate change but which really try to undermine the evidence that climate change is occurring.

image

Figure 14. Joseph Turian’s map of meaning derived from input of half a billion words from Wikipedia, and their associations with each other in the sentences that were input. Source: Joseph Turian from Collobert, R. & Weston, J. (2008, July 5–8). A unified architecture for natural language processing: Deep neural networks with multitask learning. Paper presented at the ICML ‘08 Proceedings of the 25th international conference on machine learning, Helsinki. © Joseph Turian, with permission.

In another application, Ethan Fast and colleagues have analyzed more than a billion words from novels, to see what human beings do with objects. Their system, Auger, trains models that predict human activities from their contexts, for instance, eating food with a friend, attending a meeting, taking a selfie. Human evaluations of Augur’s predictions were rated as 94 percent sensible.

Among controversies prompted by artificial intelligence is the question of whether the operations of computers to perceive, to translate languages, and to play games are helpful in understanding how humans do such things. Current conceptions support the idea that perception occurs in the way that Helmholtz proposed, and that the properties of neurons, as discovered in neurophysiological experiments, are important. Other views are that the ways in which computers do things need have no relation to how the human mind does them.

In the world beyond cognitive science, big controversies arise about whether artificial intelligence will take away people’s jobs and cause unemployment. In an interview with the BBC, the physicist Stephen Hawking said:

Humans, who are limited by slow biological evolution, couldn’t compete, and would be superseded … The development of full artificial intelligence could spell the end of the human race.13

When computers pass the Turing Test and become difficult to distinguish from humans in their mental abilities, will they, as Hawking suggests, go further and surpass us as human beings? Hinton’s answer is that this will not be a matter of science and technology. It will depend on political arrangements.

We might decide how to use computationally based abilities of a human-like kind for purposes that Francis Bacon proposed, for the betterment of the human state.14 On the other hand, we might decide that the new artificial beings are too threatening to us and do away with them. To make decisions of this kind we must do research not just on how to make machines that work in mind-like ways, but also on social cognition to understand how beneficial arrangements can be created in society, and be chosen by people as worthwhile. The issue has begun to be aired in the short-story collection by Isaac Asimov, entitled I Robot, and in films like Alec Garland’s Ex Machina. Might new beings, based on silicon rather than carbon, have feelings and rights?

During human development, technology has helped. Gutenberg’s introduction of the technology of printing to Europe led to widespread education, which in turn enabled people to read. Among the books people read were novels, with a surge that started in the nineteenth century. In the twentieth century, Virginia Woolf wrote, “In or about December 1910, human character changed.” In part she was referring to the inwardness of novels, in their exploration of character. In an article in the New York Review of Books, Edward Mendelson wrote that Woolf was “a hundred years premature.” He says that “human character changed on or about December 2010 when everyone, it seemed, started carrying a smart-phone.”15 He went on to say that human life is no longer mainly private, relational, inward, and that more of it has become public, broadcast, outward. In civil life, new methods include surveillance and monitoring. In international life, drones are employed along with other devices that have long-distance effects on others.

Might there be new ways to think about how to take part in our new digital world? Might we, for instance, use some of the new digital connectedness among us to join in with more engagement and consequence, to help make political decisions about how we should live?16

Deep Principles for Psychology

The preceding implications of digital technologies are timely and important, but there are also other fundamental principles for psychology to reflect upon.

Psychological theories employed in artificial intelligence are not just analogies—memory is like a tape-recorder—the models that are produced can actually work in human-like ways. The success of deep learning, based on neurons and their connections, has demonstrated that learning can take place in the kind of way that it does for babies, by encountering many examples, with generalizations being made by associating things that are close in time and place. Ideas such as those of Jean Piaget and Noam Chomsky were that thinking occurs by an inner machinery of logic or of rules. It seems now, however, that rather than inner logical processes and grammars, being the bases of thinking and conversation, logic and rules are not themselves the bases of inner mechanisms. They are summaries of outcomes.

The best current hypothesis is that mental models are first constructed as inscrutable intuitions based on associations in distributed neural networks, and that these intuitions can then be offered into consciousness or externalized to others (and ourselves) as verbal and non-verbal languages of our conversations and interactions.