THINKING AND REASONING
As a graduate student at Johns Hopkins, I coauthored several articles with former Harvard professor Stephen Kosslyn. He was perhaps the leading thinker on how people use mental imagery in their thought processes. For example, if you ask someone, “What shape are a German Shepherd’s ears?” most people will report that they conjure up an image of a German Shepherd from memory, picture the head on the dog, and finally see that the ears are pointy.1
Observations like these led to a debate about whether people have something like pictures in their heads or whether what they have is a set of facts, and the analysis of these facts makes them feel like they see pictures in their heads. This introspective observation spurred a high-profile, spirited debate in the academic community that included cognitive psychologists, philosophers, and computer scientists and spawned numerous journal articles. Steven Pinker, a Harvard professor and popular author on books about language and thought, was on our side (picture in the head), and Geoffrey Hinton, who is now considered the father of deep learning, weighed in on the other side (the analysis of facts). There was never a winner of the debate, although both sides claimed victory. One thing that no one argues about is that human thought processes are complex.
Adults and children alike automatically apply commonsense reasoning to their knowledge of the world to make sense of even the simplest of utterances. For example, suppose you hear these sentences:
The toddler dashed into the street. The child’s father ran after him frantically.
Suppose I then ask this:
Why did the father run after the child?
or
Why was the father frantic?
To answer these questions using only the information provided in the sentences, you would need to reason based on your commonsense knowledge of the world. You know that cars drive on streets. You understand that a child who runs into the street risks getting hit by a car. You know that if a vehicle strikes a child, the child is likely to be seriously injured. You also understand that the greatest fear of parents is that something terrible will happen to their children.
Commonsense reasoning is just one form of human thinking. Some other forms of thinking are planning, imagination, abstract reasoning, and causal reasoning. People plan for a wide variety of tasks every day. We create plans for mundane tasks like cooking breakfast and driving to work. We create plans for less mundane tasks like beating corporate competitors. People can use their imaginations to predict the future state of their environments. For example, we can imagine what will happen if we let the dog out when a cat is in the yard. People use abstract reasoning when they solve problems, put things in perspective, and empathize with their fellow human beings. We use causal reasoning to make sense of cause-and-effect relationships.
Researchers attempt to imbue AI systems with thinking and reasoning capabilities using two different strategies. The first is to create tests that require human-level thinking and reasoning and to build systems that can pass those tests. The second is to build thinking and reasoning capabilities directly into AI systems.
TESTS FOR THINKING AND REASONING
Researchers build thinking and reasoning tests for two reasons: First, they want to use these tests to identify AGI systems that can think and reason like people. Second, by turning deep learning loose on these tests, the hope is that these deep learning systems will magically acquire world knowledge and reasoning skills.
THE TURING TEST
In 1950, Alan Turing proposed a test of human-level intelligence that has come to be known as the Turing test.2 It is illustrated in figure 13.1.
Figure 13.1 The Turing test
The test involves three entities located in three separate rooms: an interrogator, a person, and a computer. They communicate via teletype; the modern version, of course, would use computer chat. The interrogator has a conversation with the person and one with the computer. If the interrogator cannot tell the difference between the person and the computer, Turing reasoned, then the computer must have human-level intelligence.
All of today’s natural language processing systems rely on statistical analysis of word occurrences; none have commonsense knowledge or reasoning capabilities. Therefore, none of these systems should fool a judge who tries to detect a lack of general world knowledge and the ability to reason based on that knowledge.
However, ELIZA did fool many people in the mid-1960s. During the thirty-year run of the Loebner Prize, some of the entrants also managed to fool some of the judges. These successful deceptions were mostly due to clever strategies devised by developers. Each chatbot has a limited repertoire of questions and canned responses to these questions. Developers create ingenious strategies for fooling the judges, such as dodging a question and then getting mad when the judge repeats a question.
NATURAL LANGUAGE INFERENCE
Starting in the early 2000s, AI researchers have tried to define tests that avoid the pitfalls of the Turing test and that could only be passed using commonsense reasoning.
The Recognizing Textual Entailment (RTE) Challenge was a competition run annually from 2005 through 2011, the first four years as a stand-alone event in Europe and the last three years as a track in the Text Analysis Conference in the US. These challenges provided a training table. Each training table row was composed of two texts, and the system’s job was to learn to determine whether the first text entailed the second text. In other words, the test requires the system to determine whether it is possible to infer the second text from the first text. For example, the system would need to determine whether the following first passage entails the second.
Passage 1: Claims by a French newspaper that seven-time Tour de France winner Lance Armstrong had taken EPO were attacked as unsound and unethical by the director of the Canadian laboratory whose tests saw Olympic drug cheat Ben Johnson hit with a lifetime ban.
Passage 2: Lance Armstrong is a Tour de France winner.
For people, this is an easy task because we can reason that a “seven-time Tour de France winner” is also “a Tour de France winner.” It is difficult for computers to perform even this trivial reasoning. Unfortunately, this example does not make a good test of human-level reasoning, because a computer can simply match the words using a rule that says, “If all the words in passage two are also in passage one, then answer ‘yes.’” Unfortunately, the RTE Challenges were susceptible to AI systems that learned to use simple word-oriented strategies, ELIZA-like word patterns, and simple matching of entities. These simple strategies work so well that human-like reasoning is not required to get a correct response.3 Since then, researchers have made many attempts to build tests that truly require reasoning. Each test has either been debunked as being susceptible to simple word-oriented strategies or is a test on which AI systems perform poorly.4
BUILDING THINKING AND REASONING CAPABILITIES INTO AI SYSTEMS
Many researchers have tried to build thinking and reasoning capabilities directly into AI systems. These researchers have tried to build systems that plan, use their imagination, and perform both abstract and commonsense reasoning.
PLANNING
AI researchers discuss planning primarily in the context of reinforcement learning. Reinforcement learning systems that steer cars, play games, and control robotic tasks are said to plan.
There are two key traits that distinguish people (and hypothetical AGI systems) from today’s reinforcement learning systems: First, people can do many different planning tasks, whereas a reinforcement learning system can do only one task. Second, people can plan for tasks they have never encountered previously.
Reinforcement learning systems can only plan for a single specific task that is defined by an environment specification composed of states, actions, a reward function, and other factors. The system learns a policy function that performs that specific task. Unlike people, the system cannot use what it learned about doing the task to help it perform similar tasks in the future.
As Roger Schank pointed out,5 people can create an ad hoc plan for what to do when the teacher says you are getting a C grade, for when a police officer pulls you over for speeding, and for many other unexpected situations people encounter daily. Reinforcement learning systems cannot do any of these.
IMAGINATION
If someone is at a furniture store deciding which couch to buy, they are probably attempting to visualize what each sofa would look like in their home. People use their knowledge of the world to make predictions and plan their interactions with their environment. This imagination capability is essential in many situations, including driving a car, walking down a sidewalk, and playing basketball. You can think of imagination as using our models of the world to create hypothetical possibilities.
AI researchers have attempted to build imagination-based prediction into AI systems. In one instance, a team of Carnegie Mellon researchers6 developed a method of making visual predictions from images, such as an image of a car approaching an intersection. They started with a set of 183 YouTube car chase videos. The first step was to identify the objects in the images using a method for the identification of objects and object parts developed by another group of Carnegie researchers.7 From the videos, the system then learns to predict the transitions of the objects from one frame to another. It uses the transition probabilities to determine the constraints under which the different objects interact with one another, and this enables the system to “imagine” possible future states of each object.
Imagination systems like this one offer significant engineering benefits in areas such as self-driving cars and robotics. However, the “imagination” of these systems is nothing like human imagination for several reasons: First, these systems apply only to a single task in a single environment. In contrast, people use their imaginations for a wide variety of tasks, many of which they have not previously learned but that they can figure out how to perform. Second, the systems only learn limited information about a handful of objects. People can imagine trajectories for a vast number of objects. Finally, people have a great deal of world knowledge that they apply to their imaginations and predictions. Parents imagine what their children will be like when they grow up. This imagining is not merely identifying what kind of occupation their children will have, like doctor or lawyer. Instead, this type of imagination involves predictions about morality, spirituality, likes and dislikes, and much more. Human imagination drives scientific thought experiments, even in a dream state. German chemist Friedrich Kekulé dreamed of a snake eating its tail, woke up, and used that image as the basis for a hypothesis about the structure of the molecule benzene that turned out to be correct. Human imagination is nothing like what is labeled “imagination” in AI systems.
ABSTRACT REASONING
People learn to apply abstract reasoning at a young age. For example, children learn that the number of legs and whether a tail is present for different categories of animals varies quite a lot, whereas, within a category such as dogs, the number of legs is usually the same. However, the colors of the animals vary quite a lot both between and within categories.8 But most humans have no trouble identifying a dog.
A group of Google DeepMind researchers9 demonstrated what appears to be abstract reasoning in neural networks. They developed a system that could perform abstract reasoning tasks such as those found in the Raven Progressive Matrices visual IQ task given to people; for example, to determine whether a machine can learn a concept like a monotonic increase. The training table included a series of lines of increasing color intensity. They designed the network to encourage relation-level comparisons. They found that the network was able to apply the concept to objects it had not encountered during training, such as a set of lines in increasing color intensity but with a different color than the lines in the training set. However, it failed in more complex abstract reasoning tests. For example, it was unable to apply the concept to a new attribute such as an increase in size rather than color.
While the results are impressive, this is still just a narrow AI system that can only reason in a specific task context. It is not an AGI system that can reason abstractly like a human.
COMMONSENSE KNOWLEDGE AND REASONING
Another area that has seen considerable research for AI systems is acquiring commonsense knowledge. Renowned developmental psychologist Jean Piaget showed that babies learn the concept of object permanence at around eight or nine months of age. If you show a nine-month-old baby a toy and then hide it under a blanket, the baby will search for it. This search indicates that the baby has developed the concept of an object in the real world that exists even though the baby cannot see the object.10
Children learn commonsense knowledge about object properties such as gravity, friction, elasticity, inertia, support, containment, and magnetism. They also learn to apply commonsense reasoning to this knowledge. For example, children learn that if they drop a glass of milk, the glass will break, and the milk will spill. And they can predict that if they take a paper towel and rub it against the floor, the floor will become dry again while the towel gets wet. Researchers collectively term this knowledge intuitive physics.11 They have studied both children’s learning of intuitive physics and the development of AI systems for learning and using intuitive physics.
A team of Facebook researchers12 used a 3D game engine to create small towers of wooden blocks with a level of stability that would cause them either to fall over or to stay upright. They then trained a network to predict whether the blocks would fall and to predict the trajectories and end locations of the blocks. They found that the system was able to not only perform these predictions for the block towers in the training set but could also apply that learning to towers with additional blocks, and the system performed at a level comparable to humans. The researchers argued that they could use this general approach to develop systems that learn commonsense physical intuitions about the world.
Even if we eventually succeed in creating such an intuitive physics engine, it still does not represent a significant step toward the type of commonsense reasoning needed for AGI. Why? The physics engine just becomes a module that the AGI can call upon to take physical descriptions as input and predict trajectories and other physical events as output. However, just as a human must decide when to use a calculator, there still needs to be a reasoning capability that decides when to use the physics engine or any other module.
Researchers have attempted to build thinking and reasoning capabilities into AI systems in two ways: First, they have attempted to define tasks that can only be done by true thinking and reasoning as opposed to simple strategies like word matching. Unfortunately, each time a system starts to perform well, other researchers discover that the results can be attributed to simple strategies. Second, they have attempted to build cognitive functions directly into AI systems. However, they have only succeeded in building systems that perform very narrowly defined tasks.
Although these researchers may label the resulting system as having planning, imagination, abstract reasoning, and commonsense reasoning capabilities, the way people plan, imagine, and reason is far broader and more generic than the way these narrow AI systems perform these cognitive functions.