End Notes

Chapter 1: What Kind of AI?

Although modern AI is only sixty years old, its history is full of twists and turns. The Pamela McCorduck book [80] is a good early history, but the one by Nils Nilsson [89] covers more territory, and can also be found online.

The field of AI is so heterogeneous that it might seem out of the question to have a single comprehensive textbook. And yet the book by Stuart Russell and Peter Norvig [105] is precisely that. It introduces all the major technical directions, including history and philosophical background. Its bibliography alone—thirty pages in a tiny font—is worth the price of admission. But the book is quite demanding technically, and so is most useful to upper-level university undergraduates. My own textbook [72] presents a much gentler introduction, and is perhaps better suited for a nontechnical audience—at least for the small part of AI that it covers.

For information on recent trends in AI technology, there is, as far as I know, no good book on the subject, and things are happening fast. A reader’s best bet is to look for articles in recent newspapers and magazines, a number of which can be found online, including information about Toyota’s investment, the OpenAI project, and other similar initiatives. For an introduction to the technological prospects of what I am calling AML, I suggest searching online for “unsupervised learning” or “deep learning” or even “AI technology.” (Another option is to search online for major players in the area, such as my colleague Geoffrey Hinton.) To learn about machine learning more broadly, an excellent (though advanced) textbook is [86]. Work on the recognition of cats in images can be found in [66].

The term GOFAI is due to the philosopher John Haugeland [54]. The classic 1958 paper by John McCarthy [79] can be found online. But it also appears in many collections, including [9], [75], [81], and [121], books that include a number of other influential papers.

The Turing Test was first presented in [119] and has been a subject of debate ever since. It is discussed in every AI textbook I know of, and in books such as [108]. The Chinese Room argument can be found in [106], including replies from a number of commentators. My own Summation Room counterargument appears in [70].

Chapter 2: The Big Puzzle

The human mind is such a fascinating subject that it is not surprising to find books on it from a very wide range of perspectives. Three nontechnical books that I have found especially thought-provoking are by Steven Pinker [94], Daniel Kahneman [63], and Daniel Dennett [31]. Other general books that are perhaps more idiosyncratic are by Jerry Fodor [42] (arguing mostly against Pinker), Marvin Minsky [83] (presenting a unique take on the structure of the mind), and a wonderfully engaging book by the reporter Malcolm Gladwell [49] (covering some of the same ground as Kahneman).

Brain research is also extremely active in the popular press, of course. What we can expect to learn from brain imaging (like fMRI) is discussed in [12]. The plasticity of the brain is the subject of [33]. The distributed neural representation (part of my argument about why it may prove so difficult to reverse-engineer a neuron) is described in [58] and [114]. The counterargument that at least some neurons may be representing one single thing (a location, in the case of so-called “place neurons”) is discussed in [91].

As to other accounts of human behavior, its basis in evolution is presented forcefully in [101], and the genetic aspects of evolution are discussed in [25]. The foundation of human behavior in language and symbols is argued in [26], and human language more generally is discussed in [93]. The topic of the evolution of language itself is presented in [64].

Finally, Daniel Dennett’s first presentation of his design stance idea can be found in [28].

Chapter 3: Knowledge and Behavior

The topic of knowledge has been a mainstay of philosophical study since the time of the ancient Greeks. Two collections of papers in this area are [50] and [104]. A more mathematical analysis of knowledge can be found in [57], work that has found further application in computer science [41].

Regarding the relation between knowledge and belief, the classical view (from Plato) is that knowledge is simply belief that is true and justified (that is, held for the right reasons). This view is disputed in a famous paper by Edmund Gettier [45]. As to the propositional attitudes more generally, the main focus in the philosophical literature is on how sentences that mention these attitudes do not appear to follow the usual rules of logic. See, for example, [98].

Philosophers and psychologists sometimes also distinguish implicit from explicit belief [32]. In the example given in the section “Intelligent behavior” in this chapter, the moment Henry discovers that his keys are not in his pocket, although he does not yet explicitly believe they are on the fridge, he implicitly believes it, in the sense that the world that he imagines is one where the keys are there. To put it another way, it is a consequence of what he believes that his keys are on the fridge, even if he does not yet realize it. Mathematical treatments of the two notions can be found in [68], [27], and [40].

The Frederic Bartlett quote on thinking is from [7]. An earlier influential piece by the same author is [6]. Zenon Pylyshyn discusses cognitive penetrability and much else in a wonderful book [97]. Daniel Dennett presents the intentional stance in [29]. (A shorter version with commentary can be found in [30].) The quote by Nicholas Humphrey is from [59]. Finally, Noam Chomsky’s competence/performance distinction appeared in [14].

Chapter 4: Making It and Faking It

One of the themes of this chapter is that being intelligent is not really the same thing as being able to fool someone into believing it (for example, in the Imitation Game). But from a purely evolutionary standpoint, the distinction is not so clear. A good case can be made that human intelligence, and language in particular, evolved for the purpose of winning mates in some sort of mental arms race, where pretense and deception get to play a very central role. The primary function of language would be for things like hearsay, gossip, bragging, posturing, and the settling of scores. The rest of what we do with language (on our better days, presumably) might be no more than a pleasant offshoot. See, for example, [35] and [101] for more along these lines.

The ELIZA program by Joseph Weizenbaum is described in [122]. The Oliver Miller interview is drawn from an online post at http://thoughtcatalog.com/?s=eliza. Weizenbaum became disheartened by how his work was interpreted by some as a possible jumping point for serious psychological use (for example, in work like [19]), which he discusses in his book [123]. The Loebner competition is described in [16] by Brian Christian, who ended up playing the role of the human in one competition. The EUGENE GOOSTMAN program is described in a number of online news articles, for example, New Scientist (25 June 2012) and Wired (9 June 2014). The Scott Aaronson interview is drawn from a blog at http://www.scottaaronson.com/blog/?p=1858.

My crocodile and baseball examples were first presented in [69] and used again in [73]. The closed-world assumption is explained in [20] and [100].

Winograd schemas were presented by me in [71] and in more detail in [74]. The first example schema (which is due to Terry Winograd) appeared in [125]. A collection of more than one hundred schema questions from various sources can be found online at https://www.cs.nyu.edu/davise/papers/WS.html. An actual competition based on Winograd Schemas was announced by Nuance Communications in July 2014, and is scheduled to take place in July 2016. (For more information including rules, see http://commonsensereasoning.org/winograd.html.) An alternative test along similar lines is the textual entailment challenge [22].

Chapter 5: Learning with and without Experience

The fact that much of what we come to know is learned through direct experience has given some philosophers pause regarding whether an intelligence without some sort of body is even possible. For example, how could such an intelligence truly understand a word like “hungry” if it cannot connect the word to actual sensations? This is an instance of what is sometimes called the symbol grounding problem [53]. How can we understand what a word means if all we have are words that refer to other words that refer to other words? (This “juggling words” issue will come up again in the context of Helen Keller in the next chapter.) Of course the answer suggested by Alan Turing is that we should not even try to answer questions like these. The term “understand” is just too vague and contentious. We should instead ask if it is possible to get an AI program to behave the way people do regarding the words in question. Just what sort of inapt behavior do we expect the symbol grounding problem to lead to?

How we manage to learn a language at all is still quite mysterious. Specifically, how do we explain how children are able to master the complexities of grammar given the relatively limited data available to them? The issue was called the poverty of the stimulus by Noam Chomsky [15]. His (controversial) proposal is that children are born with something called universal grammar allowing them to quickly pick up the actual grammar of the language they first hear.

Regarding the learning of behavior, it is interesting that critics of GOFAI like Alan Mackworth [77] and Rodney Brooks [11] focus on how animals (including people) are able to act in the world with real sensors and real effectors without the benefit of language or symbols. The animals appear to acquire some sort of procedural knowledge (knowledge formed by doing), as opposed to the more declarative knowledge (knowledge that can be expressed in declarative sentences) that GOFAI focuses on. But it is important to keep the Big Puzzle issue in mind on this, and the fact that “action in the world” is a very broad category. It certainly includes things like riding a bicycle and controlling a yo-yo, but also things like caring for a pet canary and collecting rare coins. Clearly both types of knowledge are necessary. See [126] on this issue, and [115] for a neuroscience perspective.

The quote from S. I. Hayakawa on the power of reading is from [55]. (Hayakawa also contrasts learning through experience and learning through language.) The quote from Isaac Newton is from a letter dated 1676. (A collection of his letters is [120].)

Chapter 6: Book Smarts and Street Smarts

This chapter is about the importance of what we learn and pass on through language texts—and this in spite of the fact that we often dismiss it as mere “book knowledge,” some sort of minor quirk in our makeup. Plainly other animals do not write books like we do, and this places a strong limit on the kinds of technology they can develop. On animals and technology, see [109]. For interesting reading on ant supercolonies (regarding aggression and nonaggression), see [46].

For more on young children using language to deal with language issues and problems, see [111]. This is an ability that Don Perlis argues is the very essence of conversational competence [92].

Helen Keller’s life story can be found in [78]. Parts I and II are by Keller herself. Part III is taken from letters and reports by Anne Sullivan (from which the letter in this chapter was quoted). William Rapaport’s analysis of the relevance of Keller’s story to AI work is in [99]. I believe there is still a lot to learn from her about the human mind and the human spirit.

Chapter 7: The Long Tail and the Limits to Training

One of the themes of this chapter is how common sense allows people to deal with situations that are completely unlike those they have experienced before. But Nassim Taleb argues that people, and investors in particular, are actually very bad at dealing with these “black swans” [117]. There is no contradiction here. Common sense tries to deal with new situations as they arise, but investors have the thornier task of somehow appraising all the possible situations that might arise. In deciding to invest in a stock, an investor has to try to put numbers on all the possible things that could cause the stock to go down. As Taleb argues, people are not very good at weighing in the black swans. For example, in the British National Corpus mentioned in the text, it might seem completely safe to bet against seeing words that have less than a one-in-ten-million chance of appearing, but in fact, there are so many of them that this would be very risky. Information on the British National Corpus can be found online at http://www.natcorp.ox.ac.uk/. The observation concerning the large presence of rare words in this corpus is due to Ernie Davis, and further details can be found in his textbook [24], p. 274.

Skill and expertise was the focus of considerable AI research in the 1970s. The principal idea was to try to build so-called expert systems that would duplicate what experts would do in specialized areas using a collection of if-then rules obtained from them in interviews. See [61] and [56], for example. Chess experts are considered in [110] and [103]. The experiment involving a chess expert playing a game of chess while adding numbers was performed by Hubert Dreyfus and Stuart Dreyfus [37]. Here is what they say:

We recently performed an experiment in which an international master, Julio Kaplan, was required rapidly to add numbers presented to him audibly at the rate of about one number per second, while at the same time playing five-second-a-move chess against a slightly weaker, but master level, player. Even with his analytical mind completely occupied by adding numbers, Kaplan more than held his own against the master in a series of games.

Their critique of AI and the expert system approach, as well as other general philosophical observations about experts and novices can be found in [36].

Chapter 8: Symbols and Symbol Processing

This chapter, nominally about symbols and symbol processing, is really an introduction to computer science, an area of study that started with Alan Turing’s work on Turing machines [118]. There are researchers who argue for a nonsymbolic form of AI (in [113], for example), but what they are really talking about is symbol processing where the symbols happen to stand for numbers (as in the symbolic algebra example), rather than non-numeric concepts (as in the symbolic logic example).

Those two examples raise interesting questions that are at the heart of computer science. A computational problem can have different ways of solving it with quite different properties, and computer scientists spend much of their time looking at algorithms, that is, different ways of solving computational problems [52].

In the case of symbolic algebra, the standard algorithm for solving a system of equations is called Gaussian elimination (see [76], for example). It has the property that for n equations with n variables, a solution can be calculated in about n³ steps. This means that solving systems of equations with thousands and even millions of variables is quite feasible.

But in the case of symbolic logic, perhaps the best algorithm found so far is one called DPLL (see [47], for example). In this case, it is known that there are logic problems with n variables where a solution will require about 2ⁿ steps [51]. (The proof of this involves a variant of DPLL based on the resolution rule [102], mentioned in the text.) This means that solving logic problems with as few as one hundred variables may end up being impractical for even the fastest of computers.

This raises two issues. First, we might wonder if there exists an algorithm that does significantly better than DPLL. It turns out that a mathematically precise version of this question is equivalent to the famous P = NP question, first posed by Stephen Cook in the 1970s [21]. Despite the best efforts of thousands of computer scientists and mathematicians since then, nobody knows the answer to the question. Because of its connection to many other computational problems, it is considered to be the most important open problem in computer science [43].

The second issue concerns the long tail phenomenon from the previous chapter. The way DPLL works is that it performs a systematic search through all the logical possibilities. Interestingly enough, on randomly constructed test cases, the number of steps DPLL takes to do this is almost invariably small. In fact, it appears that the required number of steps on test cases behaves quite like the long-tailed numeric example seen in the section ”A numeric example” in this chapter.. It is virtually impossible to estimate the number of steps that might be required by DPLL in practice, because as more and more sample test cases are considered, the higher the average value appears to be. For more on this, see [48].

For additional material on the idea of teaching children about following procedures systematically, see [124].

Chapter 9: Knowledge-Based Systems

A biography of Gottfried Leibniz, one of the most fascinating thinkers of all time, is [1]. But much of his thinking is scattered in the over ten thousand letters he wrote, often while he was on the road. A better introduction to his thought can be found in the articles on him in The Encyclopedia of Philosophy [38].

Charles Darwin talks about the evolution of the eye in [23]. Here is what he says:

To suppose that the eye with all its inimitable contrivances for adjusting the focus for different distances, for admitting different amounts of light, and for the correction of spherical and chromatic aberration, could have been formed by natural selection seems, I freely confess, absurd in the highest degree.

The evolution of the mind itself is discussed in [34] (with commentary) and in [26].

The knowledge representation hypothesis is quoted from Brian Smith’s PhD thesis [112]. Credit for the idea is usually given to John McCarthy, but other AI researchers were clearly on the same track. For Allen Newell and Herb Simon, the emphasis was more on the symbolic aspects, and their version is called the physical symbol system hypothesis, which they put this way: “A physical symbol system has the necessary and sufficient means for general intelligent action” [88]. (Marvin Minsky was one of the early AI researchers who saw strong limitations to both the logical and numerical approaches to AI, and argued for a rich amalgamation of approaches [84].)

There are textbooks dedicated to the knowledge representation and reasoning subarea of AI, such as [10] and [3]. There are also conferences in the area held biennially (see http://kr.org). Early readings on the subject can be found in [9]. Logic as a unifying theme for AI in general is presented in [44] and [96], and is further discussed in [85] and [60]. Marvin Minsky’s quote on logic is taken from [82], p. 262. For a look at reasoning from the psychology side, see [62]. Regarding the advantages of a probabilistic approach to reasoning with degrees of belief, see [90], for example.

On the issue of actually building a large-scale knowledge base, one long-term project along these lines is CYC [67]. (It is difficult to say precisely what has and has not been achieved with CYC because there has only been limited outside access to the work, and nothing like a controlled scientific study.) Other related but more specialized efforts are HALO [4] and AURA [13] from SRI International, as well as ARISTO [17] from the Allen Institute for AI. For a review of the prospects for automatically extracting knowledge from text on the web, see [39].

Finally, one recent attempt at reconciling the logical approach of GOFAI and the more statistical ones seen in AML can be found in the symposium described online at https://sites.google.com/site/krr2015/.

Chapter 10: AI Technology

This chapter touches on only some of the thorny issues regarding the future of AI. A much more comprehensive book on the subject is [5]. The idea of a technological singularity is due to Raymond Kurzweil [65] and further discussed in [107]. The potential dangers of AI are mentioned in interviews with Stephen Hawking (on 2 December 2 2014, online at http://www.bbc.com/news/), Elon Musk (on 8 October 8 2014, online at http://www.vanityfair.com/news/), and Bill Gates (on 28 January 28 2015, online at https://www.reddit.com/r/IAmA/comments/).

The three laws of science-fiction writer Arthur C. Clarke (who was coauthor with Stanley Kubrick of the screenplay for 2001) were presented in [18] and are as follows:

When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
Any sufficiently advanced technology is indistinguishable from magic.

The three laws of robotics by science-fiction writer Isaac Asimov were first presented in a story in [2]. (They are said to be quoted from a robotics handbook to be published in 2058.) They are the following:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Many of Asimov’s subsequent stories were concerned with what could go wrong with a robot obeying these laws.

Marvin Minsky discusses his involvement with the movie 2001 in an interview in [116]. John McCarthy is quoted by Raj Reddy in a 2000 lecture that is transcribed online at http://www.rr.cs.cmu.edu/InfiniteMB.doc. The Margaret Thatcher quote is from an interview reported in Woman’s Own magazine (23 September 1987). Alan Turing’s early involvement with chess is described in [8]. The story of DEEP BLUE is presented in [87]. For more on the Katharine Hepburn movie quote, and the idea of humans rising above what evolution has given them, see [95].