In our discussion of the status of beliefs in chapter 3, we alluded to a class of systems where knowledge was not just a convenient way of describing behavior (the intentional stance, that is), but played a more causal role, somewhat like the role of gas in the running of a car.
To understand how knowledge can play such a role, we need to first discuss symbols and symbol processing. And a good place to get started is high-school mathematics. (Readers who never ever want to see anything like high-school mathematics again should probably jump ahead to the section ”Why symbols?” later in this chapter.)
Consider the following simple word problem:
Two years ago, Johnny was seven years old.
How old is he now?
Easy as pie. No pencil and paper needed. Johnny is obviously nine years old. In high school we are taught how to solve problems like these in a systematic fashion. This has the advantage of allowing us to deal with other more intricate examples whose answers will not be so obvious, like this one:
Tommy is six years older than Suzy.
Two years ago, he was three times her age.
What is Suzy’s age now?
This is what is called algebra, of course. Returning to the first problem, we want Johnny’s age, and the trick was to figure out how to express the clue as an equation involving Johnny’s age, which for conciseness, we call x:
x – 2 = 7 or Johnny’s age (in years) less 2 is 7
Of course, the name x we chose here is not important. But we suppress unimportant details to make the clue easier to write and manipulate. Once we have done this, we want to process the equation in some way so as to end up with one of the form
where V is some number. What we learn in high school, without too much razzmatazz, is how to do this:
This gives the age as being nine, as desired.
For the second problem, the procedure is similar, using x now to stand for Tommy’s age and y to stand for Suzy’s age, which is what we are looking for. We start with two equations given by the two clues:
x = y + 6 or Tommy’s age is Suzy’s age plus 6
x –2 = 3(y – 2) or Tommy’s age less 2 is 3 times Suzy’s age less 2
The procedure this time might go something like this:
x – 2 – x ⇒ x – x – 2 ⇒ (x – x) – 2 ⇒ 0 – 2 ⇒ –2
3(y – 2) – (y + 6) ⇒ 3y – 6 – y – 6 ⇒ (3y – y) – 6 – 6 ⇒ 2y – 12
At this point, we have reduced the two equations above to a single equation with one variable, which we can now work on as before.
Presto! With a modicum of algebra, we are able to determine on the basis of the given clues that Suzy must be five years old. With a little more work, we could determine that Tommy is eleven and, sure enough, that two years ago, Tommy (who was then nine) was three times as old as Suzy (who was then three).
Similar procedures can be used to solve for three unknowns given three clues, and indeed for sixty-seven unknowns given sixty-seven suitable clues. Furthermore, it turns out that a very small number of rules concerning equality, addition, and multiplication are sufficient to account for all the steps needed to solve systems of equations like these with any number of unknowns.
Overall, algebra is a form of symbol processing. We start with a string of characters, such as “x–2=3(y–2); x=y+6,” manipulate it according to some well-established rules, and end up with another string of characters, “x=11; y=5,” that suits our purpose.
But not all symbol processing is numeric. Consider the following problem:
At least one of Alice, Bob, or Carol is guilty of a crime.
If Alice is guilty then so is Bob.
If Alice is not guilty, then neither is Carol.
Is Bob guilty?
We can solve this problem too, not using algebra, but what is called symbolic logic. Instead of equations, we represent the clues using logical formulas. Let A, B, C stand for Alice being guilty, Bob being guilty, and Carol being guilty, respectively. Then the clues, numbered for reference, are as follows:
This is a lot like arithmetic, except using symbols like ˅ and ¬ instead of symbols like + and =. Roughly speaking, the ˅ means “or,” the ¬ means “not,” and the ⊃ means “if… then …” What is taught in a course on symbolic logic (but normally not in high school) is how to manipulate the three clues to end up with new conclusions.
Here is one version of such a procedure:
On the final line, the procedure ends up concluding B, that is, that Bob must be guilty according to the given clues. (It is interesting to observe that symbolic logic allows us to draw this conclusion about Bob even though the puzzle does not provide enough information to determine the guilt status of Alice or Carol.)
As was the case with algebra, there are a small number of rules that are sufficient to obtain all the desired conclusions from a collection of clues like these. (For more details on the two rules used in the example above, see figure 8.1.) This is again symbol processing: we start with a given string of characters, such as “(A ⊃ B); A; (B ⊃ C),” manipulate it according to some rules, and end up with another string of characters, “C,” that we may be happier with.
How do people do this kind of symbol processing, either the algebra or the logic? We are certainly not born with the skill. The answer is that we are taught a procedure to follow that will do the job: the patterns to look for, the steps to take.
In the case of equations, we are taught basic arithmetic at a very early stage (starting around Grade 1), and by the time we get to high school, we have learned to evaluate any expression consisting of numbers, additions, and multiplications, including negative numbers (or subtractions) and fractions (or divisions). But this is still just arithmetic, not algebra. In Grade 9 or so, we make a conceptual leap and learn to simplify symbolic expressions that include variables using, for example, the fact that (E – E) simplifies to 0, or that (E + 0) simplifies to E, even when the expression E contains variables. This algebraic simplification includes all of arithmetic as a special case. In Grade 10 or so, we are ready to be taught the rest of what we need to handle equations: we can start with an equation, with symbolic expressions on each side of an “=” symbol, and we can get a new equation by simplifying either expression and, more powerfully, by adding or multiplying equal terms to each side.
Maybe one of the most interesting things about symbol processing, like the symbolic algebra and logic above, is that it does not really take any ingenuity to carry it out. It certainly does take ingenuity to go from word problems to symbolic expressions, either equations or logical formulas. But from there, the procedures can be carried out purely mechanically. All that is needed is to pay attention to the details without making mistakes. In fact, a machine can do it. It is not hard to write small computer programs that takes one string of characters as input, and produce the desired string of characters as output.
This is a major insight, and one that has implications in education. A mistake that is sometimes made in mathematical education is a failure to distinguish between the parts of mathematics that are purely mechanical from the parts that require ingenuity. Students need to get a much better sense of when they have reduced a problem to a point where all they need to do is follow a procedure systematically. This idea of following a procedure could be taught at a very early age for a wide variety of procedures, not necessarily algebraic ones. In fact, the procedures need not even be meaningful. What needs to be taught is more like a certain kind of mental discipline: get the details right, keep track of where you are, watch for mistakes. This skill carries over to many areas, not just mathematical ones. But of course, this part of mathematics should not be confused with the parts where ingenuity is required. Practice is essential there too. But this time, the lesson can stop as soon as the problem is reduced to a purely mechanical one.
That so much of mathematics is indeed mechanical is a fact that intrigued the philosopher Gottfried Leibniz (as we will see in the next chapter) as well as Alan Turing (discussed below).
Our use of what we have been calling a “symbol” in the previous two sections is quite different from what is normally understood by the term.
Here are some examples of symbols that are much more common: a wedding ring is a symbol for a marriage; a red traffic light is a symbol for the stop command; a silhouette of a person wearing a dress is a symbol for a woman’s washroom; the three characters XIV are a symbol (in Roman numerals) for the number fourteen; the word chien is a symbol (in French) for the concept of a dog.
So we normally take symbols to be objects in one domain that, by virtue of similarity or convention, stand for something, objects in another domain. The first domain is usually concrete and easily seen or heard, while the second one may be less concrete, less accessible, and sometimes purely abstract. In the examples mentioned, the symbols are used to communicate something: we place the silhouette of a person on a washroom door, we say the word chien, we write down the numeral XIV. In each case, we use the symbol to get information across to someone else.
The symbols considered in the algebra or logic problems above, however, are quite different. When we write down something like “x = y + 6” in algebra, it is not because we are trying to tell somebody something. We might be working on the equations quietly by ourselves.
If not for communication, why do we write the symbols down? As we already said, the written aspect is secondary. We can sometimes do the algebra in our heads, and we only need to write things down when there is too much to keep track of.
Another thing to observe is that we can often omit many of these symbols. Position in an arrangement of symbols can sometimes do the job for us. We use position when we write down ordinary numerals. For example, we write a number like “237” using just three digits instead of writing the symbols “2 × 102 + 3 × 101 + 7 × 100.” Position in the sequence of digits is used to represent the powers of ten. For binary numbers, there are only two symbols, the binary digits 0 and 1, and position does all the rest.
But why do we use the symbols at all, then?
The answer was first made clear by Alan Turing in the 1930s: writing down a number as a string of symbols allows us to talk about a mapping from numbers to numbers as operations that transform one string of symbols into another string of symbols. And these operations over strings of symbols are what we now call digital computation. We write down numbers as an arrangement of symbols because we want to perform a computation.
Computation is one of those concepts that is easy to recognize but hard to define in the abstract. It is tempting to say that computation is what computers do, but people perform computations too (in fact, in Turing’s writings, when he talked about a “computer” he meant a person, not a machine), and computers do things that are not computations (like delivering email and printing files).
When we see a person doing a subtraction, say, crossing out a digit in the thousands column, decrementing it by one, and adding ten to the digit in the hundreds column, we recognize it as a computation. When we see a car bumping into a table, and the table then lurching ahead, we recognize it as something that is not a computation.
Roughly speaking, when a process is computational, it should involve a sequence of small steps, each restricted in scale and extent. The subtraction of two numbers proceeds column by column; the more columns there are, the longer we expect the process to take. The transfer of momentum when a car bumps into a table we imagine as happening to the entire body at once. We do not picture the far end of the table reacting to the change in momentum at a time that depends on the size of the table.
(But this is only roughly right. We can imagine arithmetic where many column operations are done in parallel, as they are in modern computers. We also have computational simulations of physical processes, where the change of momentum when a car bumps into a table is broken down into small components of the two physical bodies and then reassembled piece by piece.)
But instead of trying to define the concept of computation in the abstract, what has proven to be much more productive is to take a design stance, that is, to lay out a specific model of what computation should be thought to be, analyze its properties, and then decide over time whether the definition is good enough.
This is what Turing set out to do. He wanted to show that, contrary to what had been suggested by some mathematicians, there were numbers that could be defined precisely but could not be calculated purely mechanically. To prove this, he needed to be clear about what it meant to calculate a number in a mechanical way. This is what he did when he first presented what we now call Turing machines.
What Turing’s account revealed is that among all the mathematical functions from numbers to numbers, only some of them could be described as the result of small localized operations over symbolic representations of the numbers.
Here is the idea: We imagine a number (a positive integer, for simplicity) written out as a string of binary digits (or bits) on a tape. This is considered to be the input to the machine. The Turing machine then gets to move a tape head back and forth as often as it likes, and at any point, it can read the bit on the tape under the tape head, and perhaps overwrite it with another. All the movement of the tape head and the reading and writing of bits is determined by a table of instructions, fixed in advance. If the Turing machine ever announces that it has finished, the final string of bits on the tape, interpreted again as a number written in binary, is considered to be the output of the machine. A function F from numbers to numbers is said to be Turing computable if there is a Turing machine that can take any number x in binary as input and produce F(x) in binary as its output.
One thing that was clear from the outset is that there is nothing special about binary numbers in the definition of a Turing machine. We need to imagine the machine as working on symbols drawn from a finite alphabet, but really we are talking about a mapping from one string of symbols (the input) to another string of symbols (the output), with binary numbers no more than a useful special case.
This is extremely significant, since we want to consider computations that have nothing to do with numbers. The Turing machine itself has no way of guessing what the symbols are intended to stand for, or indeed if they stand for anything. (Many people would prefer if we used the term “symbol” only when we have in mind something being represented. Symbols should always mean something. Perhaps we should be using a more neutral term like “character.” Be that as it may, the more liberal use of “symbol” is widespread enough that we will stick with it here.)
Having an alphabet with just two symbols is not necessary either, although it does work well in mechanical terms, since we can use physical features like a light being on or off, a switch being open or closed, a voltage being high or low. There is also no reason to restrict ourselves to linear arrangements of symbols. We write numerals as linear sequences of digits, but a system of equations might use a two-dimensional array of symbols, and there are many other useful two-dimensional symbolic structures.
Digital images, for example, are two dimensional arrays of symbols called pixels (picture elements), where each pixel is intended to stand for one small part of an image in a two-dimensional grid. In the simplest case, each pixel is a single bit, indicating light or dark at that point in the image, like the image of the number seven seen on the left in figure 8.2 (with zero displayed as blank, and one displayed as @). The only difference between this coarse image and a photorealistic picture of a visual scene is the scale: high-resolution pictures use many more pixels (like the other two images of seven seen in figure 8.2), and many more bits per pixel to encode luminosity and color. (A high-resolution picture may involve many millions of bits.)
Digital videos are similar and can be thought of as three-dimensional arrays of pixels, where the third dimension is for time. A pixel in this case stands for a small part of a two-dimensional grid at one specific interval of time (itself divided into a one-dimensional grid).
All these symbolic representations can ultimately be encoded as linear sequences of bits and therefore processed by Turing machines. For example, a two-dimensional array of numbers can be laid out on a tape row by row, with each number itself laid out as a sequence of bits. It is not difficult to get a Turing machine to work on a sequence like this as a two-dimensional array of numbers.
One question that inevitably comes up is this: why do theoreticians still talk about Turing machines? Has modern computer technology not gone far beyond that quaint vision of a machine with a movable head? The answer is simple: every digital computer built to date can be shown to be a special case of a Turing machine. Whatever it can do, a Turing machine can do as well. Or to be more precise, any function over strings of symbols computable by a modern digital computer is also Turing computable. Most scientists believe that this will be true of any computer we ever build. (There are other calculating devices such as “analogue computers” and “quantum computers” that are less obviously cases of Turing machines. In the case of analogue computers, they have limitations in terms of accuracy that greatly reduce their generality. As for quantum computers, they do appear to be special cases of Turing machines, but with properties that make them interestingly faster for certain tasks.)
So in the end, symbols, as we are using them here, are not for communication at all. Their job is to serve as the locus of computation. There can be no digital computation without symbols. The symbols may or may not stand for numbers; they may or may not stand for anything. But if they do, their meaning depends on their position or arrangement with other such symbols.