`7 Limits: Perfect Programs`

The previous chapter described various ways in which programs are imperfect and thus fall short of what we might want them to do. In contrast, in this chapter we take up the ways in which programs can be flawless but nevertheless unable to do what we want them to do. It may be surprising, but even perfect programs still have limits.

`Environment`

The first problem is that the program may well be correct at a particular instant of time, but we also have to consider the environment of the program—everything in the universe that isn’t the program but with which it interacts. The evolving environment of a program leads to additional challenges beyond the ones we’ve already considered.

Most changes in the environment are irrelevant. For example, consider a program that I might write to print “Hello world” when I choose to activate it. Such a program’s operation or utility would be unaffected by a change in the price of coffee, the love affairs of movie stars, or the phase of the planet Jupiter. However, that program does depend on assumptions about the computer on which it’s running, the operating system services available on that computer, and the display device on that computer. A naïve programmer might think that his or her simple “Hello world” program would be totally immune to changes in its environment, and over short timescales that view would be correct. But if we consider a “Hello world” program that might have been written on a Bell Labs or MIT computer in the 1970s, there is no way to take that exact program, unchanged, and run it on a readily available modern computer, unchanged.

There are two ways to get the program running on a modern machine. The first, a process called porting, requires that we modify the program to adjust to its new environment. The second, a process called emulation, requires that we create a layer of additional program that recreates the 1970s-era environment as expected by the old program. We can adjust the program to the new environment, or adjust the environment to the old program, or some combination—but we can’t just count on today’s correct program remaining tomorrow’s correct program indefinitely into the future.

This phenomenon of program/environment mismatches developing over time has a colloquial name of “bit rot” among programmers. It’s a term that implies decay over time for software, which is certainly what it seems like experientially—even though the programs are actually made of a wonderful nonphysical “material” that never has any kind of wear or deterioration.

`Big Problems`

We will be concerned later in the book with how to scale up systems to tackle big problems. So another important limit on even a perfectly constructed program is that the problem to be solved might be too big for the available resources. What do we mean by saying that a problem might be too big? Suppose that we have some measure of how big a problem is. For example:

• If we are looking for one instance of a word in a particular (single) book, then looking in a different, longer book is probably harder.

• If we are considering a particular single word in a particular single book, then looking for all the occurrences of the word is probably harder than just looking for any one occurrence.

Just knowing that one problem is vaguely harder than another is not much help, though. We would like to identify what the limiting elements are. We are familiar with this issue in other contexts. Suppose that we have a car capable of speeds up to 100 miles per hour, and we need to drive 10 miles through the city. Can we simply divide 10 miles by our 100 mph speed and conclude that the driving time will be 0.1 hour (6 minutes)? Probably not. In a typical city, there will be speed limits well below 100 mph for the entire route, and traffic lights at some number of intersections are likely to stop us for some of the time as well.

Suppose that we tried such a drive and found that it actually took us 24 minutes instead of 6 minutes to go 10 miles; and further suppose that we found that speed unsatisfactory. Would it be sensible to buy a much more expensive car that could go twice as fast, 200 mph? No, because the car’s top speed isn’t the limiting factor. The limiting factor is something else, and we would need to identify and correct it to get any significant improvement in performance.

In general, it’s not easy to identify what the limiting element is in a program. If there is no other information available, finding the limiting element involves a careful analysis of the program’s activities, particularly as the inputs are increased in size and/or number. The measurement of program performance is called benchmarking, and improving the performance of programs is called tuning. The concepts involved in benchmarking and tuning are not very different from what people do when racing cars—indeed, the terminology is largely borrowed from there.

`Computational Complexity`

However, computer scientists also have an approach to understanding and improving performance that is quite different from the steps taken to improve performance in race cars. That approach is called computational complexity. Professional programmers design programs with an eye on how they scale up to large problems, making design choices with an awareness of the computational complexity of various potential solutions. A good education in computer science provides an awareness of these issues and the consequent ability to make sensible choices about key implementation concerns. However, simply knowing “how to program” doesn’t necessarily imply a good understanding of these topics.

Computational complexity hinges on two tricks:

1. We think about a program’s performance only in terms of the elements that will be dominant as the problem gets bigger—the real bottlenecks or limits at large scale.
2. We characterize that performance by a kind of mathematical analogy. Rather than developing a detailed analysis of the program’s performance, we understand whether it is similar to one of a small number of common mathematical functions.

Just as we can sort socks into piles of black, white, and colored socks without worrying too much about exact colors or patterns, we can sort programs into piles based on how quickly they run out of resources as the problem of interest gets bigger. As with the sock-sorting, there may be some tricky edge cases, but we can still use the approach to get a better sense of what we have—and we know we’ll at least distinguish black socks from white socks.

Instead of sorting socks into piles based on color, we’ll sort programs into groups based on their complexity. The “color” of a program is the mathematical approximation that is closest to the program’s behavior, and that “color” is what is meant by the complexity of the program. Here the word “complexity” is not just an informal term for something complicated; instead, it has a precise meaning for how hard a problem is.

`Ignoring Constants`

Let’s return to the first “trick” of computational complexity to understand it better. When we’re looking at performance, we want to focus on a limiting factor—we don’t want to be distracted by something that’s not the “big issue.” As we noted with the example of the fast car going across town, it doesn’t make any difference to speed up something that’s not the limiting factor.

We’re going to turn that reasoning around and assume for our discussion of computational complexity that we are focused on cases where the size of the input drives the overall cost. We’re assuming that lots of good detective work has already removed quirky or unrepresentative performance, so that we’re now dealing with a cleaned-up situation where the performance of the program is basically determined in some way by the size of its input.

If we consider a program to sort numbers, the size of the input would be how many numbers we want to have sorted. We call that input size n. In this example, it’s the number of items to be sorted, but in other cases it could be something else. We drop any aspect of the cost that doesn’t include n. That is, if there’s some fixed overhead that we always have to pay that doesn’t change with the problem size, we ignore it. The technical term for this perspective is that we ignore constants—we are interested in how the cost grows as n grows, and any constant overhead simply drops out of that comparison because it doesn’t grow at all.

Perhaps more surprisingly, we also ignore constants when they are “only” multiplying the cost. For the purpose of this analysis, 2n is the same as n, which is the same as n/2—they’re all just “variants” of n. The technical term for this perspective is that we also ignore constant factors.

Ignoring constants and constant factors might seem strange. Is it justifiable? Yes and no. As a way of getting to core scaling issues, it’s absolutely reasonable. Recall that what we’re heading toward (our second “trick”) is grouping the behavior of different programs into “piles” that share an underlying growth characteristic. As we will see, the differences between those families completely dominate the differences within each family. We might think of the underlying complexity as a vehicle and the constant factors as the effort of the vehicle’s driver or rider. If we’re comparing a sports car to a bicycle, the crucial comparison is between the vehicles, not how hard the rider or driver is pushing the pedals. And it’s important not to be misled by a hard-pedaling cyclist vs. a light-footed driver into thinking that the car is slower.

However, once we are past this core analysis—understanding which “vehicle family” corresponds to the program’s behavior—constant factors and overhead come roaring back to our attention. A theorist specializing in computational complexity might dismiss a given approach as “only changing a constant factor,” and thus irrelevant; but making something run twice as fast (or perhaps even just a few percent faster) can make a huge difference to real-world applications of computing.

For example, in the annual ranking of the world’s supercomputers, the difference between being the fastest in the world vs. being at the bottom of the list is meaningless from the perspective of a theorist. In a discussion among theorists, the distinction between these is merely some constant factor, which in a theoretical analysis is dropped as insignificant. However, if you have a specific problem you want to be solved and it can be solved on either computer, you will be much happier getting to solve it a thousand times faster on the fast machine … and you will cheerfully ignore the theorists’ comments that this difference “doesn’t matter.”

`Categories of Complexity`

Let’s assume that we are given the complexity of a program rather than having to somehow figure it out. It’s useful to compare the growth rates of some plausible complexity functions to get an intuition for how they behave. Table 7.1 shows some values of complexity functions that a program might have.

Table 7.1

Some examples of how problem sizes grow with n

n	n²	n³	2n	10n	n!
1	1	1	2	10	1
2	4	8	4	100	2
3	9	27	8	1,000	6
…	…	…	…	…	…
10	100	1,000	1,024	10,000,000,000	3,628,800
11	121	1,331	2,048	100,000,000,000	39,916,800
12	144	1,728	4,096	1,000,000,000,000	479,001,600
…	…	…	…	…	…

The column labeled n represents the size of the input, like how many numbers need to be sorted. The values in the other columns give an idea of how many steps are required if the program falls into that category for its complexity. (If mathematical functions are not your friends, just think of these as slightly weird names we’re giving to some useful growth trends: they could have names like “Carol” or “Ted” instead, if you prefer.) The crucial part for our exploration is just to see how big or small some of the numbers in the table get, not to spend a lot of time with the underlying math. In particular, we notice that the columns to the right have crazily large numbers in them.

If you are acquainted with spreadsheets, it’s not hard to build a chart like this yourself. Making your own chart means you can explore these numbers and functions yourself or test out some other functions for comparison.

Our simple, no-real-math-involved observation is that it makes a big difference which column of this table most closely resembles our program’s behavior. Some of these trends grow very much faster than others.

In chapter 3 we mentioned Moore’s law and the surprising way that computing performance has improved over time. Despite its startling effects, Moore’s law does not promise growth as fast as some of the possible complexity categories we’ve just seen. We might find ourselves taking the tone of a spoiled child and complaining that Moore’s law “only” promises repeated doublings (that is, the 2n column). Although it is a remarkable benefit to receive, we could want more: any of the really fast-growing problems will stay out of our reach even with the help of Moore’s law.

We haven’t said much about how to establish a connection between a particular process and the closest growth trend representing its computational complexity—and we won’t, because that’s a hard topic that doesn’t yield much value to nonspecialists. There are three important insights to summarize:

1. Problems typically grow harder as they get bigger.
2. Not all problems grow harder in the same way or at the same rate.
3. Some problems can grow faster than our ability to supply resources to solve those problems.

`Uncomputability`

One of the great mind-benders beloved of amateur philosophers is the notion of limits on omnipotent beings. One way of raising this issue is to assume an omnipotent God. Then ask, can God create an object that’s so large that even God can’t lift it?

The problem with answering is that no matter how we answer, we seem to have put limits on the defined-to-be-omnipotent being. If we say “yes,” then that means we are saying God can’t lift the object … so God is limited, and thus not omnipotent. If we say “no,” then that means we are saying God can’t create the object … so God is limited, and thus not omnipotent. The problem seems to be in our casual assumption that “omnipotent” is a coherent concept. Instead, it seems like the designation of God as “omnipotent” is the start of the problem. As we’ll see, there are some vaguely similar logical problems in considering the power of computation.

To recap, to get to this point we have considered a variety of limits on programs and imagined our way past each. So now we are considering a perfectly specified, perfectly implemented program with an unchanging environment that has also been checked to ensure that it has sufficient resources. One final limitation relates to what is impossible to express as a computation, or uncomputability. This limitation can seem esoteric and bizarre when first described. However, uncomputability turns out to have some important practical implications.

A common first reaction to the idea of uncomputability is to reject it. What can it possibly mean to say that there are programs that can’t be computed? We can readily grasp the idea that it might be hard to capture requirements, hard to write correct programs, hard to ensure there are sufficient resources to execute the program. But the idea that there are well-specified programs that are nevertheless impossible to compute seems to violate what we’ve implied previously. As we first considered the digital world (chapter 2) we saw that we could use bits to represent analog data; no matter how complex the wave form, if we were willing to use enough bits we could produce as good a copy as we wanted. Similarly, when we examined programs and processes (chapter 3) we saw that we could build up very elaborate processes from many tiny steps. So after all of that rah-rah positive attitude about computing, it seems like a real step backward—a contradiction, even—to say that there are things we just can’t do.

`Formal Logic`

Indeed, if we were to assert that everything is computable, we would be in distinguished company. Famed mathematician David Hilbert made a comparable incorrect assumption in 1928 when he posed his “decision problem” (which sounds much more impressive in the original German, where it is “Entscheidungsproblem”).

Part of the intellectual ferment at the time was an effort to build all of mathematics from a foundation of logic. Logic is the study of valid reasoning, and its relevance for this discussion is in the idea of formal logic. Formal logic starts from distinguishing the form of an argument from its particular content. The form is the structure or skeleton of the argument; in contrast, the content is what the argument is about.

For example, we can make a variety of claims about a subject like elephants, some of which will be valid and some of which will be invalid. Similarly, we can make a variety of claims about a subject like strawberries, with some being valid while others are invalid. The key insight of formal logic is that the validity of a statement may be determined by its form, not its content. What does that mean? There are circumstances in which a claim about strawberries is related to our knowledge about strawberries: for example, “Ripe strawberries are red.” That particular piece of knowledge is relevant only to strawberries and their color; it doesn’t let us determine much about anything else.

In contrast, consider the two statements: “No strawberry is an elephant” and “No elephant is a strawberry.” Taken together, these two statements and the rules of formal logic allow us to conclude that “nothing is both an elephant and a strawberry.” Better yet, that same structure is applicable to something other than elephants and strawberries. For example, we can also reason about the unrelatedness of giraffes and grapes, or indeed even move away from animals and fruits to (say) cars and vegetables.

Seen from the perspective of everyday experience, it can seem quite peculiar and excessively philosophical to make this separation between form and content. However, with the benefit of hindsight we can appreciate that it’s exactly this bizarre-seeming emphasis on abstraction and rules that led to today’s computer age. Computers are very much embodiments of formal logic, applicable to many different kinds of content.

`No Solution to Hilbert’s Problem`

To return to Hilbert’s ambition: the thinking at the time was roughly that a mathematical system could start with a few simple assumptions (what mathematicians would call axioms) and some combining rules. The rules would allow those axioms to produce additional logical statements that were also true (proven) as a consequence of the axioms. With the right axioms and rules in place, all of mathematics could follow as consequences.

In that context, Hilbert posed the task of constructing a decision process that could be given a statement in a particular kind of logic and answer “yes” or “no” depending on whether the statement was or wasn’t reachable by applying the rules to the axioms. Computer science arguably dates its birth to this problem. Two foundational figures of computer science—Alan Turing and Alonzo Church—independently used their own computational models to explore this problem. We have already encountered Turing’s computational model in the form of the Turing machine when we first looked at processes (chapter 3). Then we saw Church’s computational model in the form of lambda calculus, when we looked at names (chapter 4). As we saw there, lambda calculus is not at all machinelike. Instead, it “operates” on the basis of substituting values for names.

Remarkably, despite the surface differences in their approaches, Church and Turing reached the same conclusion: Hilbert’s problem is not solvable. Church and Turing in turn were building on a startling discovery that yet another logician (Kurt Gödel) had made a few years earlier about the intrinsic limits of mathematics and logic.

In all of these cases, the limits arise from applying the system to itself—that is, the limits of a universal system arise from the ability of a universal system to turn a mirror on itself. The next few sections develop the slightly convoluted thinking required to demonstrate this limit in the universe. If you like logic puzzles, this is great stuff; if you’re not all that keen on logic, you may want to skip forward a few sections to “The Halting Problem,” where we talk about what this means for real-world systems.

`Russell’s Paradox`

To work our way up to this sort of self-mirroring and the problems that arise, let’s consider an even earlier logic problem called Russell’s paradox. Russell’s paradox arises from the work of Bertrand Russell, yet another famous logician and philosopher who was a contemporary of Hilbert, Gödel, Church, and Turing.

One way of talking about Russell’s paradox is to talk about clean-shaven men in a small town with a single male clean-shaven barber. The barber shaves everyone in town who does not shave himself.

The paradox arises from trying to classify the barber within the stated rules:

• If he shaves himself, he is not shaved by the barber, which means he doesn’t shave himself. Or …

• If he is shaved by the barber, he is shaved by himself, which means he isn’t shaved by the barber.

With either decision, we arrive at a contradiction. How annoying!

We can acknowledge that this is a paradox, but it seems a bit pointless and pedantic. We might well wonder about its significance to the problem of computability that we are trying to approach. Part of the answer is that the elements in play here will also matter for computability. In particular, to have this paradox we need to have three elements together:

1. Universality (a rule that applies to all the clean-shaven men);
2. Negation (those who shave themselves are not shaved by the barber);
3. Self-reference (men who shave themselves).

If we eliminate any one of these elements, the paradox disappears:

• If we eliminate universality, the barber can simply be an exception to the rule.

• If we eliminate negation, the barber can be both shaved by himself and shaved by the barber.

• Finally, if we eliminate self-reference, then we would simply be discussing the set of men who are or aren’t shaved by the barber, which can certainly include the barber.

`Halting vs. Diverging`

Russell’s paradox is a fine logic puzzle, but why does it have any relevance to computability? We are going to take those same ideas of universality, negation, and self-reference and apply them to the “answerability” of computational questions.

We start with the idea that each process either gives an answer or it doesn’t. If a process gives an answer, we say that it halts—it doesn’t take any more steps after it comes up with the answer. If a process never gives an answer, we say it diverges—that is, it just fiddles around taking meaningless steps endlessly.

Then we create a process about processes, with a special kind of “clairvoyant” quality. This clairvoyant process can be given a program as its input, and will determine (somehow) whether that program’s corresponding process halts or diverges.

Let’s pause here and consider what we’ve done. It doesn’t feel like we’ve done anything wrong—yet. We’ve just said that we are going to build a program that looks at other programs. Examining a program to determine whether it halts or not seems like something useful that a program can do, much the same way that cutting hair is something useful that a person can do. Indeed, there are real-world examples of programs that examine other programs to determine if the examined programs contain errors. We can view this clairvoyant program as a very simple abstract example of the larger class of programs that examine programs.

Determining whether a program halts is less familiar than cutting hair, but otherwise nothing here seems alarming. Just as it might take a lot of training to cut hair well, it might take a lot of sophistication to build this clairvoyant program. But in the best traditions of mathematics, we’ll just assume we can solve the problem rather than worrying about how!

`Building a Paradox`

For the next step in our pursuit of paradox, we build a negated version of that clairvoyant process. The negated clairvoyant process halts if the input process diverges and diverges if the input process halts. In contrast to the unknown complexity of building a clairvoyant process, the negation step seems easy and obviously implementable. Whatever you were going to do, just do the opposite.

The final step in assembling this paradox is to ask what happens when the negated clairvoyant process is given itself as input. After all, the negated clairvoyant process is just a process, and its corresponding program can be examined just like any other program to determine whether it halts or diverges.

But the situation is dire when we do apply the negated clairvoyant process to itself. If the negated clairvoyant process halts when given itself, then it diverges; but if it diverges when given itself, it halts. Ugh! There is a contradiction, which we can trace back to our assumption that we could build the clairvoyant process.

Once again we used the combination of universality, negation, and self-reference to achieve a contradictory result.

`The Halting Problem`

What does this contradiction mean? Our seemingly harmless assumption that we could determine the halting behavior of a program now looks like it was the first step on a road to ruin. We’ve now exposed what computer scientists call the halting problem.

Its real-world implication is that we have to be a little concerned about whether our computational system is powerful enough to let us play this trick. If this trick is possible, then in general we can’t tell whether a program will work correctly. We know for sure that any universal model of computation will definitely have this problem. After all, if a system is sufficiently powerful to write any program, then it’s also powerful enough that we can write the program(s) required to play this trick on any supposedly clairvoyant program.

If we really want to be able to guarantee whether programs will work, we can weaken the model of computation so that it’s not universal, and then perhaps we can’t build the contradictory example. But people who have tried that path have found that it is frustrating to write programs in such a mode—the elements that give a language expressive power are the same as the elements that are problematic in terms of predicting execution.

Indeed, a part of what Gödel proved in the larger realm of mathematics is that there is essentially a trade-off between completeness and consistency. If a mathematical system is large and powerful enough to prove all true statements about arithmetic, some of the statements in that system are inconsistent with each other. But if a mathematical system is careful enough to prove only consistent statements about arithmetic, it also omits some true statements.

The halting problem itself is narrower than Gödel’s much broader concerns about mathematical systems, but it still feels very philosophical. At this point it seems unlikely to be of any practical concern. However, we will see that it is quite important in chapter 22 when we start to consider defending against attackers.