A. Exiled Theory

Among some of our engineering colleagues, "theory" has a bad reputation, seeming to have connotations of "impractical" and "divorced from reality." This reputation is unfortunate, particularly in computer science. Computation itself is an abstract, mathematical concept. The rigor of mathematics lets us talk precisely about it. In this sense, "theoretical" computer science can lead us toward, rather than away from, the heart of what's really going on here.

The tools of theoretical computer science help illuminate many aspects of the security craft. In this appendix, we provide deeper discussion of some of the topics in this space, broached briefly earlier in the book.

A.1 Relations, Orders, and Lattices

The primary security models in this Orange Book world depend on the mathematical concepts of orders and lattices. We quickly review these definitions here.

A relation R on a set S is a subset of S x S, except that we write it in infix notation. That is, we write aRb when a is related to b within this relation.

An order on a set is a relation R that satisfies these properties:

A nice intuitive example of an order is Image on the integers.

We say that an order is total when, for any distinct x, y, we have either xRy or yRx. (Again, think of Image on the integers.) In contrast, a partial order does not require that all distinct pairs be orderable. Some pairs can be incomparable.

(If one wants to be pedantic about it, a total order is merely a special case of a partial order.)

In a partial order, an element c is the least upper bound of elements a and b if c satisfies these properties:

Intuitively, the first two conditions mean that c is indeed an upper bound of a and b; the third condition means that it is "smaller" than any other upper bound of a and b. We can similarly define a greatest lower bound.

A lattice is a partially ordered set such that any two elements have a least upper bound and a greatest lower bound. Infinite lattices can be a bit strange to think about. However, suppose instead that we have a finite one, built from taking the transitive closure of a simple relation, and we draw it as a directed graph:

The resulting structure will actually look a bit like what we call a lattice in ordinary English. Each pair of nodes needs to have a unique least upper bound, which forces us to have a regular lattice-work structure. The graph itself will have distinctive top and bottom elements. Figure A.1 sketches these concepts.

Figure A.1. This directed graph is an example of a simple lattice, where n1 Image n2 if and only if a directed path exists from n1 to n2. We see that node e is the greatest lower bound of nodes b and c. Nodes e, h, i, and j are all lower bounds of these two nodes, but h Image e, i Image e, and j Image e.

Image

A.2 Functions

A mathematical function f from a set D to a set R is a map that takes each element x Image D to some element f (x) Image R. A partial function can leave some f (x) undefined. The set D here is called the domain of f ; the set R is its range. Sometimes, this is all expressed mathematically as

f : D Image R.

Sometimes, the value f (x) Image R is said to be the image of x under f ; the value x is said to be the preimage of the value f (x).

There are no requirements that D and R be distinct; folks often talk about just one set (e.g., "a function on the integers") when D = R.

We sometimes distinguish some special properties of functions.

We might blunder and somehow manage to introduce an f such that it's not clear which element of R some x maps to. In this case, our function f is not well defined.

A.3 Computability Theory

Given a function f : D Image R, our first reaction, as computer people, might be to think about how we would compute it: how we would write a program that takes x Image D as an input and spits out f (x).

The field of the theory of computation developed precise mathematical characterizations of what things "programs" can do. This characterization leads to a surprising, counterintuitive result: There exist well-defined functions that cannot be computed.

We quickly present this result.

A.3.1 Things Can Be Uncountable

We start with countability.

Recall the set of natural numbers:

Image = {1, 2, …}.

The set Image is clearly infinite; other sets are infinite as well. Mathematically, we say that a set S is countable when we can put S in one-to-one correspondence with Image. One might ask: Are there any infinite sets that are not countable? The answer is yes. For example, consider the real numbers between 0 and 1:

R = {r Image Image : 0 < r < 1}.

Since each element of R is between 0 and 1, we can write each in decimal as 0, followed by a decimal point, followed by some number of digits, perhaps a nonterminating sequence of them. We'll also define a Twist() function that twists a digit to be something very different:

Image

We can show that R is not countable, using something called Cantor's diagonalization technique. If R were countable, we could enumerate it as r1, r2, … and hit the entire set. We could use this enumeration and the decimal representation to build a table: Row n would contain rn, and column k would contain the kth decimal digit. (See Figure A.2.)

Figure A.2. If the real numbers between 0 and 1 were countable. we could enumerate them as r1, r2….

Image

From this table, we can construct another element of R (call it s) by going along the diagonal but twisting things at each step—we let the nth digit of s be Twist(d), where d is the nth digit of rn. (See Figure A.3.) This new number s is clearly in R. But it also cannot be in this table—for if s = rm for some m, what would the mth digit of s be? Cool, eh?

Figure A.3. By moving along the diagonal and twisting the digit there so it's different, we produce a real number s that cannot be in this enumeration.

Image

A.3.2 Things Can Be Uncomputable

We can use the same principle to show that there are uncomputable functions. Let P be the set of possible programs and I be the set of possible inputs to these programs. When we get serious about formalizing P and I, we end up with sets that are countable. So, we can use these facts to build a table: Row n talks about program pn; column k talks about what happens when we run programs on input ik. In entry n, k, we write Halt or Not Halt, depending on whether Pn(ik) halts or runs forever. (See Figure A.4.)

Figure A.4. Because the Turing machines are countable, we can enumerate them (down the rows here). Because their possible inputs are countable, we can enumerate them as well (across the columns here). Each machine on each input either halts or doesn't; in this table, we show what happens.

Image

We can define a new Twist() function:

Image

From this table, we can construct a function f by going along the diagonal but twisting things at each step:

f (in) = Twist(pn(in)).

(See Figure A.5.)

Figure A.5. By moving along the diagonal and twisting the entry there so it's different, we produce a function f that cannot be computed by a Turing machine in this enumeration.

Image

Semantically, this function f returns Halt on in exactly when Pn does not halt on input in. Consequently, no program can compute f ! All possible programs appear in this table. So, if f were computable, it must be computed by pm for some m. However, how can pm(im) behave the same as f (im)? If pm(im) halts, then f (im) must say Not Halt, so pm Image f ;if pm(im) does not halt, then f (im) must halt, so pm Image f. Cool, eh?

The uncomputable problem of deciding, yes or no, whether a given program halts on a given input is called the Halting Problem. One can use this idea as a springboard into a world of fascinating theoretical and practical work and uncomputable functions. (Sipser's Introduction to the Theory of Computation covers these problems in detail [Sip97].) As a young postdoc, one of us was once asked to write a program (for the U.S. Postal Service) that essentially solved the Halting Problem. Citing Alan Turing's classic 1930s paper [Tur37] in our final report [Smi05] was cause for much mirth (at least among the authors of the report).

A.4 Frameworks

Section A.3 showed how functions exist that cannot be computed. This existence can be useful when we want to talk rigorously about hard tasks: For example, we can show that in order to do this task, we have to solve the Halting Problem. This is an example of a computation-theoretic argument framework.

Theory also gives us other frameworks to reason about things. Among computable functions, we can start rigorously reasoning about the time and space resources (and types of computational devices) it takes to compute them. This theory gives us a way to talk about the complexity of computable problems. Informally, we say a problem has complexity O (f (n)) if the resources it requires to solve an instance of size n grows by not more than f (n), asymptotically. A problem has complexity U2126.jpg (f (n)) if the required resource grows by at least f (n) asymptomatically. (Note however, that complexity theory usually assumes that the problem's parameters are written down in binary.)

The resulting framework gives another way to talk about hard tasks. For example, we can show that doing this task requires doing a function that, although computable, takes more resources than is feasible. This is an example of a complexity-theoretic argument framework.

Many aspects of cryptography are even weaker than complexity-theoretic. For one example, no clear complexity-theoretic bound underlies the security of SHA-1. For another example, everyone suspects that factoring is hard, but no one really knows; furthermore, breaking RSA might be even easier. We also thus sometimes see a tacit "cryptographic" framework—where one simply hopes that certain key functions are intractable. (We then argue that breaking our scheme requires breaking one of these hopefully intractable functions.)

Chapter 7 discussed yet another framework: the use of information theory to characterize the raw information present in message. If the information isn't there, then the adversary can't get it. However, if the information is there, that doesn't mean that the adversary can get it. When it comes to considering computation in the real world, information theory has a glaring deficiency—it neglects to take into account the feasibility (or even the possibility) of actually extracting the information from the message.

Conundrum

As a thought exercise about these frameworks, suppose that Alice wants to send a 20-character ASCII message M to Bob. She can choose many encodings:

(The satisfiability problem comes from complexity theory: One is given a Boolean formula consisting of ANDs and ORs of variables and their negations and needs to answer: Is there a truth assignment to the variables that satisfies this formula?)

How much information about M is present in each of Alice's formats? Information theory tells us that M1, M2, and M3 contain all the information. However, decoding M1 is intractable, and decoding M2 is, in general, not even computable. If Alice wants to minimize information exposed to an eavesdropper, she should choose M4—even though that gives away half the message right away; any sensible implementer would choose M3.

(Whenever we asked a local theoretician about this problem, he told us we were giving him a headache.)

A.5 Quantum Physics and Quantum Computation

Chapter 8 talked about how, if we use the properties of quantum mechanics to build new kinds of computers, we can factor efficiently—and thus break RSA. This section gives some more detail.

A.5.1 Evolution of Physics

To tell this story, let's broaden our scope from cryptography to the universe. How does the universe work?

Throughout history, humans seem to want to have a mathematical model that seems "natural" and "intuitive," and also actually explains the way the universe works. Searching for such a model raises deeper philosophical issues. Does the universe work this way because the model is natural? Or does the model seem natural (to us) because that's the way the universe works?

Newtonian mechanics was a model that proved rather satisfying. It explained how balls bounced. It gave us calculus. Unfortunately, the Newtonian model did not quite explain all observable phenomenon. This shortfall gave rise, in the early part of the twentieth century, to relativity: a clean model that explains such things as how the measured speed of light remains constant. Relativity isn't quite as "obvious" as the Newtonian model, but if one accepts a few things on faith, then it's still somewhat intuitive.

However, when scientists started examining the physics of very small stuff, things got strange. Explaining this strangeness led to a new model: quantum mechanics. The quantum model is elaborate, bizarre, and unintuitive—there is no "obvious" reason why the universe should work that way. The model appears sufficiently artificial and complex that a graduate student who proposed this as a theory to his or her adviser would probably be gently encouraged to go think a bit more.

However strange, the quantum model is the simplest model that we've come up with that actually explains the observed phenomenon. We now delve into this strange world.

A.5.2 Quantum Mechanics

We start with a simplified sketch. Consider an object O that might be in any state in some set SO. For example, O might be a photon, with SO being various polarizations; O might be a 1-bit register, with SO = {0, 1}. For simplicity, we'll assume that SO is finite. Intuitively, we think of the object O as actually being in exactly one state s Image SO. For example, the photon has a vertical polarization right now, or the bit equals 1.

The Natural Intuitive Model

We might describe the state of the object O with a vector, with an entry vs for each s Image SO. The entry is 1 for the state s that O actually is in and 0 for all other entries.

Perhaps we're not certain which state the object is in. In this case, we might assign each s Image SO a number P (s) describing the probability that object O is in state s. For each s, P (s) is a real number in the range 0 Image P (s) Image 1; over all s Image SO, the P (s) sum to 1. We can extend our vector description to handle this uncertainty, simply by setting each entry vs = P (S). The object O is in exactly one of these states, but we don't know which one.

Suppose instead that we have two of these objects: O1 and O2. If these two objects interact, we simply need to figure out what state each was in beforehand and then what state each goes to after the interaction. If we have uncertainty about what states they are in, we simply need to work through the probabilities. That is, we work through each choice of s1 Image SO and s2 Image SO.

For example, suppose that t1 and t2 are the new states of O1 and O2, respectively, when O1 is in s1 and O2 is in s2. If v1 and v2 be the vectors of O1 and O2, respectively, then the probability that O1 is in s1 and O2 is also in s2 is v1[s1v2[s2]· So, we add v1[s1v2[s2] to the probability that O1 goes to t1 and also to the probability that O2 to t2. And so on.

The Unnatural Quantum Model

The quantum model is like this intuitive model but branches out in a few surprising directions.

As before, if object O can be in one state from a set SO of states, we can describe its state with a vector v consisting of an entry for each element of SO. However, now, we express uncertainty by using complex numbers[1]. Each such entry is called an amplitude; the squared moduli[2] of the amplitudes sum to 1.

Quantum mechanics adopts the convention of writing states using "| Image" notation: for example, |0Image and |1Image for a 1-bit register. We then describe the state of the object as w|0Image + v|1Image, where w, v Image Image are the amplitudes. Naively, we might suspect that these amplitudes are like probabilities; if the object has more than one nonzero amplitude, then it is still in one state, but which state that is is somehow reflected by the amplitude. However, it is stranger than that—more than one nonzero amplitude means that we cannot know what state the object is in, because it literally is in more than one state. This arrangement is called a superposition of states.

An object's ability to be in multiple states simultaneously leads to some surprising things. For one thing, we might wonder about why we never actually see an object in more than one state at one time. Something called the collapse of the wave function is responsible for this. If we make an observation of an object that is in a superposition of states, the object collapses into one state. Which state we see is determined probabilistically by the amplitudes—if object is in w|0Image + v|1Image, the object goes to |0Image with probability |w|2 and |1Image with probability |v|2. What's even stranger here is that the object stays in the state we see. Observing the object changes the state of the object—its state becomes the state we observe.

Being in more than one state at a time also affects what happens when two objects interact. In the simpler uncertainty model, we merely had to think about the probability that objects O1 and O2 were in some particular states and then figure out what happens in that combination. However, with superposition of states, each state in O1's superposition interacts with each state in O2's superposition, all at the same time. Every state in the superposition contributes in an interaction. Furthermore, because these amplitudes are complex, we can end up with constructive interference and destructive interference between states. In fact, these phenomena caused the optical conundrums that helped motivate quantum theory in the first place.

A.5.3. The Root-Not Gate

To understand the computational implications of this model, let's consider a new kind of logic gate: ROOT-NOT. (Deutsch gives a nice discussion of this concept [Deu89].) We're familiar with the comfortable world of Boolean gates. Can we build a Boolean gate G such that G composed with G is NOT? Can we build a Boolean gate G that flips coins?

Our initial answer is: of course not! If G ο G = NOT, then we must have:

Image

We can then work through the possibilities. If G(1) = 1, then G(G(1)) = 1, which breaks the rule. So, G(1) = 0. Similarly, we can establish that G(0) = 1. But then G(G(1)) = G(0) = 1, which also breaks the rule. So, we're stuck.

However, suppose that we build gates that use complex amplitudes and superpositions of states and operate on qubits. (The field defines a qubit to be a superposition of |0Image and |1Image.) Then we might build a G that acts on the |0Image and |1Image components of its input as follows:

Image

What happens when we run G on |0Image and then run G again on the result?

Image

When the wave function collapses, we see 1! Similarly, G(G(|1Image)) yields |0Image . G composed with itself is NOT—hence G is ROOT-NOT.

What happens if we feed |0Image to G and observe the result? If we look at the amplitudes, we see that we will observe |0Image with probability Image and |1Image with probability Image. So, one stage of G flips a coin, but running two stages of G inverts the original input. Yes, this is counterintuitive. But yes, this is how the universe works.

This ROOT-NOT gate gives a flavor of the power of quantum computing. Compare things like ROOT-NOT to standard Boolean gates—we can do things in a few steps that are much more complicated in the traditional model.