The Craft of System Security

A. Exiled Theory

Among some of our engineering colleagues, "theory" has a bad reputation, seeming to have connotations of "impractical" and "divorced from reality." This reputation is unfortunate, particularly in computer science. Computation itself is an abstract, mathematical concept. The rigor of mathematics lets us talk precisely about it. In this sense, "theoretical" computer science can lead us toward, rather than away from, the heart of what's really going on here.

The tools of theoretical computer science help illuminate many aspects of the security craft. In this appendix, we provide deeper discussion of some of the topics in this space, broached briefly earlier in the book.

Section A.1 offers some background on relations, orders, and lattices.
Section A.2 offers some background on functions.
Section A.3 introduces computability theory.
Section A.4 compares the frameworks of computability theory, complexity theory, and information theory.
Section A.5 provides more detail on quantum mechanics and quantum computing.

A.1 Relations, Orders, and Lattices

The primary security models in this Orange Book world depend on the mathematical concepts of orders and lattices. We quickly review these definitions here.

A relation R on a set S is a subset of S x S, except that we write it in infix notation. That is, we write aRb when a is related to b within this relation.

An order on a set is a relation R that satisfies these properties:

Transitive. If aRb and bRc, then aRc.
Reflexive. For any a, we have aRa.
Antisymmetric. If aRb and bRa, then a == b.

A nice intuitive example of an order is on the integers.

We say that an order is total when, for any distinct x, y, we have either xRy or yRx. (Again, think of on the integers.) In contrast, a partial order does not require that all distinct pairs be orderable. Some pairs can be incomparable.

"I love each of my daughters more than I love the raccoon that lives behind the garage."
"But I can't tell you whether I love my older daughter more than I love my younger daughter."

(If one wants to be pedantic about it, a total order is merely a special case of a partial order.)

In a partial order, an element c is the least upper bound of elements a and b if c satisfies these properties:

aRc.
bRc.
If aRd and bRd, then cRd.

Intuitively, the first two conditions mean that c is indeed an upper bound of a and b; the third condition means that it is "smaller" than any other upper bound of a and b. We can similarly define a greatest lower bound.

A lattice is a partially ordered set such that any two elements have a least upper bound and a greatest lower bound. Infinite lattices can be a bit strange to think about. However, suppose instead that we have a finite one, built from taking the transitive closure of a simple relation, and we draw it as a directed graph:

A node for each element of the set
An edge for each pair under the simple relation
Paths for the transitive relation

The resulting structure will actually look a bit like what we call a lattice in ordinary English. Each pair of nodes needs to have a unique least upper bound, which forces us to have a regular lattice-work structure. The graph itself will have distinctive top and bottom elements. Figure A.1 sketches these concepts.

Figure A.1. This directed graph is an example of a simple lattice, where n₁ n₂ if and only if a directed path exists from n₁ to n₂. We see that node e is the greatest lower bound of nodes b and c. Nodes e, h, i, and j are all lower bounds of these two nodes, but h e, i e, and j e.

A.2 Functions

A mathematical function f from a set D to a set R is a map that takes each element x D to some element f (x) R. A partial function can leave some f (x) undefined. The set D here is called the domain of f ; the set R is its range. Sometimes, this is all expressed mathematically as

f : D R.

Sometimes, the value f (x) R is said to be the image of x under f ; the value x is said to be the preimage of the value f (x).

There are no requirements that D and R be distinct; folks often talk about just one set (e.g., "a function on the integers") when D = R.

We sometimes distinguish some special properties of functions.

A function f is one to one (or injective) if for any x y D, we have f (x) f (y). Each x goes to its very own image.
A function f is onto (or surjective) when f hits every element in its range R.
A function f that is both injective and surjective is a bijection.
The word homomorphic derives from the phrase same form. A function is homomorphic when it preserves some interesting form of its domain set, such as some special relations.

We might blunder and somehow manage to introduce an f such that it's not clear which element of R some x maps to. In this case, our function f is not well defined.

A.3 Computability Theory

Given a function f : D R, our first reaction, as computer people, might be to think about how we would compute it: how we would write a program that takes x D as an input and spits out f (x).

The field of the theory of computation developed precise mathematical characterizations of what things "programs" can do. This characterization leads to a surprising, counterintuitive result: There exist well-defined functions that cannot be computed.

We quickly present this result.

A.3.1 Things Can Be Uncountable

We start with countability.

Recall the set of natural numbers:

= {1, 2, …}.

The set is clearly infinite; other sets are infinite as well. Mathematically, we say that a set S is countable when we can put S in one-to-one correspondence with . One might ask: Are there any infinite sets that are not countable? The answer is yes. For example, consider the real numbers between 0 and 1:

R = {r : 0 < r < 1}.

Since each element of R is between 0 and 1, we can write each in decimal as 0, followed by a decimal point, followed by some number of digits, perhaps a nonterminating sequence of them. We'll also define a Twist() function that twists a digit to be something very different:

We can show that R is not countable, using something called Cantor's diagonalization technique. If R were countable, we could enumerate it as r₁, r₂, … and hit the entire set. We could use this enumeration and the decimal representation to build a table: Row n would contain r_n, and column k would contain the kth decimal digit. (See Figure A.2.)

Figure A.2. If the real numbers between 0 and 1 were countable. we could enumerate them as r₁, r₂….

From this table, we can construct another element of R (call it s) by going along the diagonal but twisting things at each step—we let the nth digit of s be Twist(d), where d is the nth digit of r_n. (See Figure A.3.) This new number s is clearly in R. But it also cannot be in this table—for if s = r_m for some m, what would the mth digit of s be? Cool, eh?

Figure A.3. By moving along the diagonal and twisting the digit there so it's different, we produce a real number s that cannot be in this enumeration.

A.3.2 Things Can Be Uncomputable

We can use the same principle to show that there are uncomputable functions. Let P be the set of possible programs and I be the set of possible inputs to these programs. When we get serious about formalizing P and I, we end up with sets that are countable. So, we can use these facts to build a table: Row n talks about program p_n; column k talks about what happens when we run programs on input i_k. In entry n, k, we write Halt or Not Halt, depending on whether P_n(i_k) halts or runs forever. (See Figure A.4.)

Figure A.4. Because the Turing machines are countable, we can enumerate them (down the rows here). Because their possible inputs are countable, we can enumerate them as well (across the columns here). Each machine on each input either halts or doesn't; in this table, we show what happens.

We can define a new Twist() function:

From this table, we can construct a function f by going along the diagonal but twisting things at each step:

f (i_n) = Twist(p_n(i_n)).

(See Figure A.5.)

Figure A.5. By moving along the diagonal and twisting the entry there so it's different, we produce a function f that cannot be computed by a Turing machine in this enumeration.

Semantically, this function f returns Halt on i_n exactly when P_n does not halt on input i_n. Consequently, no program can compute f ! All possible programs appear in this table. So, if f were computable, it must be computed by p_m for some m. However, how can p_m(i_m) behave the same as f (i_m)? If p_m(i_m) halts, then f (i_m) must say Not Halt, so p_m f ;if p_m(i_m) does not halt, then f (i_m) must halt, so p_m f. Cool, eh?

The uncomputable problem of deciding, yes or no, whether a given program halts on a given input is called the Halting Problem. One can use this idea as a springboard into a world of fascinating theoretical and practical work and uncomputable functions. (Sipser's Introduction to the Theory of Computation covers these problems in detail [Sip97].) As a young postdoc, one of us was once asked to write a program (for the U.S. Postal Service) that essentially solved the Halting Problem. Citing Alan Turing's classic 1930s paper [Tur37] in our final report [Smi05] was cause for much mirth (at least among the authors of the report).

A.4 Frameworks

Section A.3 showed how functions exist that cannot be computed. This existence can be useful when we want to talk rigorously about hard tasks: For example, we can show that in order to do this task, we have to solve the Halting Problem. This is an example of a computation-theoretic argument framework.

Theory also gives us other frameworks to reason about things. Among computable functions, we can start rigorously reasoning about the time and space resources (and types of computational devices) it takes to compute them. This theory gives us a way to talk about the complexity of computable problems. Informally, we say a problem has complexity O (f (n)) if the resources it requires to solve an instance of size n grows by not more than f (n), asymptotically. A problem has complexity (f (n)) if the required resource grows by at least f (n) asymptomatically. (Note however, that complexity theory usually assumes that the problem's parameters are written down in binary.)

The resulting framework gives another way to talk about hard tasks. For example, we can show that doing this task requires doing a function that, although computable, takes more resources than is feasible. This is an example of a complexity-theoretic argument framework.

Many aspects of cryptography are even weaker than complexity-theoretic. For one example, no clear complexity-theoretic bound underlies the security of SHA-1. For another example, everyone suspects that factoring is hard, but no one really knows; furthermore, breaking RSA might be even easier. We also thus sometimes see a tacit "cryptographic" framework—where one simply hopes that certain key functions are intractable. (We then argue that breaking our scheme requires breaking one of these hopefully intractable functions.)

Chapter 7 discussed yet another framework: the use of information theory to characterize the raw information present in message. If the information isn't there, then the adversary can't get it. However, if the information is there, that doesn't mean that the adversary can get it. When it comes to considering computation in the real world, information theory has a glaring deficiency—it neglects to take into account the feasibility (or even the possibility) of actually extracting the information from the message.

Conundrum

As a thought exercise about these frameworks, suppose that Alice wants to send a 20-character ASCII message M to Bob. She can choose many encodings:

M₁. She encodes each bit of M as an instance of the satisfiability problem, over a number of variables polynomial in |M|. (Satisfiable means that the bit is 1.)
M₂. She encodes each bit of M as an instance of the Halting Problem. (Halts means that the bit is 1.)
M₃. She encrypts the message with TDES.
M₄. She sends the message in plaintext but deletes the least-significant bit of each byte.

(The satisfiability problem comes from complexity theory: One is given a Boolean formula consisting of ANDs and ORs of variables and their negations and needs to answer: Is there a truth assignment to the variables that satisfies this formula?)

How much information about M is present in each of Alice's formats? Information theory tells us that M₁, M₂, and M₃ contain all the information. However, decoding M₁ is intractable, and decoding M₂ is, in general, not even computable. If Alice wants to minimize information exposed to an eavesdropper, she should choose M₄—even though that gives away half the message right away; any sensible implementer would choose M₃.

(Whenever we asked a local theoretician about this problem, he told us we were giving him a headache.)

A.5 Quantum Physics and Quantum Computation

Chapter 8 talked about how, if we use the properties of quantum mechanics to build new kinds of computers, we can factor efficiently—and thus break RSA. This section gives some more detail.

A.5.1 Evolution of Physics

To tell this story, let's broaden our scope from cryptography to the universe. How does the universe work?

Throughout history, humans seem to want to have a mathematical model that seems "natural" and "intuitive," and also actually explains the way the universe works. Searching for such a model raises deeper philosophical issues. Does the universe work this way because the model is natural? Or does the model seem natural (to us) because that's the way the universe works?

Newtonian mechanics was a model that proved rather satisfying. It explained how balls bounced. It gave us calculus. Unfortunately, the Newtonian model did not quite explain all observable phenomenon. This shortfall gave rise, in the early part of the twentieth century, to relativity: a clean model that explains such things as how the measured speed of light remains constant. Relativity isn't quite as "obvious" as the Newtonian model, but if one accepts a few things on faith, then it's still somewhat intuitive.

However, when scientists started examining the physics of very small stuff, things got strange. Explaining this strangeness led to a new model: quantum mechanics. The quantum model is elaborate, bizarre, and unintuitive—there is no "obvious" reason why the universe should work that way. The model appears sufficiently artificial and complex that a graduate student who proposed this as a theory to his or her adviser would probably be gently encouraged to go think a bit more.

However strange, the quantum model is the simplest model that we've come up with that actually explains the observed phenomenon. We now delve into this strange world.

A.5.2 Quantum Mechanics

We start with a simplified sketch. Consider an object O that might be in any state in some set S_O. For example, O might be a photon, with S_O being various polarizations; O might be a 1-bit register, with S_O = {0, 1}. For simplicity, we'll assume that S_O is finite. Intuitively, we think of the object O as actually being in exactly one state s S_O. For example, the photon has a vertical polarization right now, or the bit equals 1.

The Natural Intuitive Model

We might describe the state of the object O with a vector, with an entry v_s for each s S_O. The entry is 1 for the state s that O actually is in and 0 for all other entries.

Perhaps we're not certain which state the object is in. In this case, we might assign each s S_O a number P (s) describing the probability that object O is in state s. For each s, P (s) is a real number in the range 0 P (s) 1; over all s S_O, the P (s) sum to 1. We can extend our vector description to handle this uncertainty, simply by setting each entry v_s = P (S). The object O is in exactly one of these states, but we don't know which one.

Suppose instead that we have two of these objects: O₁ and O₂. If these two objects interact, we simply need to figure out what state each was in beforehand and then what state each goes to after the interaction. If we have uncertainty about what states they are in, we simply need to work through the probabilities. That is, we work through each choice of s₁ S_O and s₂ S_O.

For example, suppose that t₁ and t₂ are the new states of O₁ and O₂, respectively, when O₁ is in s₁ and O₂ is in s₂. If v₁ and v₂ be the vectors of O₁ and O₂, respectively, then the probability that O₁ is in s₁ and O₂ is also in s₂ is v₁[s₁]· v₂[s₂]· So, we add v₁[s₁]· v₂[s₂] to the probability that O₁ goes to t₁ and also to the probability that O₂ to t₂. And so on.

The Unnatural Quantum Model

The quantum model is like this intuitive model but branches out in a few surprising directions.

As before, if object O can be in one state from a set S_O of states, we can describe its state with a vector v consisting of an entry for each element of S_O. However, now, we express uncertainty by using complex numbers^[1]. Each such entry is called an amplitude; the squared moduli^[2] of the amplitudes sum to 1.

Quantum mechanics adopts the convention of writing states using "| " notation: for example, |0 and |1 for a 1-bit register. We then describe the state of the object as w|0 + v|1, where w, v are the amplitudes. Naively, we might suspect that these amplitudes are like probabilities; if the object has more than one nonzero amplitude, then it is still in one state, but which state that is is somehow reflected by the amplitude. However, it is stranger than that—more than one nonzero amplitude means that we cannot know what state the object is in, because it literally is in more than one state. This arrangement is called a superposition of states.

An object's ability to be in multiple states simultaneously leads to some surprising things. For one thing, we might wonder about why we never actually see an object in more than one state at one time. Something called the collapse of the wave function is responsible for this. If we make an observation of an object that is in a superposition of states, the object collapses into one state. Which state we see is determined probabilistically by the amplitudes—if object is in w|0 + v|1, the object goes to |0 with probability |w|² and |1 with probability |v|². What's even stranger here is that the object stays in the state we see. Observing the object changes the state of the object—its state becomes the state we observe.

Being in more than one state at a time also affects what happens when two objects interact. In the simpler uncertainty model, we merely had to think about the probability that objects O₁ and O₂ were in some particular states and then figure out what happens in that combination. However, with superposition of states, each state in O₁'s superposition interacts with each state in O₂'s superposition, all at the same time. Every state in the superposition contributes in an interaction. Furthermore, because these amplitudes are complex, we can end up with constructive interference and destructive interference between states. In fact, these phenomena caused the optical conundrums that helped motivate quantum theory in the first place.

A.5.3. The Root-Not Gate

To understand the computational implications of this model, let's consider a new kind of logic gate: ROOT-NOT. (Deutsch gives a nice discussion of this concept [Deu89].) We're familiar with the comfortable world of Boolean gates. Can we build a Boolean gate G such that G composed with G is NOT? Can we build a Boolean gate G that flips coins?

Our initial answer is: of course not! If G ο G = NOT, then we must have:

We can then work through the possibilities. If G(1) = 1, then G(G(1)) = 1, which breaks the rule. So, G(1) = 0. Similarly, we can establish that G(0) = 1. But then G(G(1)) = G(0) = 1, which also breaks the rule. So, we're stuck.

However, suppose that we build gates that use complex amplitudes and superpositions of states and operate on qubits. (The field defines a qubit to be a superposition of |0 and |1.) Then we might build a G that acts on the |0 and |1 components of its input as follows:

What happens when we run G on |0 and then run G again on the result?

When the wave function collapses, we see 1! Similarly, G(G(|1)) yields |0 . G composed with itself is NOT—hence G is ROOT-NOT.

What happens if we feed |0 to G and observe the result? If we look at the amplitudes, we see that we will observe |0 with probability and |1 with probability . So, one stage of G flips a coin, but running two stages of G inverts the original input. Yes, this is counterintuitive. But yes, this is how the universe works.

This ROOT-NOT gate gives a flavor of the power of quantum computing. Compare things like ROOT-NOT to standard Boolean gates—we can do things in a few steps that are much more complicated in the traditional model.