Why Do We Believe Theorems?
ANDRZEJ PELC
Most common mathematical practice deals with proofs of theorems. As authors, mathematicians1 invent proofs and try to write them down rigorously; as readers, they try to verify and understand proofs of other mathematicians; as referees and journal editors, they assess the interest and value of proofs, and as teachers, they explain proofs to novices.
Why are mathematicians so concerned with proofs? Or, reformulated in a more direct way: why do we prove theorems? This question serves as the title of the paper by Rav [1999]. The most obvious answer is: we prove theorems to convince ourselves and others that they are true. While this is often, indeed, the direct reason for proving a theorem, the convincing power is far from being the unique role of proofs in mathematical practice. Often the new ideas and techniques conveyed by a proof are much more important than the theorem for which the proof was originally invented. Rav formulates it succinctly:
Proofs are for the mathematician what experimental procedures are for the experimental scientist: in studying them one learns of new ideas, new concepts, new strategies—devices which can be assimilated for one’s own research and be further developed. [1999, p. 20]
and illustrates this epistemic function of proofs by well-chosen examples.
Nevertheless, the role of proofs as a means of convincing the mathematical community of the validity of theorems is very important. While proofs can also serve other purposes, only proofs can directly serve this purpose. Since in this paper we discuss the reasons for believing theorems, rather than discussing the epistemic value of proofs, our perspective on proofs is much more restrictive than that of [Rav, 1999]. It dealt with the question ‘Why do we prove theorems?’ and so was concerned with all aspects of proofs, both as ways to convince that the proved theorem is correct, and as repositories of mathematical knowledge. Our question (the title of this paper is deliberately modeled on the title of Rav [1999]) is ‘Why do we believe theorems?’ Thus we are only interested in the ‘convincing’ role of proofs.
Before announcing our position concerning reasons for believing theorems, we need to explain two crucial terms that will be used throughout the paper: proof and derivation. In choosing these particular terms, we follow [Rav, 1999]. (The expressions informal proof instead of proof and formal proof instead of derivation could be also used, as they convey the sense we want to give to those terms, but we find Rav’s terminology more convenient.) By proofs we will mean the arguments used in mathematical practice in order to justify the correctness of theorems. Since this simply reports the meaning adopted by the mathematical community, there is no formal definition of this term. On the other hand, the term derivation is used in its formal logical sense. Recall that a formalized language is described first. It has an appropriate alphabet (containing, among other things, logical symbols). Terms and formulae of this language are defined as specific strings of symbols of this alphabet. Then a formalized theory T is defined in this language by specifying a—possibly infinite—recursive set2 of formulae called axioms that form the basis of the theory. A finite set of transformations called inference rules is specified. These transformations permit us to obtain a new formula from a finite set of formulae. One example of an inference rule is modus ponens. Finally, a (linear) derivation is a finite sequence of formulae such that every term of this sequence is either an axiom or is obtained from a set of earlier terms of this sequence by applying one of the inference rules. A formula of the above formalized language is called a theorem of the theory T if it is the last formula in some derivation. The crucial characteristic of this formal definition is that, because of the fact that the set of axioms is recursive (although possibly infinite), there is a mechanical way to verify whether a given sequence of formulae of the formalized language is a derivation or not. (This should not be confused with the possibility of a mechanical method of verifying whether a given formula is a theorem of T or not. The well-known undecidability theorem states that the latter cannot be verified by a mechanical method for sufficiently strong theories.) This feature of derivations is the reason for their important role in the construction of formalized theories. Once an (alleged) derivation is presented, there is a mechanical way to verify its correctness, and in the case of a positive verdict, to assert that the last formula of the derivation is indeed a theorem of the theory T.
Since the formulation of Hilbert’s program, most of the discussions concerning reasons for confidence in mathematical theorems are conducted along the axis between formalists and their opponents. In a nutshell, the first group asserts that derivations underlying proofs actually carried out in mathematical practice are the reason to believe those theorems, while the second group dismisses or significantly weakens the role of those derivations in building confidence in theorems, and points to other features of proofs, such as the analysis of the meaning of mathematical notions, as well as to social factors, as responsible for belief in mathematical statements. Hence, while the radical interpretation of Hilbert’s program, as expressed in the famous Problem Number 2 stated in Hilbert’s address3 in Paris in 1900, has been shown to be infeasible by the later negative results of Gödel, a more moderate interpretation, trying to harvest confidence gained from some link between proofs and derivations, is still present and lively in discussions of the topic.
This controversy between formalists and their opponents is well exemplified by a discussion between Jody Azzouni and Yehuda Rav in a sequence of papers ([Rav, 1999], [Azzouni, 2004], and [Rav, 2007]). In this discussion Azzouni presents a moderate formalist point of view which he calls the derivation-indicator view, while Rav opposes his arguments. Azzouni’s view is well summarized in the abstract of his [2004], where he says:
A version of Formalism is vindicated: Ordinary mathematical proofs indicate (one or another) mechanically checkable derivation of theorems from the assumptions those ordinary mathematical proofs presuppose. The indicator view explains why mathematicians agree so readily on results established by proofs in ordinary language that are (palpably) not mechanically checkable.
The present paper is no exception to the rule that reasons for mathematical beliefs are discussed in terms of the controversy between formalists and non-formalists. We choose Azzouni [2004] as a protagonist of the formalist point of view because his position is moderate:4 we want to show that even such a moderate position is impossible to maintain. However, our opposition to the views expressed by Azzouni [2004] is presented from a very different standpoint from that of Rav [2007] and is stronger. To summarize our position, we will argue that:
No link between derivations and proofs can contribute to increasing confidence in theorems in the present state of knowledge.
When Rav says:
Consequently, when it comes to the nature of the logical justification of mathematical arguments in proofs, with Azzouni putting his faith in formal derivations, even if just indicated, and further maintaining that on this basis proofs are recognized by mathematicians as valid, as they see them hence being mechanically checkable, against such formalist-mechanist claims I have objections to voice, both on technical and on historical grounds. As opposed to various formalist views, I hold that mathematical proofs are cemented via arguments based on the meaning of the mathematical terms that occur in them, which by their very conceptual nature cannot be captured by formal calculi. [2007, p. 294] (emphasis in original)
we think that his critique is too mild. Indeed, we will argue that not only do mathematical proofs contain ingredients that cannot be captured by formal calculi but, in fact, no link between those proofs and the hypothetical derivations underlying them can be established in the case of many theorems. Thus, while Rav argues that there is added epistemic value in proofs with respect to derivations, we make a much stronger claim: in the case of many theorems, derivations cannot contribute any epistemic value at all.
We will argue that, in the present state of knowledge, it is impossible to gain any confidence in most mathematical theorems from hypothetical derivations underlying their proofs. Hence we will argue against the statement of Azzouni who says:
it’s derivations, derivations in one or another algorithmic system, which underlie what’s characteristic of mathematical practice: in particular, the social conformity of mathematicians with respect to whether one or another proof is or isn’t (should be, or shouldn’t be) convincing. [2004, p. 83]
Our impossibility argument concerning such a gain of confidence is based on considering lengths of hypothetical derivations underlying proofs.
It should be stressed that our impossibility claim is quite strong. We want to argue that not only do mathematicians not in practice use derivations to get or increase confidence in their results, but that in the present state of knowledge it is theoretically impossible to achieve such a gain of confidence in the case of most interesting mathematical theorems.
Since we want to argue against the statement of Azzouni quoted above, we need to try to understand what he means by saying that derivations ‘underlie what’s characteristic of mathematical practice.’ First we must ask: what exactly is an algorithmic system and derivations in which algorithmic system? Azzouni [2004] provides a kind of answer to the first question, saying in the footnote to the above quote: ‘An algorithmic system is one where the recognition procedure for proofs is mechanically implementable. By no means are algorithmic systems restricted to language-based axiom systems.’ He is more vague as far as the choice of a specific system is concerned:
. . . the picture doesn’t require mathematicians, in any case, when studying a subject-matter, to remain within the confines of a single (algorithmic) system—indeed, if anything, mathematicians are required to transcend such systems by embedding them in larger ones. The derivation indicated (by the application of new tools to a given subject matter) can be a derivation of the weaker system the mathematician started with, or it can be a derivation of a stronger system (some of) the terms of which are taken to pick out the same items supposedly referred to in the weaker system. [2004, p. 93]
Nevertheless, for a given theorem in mind, we have to focus on a particular ‘algorithmic system’ (while agreeing that this system could be modified several times in the course of developing the proof of the theorem). Derivations in an axiomatized theory, such as Zermelo-Fraenkel set theory with the Axiom of Choice (ZFC) are in principle mechanically checkable (when written correctly); hence we will use the system ZFC as an example of an algorithmic system in which derivations are ‘indicated’ by usual proofs, to use Azzouni’s terminology. ZFC is a good example because many mathematicians believe that most of the mathematical lore could be in principle formalized in this system. Actually, defining other mathematical notions (such as functions, relations, algebraic systems) in terms of sets is a standard mathematical practice which can be considered as a step in the direction of such a formalization. (Further steps are usually not made.)
Next we should see what Azzouni means by saying that proofs ‘indicate’ derivations. This point is somewhat obscure, as a precise definition of this relation between proofs and derivations is never given in his paper. To be sure, Azzouni is well aware of the fact that ‘mechanically recognizable derivations . . . (generally) aren’t ones (that can be) exhibited in practice’ [2004, p. 95] (emphasis original). However, the very fact of his stressing that derivations are not exhibited in practice points to the important distinction between practical considerations and a theoretical accessibility of such derivations. This is also implied by his statement ‘it’s derivation which provides the skeleton for (the flesh of) proof ’ (p. 95, emphasis original). In order to provide such a skeleton, a derivation should be at least theoretically accessible for scrutiny, and hence it should be at least theoretically possible to record and verify it. Since Azzouni stresses mechanical checkability as an important feature of derivations, it is therefore fair to assume that, whatever could be the precise meaning of the phrase ‘proofs indicate derivations’, it should be theoretically possible to check such indicated derivations mechanically.
Now comes a crucial remark that forms the basis of our impossibility argument. Since Azzouni (and other formalists) claim that confidence concerning theorems is somehow linked to derivations underlying proofs, the burden of justifying that such (indicated) derivations can (at least theoretically) be recorded and mechanically checked is on their side, in every case when such a gain of confidence is claimed. A caveat is in order here. Formalists do not need actually to exhibit an indicated derivation, but they should justify that such a derivation can theoretically be recorded and mechanically checked. This theoretical possibility is a necessary condition for such a gain of confidence and hence should be justified by anybody claiming the gain. There is an important asymmetry with respect to the ‘burden of justification’. In order to counter this formalist claim for a given theorem, there is no need to show that such a mechanically checkable derivation is theoretically impossible; it is enough to show that the formalist side has not justified the theoretical possibility of mechanical checking of such a derivation. We will argue that this is the case, and hence that the claimed confidence gain cannot be harvested.
Before proceeding with our impossibility argument, a disclaimer is in order. Our argument refers to ‘complex’ ‘deep’ theorems, as opposed to simple corollaries from definitions, such as the unicity of the neutral element in groups, or simple geometric theorems, e.g., that the sum of angles of a triangle totals π. For the latter two examples it is not hard to imagine a derivation in ZFC; it is probably even possible (although very likely a terribly boring task) to write it explicitly. Notice, however, that confidence in these simple results is extremely high anyway; so the additional potential gain would be small, if not non-existent. Of course, the above notions of ‘complex’ ‘deep’ theorems are very fuzzy, which is the reason we put them in quotation marks. However, most mathematicians would agree that Fermat’s Last Theorem, proved by Wiles [1995], is both complex and deep (no need of quotation marks in this case). Hence we will use this theorem, referred to as FLT, as an example in our impossibility argument.
We should also stress that the choice of ZFC as the underlying axiomatic system in which derivations are indicated by usual proofs is only given as an example. It could be replaced by any normally used system such as Peano Arithmetic, Kelley-Morse theory of classes, etc., whose axioms and inference rules people find intuitively obvious. Of course one might create an artificial system, e.g., one containing FLT as an axiom (in which the derivation of FLT would have length one), but such manipulations cannot possibly increase confidence in this theorem.
The starting point of our argument is the following controversy between Rav [1999] and Azzouni [2004]. Rav writes:
In reading a paper or monograph it often happens—as everyone knows too well—that one arrives at an impasse, not seeing why a certain claim B is to follow from claim A, as its author affirms. Let us symbolise the author’s claim by ‘A → B’. (The arrow functions iconically: there is an informal logical path from A to B. It does not denote formal implication.) Thus, in trying to understand the author’s claim, one picks up paper and pencil and tries to fill in the gaps. After some reflection on the background theory, the meaning of the terms and using one’s general knowledge of the topic, including eventually some symbol manipulation, one sees a path from A to A1, from A1 to A2, . . . , and finally from An to B. This analysis can be written schematically as follows:
A → A1, A1 → A2, . . . , An → B.
Explaining the structure of the argument to a student or non-specialist, the other may still fail to see why, for instance, A1 ought to follow from A. So again we interpolate A → A', A'→ A1. But the process of interpolations for a given claim has no theoretical upper bound. In other words, how far has one to analyse a claim of the form ‘from property A, B follows’ before assenting to it depends on the agent. There is no theoretical reason to warrant the belief that one ought to arrive at an atomic claim C → D which does not allow or necessitate any further justifying steps between C and D. This is one of the reasons for considering proofs as infinitary objects. Both Brouwer and Zermelo, each for different reasons, stressed the infinitary character of proofs. [1999, p. 14]
and Azzouni responds:
What’s going on? Why should anyone think that a finitary piece of mathematical reasoning, a step in a proof, say, corresponds to something infinitary (if, that is, we attempt to translate it into a derivation)? [2004, p. 97]
While we agree with the main thought of Rav [1999], it seems that the use of the word ‘infinitary’ may have been exaggerated. It is this word and not the argument itself that seems to have provoked the controversy. What we consider as the crux of Rav’s reasoning is that there is no clearly defined upper bound on the number of justifying steps. If the argument is presented in this way, there is no need to invoke any infinitary character of the proof, that raised Azzouni’s objections. Our impossibility argument will be based on this important distinction. (In the sequel we never make any claims about an infinitary character of proofs or derivations. Like Azzouni, we believe that these are finite objects.)
For any theorem T whose derivation (or many derivations) in ZFC is indicated (according to Azzouni’s terminology), by its ‘ordinary’ published proof, let L(T) denote the length of the shortest possible derivation (not only among those derivations indicated by the particular considered proof, but among all possible derivations of T in (ZFC). Such derivations exist since, by assumption, they are indicated by the ordinary proof. Hence the integer L(T) is well defined. We conjecture that in the case of most ‘complex’ ‘deep’ theorems T, and in particular in the case of FLT, it is impossible to provide (and justify) any upper bound on L(T). We agree that this is a bold statement but we would like to challenge a skeptical reader to provide (and justify!) any upper bound on L(FLT). Should it be a million? a quadrillion? 101010 ? What would be the justification of the choice of any such number? Actually, of any number at all? To be sure, we have not proved that it is impossible to give such an estimate, we have only conjectured it. Nevertheless, and this time it is not a conjecture, but a statement of a fact: nobody (including formalists claiming confidence gains from derivations underlying proofs) has ever given any such estimate. This simple observation is the first crucial claim in our argument. Our conjecture is stronger and says that such an estimate is impossible to establish, but it will not be used in the argument: due to the asymmetry with respect to the ‘burden of proof’, the fact that such an estimate has not been given will be enough for our reasoning.
The next step of our argument requires the definition of a very large integer number M, an extremely large number indeed. We are not concerned with determining the size of this number. The only thing that matters for our purposes is that no derivation of length larger than M could ever be actually written down or verified mechanically, even in theory. We will proceed with a series of estimates leading to the definition of M. First consider the period of time known as the Planck time (cf. [Halliday et al., 1996]). This interval of time, call it tP, of length of order 10–43 seconds, is a lower bound on the duration of any observable physical event, and hence it is also a lower bound on the time needed for any conceivable processing device to record or verify one step of a derivation. Next, switching to the macroscale, consider a future state of the universe precluding any information-processing activity. A good candidate for such a state would be the heat death of the universe, i.e., when the universe has reached maximum entropy. Let τ denote an upper bound (in seconds) on the time until this state of the universe is reached. Let Q = τ/tP. Hence Q is an upper bound on the number of steps of a derivation that can ever be recorded or verified by a single processing device. It is of course possible that many processing devices cooperate in verifying some derivation in parallel, each processing unit working on a different part of the derivation. Hence we need a third estimate, an upper bound on the number of elementary particles in the observable universe. Call this number P. Clearly, any processing device must be composed of at least one particle, hence the number of such conceivable devices is at most P. Finally let M = P. Q. It follows that the maximum number of derivation steps that could ever be recorded or verified is bounded by M. In other words, a derivation of length larger than M could never be verified, even if the entire observable universe were converted into a machine totally devoted to the verification of this single derivation.
It should be noted that all our estimates leading to the definition of M are grossly exaggerated: the time of recording or verifying one step of a derivation is much larger than tP, any computing activity in the universe would stop sooner than after τ seconds, and the number of processing devices in a hypothetical verifying machine is much smaller than P. As a consequence, our claim concerning the number M could be even made about a much smaller number. However, we chose to use the above exaggerated estimates because, for the purpose of our argument, we only need to indicate some number bounding the length of a theoretically verifiable derivation, and the number M has the advantage of being rather simple to define and of not leaving any doubt concerning the validity of our claim about it.
Consider a derivation of length larger than M. The hypothetical existence of such a derivation of a theorem T could not possibly contribute to our confidence in T because we could never have any kind of access (even theoretically) to all the terms of such an extremely large sequence of formulae, and hence we could never verify that it is indeed a correct derivation. Trying to gain any confidence concerning the validity of theorem T from the existence of such a hypothetical derivation is (metaphorically speaking) similar to getting insight into a black hole.
Let us now call a theorem T reachable, if L(T) ≤ M. In other words, a theorem is reachable, if and only if there exists its derivation in ZFC of length at most M. It follows from what was said above that reachability of a theorem is a necessary condition for gaining any confidence in it from some hypothetical derivation of this theorem.
Now we are ready for the third, final step of our argument. Consider the theorem FLT. Since no upper bound on L(FLT) has been justifiably provided, the answer to the question of whether FLT is reachable is unknown. Given the fact that the number M is so enormous, one would be tempted to give the answer ‘yes’ to this question, i.e., to establish M as an upper bound on L(FLT). However, as observed before, this has never been done (and seems impossible to do, although we do not need this stronger conjecture in our argument). To avoid misunderstandings: we are not claiming that FLT is not reachable, we are only arguing that the answer to this question is unknown.
We can now conclude that, since in the case of FLT (and the reasoning would be similar in the case of many other ‘complex’ ‘deep’ theorems) we are unable to decide—in the present state of knowledge—whether FLT is reachable, there can be no gain of confidence from indicating any derivation of FLT. Indeed, as observed at the beginning of our argument, the side claiming such a gain has the burden of justifying that for some derivation of FLT it is theoretically possible to check it mechanically. This however would imply reachability of FLT, which, as previously argued, is not known to be true.
We would like to present the following analogy. A biologist studying animals’ behavior or a trainer working with animals considers such notions as fear, anger, pain, or sexual attraction, and tries to explain various actions of animals (attacking, running away, mating, etc.) in these terms. However, these actions could also be explained at some lower level, the level of biochemical reactions in the animal’s brain, or even at a subatomic level, involving statistical information about motions of particles. It is clear that these lower levels (especially the subatomic one) are not very useful in explaining and predicting the animal’s behavior: they are too detailed. Likewise, a proof of a theorem should be explained at a ‘macroscopic’ level, involving mathematical objects and principal techniques, rather than at a ‘subatomic’ derivation level. However, we would like to observe that the analogy ends at this point. Indeed, the subatomic level in the case of living organisms can be (indirectly) observed. Hence one may argue that (with suitable hypothetical tools) we may be able to gain additional biological knowledge studying such a microscopic level. In the case of mathematical theorems, the accessibility of this ‘subatomic’ level, i.e., the level of derivations, is quite different. The core of the problem is their mechanical verifiability. As previously argued in the case of particular theorems these objects may be too large to be written, perceived (in any possible sense), let alone scrutinized for mechanical verification. So while in biology it is unlikely but theoretically possible that the subatomic level can provide additional insight into animals’ behavior, such a possibility (even theoretical) need not exist in mathematics in the case of many theorems, for a simple but crucial reason: the size of objects at the ‘subatomic’ level (i.e., at the level of derivations) may be too large to be examined with the aim of verification by a human mind or by any technological device.
Finally it should be stressed that, as previously mentioned, our impossibility argument concerns the present state of knowledge. In other words, we argued that it is now impossible (even theoretically) to gain additional confidence concerning the validity of theorems such as FLT by ‘indicating’ their derivations. Indeed, we based our argument on the observation that the question of whether FLT is reachable has not been settled until now. This impossibility at present is enough to refute Azzouni’s [2004] claim, which is made about such gains of confidence at present. It is of course hard to predict how the situation could change with additional insights in the future. If, for example, short derivations of all published mathematical theorems were discovered one day (a very unlikely but not completely impossible scenario) then the situation would change dramatically, and Azzouni’s claim might become substantiated. This, however, does not concern our discussion whose purpose was to refute the possibility of such confidence gains now, a claim made by the formalist side.
Thanks to Jody Azzouni for enlightening discussions concerning the subject of this paper and to anonymous referees whose important remarks permitted us to correct the arguments and improve presentation. The remaining flaws remain, of course, solely the responsibility of the author. This research was partially supported by NSERC discovery grant and by the Research Chair in Distributed Computing at the Université du Québec en Outaouais.
1. The term ‘mathematician’ is used here in a large sense and encompasses all scientists adopting the methodology of proofs taken from ‘mainstream’ mathematics. Hence apart from, say, algebraists or topologists, it includes, e.g., mathematical logicians and theoretical computer scientists but excludes users of mathematics such as engineers or physicists, who are interested in mathematical results but not in the way they are obtained.
2. This is a technical term whose informal meaning is that there exists a mechanical method of verifying whether a given sentence is a member of the set or not.
3. ‘I wish to designate the following as the most important among the numerous questions which can be asked with respect to the axioms: To prove that they are not contradictory, that is, that a finite number of logical steps based upon them can never lead to a contradictory result.’ Hilbert [1900] (emphasis in original).
4. For example, Azzouni’s abstract quoted above goes on to say: ‘Mechanically checkable derivations in this way structure ordinary mathematical practice without its being the case that ordinary mathematical proofs can be “reduced to” such derivations.’
AZZOUNI, J. [2004]: ‘The derivation-indicator view of mathematical practice’, Philosophia Mathematica (3) 12, 81–105.
HALLIDAY, D., M. RESNICK, and J. WALKER [1996]: Fundamentals of Physics. New York: Wiley.
HILBERT, D. [1900]: ‘Mathematische Probleme’, Nachrichten Königl. Gesell. Wiss. Göttingen, Math.-Phys. Klasse, 253–297; English trans. Mary W. Newson, ‘Mathematical problems’, Bull. Amer. Math. Soc. 8 (1902), 437–479.
RAV, Y. [1999]: ‘Why do we prove theorems?’ Philosophia Mathematica (3) 7, 5–41.
———. [2007]: ‘A critique of a formalist-mechanist version of the justification of arguments in mathematicians’ proof practices’, Philosophia Mathematica (3) 15, 291–320.
WILES, A. [1995]: ‘Modular elliptic curves and Fermat’s Last Theorem’, Annals of Mathematics 141, 443–551.