APPENDIX *vii
Zero Probability and the Fine-Structure of Probability and of Content

In the book, a sharp distinction is made between the idea of the probability of a hypothesis, and its degree of corroboration. It is asserted that if we say of a hypothesis that it is well corroborated, we do not say more than that it has been severely tested (it must be thus a hypothesis with a high degree of testability) and that it has stood up well to the severest tests we were able to design so far. And it is further asserted that degree of corroboration cannot be a probability, because it cannot satisfy the laws of the probability calculus. For the laws of the probability calculus demand that, of two hypotheses, the one that is logically stronger, or more informative, or better testable, and thus the one which can be better corroborated, is always less probable—on any given evidence—than the other. (See especially sections 82 and 83.)

Thus a higher degree of corroboration will, in general, be combined with a lower degree of probability; which shows not only that we must distinguish sharply between probability (in the sense of the probability calculus) and degree of corroboration or confirmation, but also that the probabilistic theory of induction, or the idea of an inductive probability, is untenable.

The impossibility of an inductive probability is illustrated in the book (sections 80, 81, and 83) by a discussion of certain ideas of Reichenbach’s, Keynes’s and Kaila’s. One result of this discussion is that in an infinite universe (it may be infinite with respect to the number of distinguishable things, or of spatio-temporal regions), the probability of any (non-tautological) universal law will be zero.

(Another result was that we must not uncritically assume that scientists ever aim at a high degree of probability for their theories. They have to choose between high probability and high informative content, since for logical reasons they cannot have both; and faced with this choice, they have so far always chosen high informative content in preference to high probability—provided that the theory has stood up well to its tests.)

By ‘probability’, I mean here either the absolute logical probability of the universal law, or its probability relative to some evidence; that is to say, relative to a singular statement, or to a finite conjunction of singular statements. Thus if a is our law, and b any empirical evidence, I assert that

(1)

20004f49v05_0404_004.jpg

and also that

(2)

20004f49v05_0404_007.jpg

These formulae will be discussed in the present appendix.

The two formulae, (1) and (2), are equivalent. For as Jeffreys and Keynes observed, if the ‘prior’ probability (the absolute logical probability) of a statement a is zero, then so must be its probability relative to any finite evidence b, since we may assume that for any finite evidence b, we have p(b) ‚ 0. For p(a) = 0 entails p(ab) = 0, and since p(a, b) = p(ab)/p(b), we obtain (2) from (1). On the other hand, we may obtain (1) from (2); for if (2) holds for any evidential b, however weak or ‘almost tautological’, we may assume that it also holds for the zero-evidence, that is to say, for the tautology t = bb Il_20004f49v05_0404_003.gif; and p(a) may be defined as equal to p(a, t).

There are many arguments in support of (1) and (2). First, we may consider the classical definition of probability as the number of the favourable possibilities divided by that of all (equal) possibilities. We can then derive (2), for example, if we identify the favourable possibilities with the favourable evidence. It is clear that, in this case, p(a,b) = 0; for the favourable evidence can only be finite, while the possibilities in an infinite universe must be clearly infinite. (Nothing depends here on ‘infinity’, for any sufficiently large universe will yield, with any desired degree of approximation, the same result; and we know that our universe is extremely large, compared with the amount of evidence available to us.)

This simple consideration is perhaps a little vague, but it can be considerably strengthened if we try to derive (1), rather than (2), from the classical definition. We may to this end interpret the universal statement a as entailing an infinite product of singular statements, each endowed with a probability which of course must be less than unity. In the simplest case, a itself may be interpreted as such as infinite product; that is to say, we may put a = ‘everything has the property A’; or in symbols, ‘(x)Ax’, which may be read ‘for whatever value of x we may choose, x has the property A’.1 In this case, a may be interpreted as the infinite product a = a1a2a3... where ai = Aki, and where ki is the name of the ith individual of our infinite universe of discourse.

We may now introduce the name ‘an’ for the product of the first n singular statements, a1a2... an, so that a may be written

20004f49v05_0405_004.jpg
20004f49v05_0406_001.jpg

and (see page 346)

(3)

20004f49v05_0406_004.jpg

It is clear that we may interpret an as the assertion that, within the finite sequence of elements k1, k2,... kn, all elements possess the property A. This makes it easy to apply the classical definition to the evaluation of p(an). There is only one possibility that is favourable to the assertion an: it is the possibility that all the n individuals, ki without exception, possess the property A rather than the property non-A. But there are in all2n possibilities, since we must assume that it is possible for any individual ki, either to possess the property A or the property non-A. Accordingly, the classical theory gives

(4 c )

20004f49v05_0406_007.jpg

But from (3) and (4c), we obtain immediately (1).

The ‘classical’ argument leading to (4c) is not entirely adequate, although it is, I believe, essentially correct.

The inadequacy lies merely in the assumption that A and non-A are equally probable. For it may be argued—correctly, I believe—that since a is supposed to describe a law of nature, the various ai are instantiation statements, and thus more probable than their negations which are potential falsifiers. (Cf. note *1 to section 28). This objection however, relates to an inessential part of the argument. For whatever probability—short of unity—we attribute to A, the infinite product a will have zero probability (assuming independence, which will be discussed later on). Indeed, we have struck here a particularly trivial case of the one-or-zero law of probability (which we may also call, with an allusion to neuro-physiology, ‘the all-or-nothing principle’). In this case it may be formulated thus: if a is the infinite product of a1, a2,..., where p(ai) = p(aj), and where every ai is independent of all others, then the following holds:

(4)

20004f49v05_0407_002.jpg

But it is clear that p(a) = 1 is unacceptable (not only from my point of view but also from that of my inductivist opponents who clearly cannot accept the consequence that the probability of a universal law can never be increased by experience). For ‘all swans are black’ would have the probability 1 as well as ‘all swans are white’—and similarly for all colours; so that ‘there exists a black swan’ and ‘there exists a white swan’, etc., would all have zero probability, in spite of their intuitive logical weakness. In other words, p(a) = 1 would amount to asserting on purely logical grounds with probability 1 the emptiness of the universe.

Thus (4) establishes (1).

Although I believe that this argument (including the assumption of independence to be discussed below) is incontestable, there are a number of much weaker arguments which do not assume independence and which still lead to (1). For example we might argue as follows.

It was assumed in our derivation that for every ki, it is logically possible that it has the property A, and alternatively, that it has the property non-A: this leads essentially to (4). But one might also assume, perhaps, that what we have to consider as our fundamental possibilities are not the possible properties of every individual in the universe of n individuals, but rather the possible proportions with which the properties A and non-A may occur within a sample of individuals. In a sample of n individuals, the possible proportions with which A may occur are: 0, 1/n,..., n/n. If we consider the occurrences of any of these proportions as our fundamental possibilities, and thus treat them as equi-probable (‘Laplace’s distribution’2), then (4) would have to be replaced by

(5)

20004f49v05_0407_008.jpg

Although from the point of view of a derivation of (1), formula (5) is much weaker than (4c), it still allows us to derive (1)—and it allows us to do so without identifying the observed cases as the favourable ones or assuming that the number of observed cases is finite.

A very similar argument leading to (1) would be the following. We may consider the fact that every universal law a entails (and is therefore at most equally as probable as) a statistical hypothesis h of the form ‘p(x, y) = 1’, and that the absolute probability of h may be calculated with the help of Laplace’s distribution, with the result p(h) = 0. (Cf. appendix *ix, the Third Note, especially *13.) But since a entails h, this leads to p(a) = 0; that is to say, to (1).

To me, this proof appears the simplest and most convincing: it makes it possible to uphold (4) and (5), by assuming that (4) applies to a and (5) to h.

So far our considerations were based on the classical definition of probability. But we arrive at the same result if instead we adopt as our basis the logical interpretation of the formal calculus of probability. In this case, the problem becomes one of dependence or independence of statements.

If we again regard a as the logical product of the singular statements a1a2,..., then the only reasonable assumption seems to be that, in the absence of any (other than tautological) information, we must consider all these singular statements as mutually independent of one another, so that ai may be followed by aj or by its negation, Il_20004f49v05_0408_001.gifj, with the probabilities

20004f49v05_0408_006.jpg

Every other assumption would amount to postulating ad hoc a kind of after-effect; or in other words, to postulating that there is something like a causal connection between ai and aj. But this would obviously be a non-logical, a synthetic assumption, to be formulated as a hypothesis. It thus cannot form part of purely logical theory of probability.

The same point may be put a little differently thus: in the presence of some hypothesis, h say, we may of course have

(6)

20004f49v05_0409_003.jpg

For h may inform us of the existence of a kind of after-effect. Consequently, we should then have

(7)

20004f49v05_0409_006.jpg

since (7) is equivalent to (6). But in the absence of h, or if h is tautologous or, in other words, if we are concerned with absolute logical probabilities, (7) must be replaced by

(8)

20004f49v05_0409_009.jpg

which means that ai and aj are independent, and which is equivalent to

(9)

20004f49v05_0409_012.jpg

But the assumption of mutual independence leads, together with p(ai) < 1, as before to p(a) = 0; that is to say, to (1).

Thus (8), that is, the assumption of the mutual independence of the singular statements ai leads to (1); and mainly for this reason, some authors have, directly or indirectly, rejected (8). The argument has been, invariably, that (8) must be false because if it were true, we could not learn from experience: empirical knowledge would be impossible. But this is incorrect: we may learn from experience even though p(a) = p(a, b) = 0; for example, C(a, b)—that is to say, the degree of corroboration of a by the tests b—may none the less increase with new tests. (Cf. appendix *ix). Thus this ‘transcendental’ argument fails to hit its target; at any rate, it does not hit my theory.3

But let us now consider the view that (8) is false, or in other words, that

20004f49v05_0410_002.jpg

is valid, and consequently

20004f49v05_0410_004.jpg

and also the following:

(+ )

20004f49v05_0410_007.jpg

This view asserts that once we have found some ki to possess the property A, the probability increases that another kj possesses the same property; and even more so if we have found the property in a number of cases. Or in Hume’s terminology, (+) asserts ‘that those instances’ (for example, kn), ‘of which we have had no experience, are likely to resemble those, of which we have had experience’.

The quotation, except for the words ‘are likely to’, is taken from Hume’s criticism of induction.4 And Hume’s criticism fully applies to (+), or its italicized verbal formulation. For, Hume argues, ‘even after the observation of the frequent constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’.5 If anybody should suggest that our experience entitles us to draw inferences from observed to unobserved objects, then, Hume says, ‘I wou’d renew my question, why from this experience we form any conclusion beyond those past instances, of which we have had experience’. In other words, Hume points out that we get involved in an infinite regress if we appeal to experience in order to justify any conclusion concerning unobserved instances—even mere probable conclusions, as he adds in his Abstract. For there we read: ‘It is evident that Adam, with all his science, would never have been able to demonstrate that the course of nature must continue uniformly the same.... Nay, I will go farther, and assert that he could not so much as prove by any probable arguments that the future must be conformable to the past. All probable arguments are built on the supposition that there is conformity betwixt the future and the past, and therefore can never prove it.’6 Thus (+ ) is not justifiable by experience; yet in order to be logically valid, it would have to be of the character of a tautology, valid in every logically possible universe. But this is clearly not the case.

Thus (+ ), if true, would have the logical character of a synthetic a priori principle of induction, rather than of an analytic or logical assertion. But it does not quite suffice even as a principle of induction. For (+ ) may be true, and p(a) = 0 may be valid none the less. (An example of a theory which accepts (+ ) as a priori valid—though, as we have seen, (+ ) must be synthetic—and which at the same time accepts p(a) = 0, is Carnap’s.7)

 An effective probabilistic principle of induction would have to be even stronger than (+ ). It would have to allow us, at least, to conclude that for some fitting singular evidence b, we may obtain p(a, b) > 1/2, or in words, that a may be made, by accumulating evidence in its favour, more probable than its negation. But this is only possible if (1) is false, that is to say, if we have p(a) > 0.

A more direct disproof of (+ ) and a proof of (2) can be obtained from an argument which Jeffreys gives in his Theory of Probability, Il_20004f49v05_0412_001.gif 1.6. 8

Jeffreys discusses a formula which he numbers (3) and which in our symbolism amounts to the assertion that, provided p(bi, a) = 1 for every i Il_20004f49v05_0412_002.gif n, so that p(abn) = p(a), the following formula must hold:

(10)

20004f49v05_0412_005.jpg

Discussing this formula, Jeffreys says (I am still using my symbols in place of his): ‘Thus, with a sufficient number of verifications, one of three things must happen: (1) The probability of a on the information available exceeds 1. (2) it is always 0. (3) p(bn, bn -1) will tend to 1.’ To this he adds that case (1) is impossible (trivially so), so that only (2) and (3) remain. Now I say that the assumption that case (3) holds universally, for some obscure logical reasons (and it would have to hold universally, and indeed a priori, if it were to be used in induction), can be easily refuted. For the only condition needed for deriving (10), apart from 0 < p(Bi) < 1, is that three exists some statement a such that p(bn, a) = 1. But this condition can always be satisfied, for any sequence of statements bi. For assume that the bi are reports on penny tosses; then it is always possible to construct a universal law a which entails the reports of all the n -1 observed penny tosses, and which allows us to predict all further penny tosses (though probably incorrectly).9 Thus the required a always exists; and there always is also another law, aœ, yielding the same first n -1 results but predicting, for the nth toss, the opposite result. It would be paradoxical, therefore, to accept Jeffreys’s case (3), since for a sufficiently large n we would always obtain p(bn, bn -1) close to 1, and also (from another law, aœ) p(b Il_20004f49v05_0413_001.gifn, bn -1) close to 1. Accordingly, Jeffreys’s argument, which is mathematically inescapable, can be used to prove his case (2), which happens to coincide with my own formula (2), as stated at the beginning of this appendix.10

We may sum up our criticism of (+ ) as follows. Some people believe that, for purely logical reasons, the probability that the next thing we meet will be red increases in general with the number of red things seen in the past. But this is a belief in magic—in the magic of human language. For ‘red’ is merely a predicate; and there will always be predicates A and B which both apply to all the things so far observed, but lead to incompatible probabilistic predictions with respect to the next thing. These predicates may not occur in ordinary languages, but they can always be constructed. (Strangely enough, the magical belief here criticized is to be found among those who construct artificial model languages, rather than among the analysts of ordinary language.) By thus criticizing (+ ) I am defending, of course, the principle of the (absolute logical) independence of the various an from any combination aiaj ... ; that is to say, my criticism amounts to a defence of (4) and (1). There are further proofs of

(1). One of them which is fundamentally due to an idea of Jeffreys and Wrinch11 will be discussed more fully in appendix *viii. Its main idea may be put (with slight adjustments) as follows.

Let e be an explicandum, or more precisely, a set of singular facts or data which we wish to explain with the help of a universal law. There will be, in general, an infinite number of possible explanations—even an infinite number of explanations (mutually exclusive, given the data e) such that the sum of their probabilities (given e) cannot exceed unity. But this means that the probability of almost all of them must be zero—unless, indeed, we can order the possible laws in an infinite sequence, so that we can attribute to each a positive probability in such a way that their sum converges and does not exceed unity. And it means, further, that to laws which appear earlier in this sequence, a greater probability must be attributed (in general) than to laws which appear later in the sequence. We should therefore have to make sure that the following important consistency condition is satisfied:

Our method of ordering the laws must never place a law before another one if it is possible to prove that the probability of the latter is greater than that of the former.

Jeffreys and Wrinch had some intuitive reasons to believe that a method of ordering the laws satisfying this consistency condition may be found: they proposed to order the explanatory theories according to their decreasing simplicity (‘simplicity postulate’), or according to their increasing complexity, measuring complexity by the number of the adjustable parameters of the law. But it can be shown (and it will be shown in appendix *viii) that this method of ordering, or any other possible method, violates the consistency condition.

Thus we obtain p(a, e) = 0 for all explanatory hypotheses, whatever the data e may be; that is to say, we obtain (2), and thereby indirectly (1).

(An interesting aspect of this last proof is that it is valid even in a finite universe, provided our explanatory hypotheses are formulated in a mathematical language which allows for an infinity of (mutually exclusive) hypotheses. For example, we may construct the following universe.12 On a much extended chessboard, little discs or draught pieces are placed by somebody according to the following rule: there is a mathematically defined function, or curve, known to him but not to us, and the discs may be placed only in squares which lie on the curve;

within the limits determined by this rule, they may be placed at random. Our task is to observe the placing of the discs, and to find an ‘explanatory theory’, that is to say, the unknown mathematical curve, if possible, or one very close to it. Clearly, there will be an infinity of possible solutions any two of which are incompatible, although some of them will be indistinguishable with respect to the discs placed on the board. Any of these theories may, of course, be ‘refuted’ by discs placed on the board after the theory was announced. Although the

‘universe’—that of possible positions—may here be chosen to be a finite one, there will be nevertheless an infinity of mathematically incompatible explanatory theories. I am aware, of course, that instrumentalists or operationalists might say that the differences between any two theories determining the same squares would be ‘meaningless’. But apart from the fact that this example does not form part of my argument—so that I need really not reply to this objection—the following should be noted. It will be possible, in many cases, to give ‘meaning’ to these ‘meaningless’ differences by making our mesh sufficiently fine, i.e. subdividing our squares.)

The detailed discussion of the fact that my consistency condition cannot be satisfied will be found in appendix *viii. I will now leave the problem of the validity of formulae (1) and (2), in order to proceed to the discussion of a formal problem arising from the fact that these formulae are valid, so that all universal theories, whatever their content, have zero probability.

There can be no doubt that the content or the logical strength of two universal theories can differ greatly. Take the two laws a1 = ‘All planets move in circles’ and a2 = ‘All planets move in ellipses’. Owing to the fact that all circles are ellipses (with eccentricity zero), a1 entails a2, but not vice versa. The content of a1 is greater by far than the content of a2. (There are, of course, other theories, and logically stronger ones, than a1; for example, ‘All planets move in concentric circles round the sun’.)

The fact that the content of a1 exceeds that of a2 is of the greatest significance for all our problems. For example, there are tests of a1—that is to say, attempts to refute a1 by discovering some deviation from circularity—which are not tests of a2; but there could be no genuine test of a2 which would not, at the same time, be an attempt to refute a1. Thus a1 can be more severely tested than a2, it has the greater degree of testability; and if it stands up to its more severe tests, it will attain a higher degree of corroboration than a2 can attain.

Similar relationships may hold between two theories, a1 and a2, even if a1 does not logically entail a2, but entails instead a theory to which a2 is a very good approximation. (Thus a1 may be Newton’s dynamics and a2 may be Kepler’s laws which do not follow from Newton’s theory, but merely ‘follow with good approximation’; see also section *15 of my Postscript.) Here too, Newton’s theory is better testable, because its content is greater.13

Now our proof of (1) shows that these differences in content and in testability cannot be expressed immediately in terms of the absolute logical probability of the theories a1 and a2, since p(a1) = p(a2) = 0. And if we define a measure of content, C(a), by C(a) = 1-p(a), as suggested in the book, then we obtain, again, C(a1) = C(a2), so that the differences in content which interest us here remain unexpressed by these measures. (Similarly, the difference between a self-contradictory statement aa Il_20004f49v05_0416_001.gif and a universal theory a remains unexpressed since p(aa Il_20004f49v05_0416_001.gif) = p(a) = 0, and C(aa Il_20004f49v05_0416_001.gif) = C(a) = 1.14)

All this does not mean that we cannot express the difference in content between a1 and a2 in terms of probability, at least in some cases. For example, the fact that a1 entails a2 but not vice versa would give rise to

20004f49v05_0417_002.jpg

even though we should have, at the same time, p(a1) = p(a2) = 0.

Thus we should have

20004f49v05_0417_005.jpg

which would be an indication of the greater content of a1.

The fact that there are these differences in content and in absolute logical probability which cannot be expressed immediately by the corresponding measures may be expressed by saying that there is a ‘fine structure’ of content, and of logical probability, which may allow us to differentiate between greater and smaller contents and absolute probabilities even in cases where the measures C(a) and p(a) are too coarse, and insensitive to the differences; that is, in cases where they yield equality. In order to express this fine structure, we may use the symbols ‘ 20004f49v05_0417_010.jpg’ (‘is higher’) and ‘20004f49v05_0417_008.jpg ’ (‘is lower’), in place of the ordinary symbols ‘ > ’ and ‘ < ’. (We may also use ‘ 20004f49v05_0417_009.jpg’, or ‘is higher or equally high’, and ‘ 20004f49v05_0418_002.jpg ’.) The use of these symbols can be explained by the following rules:

(1) ‘C(a)20004f49v05_0418_004.jpg C(b)’ and thus its equivalent ‘p(b)20004f49v05_0418_005.jpg p(a)’ may be used to state that the content of a is greater than that of bat least in the sense of the fine structure of content. We shall thus assume that C(a)20004f49v05_0418_007.jpg C(b) entails C(a) C(b), and that this in turn entails C(a) Il_20004f49v05_0418_002.gif C(b), that is to say, the falsity of C(a) < C(b). None of the opposite entailments hold.

(2) C(a)20004f49v05_0418_008.jpg C(b) and C(a)20004f49v05_0418_009.jpg C(b) together entail C(a) = C(b), but C(a) = C(b) is compatible with C(a)20004f49v05_0418_006.jpg C(b), or with C(a)20004f49v05_0418_011.jpg C(b) and, of course, also with C(a)20004f49v05_0418_012.jpg C(b) and with C(a)20004f49v05_0418_013.jpg C(b).

(3) C(a) > C(b) always entails C(a)20004f49v05_0418_010.jpg C(b).

(4) Corresponding rules will hold for p(a)20004f49v05_0418_015.jpg p(b), etc. 

The problem now arises of determining the cases in which we may say that C(a)20004f49v05_0418_017.jpg C(b) holds even though we have C(a) = C(b). A number of cases are fairly clear; for example, unilateral entailment of b by a. More generally, I suggest the following rule:

If for all sufficiently large finite universes (that is, for all universes with more than N members, for some sufficiently large N), we have C(a) > C(b), and thus, in accordance with rule (3), C(a)20004f49v05_0418_019.jpg C(b), we retain C(a)20004f49v05_0418_021.jpg C(b) for an infinite universe even if, for an infinite universe, we obtain C(a) = C(b).

This rule seems to cover most cases of interest, although perhaps not all.15

The problem of a1 = ‘All planets move in circles’ and a2 = ‘All planets move in ellipses’ is clearly covered by our rule, and so is even the case of comparing a1 and a3 = ‘All planets move in ellipses with an eccentricity other than zero’; for p(a3) > p(a1) will hold in all sufficiently large finite universes (of possible observations, say) in the simple sense that there are more possibilities compatible with a3 than with a1.

*

The fine-structure of content and of probability here discussed not only affects the limits, 0 and 1, of the probability interval, but it affects in principle all probabilities between 0 and 1. For let a1 and a2 be universal laws with p(a2) = 0 and p(a1)20004f49v05_0419_002.jpg p(a2), as before; let b be not entailed by either a1 or a2 or their negations; and let 0 < p(b) = r < 1. Then we have

 
20004f49v05_0419_003.jpg

and at the same time

20004f49v05_0419_005.jpg

Similarly we have

20004f49v05_0419_007.jpg

and at the same time

20004f49v05_0419_009.jpg

since p(Il_20004f49v05_0419_006.gif1) p(Il_20004f49v05_0419_006.gif2), although of course p(Il_20004f49v05_0419_006.gif1) = p(Il_20004f49v05_0419_006.gif2) = 1. Thus we may have for every b such that p(b) = r, a20004f49v05_0419_013.jpg c1 such that p(c1) = p(b) and p(c1)20004f49v05_0419_012.jpg p(b), and also a c2 such that p(c2) = p(b) and p(c2)20004f49v05_0419_011.jpg p(b). 

The situation here discussed is important for the treatment of the simplicity or the dimension of a theory. This problem will be further discussed in the next appendix.

Addendum, 1972

In the last paragraph of the preceding Appendix, I hinted that the idea of a fine-structure of probability may be of significance for the comparison of the simplicity and dimension of theories. But the opposite also holds. The simplicity and especially the dimension of a theory are significant for the theory of its fine-structure, as emerges from the first pages of the following Appendix.

The dimension of a theory is relative to a field of application and thus to a set of problems for which the theory offers some solution. (The same relativization will be relevant to the fine-structure of theories, and thus to their ‘goodness’.)

1x’ is here an individual variable ranging over the (infinite) universe of discourse. We may choose; for example, a = ‘All swans are white’ = ‘for whatever value of x we may choose, x has the property A’ where ‘A’ is defined as ‘being white or not being a swan’. We may also express this slightly differently, by assuming that x ranges over the spatiotemporal regions of the universe, and that ‘A’ is defined by ‘not inhabited by a non-white swan’. Even laws of more complex form—say of a form like ‘(x)(y)(xRy ¨ xSy)’ may be written ‘(x)Ax’, since we may define ‘A’ by

We may perhaps come to the conclusion that natural laws have another form than the one here described (cf. appendix *x): that they are logically still stronger than is here assumed; and that, if forced into a form like ‘(x)Ax’, the predicate A becomes essentially non-observational (cf. notes *1 and *2 to the ‘Third Note’, reprinted in appendix *ix) although, of course, deductively testable. But in this case, our considerations remain valid a fortiori.

2 It is the assumption underlying Laplace’s derivation of his famous ‘rule of succession’; this is why I call it ‘Laplace’s distribution’. It is an adequate assumption if our problem is one of mere sampling; it seems inadequate if we are concerned (as was Laplace) with a succession of individual events. See also appendix *ix, points 7 ff. of my ‘Third Note’; and note 10 to appendix *viii.

3 An argument which appeals to the fact that we possess knowledge or that we can learn from experience, and which concludes from this fact that knowledge or learning from experience must be possible, and further, that every theory which entails the impossibility of knowledge, or of learning from experience, must be false, may be called a ‘transcendental argument’. (This is an allusion to Kant.) I believe that a transcendental argument may indeed be valid if it is used critically—against a theory which entails the impossibility of knowledge, or of learning from experience. But one must be very careful in using it. Empirical knowledge in some sense of the word ‘knowledge’, exists. But in other senses—for example in the sense of certain knowledge, or of demonstrable knowledge—it does not. And we must not assume, uncritically, that we have ‘probable’ knowledge–knowledge that is probable in the sense of the calculus of probability. It is indeed my contention that we do not have probable knowledge in this sense. For I believe that what we may call ‘empirical knowledge’, including ‘scientific knowledge’, consists of guesses, and that many of these guesses are not probable (or have a probability zero) even though they may be very well corroborated. See also my Postscript, sections *28 and *32.

4 Treatise of Human Nature, 1739–40, book i, part iii, section vi (the italics are Hume’s). See also my Postscript, note 1 to section *2 and note 2 to section *50.

5 loc. cit., section xii (the italics are Hume’s). The next quotation is from loc. cit., section vi.

6 Cf. An Abstract of a Book lately published entitled A Treatise of Human Nature, 1740, ed. by J. M. Keynes and P. Sraffa, 1938, p. 15. Cf. note 2 to section 81. (The italics are Hume’s.)

7 Carnap’s requirement that his ‘lambda’ (which I have shown to be the reciprocal of a dependence measure) must be finite entails our (+ ); cf. his Continuum of Inductive Methods, 1952. Nevertheless, Carnap accepts p(a) = 0, which according to Jeffreys would entail the impossibility of learning from experience. And yet, Carnap bases his demand that his ‘lambda’ must be finite, and thus that (+ ) is valid, on precisely the same transcendental argument to which Jeffreys appeals—that without it, we could not learn from experience. See his Logical Foundations of Probability, 1950, p. 565, and my contribution to the Carnap volume of the Library of Living Philosophers, ed. by P. A. Schilpp, especially note 87. This is now also in my Conjectures and Refutations, 1963.

8 I translate Jeffreys’s symbols into mine, omitting his H since nothing in the argument prevents us from taking it to be either tautological or at least irrelevant; in any case, my argument can easily be restated without omitting Jeffreys’s H.

9 Note that there is nothing in the conditions under which (10) is derived which would demand the bi to be of the form ‘B(ki)’, with a common predicate ‘B’, and therefore nothing to prevent our assuming that bi = ‘ki is heads’ and bj = ‘kj is tails’. Nevertheless, we can construct a predicate ‘B’ so that every bi has the form ‘B(ki)’: we may define B as ‘having the property heads, or tails, respectively, if and only if the corresponding element of the sequence determined by the mathematical law a is 0, or is 1, respectively’. (It may be noted that a predicate like this can be defined only with respect to a universe of individuals which are ordered, or which may be ordered; but this is of course the only case that is of interest if we have in mind applications to problems of science. Cf. my Preface, 1958, and note 2 to section *49 of my Postscript.)

10 Jeffreys himself draws the opposite conclusion: he adopts as valid the possibility stated in case (3).

11 Philos. Magazine 42, 1921, pp. 369 ff.

12 A similar example is used in appendix *viii, text to note 2.

13 Whatever C. G. Hempel may mean by ‘confirming evidence’ of a theory, he clearly cannot mean the result of tests which corroborate the theory. For in his papers on the subject (Journal of Symbolic Logic 8, 1943, pp. 122 ff., and especially Mind 54, 1945, pp. 1 ff. and 97 ff.; 55, 1946, pp. 79 ff.), he states (Mind 54, pp. 102 ff.) among his conditions for adequacy the following condition (8.3): if e is confirming evidence of several hypotheses, say h1 and h2, then h1 and h2 and e must form a consistent set of statements.

But the most typical and interesting cases tell against this. Let h1 and h2 be Einstein’s and Newton’s theories of gravitation. They lead to incompatible results for strong gravitational fields and fast moving bodies, and therefore contradict each other. And yet, all the known evidence supporting Newton’s theory is also evidence supporting Einstein’s, and corroborates both. The situation is very similar for Newton’s and Kepler’s theories, or Newton’s and Galileo’s. (Also, any unsuccessful attempt to find a red or yellow swan corroborates both the following two theories which contradict each other in the presence of the statement ‘there exists at least one swan’: (i) ‘All swans are white’ and (ii) ‘All swans are black’.)

Quite generally, let there be a hypothesis h, corroborated by the result e of severe tests, and let h1, and h2 be two incompatible theories each of which entails h. (h1 may be ah, and h2 may be a Il_20004f49v05_0416_002.gifh.) Then any test of h is one of both h1 and h2, since any successful refutation of h would refute both h1 and h2; and if e is the report of unsuccessful attempts to refute h, then e will corroborate both h1 and h2. (But we shall, of course, look for crucial tests between h1 and h2.) With ‘verifications’ and ‘instantiations’, it is, of course, otherwise. But these need not have anything to do with tests.

Yet quite apart from this criticism, it should be noted that in Hempel’s model language identity cannot be expressed; see his paper in The Journal of Symbolic Logic 8, 1943, the last paragraph on p. 143, especially line 5 from the end of the paper, and p. 21 of my Preface, 1958. For a simple (‘semantical’) definition of instantiation, see the last footnote of my note in Mind 64, 1955, p. 391.

14 That a self-contradictory statement may have the same probability as a consistent synthetic statement is unavoidable in any probability theory if applied to some infinite universe of discourse: this is a simple consequence of the multiplication law which demands that p(a1a2... an) must tend to zero provided all the ai are mutually independent. Thus the probability of tossing n successive heads is, according to all probability theories, 1/2n, which becomes zero if the number of throws becomes infinite.

A similar problem of probability theory is this. Put into an urn n balls marked with the numbers 1 to n, and mix them. What is the probability of drawing a ball marked with a prime number? The well-known solution of this problem, like that of the previous one, tends to zero when n tends to infinity; which means that the probability of drawing a ball marked with a divisible number becomes 1, for n→∞, even though there is an infinite number of balls with non-divisible numbers in the urn. This result must be the same in any adequate theory of probability. One must not, therefore, single out a particular theory of probability, such as the frequency theory, and criticize it as ‘at least mildly paradoxical’ because it yields this perfectly correct result. (A criticism of this kind will be found in W. Kneale’s Probability and Induction, 1949, p. 156). In view of our last ‘problem of probability theory’—that of drawing numbered balls—Jeffrey’s attack on those who speak of the ‘probability distribution of prime numbers’ seems to me equally unwarranted. (Cf. Theory of Probability, 2nd edition, p. 38, footnote.)

15 Related problems are discussed in considerable detail in John Kemeny’s very stimulating paper ‘A Logical Measure Function’, Journal of Symb. Logic 18, 1953, pp. 289 ff. Kemeny’s model language is the second of three to which I allude on p. xxiv of my Preface, 1958. It is, in my opinion, by far the most interesting of the three. Yet as he shows on p. 294, his language is such that infinitistic theorems—such as the principle that every number has a successor—must not be demonstrable in it. It thus cannot contain the usual system of arithmetic.