16

Statistics and Probability

1 INTRODUCTION

In many branches of engineering and physics, processes are encountered which are not known with exactness. In the study of phenomena of this type, the applications of the methods of statistics and of the theory of probability are imperative. In this chapter, some of the basic elementary concepts and methods employed in the theory of statistics and probability will be given. The notion of probability is a very difficult concept whose fundamental aspects we shall not discuss. Simple cases and examples will be given to introduce its use in a reasonable way. For the reader who wishes to study the vast subject of statistics and probability further, a fairly complete and representative set of references to the more important modern treatises on the subject is given at the end of the chapter.

2 STATISTICAL DISTRIBUTIONS

When a series of measurements is made on a set of objects, the set of objects being measured is called a population. A population may be, for example, the animals living on a farm, the group of people living in a certain city, the automobiles produced by a factory in a year, the set of results of a physical or chemical experiment, etc.

If a single quantity is measured for each member of a population, this quantity is called a variate. Variates are usually denoted in the mathematical literature by symbols such as x or y. The possible observed values of a variate x will be a finite sequence or set of numbers x1 x2, x3, . . . , xn. The numbers xk, for example, may be the heights of a group of n individuals or the prices of a certain make of automobile in a dealer’s showroom.

FREQUENCY

In a given population, the number of times a particular value xr of a variate is observed is called the frequency fr of xr. For example, an automobile dealer may have nine cars in his showroom; three cars are priced at $3,500, four at $4,000, and two at $2,000. In this case we have x1 = $3,500, f1 = 3, x2 = $4,000, f2 = 4, and x3 = $2,000, f3 = 2. The total number of automobiles in the population is N, where

images

The total value of the automobiles S is

images

The average or mean price images is

images

THE MEAN OF A DISTRIBUTION

The most elementary statistical quantity defined by a distribution is the arithmetic mean images of the variate x, usually known as the mean. This is the sum of the values of x for each member of the population divided by N. Therefore,

images

Let us suppose that a is a typical value of the variate x, and let us define

images

It is thus evident that dr is the deviation of xr from the typical value a. The variate d has a distribution fr over the values dr. The mean images of d for this distribution is

images

images is simply related to images; this can be seen by writing (2.6) in the form:

images

Therefore we have

images

The following relation is sometimes useful.

images

Equation (2.8) is useful in computing images quickly, provided that we choose a to be a number close to the mean.

3 SECOND MOMENTS AND STANDARD DEVIATION

In statistical investigations, it is of interest to know whether or not a distribution is closely grouped around its mean. One of the most useful quantities for estimating the spread of a distribution is the standard deviation σ. Before we define σ it is instructive to note that images as given by (2.7) is the first moment of a distribution about a given value a of the variate. We note from (2.8) that if images, then the first moment images.

THE SECOND MOMENT

The second moment images of a distribution about a given value a of the variate is the average over the population of the squares of the differences dr = xra between a and the values of the variate xr. It is given by

images

It is apparent that if most of the measured values of x for the population are approximately equal to images will be small, while if a differs considerably from the observed values of x, then images will be large since all the terms in (3.1) are positive. If we now take images in (3.1), that is, we take the second moment about the mean images of the population, this second moment μ2 is called the variance. The variance is therefore given by

images

The variance gives an estimate of the spread of values of x about their mean. The variance is not directly comparable to images since it is of dimension of x2.

THE STANDARD DEVIATION

The positive square root of μ2 is the standard deviation σ; since it is of the same dimension as x σ gives us an estimate of the spread of values of x which can be compared directly with images. By definition, we have therefore

images

where we have made use of (2.9).

If (3.3) is expanded, we obtain the following result:

images

It is thus evident that the variance μ2 and the second moment images about any value a are related by

images

In order to calculate the standard deviation σ, it is usually easier to calculate the second moment images about a simple value a and then use (3.5) instead of using (3.3) directly.

Since σ2 is the mean value of the squares of the deviations of the variate from its average value images, σ (the “root mean square”) gives us an over-all measure of the deviation of the set from the average images without reference to sign. There are two other features of the population which are sometimes found useful. Suppose that a frequency diagram has been constructed in which the ordinates represent the number of readings lying in successive intervals. The interval in which the ordinate attains its maximum clearly corresponds to the most frequent or “most fashionable” value of x among the set. This value is called the mode; in general, it is not identical with the average or mean, but it will be if the frequency curve is symmetric about the mean value. A frequency curve may have more than one mode, but we are here concerned only with cases in which a single mode exists.

We may also arrange our data in ascending order to magnitude and divide them into two sections halfway, so that as many measurements lie about this division as below it. This position is called the median, and is such that it is useful in analyzing the data in many experimental observations.

4 DEFINITIONS OF PROBABILITY

A POSTERIORI PROBABILITY

Let us take an ordinary coin and flip it N times. Let the number of times the “heads” side of the coin comes up be denoted by Nh; we may then write the following ratio:

images

When this experiment is performed with a balanced coin, the result of this experiment has been found to be that

images

Because of the result (4.2), we say that the probability of a coin coming up heads when it is flipped is images.

This simple case can be generalized to the following situation. Consider an experiment that is performed N times. Let the times that the experiment succeeds be Ns; then if the ratio

images

approaches the limit

images

we say that the probability of the success of the experiment is Ps.

Because in this case the probability Ps is measured by performing the experiment and is not predicted, Ps is called the a posteriori probability that the experiment will be successful.

A PRIORI PROBABILITY

Let us now perform another experiment that has n equally likely outcomes. Let us call the successes of the experiment ns. If we now predict that the ratio

images

is such that

images

Then ps is called the a priori probability that the experiment will succeed.

Of course, the two types of probability should always give the same answer. If the a posteriori probability comes out to be different from the a priori probability, it will be concluded that an error has been made in the analysis. In the following discussion, no distinction will be made between the two types of probability.

5 FUNDAMENTAL LAWS OF PROBABILITY

In this section a simple discussion of the fundamental laws of probability will be given. Let P(A) be the probability of an event called A occurring when a certain experiment is performed. By the definition of probability, P(A) is a fraction that lies between 0 and 1. We also have by definition that if

images

the event A is certain to happen. However if,

images

the event A is certain not to happen.

COMPOUND EVENTS

To illustrate more complex possibilities, let us consider an experiment with N equally likely outcomes that involve two events A and B. Let us use the following notation:

                n1 = the number of outcomes in which A only occurs

                n2 = the number of outcomes in which B only occurs

                n3 = the number of outcomes in which both A and B occur

                n4 = the number of outcomes in which neither A nor B occurs.

                N = n1 +n2 + n3 + n4

We see that by the use of the above notation we may define the following probabilities:

images

images

images

images

images

images

From the above definitions, we see that

images

or

images

We also have

images

or

images

MUTUALLY EXCLUSIVE EVENTS

If A and B are mutually exclusive events so that the occurrence of one precludes the occurrence of the other, then the probability of their joint occurrence is zero and we have

images

For this case (5.10) reduces to the equation

images

STATISTICALLY INDEPENDENT EVENTS

If the events A and B are statistically independent, then the probability of the occurrence of A does not depend on the occurrence of B and vice versa; we then have

images

In this case, Eq. (5.12) reduces to

images

If the events A and B are statistically independent, then the additive law of probability (5.10) becomes

images

As an example of the use of (5.17), let it be required to compute the probability that when one card is drawn from each of two decks, at least one will be an ace. In this case the events A and B are the drawing of an ace out of a deck of 52 cards. Therefore,

images

Since the events A and B are statistically independent, we have from (5.17) the following result:

images

BAYES’S THEOREM

Some interesting relations among conditional probabilities may be deduced by means of Eq. (5.12). If we solve (5.12) for P(B/A), we obtain

images

However, we also have from (5.12) the result

images

If we substitute this into (5.20), we obtain

images

Let us now write a relation similar to (5.22) but with B replaced by C. We thus obtain

images

We now divide Eq. (5.22) by (5.23) and obtain the result

images

This result is a form of Bayes’s theorem in the theory of probability.

6 DISCRETE PROBABILITY DISTRIBUTIONS

In Sees. 2 and 3 we discussed observed or empirical distributions. The theory of probability is concerned with the prediction of distribution when some basic law is assumed to govern the behavior of the variate. A simple example of a law of probability is that governing the behavior of a perfect die. The fact that the die is symmetric with respect to its six faces suggests very strongly that each of the faces is equally likely to end facing upward if we make an unbiased throw of the die. The probability p of any particular face appearing is thus equal to images. A similar situation is that involving the probability p that a particular card is on top of a thoroughly shuffled pack of 52 cards, images. The fact that if a fair die is thrown, the probability distribution is images for k = 1, 2, . . . , 6 is the definition of a fair die. The statistical problem then becomes one of deciding whether or not any particular die is fair.

TRIALS

When a card is drawn, a die is thrown, or the height of a man selected from a certain group of men is measured, we are said to be making a trial. Let us make a series of completely independent trials on a system, and let there be a finite number n of possible results of a trial. Let these n possible results be denoted by x1 x2, x3, . . . , xn. If in the series of trials the result x1 occurs m1 times, the result x2 occurs m2 times, and in general the result xr occurs mr times (r, 2, 3, ... , n), then

images

Now if as N becomes large, it is found that the proportion of any result xr measured by xr/N tends to a definite value pr, so that

images

then pr is the probability of the result xr.

Since we cannot in practice make an infinite number of trials, we cannot determine pr exactly by making a series of trials. It is therefore evident that a law of probability that specifies definite values of the probabilities pr is never established with certainty by a series of trials, although it may be strongly suggested. The assumption of a law of probability corresponding to a series of trials is therefore a hypothesis which may or may not be confirmed by further trials.

If we divide (6.1) by N, we obtain

images

We now let N → ∞ in (6.3) and use the result (6.2) to obtain

images

As an example of the above discussion, we note that the possible results of a throw of a die are x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 5, x6 = 6. For an unbiased die, we assume the probability law images. If in a large number of throws the values of mr/N (r = 1, 2, 3, . . . , 6) do not tend to the value images, we conclude that the die is biased.

As an example of the use of an assumed law of probability let us consider the following question: “What is the probability that the face number 1 appears at least once in n throws of a die?”

Since we assume that the probability of the face 1 appearing in one throw is images, the probability of not obtaining a 1 in one throw is images The probability of not obtaining a 1 in two throws is images and of not obtaining a 1 in n throws is images. The probability of throwing a 1 at least once in n throws p is therefore images In this calculation we assume that the result of each throw is independent of the result of any other throw and use (5.16).

7 ELEMENTS OF THE THEORY OF COMBINATIONS AND PERMUTATIONS

Since the mathematical theory of probability treats questions involving the relative frequency with which certain groups of objects may be conceived as arranged within a population, it is necessary to review the theory involving the number of ways in which various subgroups may be partitioned from the members of a larger group.

In dealing with objects in groups one is led to consider two kinds of arrangements, according to whether the order of the objects in the groups is or is not taken into account. Some elementary results in the theory of combinations and permutations will now be reviewed.

PERMUTATIONS AND COMBINATIONS

The number of arrangements, or permutations, of n objects is n!. This can be seen as follows since the first position can be occupied by any of the n objects, the second by any of the n – 1 remaining objects, etc. Therefore if we denote the number of permutations of n objects by nPn we have

images

where n! = (1)(2)(3) · · · (n – 1)n is the factorial of n. By definition, 0! = 1 and 1! = 1.

THE NUMBER OF PERMUTATIONS OF n THINGS TAKEN r AT A TIME

The number of permutations of n different things taken r at a time is denoted by the symbol nPr and is found easily as follows. The first place of any arrangement may have any one of the n things, so that there are n choices for the first place. The second place may have any one of the remaining n – 1 things, the third any of the remaining (n –2), and so on, to the rth place which has any of the remaining (n –r + 1). It follows that

images

THE NUMBER OF COMBINATIONS OF n THINGS TAKING r AT A TIME

The combinations of n things taking r at a time is defined as the possible arrangements of r of the n things, no account being taken of the order of the arrangement. Thus the combinations of three things a, b, c, taking two at a time, are ab, ac, and bc. The combination ba is the same as ab, and so on.

The number of combinations of n things taking r at a time is denoted by the symbol nCr and is found as follows. Any combination of the r things can be arranged in r! ways since this is the number of arrangements of r things taking r at a time, or rPr. If every combination is so treated, we clearly obtain nPr arrangements and it therefore follows that

images

If we divide (7.3) by r!, we obtain

images

The quantity nCr = n!/r!(n – r)! is called the binomial coefficient, it is shown in works on algebra that if n is a positive integer, we may write

images

Equation (6.5) for n = 2, 3, 4, . . . , is known as the binomial theorem.

PROPERTIES OF THE BINOMIAL COEFFICIENT nCr = C(n,r)

The following interesting and useful properties of the binomial coefficient nCr can be established:

images

images

images

images

As a simple example of the use of Eq. (7.4), let it be required to solve the following problem. There are n points in a plane, no three being on a line. Find the number of lines passing through pairs of points.

We may solve this problem by realizing that since no three points lie on a line, any line can be considered as a combination of two of the n points. The number of lines is therefore nC2 = n!/2! (n – 2)!=n(n – l)/2.

8 STIRLING’S APPROXIMATION FOR THE FACTORIAL

In many phases of the mathematical theory of probability, the factorial of large numbers occurs. A classical result given by Stirling is of paramount importance in the computation of the factorial of large numbers. The formula given by Stirling was obtained in the 1730s and may be stated in the following form:

images

The percent error of this formula for finite values of n is of the order of 100/12n percent error.

In addition to being a very useful method for the computation of factorial numbers, Stirling’s formula is very useful in a theoretical sense. Its use in the computation of the factorial or large numbers may be seen by considering the computation of 10!. Instead of multiplying together a large number of integers, we have merely to calculate the factorial by the use of expression (8.1) by means of a table of logarithms, this involves far fewer operations. For example, for n = 10 we obtain the value 3,598,696 by Stirling’s expression (using seven-figure tables), while the exact value of 10! obtained by direct multiplication is 3,628,800. The percentage error obtained by the use of Stirling’s formula is of the order of images percent too low.

A more exact formula for the computation of factorial numbers is the following one:

images

It is thus evident that the expressions n! and images differ only by a small percentage of error when the value of n is large. The factor images gives us an estimate of the degree of accuracy of the approximation.

In order to obtain this remarkable formula let us try to estimate the area under the curve y = ln x. This curve has the general form depicted in Fig. 8.1. If we integrate ln x by parts we obtain the result

images

An is the exact area under the curve shown in Fig. 8.1.

If we erect ordinates y1 y2, . . . , yn at x = x1 x = x2, . . . , x = xn, the trapezoidal rule for the area under the curve Tn gives

images

images

Fig. 8.1

Since we have

images

we obtain the following expression for Tn on substituting (8.5) into (8.4):

images

By the definition of the factorial number n!, we have

images

Therefore if we take the natural logarithm of (8.7), we obtain

images

If we now compare (8.6) with (8.8) it can be seen that

images

The exact area An and the approximate area Tn are of the same order of magnitude. Since the trapezoids used in computing Tn lie below the curve y = 1n (x), it is apparent that the true area An exceeds the trapezoidal area Tn. Let us write

images

If we solve (8.10) for ln(n!), we obtain

images

Let us write

images

We therefore have as a consequence of (8.11) and the definition of the natural logarithm the result

images

It can be shown that the quantity an is bounded and is a monotonic increasing function of n. Therefore the quantity bn is bounded and is a monotonic decreasing function of n. It can be shown that

images

If we substitute (8.12) into (8.13), we obtain the result

images

This is Stirling’s result. For finite values of n, (8.15) gives a result for n! that is of the order of 100/12n percent too small. For more precise results, the inequality (8.2) may be used. The factor images gives us an estimate of the degree of accuracy of the approximation.

9 CONTINUOUS DISTRIBUTIONS

In many problems encountered in the mathematical theory of probability, the possible values of a variate x are not discrete but lie in a continuous range. For example, the heights of adults normally take values in the continuous range from 4.5 to 6.5 ft. Even though observed heights are classified as taking one of a finite discrete set of values, any theoretical probability law should refer to the continuous range. If, for example, a law of probability predicted a distribution of heights at intervals of images in., it would not be appropriate and adequate for comparison with a set of measurements made to the nearest centimeter.

A law of probability for a continuous variate is therefore expressed in terms of a probability function ϕ(x), such that ϕ(x)dx is the probability of a trial giving a result in the infinitesimal range [x – (dx/2), x + (dx/2)] for all values of x. The probability P(x1,x2) of the result lying between values x1 and x2 is then

images

Since the total probability is unity, we have

images

where the integral is taken over the complete range of the variate x. The equation (9.1) is analogous to Eq. (7.4) for discrete distributions; integration over the probability function replaces summation over the discrete probabilities.

10 EXPECTATION, MOMENTS, AND STANDARD DEVIATION, CHEBYSHEV’S INEQUALITY

The results of Sees. 2 and 3 apply equally well to probabilities. The mean of a probability distribution for a variate x is called the expectation E(x) and is defined by

images

Since the sum of the probabilities is unity, we have

images

Therefore the expectation E(x) as given by (10.1) takes the form

images

The expectation of a probability distribution of a continuous variate is defined by the equation

images

where the integration takes place over the whole range of x.

The second moment about a of a discrete probability distribution images is defined by the equation

images

The second moment about a of a continuous probability distribution is defined by the equation

images

The variance μ2 is given by (10.5) or (10.6) with a = [x]. The standard deviation σ is given by

images

The standard deviation and the second moment images are related in the same manner in (3.5) by the equation

images

where

images

so that

images

It is interesting to note that when a = [x], then (10.10) becomes

images

As an example of the above definitions consider the following illustration. Let us suppose that cards numbered 1, 2, 3, . . . , n are put in a box and thoroughly shuffled; then one card is drawn at random. If we assume that each card is equally likely to be drawn, so that the law of probability of drawing the cards is

images

then the expectation E(x) of the number x on the card is

images

The second moment about x = 0 of the probability distribution of x is, by (10.5),

images

The variance is given by (10.7) in the form

images

It can be seen that for large values of n we have

images

It is to be expected that the values of [x] and σ will become proportional to n when n is large.

As an example of a continuous distribution, consider a variatex which is equally likely to take any value in the range (0, X). In such a case the probability function ϕ(x) must equal a constant ϕ0 in this range and zero elsewhere. In order to satisfy (9.2), we must have

images

The expectation E(x) of x is, by (10.4),

images

The variance is given by (10.6) with a = [x] = X/2, so that

images

Therefore the standard deviation is images

CHEBYSHEV’S INEQUALITY

Let us consider a variate x with probability function f(x). Let μ be the mean value and σ the standard deviation of x. We shall prove that if we select any number a, the probability that x differs from its mean value μ by more than a is less than σ2/a2. This means that x is unlikely to differ from μ by more than a few standard deviations, and that therefore the standard deviation σ gives an estimate of the spread of the distribution. For example, if a = 2σ, we find that the probability for x to differ from μ by more than 2σ is less than images

Proof By the definition of the standard deviation σ, we have

images

If we now sum over values of x for which x exceeds (μ + a) = x0, we get less than σ2, so that we have

images

If we replace (x – μ) by a in (10.22), the sum is further decreased and we have

images

We may write (10.23) in the form

images

However f(x) is the sum of the probabilities of values of x which differ from μ by more than a, so that x > (μ + a) and (10.24) says that this probability is less than σ2/a2. This result is known in the literature as Chebyshev’s inequality.

11 THE BINOMIAL DISTRIBUTION

Let us suppose that a trial can have only two possible results, one called a “success” with probability p, and the other called a “failure” with a probability of q = (1 – p). If we make n independent trials of this type in succession, then the probability of a particular sequence of r successes and (n – r) failures is p2qn–r. However, the number of sequences with r successes and (n – r) failures is

images

Therefore the probability of there being exactly r successes in n trials is

images

It will be noticed that P(r) is the term containing pr in the binomial expansion of (p+q)n. For this reason the probability distribution (11.2) is known as the binomial distribution.

Example 1 If the probability that a man aged 60 will live to 70 is p = 0.65, what is the probability that 8 out of 10 men now aged 60 will live to 70? This is exactly the problem proposed above, with p = 0.65, r = 8, and n = 10. The answer is, therefore,

images

Example 2 What is the probability of throwing an ace exactly three times in four trials with a single die? In this case we have n = 4, r = 3, and since there is one chance in six of throwing an ace on a single trial, we have images. Therefore we have

images

Example 3 What is the probability of throwing a deuce exactly three times in three trials? In this case images, therefore we have the result

images

THE MEAN OF THE BINOMIAL DISTRIBUTION

The expectation or the mean number of successes images is given by equation (10.1) in the form

images

We have

images

If we denote the partial derivative with respect to p by Dp, we note that

images

Since the denominator of (11.5) is unity, we have

images

The series in (11.8) is summed by treating p and q as independent variables. If we now use the result (11.7), we may write (11.8) in the form

images

We now substitute q = 1 – p in (11.9) and obtain the result

images

Therefore the mean number of successes for the binomial distribution is images

THE STANDARD DEVIATION OF THE BINOMIAL DISTRIBUTION

We may compute the variance of the binomial distribution by the use of Eq. (10.8). The second moment images about x = 0 of the distribution of the number of successes can be found by a method similar to that used in computing the first moment above (we leave the derivation as an exercise for the reader and quote the result):

images

As a consequence of (10.8) we have

images

Therefore the standard deviation of the total number of successes in n trials is

images

12 THE POISSON DISTRIBUTION

The Poisson distribution is derived as the limit of the binomial distribution when the number of trials n is very large and the probability of success p is very small. For the binomial distribution we have seen that the mean number of successes is given by

images

We now investigate the limiting form taken by the binomial distribution as the number of trials n tends to infinity and the probability p tends to zero in such a manner that

images

If we substitute p = m/n in the expression (11.2) for the binomial distribution, we obtain the equation

images

This expression may be written in the form

images

Now for any fixed r we have, as n tends to infinity,

images

We therefore see that the limiting form of (12.4) as n tends to infinity is

images

This is known as the Poisson distribution.

We note that

images

as it should.

THE MEAN OF THE POISSON DISTRIBUTION

The mean images of the Poisson distribution may be computed by the equation

images

THE STANDARD DEVIATION OF THE POISSON DISTRIBUTION

The standard deviation of the Poisson distribution may be obtained by taking the limiting value of the standard deviation of the binomial distribution as we let p tend to zero. Equation (11.12) may be written in the form

images

We therefore have

images

Therefore,

images

We thus see that the standard deviation of the Poisson distribution is images

13 THE NORMAL OR GAUSSIAN DISTRIBUTION

A most important probability distribution that arises as a limit of the binomial distribution is a continuous distribution known as the normal or gaussian distribution, with the probability function

images

We shall show that μ is the mean of the distribution and σ is the standard deviation of the normal or gaussian distribution. If the function ϕ(x) is to be a probability density function, it is implied that

images

In order to verify that (13.2) is true, let us compute

images

In order to simplify (13.3), we make the substitution.

images

in the integral (13.3) and find that it is transformed into

images

From a table of definite integrals we obtain the result

images

If we now substitute (13.6) into (13.5), we find that A =1

THE MEAN VALUE OF THE NORMAL DISTRIBUTION

We may compute the mean images he normal distribution by the use of Eq. (10.4), which in this case takes the form

images

To simplify (13.7), let us introduce the following change in variable:

images

With this change in variable, (13.7) becomes

images

From a table of definite integrals, we have the results

images

If we substitute the results (13.10) into (13.9), we obtain

images

We have therefore verified that μ is the mean or expected value of the distribution ϕ(x)f (13.1).

THE STANDARD DEVIATION OF THE NORMAL DISTRIBUTION

The variance μ2 = σ2 of the normal distribution may be obtained by the use of (10.6) by placing a = μ and thus obtaining

images

If we now substitute ϕ(x) as given by (13.1) into (13.12) and make use of the change in variable

images

we obtain

images

The integral

images

may be integrated by parts to obtain

images

images

Fig. 13.1

If the above value for I is substituted into (13.14),

images

Therefore σ correctly represents the standard deviation of the normal or gaussian distribution (Fig. 13.1).

The maximum value of ϕ(x) is located at x = μ; ϕ(μ) is approximately images When x = μ ± σ, the value of ϕ(x) is slightly less than images. It is easy to show that the points x = μ ± σ are the points of inflection of the normal distribution curve ϕ(x). It is of great interest to know the probability that the deviation y = x — μ from the mean is within certain limits. The probability that y lies between the limits σt1 and σt2 is

images

By placing t1 = –a and t2 = a in (13.18) and using a table of definite integrals, it can be shown that the probability that the deviation y exceeds 2σ is less than images and the probability that y exceeds 3σ is of the order of 0.003. In statistics, an observation which is improbable is said, to be significant. For a normally distributed variate, it is usual to regard a single observation with |x| > μ + 2σ as significant and one with |x| > μ + 3σ as highly significant.

The normal or gaussian distribution may be derived as a limiting form of the binomial distribution when n becomes very large and p remains constant.

The normal or gaussian distribution (13.1) is sometimes also known as the normal error law, although some statisticians object that it is not in fact a normal occurrence in nature because it is only valid as a limiting case for exceptionally large numbers. Sufficiently large numbers are, however, available in problems of applied physics, such as the kinetic theory of gases and fluctuations in the magnitude of an electric current, and so the gaussian distribution seems more real to physicists and engineers than to social and biological scientists. It is interesting to note that the eminent mathematical

physicist H. Poincaré, in the preface to his “Thermodynamique makes the laconic remark, “Everybody firmly believes in the law of errors because mathematicians imagine that it is a fact of observation, and observers that it is a theorem of mathematics.”

The application of the gaussian distribution to the theory of errors is made reasonable by the fact that any process of random sampling tends to produce a gaussian distribution of sample values, even if the whole population from which the samples are drawn does not have a normal distribution. Hence it is usual to assume that the gaussian distribution will be a useful approximation to the distribution of errors of observation, provided that all causes of systematic error can be excluded.

THE AREA UNDER SOME PART OF THE GAUSSIAN DISTRIBUTION FUNCTION

The area under some part of the gaussian distribution function is frequently needed. That is, we must evaluate the integral of ϕ(x) between finite values of x. For example, let it be required to evaluate the integral I where

images

In order to simplify the above integral, let

images

With this change in variable, (13.19) becomes

images

where

images

Now the function F(z) given by

images

has been extensively tabulated. It is easy to see that the integral is given by

images

in terms of the function F(z).

PROBLEM ILLUSTRATING A TYPICAL STATISTICAL DISTRIBUTION

As an interesting typical problem involving a statistical distribution, consider that the heights of 500 men are measured to the nearest images in. The frequencies fr of heights xr (in inches) are given in the following table.

Heights of 500 men

images

Show that the distribution of the heights of the 500 men given in the above table is approximately normal, with mean 67 in. and standard deviation 2.5 in. Assuming this normal distribution of heights, which of the following measurements of heights, considered individually, are significant or highly significant: 63 in., 72 in., 69.5 in., 58.5 in., 73 in., 75 in. ?

14 DISTRIBUTION OF A SUM OF NORMAL VARIATES

If two variates x and y are distributed normally, their sum u = x + y is also distributed normally. To prove this, we shall refer each distribution to its mean and assume that the distribution functions of images are

images

images

The probability that (υ + w) takes a particular value z is found by integrating over the probabilities that υ and w take particular values summing to z. This gives

images

If we now let

images

the integral in (14.4) becomes

images

Hence the distribution (14.4) is

images

We therefore see that z = υ + w is normally distributed about the mean value zero, with standard deviation given by

images

The distribution of u = x + y, therefore, has the expectation images with a standard deviation given by (14.8).

15 APPLICATIONS TO EXPERIMENTAL MEASUREMENTS

Let x1 x2 …, xn be a set of n measurements. The xi i = 1, 2, . . . ,n, quantities may be the n determinations of the velocity of light in a certain medium. If we let z be a typical value of the measurements xi then di is the deviation of the measurement xi from z where

images

Let y represent the sum of the squares of the deviations of the measurements xi from the typical value z, so that

images

We now wish to choose the typical value z so that the sum of the squares of the deviations y is a minimum. We therefore minimize (15.2) in the usual way by computing the derivative

images

On setting the derivative dy/dz equal to zero and solving for z, we find

images

That is, we find that the arithmetic mean of the measurements xi is the typical value that minimizes y. It will be noted that images gives the “best” typical value of the xi quantities in the least-square sense.

The minimum value of y, ymin is obtained by substituting images in (15.2) and thus obtaining

images

If we divide (15.5) by n, we obtain the variance σ2 of the set of measurements xi so that

images

The square root of (15.6) is the standard deviation a of the set of measurements xi so that

images

16 THE STANDARD DEVIATION OF THE MEAN

We now turn to a very important question that arises in the theory of statistics: “What is the relation between the standard deviation of the individual measurements and the standard deviation of the mean of a set of measurements?”

In order to answer this question, we will assume that we have a parent distribution of errors which follow the Gauss distribution. (The justification for assuming that the distribution is gaussian is that in practice errors do follow the gaussian law). We suppose first that we take N measurements or a sample of the parent distribution. We then take another sample or set of measurements. If we now compute the mean and standard deviation of both sets, these quantities will in general be different. That is, the mean and variance of a sample of N observations are not in general equal to the mean and variance of the parent distribution.

The process to be followed is to take M sets of N measurements, each with its own mean and standard deviation, and then compute the standard deviation of the means. The standard deviation of the means provides an indication of the reliability of any one of the means.

To facilitate the calculation let the following notation be introduced:

            σ = the standard deviation of the individual measurements

            σm = the standard deviation of the means (of the various samples)

            xsi = measurement i in the set s

            images = the mean of the set s

            images = the mean of all the measurements

            dsi = the deviation of xsi xsiimages=dsi

            Dsi = images = the deviation of the mean images

Since we are taking M sets of measurements with N measurements in each set, there will be n = MN measurements in the total. The variance of the individual measurements is given by

images

The variance of the means is given by

images

The deviations Ds of the means can be expressed in terms of the deviations dsi of the individual observations as follows:

images

We also have

images

If we now substitute the square of (16.4) into (16.1), we obtain

images

The substitution of the square of (16.3) into (16.2) gives the results

images

After some reductions, it can be seen that (16.6) may be written in the following form:

images

If we now compare (16.5) with (16.7), we see that

images

or therefore

images

Therefore the variance of the mean of a set of N measurements is simply the variance of the individual measurements divided by the number of measurements.

It may be mentioned that in order to reduce (16.6) to the form (16.7) it is necessary to assume that the cross-product terms are negligibly small. This is true for the gaussian distribution, and since experimental measurements so often obey the gaussian distribution, formula (16.8) is a very useful one.

PROBLEMS

    1. What is the probability that number 1 appears at least in n throws of dice ?

    2. A hunter finds that on the average he kills once in three shots. He fires three times at a duck; on the assumption that his a priori probability of killing is images what is the probability that he kills the duck?

    3. Prove the following theorem: If in any continuously varying process a certain characteristic is present to the extent of one in T units, then the probability that the characteristic does not occur in a sample of t units is e–t/T. (Consider the Poisson distribution.)

    4. It is known that 100 liters of water have been polluted with 106 bacteria. If 1 cc of water is drawn off, what is the probability that the sample is not polluted?

    5. An aircraft company carries on the average P passengers M miles for every passenger killed. What is the probability of a passenger completing a journey of m miles in safety ?

    6. Given the result of Prob. 5, estimate an apparently reasonable premium to pay in order that if a passenger is killed in such a flight, his heir would receive $10,000.

    7. If during the weekend road traffic 100 cars per hr pass along a certain road, each taking 1 min to cover it, find the probability that at any given instant no car will be on this road. (NOTE : Evidently no car must have entered the road during the previous minute; however, on the average a car enters every 3,600/100 = 36 sec. It is thus apparent that the required probability is e–60/36.)

    8. In a completed book of 1,000 pages, 500 typographical errors occur. What is the probability that four specimen pages selected for advertisement are free from errors ?

    9. Criticize the following statements:

(a) The sun rises once per day; hence the probability that it will not rise tomorrow is e–1.

(b) The probability that it will rise at least once is 1 – e–1.

10. Let it be supposed that in the manufacture of firecrackers there is a certain defect that causes only three-fourths of them to explode when lighted. What is the probability that if we choose five of these firecrackers, they will all explode ?

11. It has been observed in human reproduction that twins occur approximately once in 100 births. If the number of babies in a birth are assumed to follow a Poisson distribution, calculate the probability of the birth of quintuplets. What is the probability of the birth of octuplets ?

12. A coin is tossed 10,000 times; the results are 5,176 heads and 4,824 tails. Is this a reasonable result for a symmetric coin, or is it fairly conclusive evidence that the coin is defective ?

HINT: Calculate the total probability for more than 5,176 heads in 10,000 tosses. To do this a gaussian distribution may be assumed.

13. If a set of measurements is distributed in accordance with a gaussian distribution, find the probability that any single measurement will fall between m – σ /2 and m + σ/2.

14. The “probable error” of a distribution is defined as the error such that the probability of occurrence of an error whose absolute is less than this value is images. Determine the probable error for the gaussian distribution and express it as a multiple of σ.

15. An object undergoes a simple harmonic motion with amplitude A and angular frequency w according to the equation x = A sin wt, where x represents the displacement of the object from equilibrium. Calculate the mean and standard deviation of the position and of the speed of the object.

16. Among a large number of eggs, 1 percent were found to be rotten. In a dozen eggs, what is the probability that none is rotten ?

17. A certain quantity was measured TV times, and the mean and its standard deviation were computed. If it is desired to increase the precision of the result (decrease σ) by a factor of 2, how many additional measurements should be made?

18. A group of underfed chickens were observed for 50 consecutive days and found to lay the following number of eggs:

images

Show that this is approximately a Poisson distribution. Calculate the mean and standard deviation directly from the data. Compare with the standard deviation predicted by the Poisson distribution.

19. On the average, how many times must a die be thrown until one gets a 6 ?

HINT: Use the binomial distribution.

20. When 100 coins are tossed, what is the probability that exactly 50 are heads ?

HINT: Use the binomial distribution.

21. A bread salesman sells on the average 20 cakes on a round of his route. What is the probability that he sells an even number of cakes ?

HINT: Assume the Poisson distribution.

22. How thick should a coin be to have a i chance of landing on an edge.

HINT: In order to get an approximate solution to this problem, one may imagine the coin inscribed inside a sphere, where the center of the coin is the center of the sphere. The coin may be regarded as a right circular cylinder. A random point on the surface of the sphere is now chosen; if the radius from that point to the center strikes the edge, the coin is said to have fallen on an edge.

23. If a stick is broken into two at random, what is the average length of the smaller piece ?

24. Shuffle an ordinary deck of 52 playing cards containing four aces and then turn up cards from the top until the first ace appears. On the average, how many cards are required to produce the first ace ?

25. If n grains of wheat are scattered in a haphazard manner over a surface of S units of area, show that the probability that A units of area will contain R grains of wheat is

images

HINT : ndS/S represents the infinitely small probability that the small space dS contains a grain of wheat. If the selected space be A units of area, we may suppose each dS to be a trial; the number of trials will therefore be A/dS. Hence we must substitute An/S for np in the Poisson distribution P = (np)r e–np/r!

26. A basketball player succeeds in making a basket three tries out of four. How many times must he try for a basket in order to have probability greater than 0.99 of making at least one basket ?

27. An unpopular instructor who grades “on the curve” computes the mean and standard deviation of the grades in his class, and then he assumes a normal or gaussian distribution with this μ and σ He then sets borderlines between the grades at C from μ –The peak of the graph of the Poisson distribution ; σ /2 to μ + σ /2, B from μ + σ /2 to μ + 3 σ 12, A from μ + 3 σ /2 up. Find the percentages of the students receiving the various grades. Where should the borderlines be set to give the following percentages: A and F, 10 percent; B and D, 20 percent; C, 40 percent ?

28. A popular lady receives an average of four telephone calls a day. What is the probability that on a given day she will receive no telephone calls ? Just one call ? Exactly four calls ?

29. The Poisson distribution is used mainly for small values of μ = np. For large values of μ, the Poisson distribution (as well as the binomial) is fairly well approximated by the normal or gaussian distribution. Show that

images

The peak of the graph of the Poisson distribution is at x = μ; note that the approximating normal curve is shifted to have its center at x = μ. The equation given above gives a good approximation in the central region (around x = ) where the probability is large.

REFERENCES

1939. von Mises, Richard: “Probability, Statistics and Truth,” The Macmillan Company, New York.

1955. Cramer, Harald: “The Elements of Probability Theory,” John Wiley & Sons, Inc., New York.

1957. Feller, William: “An Introduction to Probability Theory and Its Applications,”2d ed., vol. I, John Wiley & Sons, Inc., New York.

1960. Parzen, E.: “Modern Probability Theory and Its Applications,” John Wiley & Sons, Inc., New York.

1963. Levy, H., and Roth, L.: “Elements of Probability,” Oxford University Press, London.

1965 Fry, C. T.: “Probability and Its Engineering Uses,” 2d ed., D. Van Nostrand Company, Inc., New York.

1966 Feller, William: “An Introduction to Probability Theory and Its Applications,” vol. II, John Wiley & Sons, Inc., New York.

See R. Courant, “Differential and Integral Calculus,” 2d ed., vol. I, pp. 361–364, Inter-science Publishers, Inc., New York, 1938.

See Harald Cramer, “The Elements of Probability Theory,” pp. 97–101, John Wiley & Sons, Inc., New York, 1955.

Paris, 1892.