Example 2.44

Sums of Independent Binomial Random Variables

If Ximage and Yimage are independent binomial random variables with parameters (n,p)image and (m,p)image, respectively, then what is the distribution of X+Yimage?

Solution: The moment generating function of X+Yimage is given by

ϕX+Y(t)=ϕX(t)ϕY(t)=(pet+1-p)n(pet+1-p)m=(pet+1-p)m+n

image

But (pet+(1-p))m+nimage is just the moment generating function of a binomial random variable having parameters m+nimage and pimage. Thus, this must be the distribution of X+Yimage. ■

Example 2.45

Sums of Independent Poisson Random Variables

Calculate the distribution of X+Yimage when Ximage and Yimage are independent Poisson random variables with means λ1image and λ2image, respectively.

Solution:

ϕX+Y(t)=ϕX(t)ϕY(t)=eλ1(et-1)eλ2(et-1)=e(λ1+λ2)(et-1)

image

Hence, X+Yimage is Poisson distributed with mean λ1+λ2image, verifying the result given in Example 2.37. ■

Example 2.46

Sums of Independent Normal Random Variables

Show that if Ximage and Yimage are independent normal random variables with parameters (μ1,σ12)image and (μ2,σ22)image, respectively, then X+Yimage is normal with mean μ1+μ2image and variance σ12+σ22image.

Solution:

ϕX+Y(t)=ϕX(t)ϕY(t)=expσ12t22+μ1texpσ22t22+μ2t=exp(σ12+σ22)t22+(μ1+μ2)t

image

which is the moment generating function of a normal random variable with mean μ1+μ2image and variance σ12+σ22image. Hence, the result follows since the moment generating function uniquely determines the distribution. ■

Example 2.47

The Poisson Paradigm

We showed in Section 2.2.4 that the number of successes that occur in nimage independent trials, each of which results in a success with probability pimage is, when nimage is large and pimage small, approximately a Poisson random variable with parameter λ=npimage. This result, however, can be substantially strengthened. First it is not necessary that the trials have the same success probability, only that all the success probabilities are small. To see that this is the case, suppose that the trials are independent, with trial iimage resulting in a success with probability piimage, where all the pi,i=1,,nimage are small. Letting Xiimage equal 1 if trial iimage is a success, and 0 otherwise, it follows that the number of successes, call it Ximage, can be expressed as

X=i=1nXi

image

Using that Xiimage is a Bernoulli (or binary) random variable, its moment generating function is

E[etXi]=piet+1-pi=1+pi(et-1)

image

Now, using the result that, for ximage small,

ex1+x

image

it follows, because pi(et-1)image is small when piimage is small, that

E[etXi]=1+pi(et-1)exp{pi(et-1)}

image

Because the moment generating function of a sum of independent random variables is the product of their moment generating functions, the preceding implies that

E[etX]i=1nexp{pi(et-1)}=expipi(et-1)

image

But the right side of the preceding is the moment generating function of a Poisson random variable with mean ipiimage, thus arguing that this is approximately the distribution of Ximage.

Not only is it not necessary for the trials to have the same success probability for the number of successes to approximately have a Poisson distribution, they need not even be independent, provided that their dependence is weak. For instance, recall the matching problem (Example 2.31) where nimage people randomly select hats from a set consisting of one hat from each person. By regarding the random selections of hats as constituting nimage trials, where we say that trial iimage is a success if person iimage chooses his or her own hat, it follows that, with Aiimage being the event that trial iimage is a success,

P(Ai)=1nandP(AiAj)=1n-1,ji

image

Hence, whereas the trials are not independent, their dependence appears, for large nimage, to be weak. Because of this weak dependence, and the small trial success probabilities, it would seem that the number of matches should approximately have a Poisson distribution with mean 1 when nimage is large, and this is shown to be the case in Example 3.23.

The statement that “the number of successes in nimage trials that are either independent or at most weakly dependent is, when the trial success probabilities are all small, approximately a Poisson random variable” is known as the Poisson paradigm. ■

Remark

For a nonnegative random variable Ximage, it is often convenient to define its Laplace transform g(t),t0image, by

g(t)=ϕ(-t)=E[e-tX]

image

That is, the Laplace transform evaluated at timage is just the moment generating function evaluated at -timage. The advantage of dealing with the Laplace transform, rather than the moment generating function, when the random variable is nonnegative is that if X0image and t0image, then

0e-tX1

image

That is, the Laplace transform is always between 0 and 1. As in the case of moment generating functions, it remains true that nonnegative random variables that have the same Laplace transform must also have the same distribution. ■

It is also possible to define the joint moment generating function of two or more random variables. This is done as follows. For any nimage random variables X1,,Xnimage, the joint moment generating function, ϕ(t1,,tn)image, is defined for all real values of t1,,tnimage by

ϕ(t1,,tn)=E[e(t1X1++tnXn)]

image

It can be shown that ϕ(t1,,tn)image uniquely determines the joint distribution of X1,,Xnimage.

Example 2.48

The Multivariate Normal Distribution

Let Z1,,Znimage be a set of nimage independent standard normal random variables. If, for some constants aijimage, 1im,1jnimage, and μi,1imimage,

X1=a11Z1++a1nZn+μ1,X2=a21Z1++a2nZn+μ2,Xi=ai1Z1++ainZn+μi,Xm=am1Z1++amnZn+μm

image

then the random variables X1,,Xmimage are said to have a multivariate normal distribution.

It follows from the fact that the sum of independent normal random variables is itself a normal random variable that each Xiimage is a normal random variable with mean and variance given by

E[Xi]=μi,Var(Xi)=j=1naij2

image

Let us now determine

ϕ(t1,,tm)=E[exp{t1X1++tmXm}]

image

the joint moment generating function of X1,,Xmimage. The first thing to note is that since i=1mtiXiimage is itself a linear combination of the independent normal random variables Z1,,Znimage, it is also normally distributed. Its mean and variance are respectively

Ei=1mtiXi=i=1mtiμi

image

and

Vari=1mtiXi=Covi=1mtiXi,j=1mtjXj=i=1mj=1mtitjCov(Xi,Xj)

image

Now, if Yimage is a normal random variable with mean μimage and variance σ2image, then

E[eY]=ϕY(t)t=1=eμ+σ2/2

image

Thus, we see that

ϕ(t1,,tm)=expi=1mtiμi+12i=1mj=1mtitjCov(Xi,Xj)

image

which shows that the joint distribution of X1,,Xmimage is completely determined from a knowledge of the values of E[Xi]image and Cov(Xi,Xjimage), i,j=1,,mimage. ■

2.6.1 The Joint Distribution of the Sample Mean and Sample Variance from a Normal Population

Let X1,,Xnimage be independent and identically distributed random variables, each with mean μimage and variance σ2image. The random variable S2image defined by

S2=i=1n(Xi-X¯)2n-1

image

is called the sample variance of these data. To compute E[S2]image we use the identity

i=1n(Xi-X¯)2=i=1n(Xi-μ)2-n(X¯-μ)2 (2.21)

image (2.21)

which is proven as follows:

i=1n(Xi-X¯)=i=1n(Xi-μ+μ-X¯)2=i=1n(Xi-μ)2+n(μ-X¯)2+2(μ-X¯)i=1n(Xi-μ)=i=1n(Xi-μ)2+n(μ-X¯)2+2(μ-X¯)(nX¯-nμ)=i=1n(Xi-μ)2+n(μ-X¯)2-2n(μ-X¯)2

image

and Identity (2.21) follows.

Using Identity (2.21) gives

E[(n-1)S2]=i=1nE[(Xi-μ)2]-nE[(X¯-μ)2]=nσ2-nVar(X¯)=(n-1)σ2from Proposition 2.4(b)

image

Thus, we obtain from the preceding that

E[S2]=σ2

image

We will now determine the joint distribution of the sample mean X¯=i=1nXi/nimage and the sample variance S2image when the Xiimage have a normal distribution. To begin we need the concept of a chi-squared random variable.

Definition 2.2

If Z1,,Znimage are independent standard normal random variables, then the random variable i=1nZi2image is said to be a chi-squared random variable with nimage degrees of freedom.

We shall now compute the moment generating function of i=1nZi2image. To begin, note that

E[exp{tZi2}]=12π-etx2e-x2/2dx=12π-e-x2/2σ2dxwhereσ2=(1-2t)-1=σ=(1-2t)-1/2

image

Hence,

Eexpti=1nZi2=i=1nE[exp{tZi2}]=(1-2t)-n/2

image

Now, let X1,,Xnimage be independent normal random variables, each with mean μimage and variance σ2image, and let X¯=i=1nXi/nimage and S2image denote their sample mean and sample variance. Since the sum of independent normal random variables is also a normal random variable, it follows that X¯image is a normal random variable with expected value μimage and variance σ2/nimage. In addition, from Proposition 2.4,

Cov(X¯,Xi-X¯)=0,i=1,,n (2.22)

image (2.22)

Also, since X¯,X1-X¯,X2-X¯,,Xn-X¯image are all linear combinations of the independent standard normal random variables (Xi-μ)/σ,i=1,,nimage, it follows that the random variables X¯,X1-X¯,X2-X¯,,Xn-X¯image have a joint distribution that is multivariate normal. However, if we let Yimage be a normal random variable with mean μimage and variance σ2/nimage that is independent of X1,,Xnimage, then the random variables Y,X1-X¯,X2-X¯,,Xn-X¯image also have a multivariate normal distribution, and by Equation (2.22), they have the same expected values and covariances as the random variables X¯,Xi-X¯,i=1,,nimage. Thus, since a multivariate normal distribution is completely determined by its expected values and covariances, we can conclude that the random vectors Y,X1-X¯,X2-X¯,,Xn-X¯image and X¯,X1-X¯,X2-X¯,,Xn-X¯image have the same joint distribution; thus showing that X¯image is independent of the sequence of deviations Xi-X¯image, i=1,,nimage.

Since X¯image is independent of the sequence of deviations Xi-X¯,i=1,,nimage, it follows that it is also independent of the sample variance

S2i=1n(Xi-X¯)2n-1

image

To determine the distribution of S2image, use Identity (2.21) to obtain

(n-1)S2=i=1n(Xi-μ)2-n(X¯-μ)2

image

Dividing both sides of this equation by σ2image yields

(n-1)S2σ2+X¯-μσ/n2=i=1n(Xi-μ)2σ2 (2.23)

image (2.23)

Now, i=1n(Xi-μ)2/σ2image is the sum of the squares of nimage independent standard normal random variables, and so is a chi-squared random variable with nimage degrees of freedom; it thus has moment generating function (1-2t)-n/2image. Also [(X¯-μ)/(σ/n)]2image is the square of a standard normal random variable and so is a chi-squared random variable with one degree of freedom; it thus has moment generating function (1-2t)-1/2image. In addition, we have previously seen that the two random variables on the left side of Equation (2.23) are independent. Therefore, because the moment generating function of the sum of independent random variables is equal to the product of their individual moment generating functions, we obtain that

E[et(n-1)S2/σ2](1-2t)-1/2=(1-2t)-n/2

image

or

E[et(n-1)S2/σ2]=(1-2t)-(n-1)/2

image

But because (1-2t)-(n-1)/2image is the moment generating function of a chi-squared random variable with n-1image degrees of freedom, we can conclude, since the moment generating function uniquely determines the distribution of the random variable, that this is the distribution of (n-1)S2/σ2image.

Summing up, we have shown the following.

Proposition 2.5

If X1,,Xnimage are independent and identically distributed normal random variables with mean μimage and variance σ2image, then the sample mean X¯image and the sample variance S2image are independent. X¯image is a normal random variable with mean μimage and variance σ2/n;(n-1)S2/σ2image is a chi-squared random variable with n-1image degrees of freedom.

2.7 The Distribution of the Number of Events that Occur

Consider arbitrary events A1,,Animage, and let Ximage denote the number of these events that occur. We will determine the probability mass function of Ximage. To begin, for 1knimage, let

Sk=i1<<ikP(Ai1Aik)

image

equal the sum of the probabilities of all the nkimage intersections of kimage distinct events, and note that the inclusion-exclusion identity states that

P(X>0)=P(i=1nAi)=S1-S2+S3-+(-1)n+1Sn

image

Now, fix kimage of the nimage events — say Ai1,,Aikimage — and let

A=j=1kAij

image

be the event that all kimage of these events occur. Also, let

B=j{i1,,ik}Ajc

image

be the event that none of the other n-kimage events occur. Consequently, ABimage is the event that Ai1,,Aikimage are the only events to occur. Because

A=ABABc

image

we have

P(A)=P(AB)+P(ABc)

image

or, equivalently,

P(AB)=P(A)-P(ABc)

image

Because Bcimage occurs if at least one of the events Aj,j{i1,,ik}image, occur, we see that

Bc=j{i1,,ik}Aj

image

Thus,

P(ABc)=P(Aj{i1,,ik}Aj)=P(j{i1,,ik}AAj)

image

Applying the inclusion-exclusion identity gives

P(ABc)=j{i1,,ik}P(AAj)-j1<j2{i1,,ik}P(AAj1Aj2)+j1<j2<j3{i1,,ik}P(AAj1Aj2Aj3)-

image

Using that A=j=1kAijimage, the preceding shows that the probability that the kimage events Ai1,,Aikimage are the only events to occur is

P(A)-P(ABc)=P(Ai1Aik)-j{i1,,ik}P(Ai1AikAj)+j1<j2{i1,,ik}P(Ai1AikAj1Aj2)-j1<j2<j3{i1,,ik}P(Ai1AikAj1Aj2Aj3)+

image

Summing the preceding over all sets of kimage distinct indices yields

P(X=k)=i1<<ikP(Ai1Aik)-i1<<ikj{i1,,ik}P(Ai1AikAj)+i1<<ikj1<j2{i1,,ik}P(Ai1AikAj1Aj2)- (2.24)

image (2.24)

First, note that

i1<<ikP(Ai1Aik)=Sk

image

Now, consider

i1<<ikj{i1,,ik}P(Ai1AikAj)

image

The probability of every intersection of k+1image distinct events Am1,,Amk+1image will appear k+1kimage times in this multiple summation. This is so because each choice of kimage of its indices to play the role of i1,,ikimage and the other to play the role of jimage results in the addition of the term P(Am1Amk+1)image. Hence,

i1<<ikj{i1,,ik}P(Ai1AikAj)=k+1km1<<mk+1P(Am1Amk+1)=k+1kSk+1

image

Similarly, because the probability of every intersection of k+2image distinct events Am1,,Amk+2image will appear k+2kimage times in i1<<ikj1<j2{i1,,ik}P(Ai1AikAj1Aj2)image, it follows that

i1<<ikj1<j2{i1,,ik}P(Ai1AikAj1Aj2)=k+2kSk+2

image

Repeating this argument for the rest of the multiple summations in (2.24) yields the result

P(X=k)=Sk-k+1kSk+1+k+2kSk+2-+(-1)n-knkSn

image

The preceding can be written as

P(X=k)=j=kn(-1)k+jjkSj

image

Using this we will now prove that

P(Xk)=j=kn(-1)k+jj-1k-1Sj

image

The proof uses a backwards mathematical induction that starts with k=nimage. Now, when k=nimage the preceding identity states that

P(X=n)=Sn

image

which is true. So assume that

P(Xk+1)=j=k+1n(-1)k+1+jj-1kSj

image

But then

P(Xk)=P(X=k)+P(Xk+1)=j=kn(-1)k+jjkSj+j=k+1n(-1)k+1+jj-1kSj=Sk+j=k+1n(-1)k+jjk-j-1kSj=Sk+j=k+1n(-1)k+jj-1k-1Sj=j=kn(-1)k+jj-1k-1Sj

image

which completes the proof.

2.8 Limit Theorems

We start this section by proving a result known as Markov’s inequality.

Proposition 2.6

Markov’s Inequality

If Ximage is a random variable that takes only nonnegative values, then for any value a>0image

P{Xa}E[X]a

image

Proof

We give a proof for the case where Ximage is continuous with density fimage.

E[X]=0xf(x)dx=0axf(x)dx+axf(x)dxaxf(x)dxaaf(x)dx=aaf(x)dx=aP{Xa}

image

and the result is proven.

As a corollary, we obtain the following.

Proposition 2.7

Chebyshev’s Inequality

If Ximage is a random variable with mean μimage and variance σ2image, then, for any value k>0image,

P{X-μk}σ2k2

image

Proof

Since (X-μ)2image is a nonnegative random variable, we can apply Markov’s inequality (with a=k2image) to obtain

P{(X-μ)2k2}E[(X-μ)2]k2

image

But since (X-μ)2k2image if and only if X-μkimage, the preceding is equivalent to

P{X-μk}E[(X-μ)2]k2=σ2k2

image

and the proof is complete.

The importance of Markov’s and Chebyshev’s inequalities is that they enable us to derive bounds on probabilities when only the mean, or both the mean and the variance, of the probability distribution are known. Of course, if the actual distribution were known, then the desired probabilities could be exactly computed, and we would not need to resort to bounds.

Solution: Let Ximage be the number of items that will be produced in a week.

(a) By Markov’s inequality,

P{X1000}E[X]1000=5001000=12

image

(b) By Chebyshev’s inequality,

P{X-500100}σ2(100)2=1100

image

Hence,

P{X-500<100}1-1100=99100

image

and so the probability that this week’s production will be between 400 and 600 is at least 0.99. ■

The following theorem, known as the strong law of large numbers, is probably the most well-known result in probability theory. It states that the average of a sequence of independent random variables having the same distribution will, with probability 1, converge to the mean of that distribution.

Theorem 2.1

Strong Law of Large Numbers

Let X1,X2,image be a sequence of independent random variables having a common distribution, and let E[Xi]=μimage. Then, with probability 1,

X1+X2++Xnnμasn