image

Conditional Probability and Conditional Expectation

Abstract

One of the most useful concepts in probability theory is that of conditional probability and conditional expectation. The reason is twofold. First, in practice, we are often interested in calculating probabilities and expectations when some partial information is available; hence, the desired probabilities and expectations are conditional ones. Secondly, in calculating a desired probability or expectation it is often extremely useful to first “condition” on some appropriate random variable.

Keywords

Conditional Probability; Conditional Expectation; Computing Expectations by Conditioning; Computing Probabilities by Conditioning; Conditional Variance Formula

3.1 Introduction

One of the most useful concepts in probability theory is that of conditional probability and conditional expectation. The reason is twofold. First, in practice, we are often interested in calculating probabilities and expectations when some partial information is available; hence, the desired probabilities and expectations are conditional ones. Secondly, in calculating a desired probability or expectation it is often extremely useful to first “condition” on some appropriate random variable.

3.2 The Discrete Case

Recall that for any two events Eimage and Fimage, the conditional probability of Eimage given Fimage is defined, as long as P(F)>0image, by

P(EF)=P(EF)P(F)

image

Hence, if Ximage and Yimage are discrete random variables, then it is natural to define the conditional probability mass function of Ximage given that Y=yimage, by

pXY(xy)=P{X=xY=y}=P{X=x,Y=y}P{Y=y}=p(x,y)pY(y)

image

for all values of yimage such that P{Y=y}>0image. Similarly, the conditional probability distribution function of Ximage given that Y=yimage is defined, for all yimage such that P{Y=y}>0image, by

FXY(xy)=P{XxY=y}=axpXY(ay)

image

Finally, the conditional expectation of Ximage given that Y=yimage is defined by

E[XY=y]=xxP{X=xY=y}=xxpXY(xy)

image

In other words, the definitions are exactly as before with the exception that everything is now conditional on the event that Y=yimage. If Ximage is independent of Yimage, then the conditional mass function, distribution, and expectation are the same as the unconditional ones. This follows, since if Ximage is independent of Yimage, then

pXY(xy)=P{X=xY=y}=P{X=x}

image

Example 3.2

If X1image and X2image are independent binomial random variables with respective parameters (n1,p)image and (n2,p)image, calculate the conditional probability mass function of X1image given that X1+X2=mimage.

Solution: With q=1-pimage,

P{X1=kX1+X2=m}=P{X1=k,X1+X2=m}P{X1+X2=m}=P{X1=k,X2=m-k}P{X1+X2=m}=P{X1=k}P{X2=m-k}P{X1+X2=m}=n1kpkqn1-kn2m-kpm-kqn2-m+kn1+n2mpmqn1+n2-m

image

where we have used that X1+X2image is a binomial random variable with parameters (n1+n2,p)image (see Example 2.44). Thus, the conditional probability mass function of X1image, given that X1+X2=mimage, is

P{X1=kX1+X2=m}=n1kn2m-kn1+n2m (3.1)

image (3.1)

The distribution given by Equation (3.1), first seen in Example 2.35, is known as the hypergeometric distribution. It is the distribution of the number of blue balls that are chosen when a sample of mimage balls is randomly chosen from an urn that contains n1image blue and n2image red balls. (To intuitively see why the conditional distribution is hypergeometric, consider n1+n2image independent trials that each result in a success with probability pimage; let X1image represent the number of successes in the first n1image trials and let X2image represent the number of successes in the final n2image trials. Because all trials have the same probability of being a success, each of the n1+n2mimage subsets of mimage trials is equally likely to be the success trials; thus, the number of the mimage success trials that are among the first n1image trials is a hypergeometric random variable.) ■

Example 3.3

If Ximage and Yimage are independent Poisson random variables with respective means λ1image and λ2image, calculate the conditional expected value of Ximage given that X+Y=nimage.

Solution: Let us first calculate the conditional probability mass function of Ximage given that X+Y=nimage. We obtain

P{X=kX+Y=n}=P{X=k,X+Y=n}P{X+Y=n}=P{X=k,Y=n-k}P{X+Y=n}=P{X=k}P{Y=n-k}P{X+Y=n}

image

where the last equality follows from the assumed independence of Ximage and Yimage. Recalling (see Example 2.37) that X+Yimage has a Poisson distribution with mean λ1+λ2image, the preceding equation equals

P{X=kX+Y=n}=e-λ1λ1kk!e-λ2λ2n-k(n-k)!e-(λ1+λ2)(λ1+λ2)nn!-1=n!(n-k)!k!λ1kλ2n-kλ1+λ2n=nkλ1λ1+λ2kλ2λ1+λ2n-k

image

In other words, the conditional distribution of Ximage given that X+Y=nimage is the binomial distribution with parameters nimage and λ1/(λ1+λ2)image. Hence,

E{XX+Y=n}=nλ1λ1+λ2

image

Conditional expectations possess all of the properties of ordinary expectations. For example such identities such as

Ei=1nXiY=y=i=1nE[XiY=y]E[h(X)Y=y]=xh(x)P(X=xY=y)

image

remain valid.

Example 3.4

There are nimage components. On a rainy day, component iimage will function with probability piimage; on a nonrainy day, component iimage will function with probability qiimage, for i=1,,nimage. It will rain tomorrow with probability αimage. Calculate the conditional expected number of components that function tomorrow, given that it rains.

3.3 The Continuous Case

If Ximage and Yimage have a joint probability density function f(x,y)image, then the conditional probability density function of Ximage, given that Y=yimage, is defined for all values of yimage such that fY(y)>0image, by

fXY(xy)=f(x,y)fY(y)

image

To motivate this definition, multiply the left side by dximage and the right side by (dxdy)/dyimage to get

fXY(xy)dx=f(x,y)dxdyfY(y)dyP{xXx+dx,yYy+dy}P{yYy+dy}=P{xXx+dxyYy+dy}

image

In other words, for small values dximage and dyimage, fXY(xy)dximage is approximately the conditional probability that Ximage is between ximage and x+dximage given that Yimage is between yimage and y+dyimage.

The conditional expectation of Ximage, given that Y=yimage, is defined for all values of yimage such that fY(y)>0image, by

E[XY=y]=-xfXY(xy)dx

image

Example 3.7

The joint density of Ximage and Yimage is given by

f(x,y)=12ye-xy,0<x<,0<y<20,otherwise

image

What is E[eX/2Y=1]image?

Solution: The conditional density of Ximage, given that Y=1image, is given by

fXY(x1)=f(x,1)fY(1)=12e-x012e-xdx=e-x

image

Hence, by Proposition 2.1,

EeX/2Y=1=0ex/2fXY(x1)dx=0ex/2e-xdx=2

image

3.4 Computing Expectations by Conditioning

Let us denote by E[XY]image that function of the random variable Yimage whose value at Y=yimage is E[XY=y]image. Note that E[XY]image is itself a random variable. An extremely important property of conditional expectation is that for all random variables Ximage and Yimage

E[X]=EE[XY] (3.2)

image (3.2)

If Yimage is a discrete random variable, then Equation (3.2) states that

E[X]=yE[XY=y]P{Y=y} (3.2a)

image (3.2a)

while if Yimage is continuous with density fY(y)image, then Equation (3.2) says that

E[X]=-E[XY=y]fY(y)dy (3.2b)

image (3.2b)

We now give a proof of Equation (3.2) in the case where Ximage and Yimage are both discrete random variables.

Proof of Equation (3.2) When X and Y Are Discrete. We must show that

E[X]=yE[XY=y]P{Y=y} (3.3)

image (3.3)

Now, the right side of the preceding can be written

yE[XY=y]P{Y=y}=yxxP{X=xY=y}P{Y=y}=yxxP{X=x,Y=y}P{Y=y}P{Y=y}=yxxP{X=x,Y=y}=xxyP{X=x,Y=y}=xxP{X=x}=E[X]

image

and the result is obtained. ■

One way to understand Equation (3.3) is to interpret it as follows. It states that to calculate E[X]image we may take a weighted average of the conditional expected value of Ximage given that Y=yimage, each of the terms E[XY=y]image being weighted by the probability of the event on which it is conditioned.

The following examples will indicate the usefulness of Equation (3.2).

Example 3.9

Sam will read either one chapter of his probability book or one chapter of his history book. If the number of misprints in a chapter of his probability book is Poisson distributed with mean 2 and if the number of misprints in his history chapter is Poisson distributed with mean 5, then assuming Sam is equally likely to choose either book, what is the expected number of misprints that Sam will come across?

Solution: Letting Ximage denote the number of misprints and letting

Y=1,if Sam chooses his history book2,if Sam chooses his probability book

image

then

E[X]=E[XY=1]P{Y=1}+E[XY=2]P{Y=2}=512+212=72

image

The random variable i=1NXiimage, equal to the sum of a random number Nimage of independent and identically distributed random variables that are also independent of Nimage, is called a compound random variable. As just shown in Example 3.10, the expected value of a compound random variable is E[X]E[N]image. Its variance will be derived in Example 3.19.

Example 3.11

The Mean of a Geometric Distribution

A coin, having probability pimage of coming up heads, is to be successively flipped until the first head appears. What is the expected number of flips required?

Solution: Let Nimage be the number of flips required, and let

Y=1,if the first flip results in a head0,if the first flip results in a tail

image

Now,

E[N]=E[NY=1]P{Y=1}+E[NY=0]P{Y=0}=pE[NY=1]+(1-p)E[NY=0] (3.4)

image (3.4)

However,

E[NY=1]=1,E[NY=0]=1+E[N] (3.5)

image (3.5)

To see why Equation (3.5) is true, consider E[NY=1]image. Since Y=1image, we know that the first flip resulted in heads and so, clearly, the expected number of flips required is 1. On the other hand if Y=0image, then the first flip resulted in tails. However, since the successive flips are assumed independent, it follows that, after the first tail, the expected additional number of flips until the first head is just E[N]image. Hence E[NY=0]=1+E[N]image. Substituting Equation (3.5) into Equation (3.4) yields

E[N]=p+(1-p)(1+E[N])

image

or

E[N]=1/p

image

Because the random variable Nimage is a geometric random variable with probability mass function p(n)=p(1-p)n-1image, its expectation could easily have been computed from E[N]=1np(n)image without recourse to conditional expectation. However, if you attempt to obtain the solution to our next example without using conditional expectation, you will quickly learn what a useful technique “conditioning” can be.

Example 3.12

A miner is trapped in a mine containing three doors. The first door leads to a tunnel that takes him to safety after two hours of travel. The second door leads to a tunnel that returns him to the mine after three hours of travel. The third door leads to a tunnel that returns him to his mine after five hours. Assuming that the miner is at all times equally likely to choose any one of the doors, what is the expected length of time until the miner reaches safety?

Solution: Let Ximage denote the time until the miner reaches safety, and let Yimage denote the door he initially chooses. Now,

E[X]=E[XY=1]P{Y=1}+E[XY=2]P{Y=2}+E[XY=3]P{Y=3}=13E[XY=1]+E[XY=2]+E[XY=3]

image

However,

E[XY=1]=2,E[XY=2]=3+E[X],E[XY=3]=5+E[X], (3.6)

image (3.6)

To understand why this is correct consider, for instance, E[XY=2]image, and reason as follows. If the miner chooses the second door, then he spends three hours in the tunnel and then returns to the mine. But once he returns to the mine the problem is as before, and hence his expected additional time until safety is just E[X]image. Hence E[XY=2]=3+E[X]image. The argument behind the other equalities in Equation (3.6) is similar. Hence,

E[X]=132+3+E[X]+5+E[X]orE[X]=10

image

Example 3.13

Multinomial Covariances

Consider nimage independent trials, each of which results in one of the outcomes 1,,rimage, with respective probabilities p1,,pr,image i=1rpi=1image. If we let Niimage denote the number of trials that result in outcome iimage, then (N1,,Nr)image is said to have a multinomial distribution. For ijimage, let us compute

Cov(Ni,Nj)=E[NiNj]-E[Ni]E[Nj]

image

Because each trial independently results in outcome iimage with probability piimage, it follows that Niimage is binomial with parameters (n,pi)image, giving that E[Ni]E[Nj]=n2pipjimage. To compute E[NiNj]image, condition on Niimage to obtain

E[NiNj]=k=0nE[NiNjNi=k]P(Ni=k)=k=0nkE[NjNi=k]P(Nj=k)

image

Now, given that kimage of the nimage trials result in outcome iimage, each of the other n-kimage trials independently results in outcome jimage with probability

P(jnoti)=pj1-pi

image

thus showing that the conditional distribution of Njimage, given that Ni=kimage, is binomial with parameters (n-k,pj1-pi)image. Using this yields

E[NiNj]=k=0nk(n-k)pj1-piP(Ni=k)=pj1-pink=0nkP(Ni=k)-k=0nk2P(Ni=k)=pj1-pi(nE[Ni]-E[Ni2])

image

Because Niimage is binomial with parameters (n,pi)image

E[Ni2]=Var(Ni)+(E[Ni])2=npi(1-pi)+(npi)2

image

Hence,

E[NiNj]=pj1-pi[n2pi-npi(1-pi)-n2pi2]=npipj1-pi[n-npi-(1-pi)]=n(n-1)pipj

image

which yields the result

Cov(Ni,Nj)=n(n-1)pipj-n2pipj=-npipj

image

Example 3.14

The Matching Rounds Problem

Suppose in Example 2.31 that those choosing their own hats depart, while the others (those without a match) put their selected hats in the center of the room, mix them up, and then reselect. Also, suppose that this process continues until each individual has his own hat.

Solution: (a) It follows from the results of Example 2.31 that no matter how many people remain there will, on average, be one match per round. Hence, one might suggest that E[Rn]=nimage. This turns out to be true, and an induction proof will now be given. Because it is obvious that E[R1]=1image, assume that E[Rk]=kimage for k=1,,n-1image. To compute E[Rn]image, start by conditioning on Xnimage, the number of matches that occur in the first round. This gives

E[Rn]=i=0nE[RnXn=i]P{Xn=i}

image

Now, given a total of iimage matches in the initial round, the number of rounds needed will equal 1 plus the number of rounds that are required when n-iimage persons are to be matched with their hats. Therefore,

E[Rn]=i=0n(1+E[Rn-i])P{Xn=i}=1+E[Rn]P{Xn=0}+i=1nE[Rn-i]P{Xn=i}=1+E[Rn]P{Xn=0}+i=1n(n-i)P{Xn=i}by the induction hypothesis=1+E[Rn]P{Xn=0}+n(1-P{Xn=0})-E[Xn]=E[Rn]P{Xn=0}+n(1-P{Xn=0})

image

where the final equality used the result, established in Example 2.31, that E[Xn]=1image. Since the preceding equation implies that E[Rn]=nimage, the result is proven. (b) For n2image, conditioning on Xnimage, the number of matches in round 1, gives

E[Sn]=i=0nE[SnXn=i]P{Xn=i}=i=0n(n+E[Sn-i])P{Xn=i}=n+i=0nE[Sn-i]P{Xn=i}

image

where E[S0]=0image. To solve the preceding equation, rewrite it as

E[Sn]=n+E[Sn-Xn]

image

Now, if there were exactlyimage one match in each round, then it would take a total of 1+2++n=n(n+1)/2image selections. Thus, let us try a solution of the form E[Sn]=an+bn2image. For the preceding equation to be satisfied by a solution of this type, for n2image, we need

an+bn2=n+Ea(n-Xn)+b(n-Xn)2

image

or, equivalently,

an+bn2=n+a(n-E[Xn])+bn2-2nE[Xn]+EXn2

image

Now, using the results of Example 2.31 and Exercise 72 of Chapter 2 that E[Xn]=Var(Xn)=1image, the preceding will be satisfied if

an+bn2=n+an-a+bn2-2nb+2b

image

and this will be valid provided that b=1/2,a=1image. That is,

E[Sn]=n+n2/2

image

satisfies the recursive equation for E[Sn]image.
The formal proof that E[Sn]=n+n2/2,n2image, is obtained by induction on nimage. It is true when n=2image (since, in this case, the number of selections is twice the number of rounds and the number of rounds is a geometric random variable with parameter p=1/2image). Now, the recursion gives

E[Sn]=n+E[Sn]P{Xn=0}+i=1nE[Sn-i]P{Xn=i}

image

Hence, upon assuming that E[S0]=E[S1]=0,E[Sk]=k+k2/2image, for k=2,,n-1image and using that P{Xn=n-1}=0image, we see that

E[Sn]=n+E[Sn]P{Xn=0}+i=1n[n-i+(n-i)2/2]P{Xn=i}=n+E[Sn]P{Xn=0}+(n+n2/2)(1-P{Xn=0})-(n+1)E[Xn]+E[Xn2]/2