2

Discrete random variables

Summary. Discrete random variables are studied via their probability mass functions. This leads to the definition of the ‘mean value’ or ‘expectation’ of a random variable. There are discussions of variance, and of functions of random variables. Methods are presented for calculating expectations, including the use of conditional expectation.

2.1 Probability mass functions

Given a probability space image, we are often interested in situations involving some real-valued function X acting on Ω. For example, let image be the experiment of throwing a fair die once, so that image, and suppose that we gamble on the outcome of image in such a way that the profit is

image

where negative profits are positive losses. If the outcome is ω, then our profit is image, where image is defined by

image

The mapping X is an example of a ‘discrete random variable’.

More formally, a discrete random variable X on the probability space image is defined to be a mapping image such that 1

image

(2.1)

image

(2.2)

The word ‘discrete’ here refers to the condition that X takes only countably many values in image.2 Condition (2.2) is obscure at first sight, and the point here is as follows. A discrete random variable X takes values in image, but we cannot predict the actual value of X with certainty since the underlying experiment image involves chance. Instead, we would like to measure the probability that X takes a given value, x say. To this end, we note that X takes the value x if and only if the result of image lies in that subset of Ω which is mapped into x, namely the subset image. Condition (2.2) postulates that all such subsets are events, in that they belong to image, and are therefore assigned probabilities by image.

The most interesting things about a discrete random variable are the values which it may take and the probabilities associated with these values. If X is a discrete random variable on the probability space (image), then its image image is the image of Ω under X, that is, the set of values taken by X.

Henceforth, we abbreviate events of the form image to the more convenient form image.

Definition

The (probability) mass function (or pmf) of the discrete random variable X is the function image defined by

image

(2.4)

Thus, image is the probability that the mapping X takes the value x. Note that image is countable for any discrete random variable X, and

image

(2.5)

image

(2.6)

Equation (2.6) is sometimes written as

image

in the light of the fact that only countably many values of x make non-zero contributions to this sum. Condition (2.6) essentially characterizes mass functions of discrete random variables in the sense of the following theorem.

Theorem 2.7

Let image be a countable set of distinct real numbers, and let image be a collection of real numbers satisfying

image

There exists a probability space image and a discrete random variable X on image such that the probability mass function of X is given by

image

Proof

Take image, image to be the set of all subsets of Ω, and

image

Finally, define image by image for image.

This theorem is very useful, since for many purposes it allows us to forget about sample spaces, event spaces, and probability measures; we need only say ‘let X be a random variable taking the value si with probability image, for image’ and we can be sure that such a random variable exists without having to construct it explicitly.

In the next section, we present a list of some of the most common types of discrete random variables.

  

Exercise 2.8

If X and Y are discrete random variables on the probability space (image), show that U and V are discrete random variables on this space also, where

image

Exercise 2.9

Show that if image is the power set of Ω, then all functions which map Ω into a countable subset of image are discrete random variables.

Exercise 2.10

If E is an event of the probability space image show that the indicator function of E, defined to be the function image on Ω given by

image

is a discrete random variable.

Exercise 2.11

Let (image) be a probability space in which

image

and let U, V, W be functions on Ω defined by

image

for image. Determine which of U, V, W are discrete random variables on the probability space.

Exercise 2.12

For what value of c is the function p, defined by

image

a mass function?

  

2.2 Examples

Certain types of discrete random variables occur frequently, and we list some of these. Throughout this section, n is a positive integer, p is a number in image, and image. We never describe the underlying probability space.

Bernoulli distribution. This is the simplest non-trivial distribution. We say that the discrete random variable X has the Bernoulli distribution with parameter p if the image of X is image, so that X takes the values 0 and 1 only.

Such a random variable X is often called simply a coin toss. There exists image such that

image

(2.13)

and the mass function of X is given by

image

Coin tosses are the building blocks of probability theory. There is a sense in which the entire theory can be constructed from an infinite sequence of coin tosses.

Binomial distribution. We say that X has the binomial distribution with parameters n and p if X takes values in image and

image

(2.14)

Note that (2.14) gives rise to a mass function satisfying (2.6) since, by the binomial theorem,

image

Poisson distribution. We say that X has the Poisson distribution with parameter λ (image) if X takes values in image and

image

(2.15)

Again, this gives rise to a mass function since

image

Geometric distribution. We say that X has the geometric distribution with parameter image if X takes values in image and

image

(2.16)

As before, note that

image

Negative binomial distribution. We say that X has the negative binomial distribution with parameters n and image if X takes values in image and

image

(2.17)

As before, note that

image

using the binomial expansion of image, see Theorem A.3.

Example 2.18

Here is an example of some of the above distributions in action. Suppose that a coin is tossed n times and there is probability p that heads appears on each toss. Representing heads by H and tails by T, the sample space is the set Ω of all ordered sequences of length n containing the letters H and T, where the kth entry of such a sequence represents the result of the kth toss. The set Ω is finite, and we take image to be the set of all subsets of Ω. For each image, we define the probability that ω is the actual outcome by

image

where image is the number of heads in ω and image is the number of tails. Similarly, for any image,

image

For image we define the discrete random variable Xi by

image

Each Xi takes values in image and has mass function given by

image

where image is the ith entry in ω. Thus

image

and

image

Hence, each Xi has the Bernoulli distribution with parameter p. We have derived this fact in a cumbersome manner, but we believe these details to be instructive.

Let

image

which is to say that image. Clearly, Sn is the total number of heads which occur, and Sn takes values in image since each Xi equals 0 or 1. Also, for image we have that

image

(2.19)

and so Sn has the binomial distribution with parameters n and p.

If n is very large and p is very small but np is a ‘reasonable size’ (image, say) then the distribution of Sn may be approximated by the Poisson distribution with parameter λ, as follows. For fixed image, write image and suppose that n is large to find that

image

(2.20)

This approximation may be useful in practice. For example, consider a single page of the Guardian newspaper containing, say, image characters, and suppose that the typesetter flips a coin before setting each character and then deliberately mis-sets this character whenever the coin comes up heads. If the coin comes up heads with probability image on each flip, then this is the equivalent to taking image and image in the above example, giving that the number Sn of deliberate mistakes has the binomial distribution with parameters image and image. It may be easier (and not too inaccurate) to use (2.20) rather than (2.19) to calculate probabilities. In this case, image and so, for example,

image

Example 2.21

Suppose that we toss the coin of the previous example until the first head turns up, and then we stop. The sample space now is

image

where image represents the outcome of k tails followed by a head, and image represents an infinite sequence of tails with no head. As before, image is the set of all subsets of Ω, and image is given by the observation that

image

Let Y be the total number of tosses in this experiment, so that image for image and image. If image, then

image

showing that Y has the geometric distribution with parameter p.

Example 2.22

If we carry on tossing the coin in the previous example until the nth head has turned up, then a similar argument shows that, if image, the total number of tosses required has the negative binomial distribution with parameters n and p.

  

Exercise 2.23

If X is a discrete random variable having the Poisson distribution with parameter λ, show that the probability that X is even is image.

Exercise 2.24

If X is a discrete random variable having the geometric distribution with parameter p, show that the probability that X is greater than k is image.

  

2.3 Functions of discrete random variables

Let X be a discrete random variable on the probability space image and let image. It is easy to check that image is a discrete random variable on image also, defined by

image

Simple examples are

image

If image, the mass function of Y is given by

image

(2.25)

since there are only countably many non-zero contributions to this sum. Thus, if image with image, then

image

while if image, then

image

  

Exercise 2.26

Let X be a discrete random variable having the Poisson distribution with parameter λ, and let image. Find the mass function of Y.

  

2.4 Expectation

Consider a fair die. If it were thrown a large number of times, each of the possible outcomes image would appear on about one-sixth of the throws, and the average of the numbers observed would be approximately

image

which we call the mean value. This notion of mean value is easily extended to more general distributions as follows.

Definition 2.27

If X is a discrete random variable, the expectation of X is denoted by image and defined by

image

(2.28)

whenever this sum converges absolutely, in that image.

Equation (2.28) is often written

image

and the expectation of X is often called the expected value or mean of X.3 The reason for requiring absolute convergence in (2.28) is that the image image may be an infinite set, and we need the summation in (2.28) to take the same value irrespective of the order in which we add up its terms.

The physical analogy of ‘expectation’ is the idea of ‘centre of gravity’. If masses with weights image are placed at the points image of image, then the position of the centre of gravity is image, or image, where image is the proportion of the total weight allocated to position xi.

If X is a discrete random variable (on some probability space) and image, then image is a discrete random variable also. According to the above definition, we need to know the mass function of Y before we can calculate its expectation. The following theorem provides a useful way of avoiding this tedious calculation.

Theorem 2.29 (Law of the subconscious statistician)

If X is a discrete random variable and image, then

image

whenever this sum converges absolutely.

Intuitively, this result is rather clear, since image takes the value image when X takes the value x, an event which has probability image. A more formal proof proceeds as follows.

Proof

Writing I for the image of X, we have that image has image image. Thus

image

if the last sum converges absolutely.

Two simple but useful properties of expectation are as follows.

Theorem 2.30

Let X be a discrete random variable and let image.

(a) If image and image, then image.

(b) We have that image.

Proof

(a) Suppose the assumptions hold. By the definition (2.28) of image, we have that image for all image. Therefore, image for image, and the claim follows.

(b) This is a simple consequence of Theorem 2.29 with image.

Here is an example of Theorem 2.29 in action.

Example 2.31

Suppose that X is a random variable with the Poisson distribution, parameter λ, and we wish to find the expected value of image. Without Theorem 2.29 we would have to find the mass function of Y. Actually this is not difficult, but it is even easier to apply the theorem to find that

image

The expectation image of a discrete random variable X is an indication of the ‘centre’ of the distribution of X. Another important quantity associated with X is the ‘variance’ of X, and this is a measure of the degree of dispersion of X about its expectation image.

Definition 2.32

The variance image of a discrete random variable X is defined by

image

(2.33)

We note that, by Theorem 2.29,

image

(2.34)

where image. A rough motivation for this definition is as follows. If the dispersion of X about its expectation is very small, then image tends to be small, giving that image is small also; on the other hand, if there is often a considerable difference between X and its mean, then image may be large, giving that image is large also.

Equation (2.34) is not always the most convenient way to calculate the variance of a discrete random variable. We may expand the term image in (2.34) to obtain

image

where image as before. Thus we obtain the useful formula

image

(2.35)

Example 2.36

If X has the geometric distribution with parameter p (image), the mean of X is

image

and the variance of X is

image

by (2.35).4 Now,

image

by Footnote 4, giving that

image

  

Exercise 2.37

If X has the binomial distribution with parameters n and image, show that

image

and deduce the variance of X.

Exercise 2.38

Show that image for image.

Exercise 2.39

Find image and image when X has the Poisson distribution with parameter λ, and hence show that the Poisson distribution has variance equal to its mean.

  

2.5 Conditional expectation and the partition theorem

Suppose that X is a discrete random variable on the probability space image, and that B is an event with image. If we are given that B occurs, then this information affects the probability distribution of X. That is, probabilities such as image are replaced by conditional probabilities such as image.

Definition 2.40

If X is a discrete random variable and image, the conditional expectation of X given B is denoted by image and defined by

image

(2.41)

whenever this sum converges absolutely.

Just as the partition theorem, Theorem 1.48, expressed probabilities in terms of conditional probabilities, so expectations may be expressed in terms of conditional expectations.

Theorem 2.42 (Partition theorem)

If X is a discrete random variable and image is a partition of the sample space such that image for each i, then

image

(2.43)

whenever this sum converges absolutely.

Proof

The right-hand side of (2.43) equals, by (2.41),

image

We close this chapter with an example of this partition theorem in use.

Example 2.44

A coin is tossed repeatedly, and heads appears at each toss with probability p, where image. Find the expected length of the initial run (this is a run of heads if the first toss gives heads, and of tails otherwise).

Solution

Let H be the event that the first toss gives heads and image the event that the first toss gives tails. The pair H, image forms a partition of the sample space. Let X be the length of the initial run. It is easy to see that

image

since if H occurs, then image if and only if the first toss is followed by exactly image heads and then a tail. Similarly,

image

Therefore,

image

and similarly,

image

By the partition theorem, Theorem 2.42,

image

  

Exercise 2.45

Let X be a discrete random variable and let g be a function from image to image. If x is a real number such that image, show formally that

image

and deduce from the partition theorem, Theorem 2.42, that

image

Exercise 2.46

Let N be the number of tosses of a fair coin up to and including the appearance of the first head. By conditioning on the result of the first toss, show that image.

  

2.6 Problems

1.  If X has the Poisson distribution with parameter λ, show that

image

for image.

2.  Each toss of a coin results in heads with probability p (image). If image is the mean number of tosses up to and including the rth head, show that

image

for image, with the convention that image. Solve this difference equation by the method described in Appendix B.

3.  If X is a discrete random variable and image, show that image. Deduce that, if image, then image, whenever image is finite.

4.  For what values of c and α is the function p, defined by

image

a mass function?

5.  Lack-of-memory property. If X has the geometric distribution with parameter p, show that

image

for m, image.

We say that X has the ‘lack-of-memory property’ since, if we are given that image, then the distribution of image is the same as the original distribution of X. Show that the geometric distribution is the only distribution concentrated on the positive integers with the lack-of-memory property.

6.  The random variable N takes non-negative integer values. Show that

image

provided that the series on the right-hand side converges.

A fair die having two faces coloured blue, two red and two green, is thrown repeatedly. Find the probability that not all colours occur in the first k throws.

Deduce that, if N is the random variable which takes the value n if all three colours occur in the first n throws but only two of the colours in the first image throws, then the expected value of N is image. (Oxford 1979M)

7.  Coupon-collecting problem. There are c different types of coupon, and each coupon obtained is equally likely to be any one of the c types. Find the probability that the first n coupons which you collect do not form a complete set, and deduce an expression for the mean number of coupons you will need to collect before you have a complete set.

*8. An ambidextrous student has a left and a right pocket, each initially containing n humbugs. Each time he feels hungry, he puts a hand into one of his pockets and, if it is not empty, he takes a humbug from it and eats it. On each occasion, he is equally likely to choose either the left or right pocket. When he first puts his hand into an empty pocket, the other pocket contains H humbugs.

Show that if ph is the probability that image, then

image

and find the expected value of H, by considering

image

or otherwise. (Oxford 1982M)

9.  The probability of obtaining a head when a certain coin is tossed is p. The coin is tossed repeatedly until n heads occur in a row. Let X be the total number of tosses required for this to happen. Find the expected value of X.

10.  A population of N animals has had a certain number a of its members captured, marked, and then released. Show that the probability Pn that it is necessary to capture n animals in order to obtain m which have been marked is

image

where image. Hence, show that

image

and that the expectation of n is image. (Oxford 1972M)

1 If image and image, the image of A is the set image of values taken by X on A.

2 A slightly different but morally equivalent definition of a discrete random variable is a function image such that there exists a countable subset image with image.

3 One should be careful to avoid ambiguity in the use (or not) of parentheses. For example, we shall sometimes write image for image, and image for image.

4 To sum a series such as image, just note that, if image, then image, and hence image. The relevant property of power series is that they may be differentiated term by term within their circle of convergence. Repeated differentiation of image yields formulae for image and similar expressions.