6

Binomial Probability Distributions

squimg 6.1 Binomial Random Variables

A researcher takes a simple random sample of 20 elementary school students. How many students in the sample will have asthma? A new treatment for breast cancer is used in 150 cases. How many of these individuals will survive five or more years? In a random sample of 100 leukemia cases, how many will have a history of exposure to high-tension electric transmission lines? The random number of “successes” in each of these scenarios is described by binomial probability mass functions (pmfs).

Binomial pmfs are a family of probability models that apply to random variables in which:

•   There are n independent observations.

•   Each observation can be characterized as a “success” or “failure.”

•   The probability of “success” for each observation is a constant p.

Single random occurrences that can be characterized as either a “success” or “failure” are called Bernoulli trials. The total number of successes in a series of n independent Bernoulli trials when each trial has probability of success p is a binomial random variable. Thus, binomial random variables are characterized by two parameters:

1.   n (the number of independent Bernoulli trials)

2.   p (the probability of success for each trial)

ILLUSTRATIVE EXAMPLE

Four patients binomial example. Four patients are treated with an intervention that is successful 75% of the time. The random number of successes is a binomial random variable with parameters n = 4 and p = 0.75.

ILLUSTRATIVE EXAMPLE

Asthma survey binomial example. Four percent of the children in a large school population have asthma. The number of asthmatics in a simple random sample of n = 20 from this population is a binomial random variable with n = 20 and p = 0.04.

Exercises

6.1      Tay-Sachs. Tay-Sachs is a metabolic disorder that is inherited as an autosomal recessive trait. Both recessive alleles are necessary for expression of the disease. Therefore, when each parent is a carrier, there is a one in four chance of transmitting the genetic disorder to each offspring.

(a)   Let X represent the number of offspring affected in three consecutive conceptions from Tay-Sachs carrier parents. Is X a binomial random variable? Explain your response.

(b)   A carrier couple is unaware of their carrier state. Let X represent the number of children conceived before an affected offspring is encountered. Is X a binomial random variable? Explain.

6.2      Breast cancer. Say the lifetime probability of developing female breast cancer in a population is 1 in 10. Let X represent the number of women among 5102 women, selected randomly from this population, who ultimately develop breast cancer. Explain why X is a binomial random variable.

squimg 6.2 Calculating Binomial Probabilities

We will use the notation X ~ b(n, p) to refer to a binomial random variable with parameters n and p. The tilda (~) is read “distributed as” so “X ~ b(n, p)” is read “X is distributed as a binomial random variable with parameters n and p.” For example, X ~ b(4, 0.75) is read “X is a binomial random variable with n = 4 and p = 0.75” (like the four patients binomial example).

Binomial probabilities are calculated with the formula:

image

where nCx is the binomial coefficient (described below), n is the number of independent Bernoulli trials, p is the probability of success for each trial, and q = 1 − p.

The binomial coefficient nCx (loosely called the “choose function”) determines the number of different ways to choose x items out of n (“n choose x”). Its formula is:

image

where the factorial function x! = x · (x − 1) · (x − 2) · … · 1. For example, 4! = 4 · 3 · 2 · 1 = 24. By definition 0! = 1.

ILLUSTRATIVE EXAMPLE S

Binomial coefficients. Three examples of inline are presented.

•   How many ways can you choose two items out of three? Solution:

inline. This means that you can choose two items out of the three different ways—label items A, B, and C. You can choose {A, B}, {A, C}, or {B, C}.

•   How many ways can we choose two items out of four? Solution:

inline

•   How many ways can we choose five items out of seven?

inline

We are now ready to calculate binomial probabilities.

ILLUSTRATIVE EXAMPLE

Binomial probabilities (Four patients). We have established a scenario in which four patients are treated with an intervention that is successful 75% of the time. The number of patients who respond to treatment is a binomial random variable with parameters n = 4 and p = 0.75. It follows that probability of a failure q = 1 − p = 1 − 0.75 = 0.25.

What is the probability of observing no successes among the four treatments?

Solution: Pr(X = 0) = nCx · px · qnx = 4C0 · 0.750 · 0.254−0 = 1 · 1 · 0.0039 = 0.0039

What is the probability of one success?

Solution: Pr(X = 1) = nCx · px · qnx = 4C1 · 0.751 · 0.254−1 = 4 · 0.75 · 0.015625 = 0.0469

The probability of two successes is

Pr(X = 2) = 4C2 · 0.752 · 0.254−2 = 6 · 0.5625 · 0.0625 = 0.2109

The probability of three successes is

Pr(X = 3) = 4C3 · 0.753 · 0.254−3 = 4 · 0.4219 · 0.25 = 0.4219

The probability of four successes is

Pr(X = 4) = 4C4 · 0.754 · 0.254−4 = 1 · 0.3164 · 1 = 0.3164

Table 6.1 lists this pmf in tabular form.

TABLE 6.1 Probability mass function for X ~ b(4, 0.75) in tabular form.

tab

ILLUSTRATIVE EXAMPLE

Binomial probabilities (Asthma survey). We take a simple random sample of n = 20 from a population in which the prevalence of asthma is 0.04. The number of asthmatics in a given random sample is denoted X ~ b(20, 0.04). Since p = 0.04, q = 1 − 0.04 = 0.96.

What is the probability of observing no asthmatics in a sample?

Solution: Pr(X = 0) = nCx · px · qnx = 20C0 · 0.040 · 0.9620−0 = 1 · 1 · 0.4420 = 0.4420

What is the probability of observing one asthmatic?

Solution: Pr(X = 1) = nCx · px · qnx = 20C1 · 0.041 · 0.9620−1 = 20 · 0.04 · 0.4604 = 0.3683

What is the probability of observing two asthmatics?

Solution: Pr(X = 2) = 20C2 · 0.042 · 0.9620−2 = 190 · 0.0016 · 0.4796 = 0.1458

To complete the pmf, we would need to calculate Pr(X = 3), Pr(X = 4), …, Pr(X = 20). We will not demonstrate all these calculations, but do show the results as Figure 6.1.

images

FIGURE 6.1 Probability mass function for binomial random variable with p = 0.04 and n = 20.

squimg 6.3 Cumulative Probabilities

A cumulative probability is the probability of observing a certain value or less. For example, the cumulative probability of 2 for a discrete random variable Pr(X ≤ 2) = Pr(X = 0) + Pr(X = 1) + Pr(X = 2).

ILLUSTRATIVE EXAMPLE

Cumulative probability (Asthma survey). The number of asthmatics in the asthma survey discussed in the prior illustration varies according to a binomial distribution with n = 20 and p = 0.04. What is the cumulative probability of observing two asthmatics in a sample? We established in the prior illustration box that Pr(X = 0) = 0.4420, Pr(X = 1) = 0.3683, and Pr(X = 2) = 0.1458.

Solution: Pr(X ≤ 2) = Pr(X = 0) + Pr(X = 1) + Pr(X = 2) = 0.4420 + 0.3683 + 0.1458 = 0.9561

squimg 6.4 Probability Calculators

Because binomial probabilities can be tedious to calculate, some textbooks will include tables of binomial probabilities for selected values of n and p. These tables are limited in that they are very bulky and cannot show all possible combinations of n and p.

ILLUSTRATIVE EXAMPLE

Cumulative probability (Four patients). The number of patients responding to treatment in the four patients illustration is a binomial random variable with n = 4 and p = 0.75. What is the cumulative probability of three successes? Table 6.1 lists the pmf for the random variable.

Solution: Pr(X ≤ 3) = Pr(X = 0) + Pr(X = 1) + Pr(X = 2) + Pr(X = 3) = 0.0039 + 0.0469 + 0.2109 + 0.4219 = 0.6836. Figure 6.2 shows this cumulative probability as the shaded region in the left “tail” of the distribution.

images

FIGURE 6.2 Cumulative probabilities correspond to areas under the curve to the left of the value. This pmf shows the cumulative probability of 3 for X ~ b(4, 0.75).

The cumulative distribution function (cdf) of a random variable is the cumulative probabilities for all values of the variable. Table 6.2 lists the cdf for the four patients example in tabular form.

TABLE 6.2 Cumulative probability function for X ~ b(4, 0.75) in tabular form.

tab

It is for this reason that this text does not include binomial tables; instead, it occasionally uses freeware probability calculators (such as StaTable®)a for this purpose. In addition, Microsoft Excel® will calculate binomial probabilities.

For example, Excel’sb BINOMDIST function can be used to calculate the probability of three successes for X ~ b(4, 0.75) with this argument: = BINOMDIST(3,4,0.75,0). The cumulative probability of 3 uses the argument = BINOMDIST(3,4,0.75,1).

Exercises

6.3      Tay-Sachs inheritance. Exercise 6.1 introduced facts about Tay-Sachs inheritance. If both parents are carriers, there is a one in four chance the offspring inherits both alleles necessary for expression of the disease. Suppose a carrier couple plans on having three children. Build the probability mass function (pmf) for the number of conceptions that will receive Tay-Sachs genes from both parents.

6.4      Tay-Sachs couples. Approximately 1 in 28 people of Ashkenazi Jewish descent are Tay-Sachs carriers. In randomly sampling one man and one woman from this population, what is the probability neither is a Tay-Sachs carrier? What is the probability both are carriers?

6.5      Telephone survey. A telephone survey uses a random digit dialing machine to call subjects. The random digit dialing machine is expected to reach a live person 15% of the time. In eight attempts, what is the probability of achieving exactly two successful calls?

6.6      Telephone survey, two or fewer. In the telephone survey technique introduced in Exercise 6.5, what is the probability of two or fewer successful calls in eight attempts?

squimg 6.5 Expected Value and Variance of a Binomial Random Variable

There is a general formula for calculating the mean (expected value) and variance of a discrete random variable. Those formulas can also be applied to binomial random variables. Shortcut formulas also exist. The shortcut formula for the mean of a binomial random variable is:

image

The shortcut formula for the variance of a binomial random variable is:

image

where q = 1 − p.

The standard deviation is the square root of the variance:

image

ILLUSTRATIVE EXAMPLE

Expected value and variance of a binomial random variable (Asthma survey). The asthma survey considered in previous illustrative examples addresses a binomial random variable with n = 20 and p = 0.04. It follows that q = 1 − 0.04 = 0.96. The expected value of this random variable μ = np = 20 · 0.04 = 0.8. It has variance σ2 = npq = 20 · 0.04 · 0.96 = 0.768 and standard deviation inline.

ILLUSTRATIVE EXAMPLE

Expected value and variance of a binomial random variable (Four patients example). The four patients illustrative example has established a binomial random variable in which n = 4 and p = 0.75. Therefore, this distribution has μ = np = 4 · 0.75 = 3 and σ2 = npq = 4 · 0.75 · (1 − 0.75) = 0.75. Its standard deviation inline. Figure 6.3 depicts μ as the balancing point of the pmf.

images

FIGURE 6.3 The expected value μ of X ~ b(4, 0.75).

Exercises

6.7      Tay-Sachs. Exercises 6.1 and 6.3 considered the random number of Tay-Sachs cases in three pregnancies from a carrier couple (X ~ b(3, 0.25)). What is the expected value and variance of this random variable?

6.8      Telephone survey. Exercise 6.5 introduced a random digit dialing machine with p = 0.15 for each call attempt.

(a)   What is the expected value and variance for the number of contacts in eight attempts?

(b)   What is the expected value and variance for 50 attempts?

squimg 6.6 Using the Binomial Distribution to Help Make Judgments About the Role of Chance

The binomial distribution can be used to assess whether a given number of successes in a sample would be surprising under specified conditions. As an example, consider that 1 in 10 women will develop breast cancer during their lifetime. In a simple random sample of n = 3, the number of cases X ~ b(3, 0.10). Suppose we take a simple random sample of three women and all three ultimately develop breast cancer. Is this just “bad luck” or should we suspect something is awry? To address this problem, first ask, “What is the probability of seeing three cases under these conditions?” If the sampling process is truly random, the number of cases will follow a binomial distribution with n = 3 and p = 0.10, and the probability of observing three cases is:

image

Therefore, only 1 in 1000 such samples will have three cases. This is unlikely (but could still happen). This is the “chance explanation.”c At least two other explanations can be entertained. These are the “assumed value of p is wrong explanation” and the “biased selection explanation.” The “assumed value of p is wrong explanation” suggests that p is greater than anticipated. The initial probability model assumed p was equal to 0.10. If we had made our selection from a subgroup with greater underlying risk—for example, from a family with a higher than typical risk of breast cancer—then X ~ b(3, 0.1) would not hold.

The last explanation is the “biased selection explanation.” The binomial distribution assumes that the observations were derived by a simple random sample. If the sample was not random, but instead had intentionally over-sampled breast cancer cases, the sample would not be random and the binomial model would no longer hold.

In summary, three alternative explanations are presented for the observed finding.

1.   The chance explanation

2.   The assumed value of p is wrong explanation

3.   The biased selection explanation

Given the limited available information, all are good explanations for the observation. Let’s apply this line of reasoning to another example.

ILLUSTRATIVE EXAMPLE

Using the binomial model to question the role of chance (Asthma survey). Start with the assumption that the prevalence of asthma in a school population is 0.04. Select at random 20 students from the school. Therefore, the random number of asthmatics in a given sample X ~ b(20, 0.04).

Suppose we find two asthmatics in a sample. The expected number of asthmatics in the sample μ = np = (20)(0.04) = 0.8, so we have seen more than expected. We are aware that some samples are going to randomly capture more cases than others. What is the probability of seeing two or more cases under these conditions?

To assess the role of chance, we determine the probability of seeing at least two cases. This is Pr(X ≥ 2) = Pr(X = 2) + Pr(X = 3) + ··· + Pr(X = 20). Because this is a lengthy series of calculation, we make use of the fact that Pr(X ≥ 2) = 1 − Pr(X ≤ 1) and calculate:

Pr(X = 0) = nCx · px · qnx = 20C0 · 0.040 · 0.9620−0 = 1 · 1 · 0.4420 = 0.4420

Pr(X = 1) = nCx · px · qnx = 20C1 · 0.041 · 0.9620−1 = 20 · 0.04 · 0.4604 = 0.3683

Therefore, Pr(X ≤ 1) = Pr(X = 0) + Pr(X = 1) = 0.4420 + 0.3683 = 0.8103 and Pr(X ≥ 2) = 1 − Pr(X ≤ 1) = 1 − 0.8103 = 0.1897 (nearly 19%). Figure 6.4 displays this as the shaded region in the right tail of the distribution.

The chance of seeing two or more cases in a sample is pretty high. Therefore, chance is a good explanation of the finding.

images

FIGURE 6.4 Probability of 2 or more on X ~ b(20, 0.04).

Summary Points (Binomial Probability Distributions)

1.   Binomial distributions are a family of probability mass functions (pmfs) that describe the random number of “successes” among n independent trials, where the probability of “success” in each trial is consistently p.

2.   Binomial random variables have two parameters: n (number of observations) and p (probability of success for each observation).

3.   The notation X ~ b(n, p) is read as “X is distributed as a binomial random variable with parameters n and p.”

4.   The probability of observing x successes for a binomial random variable X is given by Pr(X = x) = nCx pxqnx, where inline.

5.   The cumulative probability of an event is the probability of observing given value x or less, that is, Pr(Xx).

Vocabulary

Bernoulli trial

Binomial

Binomial coefficient (nCx)

Cumulative distribution function (cdf)

Factorial function (x!)

Parameters

Probability mass function (pmf)

X ~ b(n, p)

Review Questions

6.1      Binomial distributions have two parameters. Name them.

6.2      What does the symbol X represent in the statement X ~ b(n, p)?

6.3      What does the symbol ~ represent in the statement X ~ b(n, p)?

6.4      What does the symbol n represent in the statement X ~ b(n, p)?

6.5      What does the symbol p represent in the statement X ~ b(n, p)?

6.6      What is a Bernoulli trial?

6.7      What does “4C2 = 6” mean in plain language?

6.8      What does X represent in the statement Pr(X = x)?

6.9      What does x represent in the statement Pr(X = x)?

6.10    How do you read the statement “Pr(X = x)”?

6.11    How do you read the statement “X ~ b(n, p)”?

6.12    By definition, 0! = ____

6.13    Determine the value of 7!/6! without using a calculator.

6.14    What does q represent in the context of binomial distributions?

6.15    Fill in the blank: Pr(Xx) represents the _____________ probability of x.

6.16    What symbol is used to represent the mean (expected value) of a binomial distribution?

6.17    What symbol is used to represent the variance of a binomial distribution?

6.18    Fill in the blank: The expected number of successes μ for a binomial random variable X ~ (n, p) is equal to n × ____.

Exercises

6.9      Prevalence 76.8%. The prevalence of a trait is 76.8%.

(a)   In a simple random sample of n = 5, how many individuals are expected to exhibit this characteristic?

(b)   How many would you expect to see with this characteristic in a simple random sample of n = 10?

(c)   What is the probability of seeing nine or more individuals with this characteristic in an SRS of n = 10?

6.10    Smoking on campus. Suppose 20% of the students on campus smoke. You select two students at random. In what percentage of samples will both students be smokers?

6.11    Prevalence 10%. The prevalence of a condition in a population is 10%. You take a simple random sample of 15 people from this population. Let X represent the number of individuals in the sample with the condition in question.

(a)   Describe the distribution X.

(b)   What is the probability of seeing no cases in a sample?

(c)   What is the probability of seeing one case?

(d)   What is the probability of seeing one or fewer cases?

(e)   What is the probability of at least two cases?

6.12    Herpes simplex-2. Suppose 7.5% of a population is infected with Herpes simplex-2 virus (HSV2). You select seven individuals at random from the population. What is the probability of finding at least one HSV2-positive individual in your sample? [Hint: First find Pr(X = 0). Then make use of the fact that (X ≥ 1) = 1− Pr(X = 0).]

6.13    Linda’s omelets. Linda hears a story on National Public Radio stating that one in six eggs in the United States are contaminated with Salmonella. If Salmonella contamination occurs independently within and between egg cartons and Linda makes a three-egg omelet, what is the probability that her omelet will contain at least one Salmonella-contaminated egg?

6.14    Electromagnetic fields. Twenty-five percent of the children in a community are exposed to high levels of electromagnetic field (EMF) radiation. You select at random a control series of 20 children from this neighborhood. Construct the pmf that describes the number of children in the sample that are exposed to high EMF levels.

6.15    Decayed teeth. A child has 20 deciduous teeth. Two of her teeth are decayed. Given that this is all that you currently know about the child’s dentition, how many different possible combinations of decayed teeth might she have?

6.16    False positives. A rapid screening test for HIV has a false positive rate of 0.5%. This means that the probability of a false positive test is 0.005. If the test is used in 500 HIV-free individuals, what is the expected number of false positives in the sample? In addition, what is the probability of encountering no false positives in the sample?

6.17    Human papillomavirus. In a particular population, 20% of the individuals are human papillomavirus carriers. Select four individuals at random from this population.

(a)   Build the pmf for the number of HPV+ individuals in the sample.

(b)   What is the probability of finding at least one carrier in the sample?

6.18    Random 7s. The digits (0 through 9) in Appendix Table A occur randomly throughout the table. Thus, a value of 7 has a 0.1 chance of occurring in any single slot anywhere in the table.

(a)   Each line in Appendix Table A has 50 slots. On average, how many 7s do we expect to find in a given line?

(b)   What is the probability of finding exactly five 7s in a given line?

______________

a Cytel Inc., 675 Massachusetts Avenue, Cambridge, MA 02139-3309. Available: www.cytel.com

b See Microsoft Excel, www.microsoft.com.

c We can also call this the “bad luck explanation” because it was a just a matter of bad luck to have selected three cases.