Basic Biostatistics: Statistics for Public Health Practice, Second Edition

7	Normal Probability Distributions

The prior chapter considered the most popular type of probability distribution that applies to discrete random variables—the binomial distributions. This chapter considers the most popular type of probability distribution that applies to continuous random variables—the Normal distributions. The popularity of the Normal distribution can be explained by four facts: (a) some natural phenomena follow Normal distributions; (b) some natural phenomena are approximately Normal; (c) some can be transformed to approximate Normality; and (d) the variability associated with random sampling tends toward Normality even in non-Normal populations (Section 8.2).

The mathematics of the Normal distribution was developed in the 18th century by Abraham de Moivre and Pierre-Simon Laplace. Johann Carl Friedrich Gauss popularized the use of the distribution in the early 19th century, which explains why it is sometimes called the Gaussian distribution.

squimg 7.1 Normal Distributions

A Heuristic Example

While discrete random variables were described with chunky mass functions, continuous random variables are described with smooth probability density functions (pdf). Figure 7.1 depicts a histogram with an overlying Normal probability density function curve. Normal distributions are recognized by their bell shape. Notice that a large percentage of the curve’s area is located near its center and that its tails approach the horizontal axis as asymptotes.

Figure 7.2 displays the same distribution with the histogram bars shaded for subjects that are less than 9 years of age.^a The area of the shaded bars makes up approximately one third of the histogram; about one third of the sample is less than 9. When working with Normal curves, we drop the histogram and look only at the area under the curve (Figure 7.3). As introduced in Section 5.4, the area under the pdf’s curve corresponds to the probability of the specified range.

FIGURE 7.1 Histogram with overlying Normal curve.

Characteristics of Normal Distributions

Normal distributions are defined by this mathematical function:

where μ represents the mean of the distribution and σ represents its standard deviation.^b This function applies to a family of distributions with each family member identified by parameters μ and σ. Because there are many different Normal distributions, each with its own μ and σ, let X ~ N(μ, σ) represent a specific member of the Normal distribution family. As before, the symbol “~” is read as “distributed as.” For example, X ~ N(100, 15) is read as “X is a Normal random variable with mean 100 and standard deviation 15.

FIGURE 7.2 Proportion less than 9 shaded darker color.

FIGURE 7.3 Proportion less than 9 (area under curve).

FIGURE 7.4 Normal distributions with different means.

Changing μ shifts the distribution on its horizontal axis. Figure 7.4 displays Normal curves with different means.

The standard deviation σ determines the spread of the distribution. Figure 7.5 depicts two Normal curves with different standard deviations.

You can get a rough idea of the size of the distribution’s standard deviation σ by identifying its points of inflection. This is where the Normal curve begins to change slope. Figure 7.6 shows the location of these inflection points on a Normal curve. Trace the curve with your finger; a point of inflection is where you feel the curve begin to change slope. Once you’ve identified the inflection points, use these landmarks to identify points that are one standard deviation above and below the mean (μ ± σ) on the curve’s horizontal axis.

images

FIGURE 7.5 Normal distributions with different standard deviations.

FIGURE 7.6 Points of inflection.

The 68–95–99.7 Rule

The 68–95–99.7 rule is used as a guide when working with Normal random variables. This rule says that:

• 68% of the area under the Normal curve lies in the region μ ± σ.

• 95% of the area under the Normal curve lies in the region μ ± 2σ.

• 99.7% of the area under the Normal curve lies in the region μ ± 3σ.

Figure 7.7 shows this visually.

Although μ and σ vary from Normal random variable to Normal random variable, you can always depend on the 68–95–99.7 rule to predict the percentage of individuals who fall in these ranges. Here is an example.

FIGURE 7.7 The 68–95–99.7 rule.

ILLUSTRATIVE EXAMPLE

68–95–99.7 rule (WAIS). The Wechsler Adult Intelligence Scale (WAIS) is a commonly used intelligence test that is calibrated to produce a Normal distribution of scores with μ = 100 and σ = 15 for various age groups.^c Based on the 68–95–99.7 rule, we can say that:

• 68% of the scores lie in the range 100 ± 15 = 85 to 115.

• 95% lie in the range 100 ± (2)(15) = 70 to 130.

• 99.7% lie in the range 100 ± (3)(15) = 55 to 145.

Figure 7.8 depicts these landmarks.

Focus on the middle 95% of scores in Figure 7.8. This is defined by the range 70 and 130. Five percent of the scores lie outside this range. Because the distribution is symmetrical, half of the remaining 5.0% (2.5%) of the scores are below 70 (in the left tail of the distribution) and the other 2.5% are above 130, in the right tail. Figure 7.9 depicts these tails.

FIGURE 7.8 Distribution of Wechsler adult intelligence scale.

FIGURE 7.9 The symmetry of the Normal curve allows us to make statements about probabilities above and below certain cut points.

ILLUSTRATIVE EXAMPLE

Tails of the Normal distribution (Gestational length). Gestation is the period of pregnancy from conception to delivery.^d Uncomplicated human gestations (without intervention) vary according to a Normal distribution with μ = 39 weeks and σ = 2 weeks.^e According to the “95 part” of the 68-95-99.7 rule, we can say that 95% of uncomplicated human gestations will fall in the range μ ± 2σ = 39 ± 2 · 2 = 39 ± 4 = 35 to 43 weeks. This leaves 5% of the values outside of this range. Since the distribution is symmetrical, we can say that 2.5% of gestations will be fewer than 35 weeks (in the left tail of the distribution) and 2.5% will be more than 43 weeks (in the right tail).

Reexpression

Many random variables encountered in nature are not Normal.^f We can, however, often make random variables more Normal by reexpressing them with a mathematical transformation. Many types of mathematical transforms are available for this purpose (e.g., logs, exponents, powers, roots, and quadratics). Logarithmic transformations are particularly useful for bringing in the right tails of distributions with positive skews. The use of logarithmic transformations is common, so let us take this opportunity for a brief review of the subject.

There are different kinds of logarithms, depending on their base. The two most popular logarithms are common logarithms (base 10) and natural logarithms (base e; e is approximately equal to 2.71828…).^g

Logarithms are exponents of their base. For example, the common log (base 10) of 100 is 2 because 10² = 100.

By the same token, the natural logarithm (base e) of 100 is approximately 4.60517 because e^4.60517… ≈ 2.71828^4.60517… ≈ 100. This is expressed as:

No matter the base, log_base(base) = 1 and log_base(1) = 0, since base¹ = base and base⁰ = 1.

Here is an example of how a logarithmic transformation is used in practice:

ILLUSTRATIVE EXAMPLE

Log transformation (Prostate-specific antigen). Prostate-specific antigen (PSA) and its isoforms are used to screen for prostate cancer in men. Reference ranges for PSA do not vary Normally, but their logarithms do so approximately. A reference lab finds that natural logarithmic transformed PSA values for 50- to 60-year-old disease-free men vary according to a Normal with a mean of −0.3 and standard deviation of 0.80.^h According to the 68–95–99.7 rule, 95% of the ln(PSA) values in this population will be in the interval −0.3 ± (2) (0.80) = −1.9 to 1.3. Figure 7.10 depicts this distribution. Notice that 2.5% of men will have ln(PSA) values less than −1.90 and 2.5% will have values above 1.30. We exponentiate these limits to convert them back to their initial scale: e^−1.9 = 0.15 and e^1.3 = 3.67. The upper limit can now serve as a cutoff for PSA screening.ⁱ

images

FIGURE 7.10 PSA levels in 50- to 60-year-old men.

Section 7.4 includes additional examples of logarithmic reexpressions.

Exercises

7.1 Heights of 10-year-olds. Heights of 10-year-old male children follow a Normal distribution with μ =138 centimeters and σ = 7 centimeters.^j Create a sketch of a Normal curve depicting this distribution. Mark the horizontal axis with values that are one standard deviation above and below the mean. (Locate points of inflection on the curve as accurately as possible.) Then, use the 68–95–99.7 rule to determine the middle 68% of values. What percent of values fall below this range? What percent fall above this range?

7.2 Height of 10-year-olds. This exercise continues the work we began in the prior exercise. What range of values will capture the middle 95% of heights? How tall are the tallest 2.5%?

7.3 Visualizing the distribution of gestational length. The gestation length illustrative example presented earlier in this chapter established that uncomplicated pregnancies vary according to a Normal distribution with μ = 39 weeks and σ = 2 weeks. Sketch a Normal curve depicting this distribution. Label its horizontal axis with landmarks that are ±σ and ±2σ on either side of the mean. Remember to use the curve’s inflection points to establish distances for landmarks.

squimg 7.2 Determining Normal Probabilities

We often need to determine probabilities for Normal values that do not fall exactly ±1σ, ±2σ, or ±3σ from the mean. To accomplish this, we first standardize the values and then use a Standard Normal table to look up the associated probability. A Standard Normal table lists cumulative probabilities for a Normal random variable with μ = 0 and σ = 1.

Standardizing Values

Standardizing a Normal value transforms it to a Normal scale with mean 0 and standard deviation 1. This special Normal distribution is called a Z-distribution, and the transformed value is called a z-score; Z ~ N(0,1). The formula is:

where x is the value you want to standardize, μ is the mean of the distribution, and σ is its standard deviation. The z-score tells you the distance the value falls from the mean in standard deviation units. Values that are larger than the mean will have positive z-scores. Values that are smaller than the mean will have negative z-scores. For example, a z-score of 1 tells you that the value is one standard deviation above the mean. A z-score of −2 tells you that the value is two standard deviations below the mean.

ILLUSTRATIVE EXAMPLE

Standardization (WAIS). Recall that Wechsler adult intelligence scores vary according to a Normal distribution with μ = 100 and σ = 15. What is the z-score of a value of 95?

Solution: . This indicates the score is 0.33 standard deviations below the mean.

ILLUSTRATIVE EXAMPLE

Standardization (Gestational length). We have established that uncomplicated human pregnancies have a gestation period that is approximately Normal with μ = 39 weeks and σ = 2 weeks. What is the z-score for a pregnancy that lasts 36 weeks?

Solution: A pregnancy that is 36 weeks in length corresponds to . This gestation is 1.5 standard deviations below the mean.

The Standard Normal Table

After the Normal random variable has been standardized, we can find its cumulative probability with a Standard Normal table (z table). Our Standard Normal (z) table appears as Appendix Table B. It is also available online. Figure 7.11 shows a portion of this table.

The setup of this table may seem strange. The one and tens places for Normal z-scores appear in the left column of the table. The hundredths place for the Normal z-scores appears in the top row of the table. Table entries represent cumulative probabilities, or areas under the curve to the left of the Normal z-score. As an example, Figure 7.11 highlights the entry for z = 1.96. This point has a cumulative probability of 0.9750. The figure below the table depicts this graphically.

FIGURE 7.11 Portion of Table B highlighting z = 1.96.

Table B lists cumulative probabilities for Normal z-scores between −3.49 and 3.49. Values less than or equal to −3.50 have a cumulative probability less than 0.0002, and values greater than 3.50 have a cumulative probability of more than 0.9998.

FIGURE 7.12 Pr(a ≤ Z ≤ b) by subtraction.

Probabilities for Ranges of Normal Random Variables

You can determine probabilities for ranges of Normal random variables by standardizing the range and using the z table to determine the enclosed area under the curve. Here is a procedure you can use for this purpose:

1. State the problem.

2. Standardize values.

3. Sketch the curve and shade the probability area.

4. Use Table B to determine the probability.

You can determine probabilities between any two points on a Normal distribution as follows: Let a represent the lower boundary of the interval and b represent the upper boundary. The probability of seeing a value between a and b is:

Figure 7.12 is a schematic of this approach.

ILLUSTRATIVE EXAMPLE

Probabilities for ranges (Gestational length). A prior illustrative example established that uncomplicated human gestation varies according to a Normal distribution with μ = 39 weeks and σ = 2 weeks. What is the probability that an uncomplicated pregnancy selected at random lasts 41 weeks or fewer? What is the probability it lasts more than 41 weeks?

1. State. Let X represent gestational length: X ~ N(39, 2). We want to determine Pr(X ≤ 41)^k and Pr(X > 41).

2. Standardize. The z-score associated with a 41-week gestation is

3. Sketch. Figure 7.13 shows the sketch of the Normal distribution for this problem. The horizontal axis is scaled with values for both the original variable and Standard z-scores. The region corresponding to Pr(X ≤ 41) is shaded dark blue. The region corresponding to Pr(X > 41) is shaded light blue.

4. Use Table B. Table B tells us that Pr(Z ≤ 1) = 0.8413. Therefore, 84.13% of pregnancies last 41 weeks or fewer. Because the area under the curve sums to 1: (Area under the curve to the right) = 1 − (Area under the curve to the left). Therefore, Pr(X > 41) = 1 − Pr(X ≤ 41) = 1 − 0.8413 = 0.1587 and 15.87% of the pregnancies last more than 41 weeks.

images

FIGURE 7.13 Distribution of gestational length. About 84% of pregnancies are less than or equal to 41 weeks. About 16% are at least 41 weeks in length.

Exercises

7.4 Standard Normal probabilities. Use Table B to find the probabilities listed here. In each instance, sketch the Normal curve and shade the area under the curve associated with the probability.

(a) Pr(Z < −0.64)

(b) Pr(Z > −0.64)

(d) Pr(−0.64 < Z < 1.65)

7.5 Heights of 10-year-old boys. We’ve established that heights of 10-year-old boys vary according to a Normal distribution with μ =138 cm and σ = 7 cm.

(a) What proportion of this population is less than 150 cm tall?

(b) What proportion is less than 140 cm in height?

7.6 Heights of 20-year-olds. Heights of 20-year-old men vary approximately according to a Normal distribution with μ = 176.9 cm and σ = 7.1 cm.

(a) What percentage of U.S. men are at least 6 feet tall? (Six feet ≈ 183 cm.)

(b) Heights of 20-year-old women vary approximately according to a Normal with μ = 163.3 cm and σ = 6.5 cm. What percentage of U.S. women are at least 6 feet tall?

ILLUSTRATIVE EXAMPLE

Normal probability between points (Gestational length). Births that occur before 35 weeks of gestation are considered premature. Those more than 40 weeks are considered “postdate.” What proportion of pregnancies are neither premature nor postdate? Previous illustrative examples established that gestational lengths for uncomplicated pregnancies vary according to a Normal distribution with μ = 39 weeks and σ = 2 weeks.

Solution:

1. State. We propose to find Pr(35 ≤ X ≤ 40).

2. Standardize. The z-score for the lower boundary is z = (35 − 39)/2 = −2.00. The z-score for the upper boundary is z = (40 − 39)/2 = 0.50.

3. Sketch. Figure 7.14 shows the Normal sketch with landmarks indicated.

4. Use Table B. Based on Table B, Pr(Z ≤ −2.00) = 0.0228 and Pr(Z ≤ 0.50) = 0.6915. It follows that Pr(−2 ≤ Z ≤ 0.5) = 0.6915 − 0.0228 = 0.6687. About two thirds of gestations fall in this range.

images

FIGURE 7.14 Gestational length between 35 and 40 weeks.

squimg 7.3 Finding Values that Correspond to Normal Probabilities

Table B can also be used to find values that correspond to Normal probabilities. Here is a four-step procedure that can be used for this purpose:

1. State the problem.

2. Use Table B to look up the z-score for the given probability.

3. Sketch the distribution with associated landmarks.

4. Unstandardize the z-score using the formula x = μ + zσ. (We have merely rearranged inline to solve for x.)

Terminology and Notation

Step 2 of this process requires us to find the z-score associated with a stated probability. It is helpful to establish notation and terminology when discussing this part of the procedure. Let z_p denote a z-score with cumulative probability p. This z-score value is greater than p × 100% of the z-scores and is thus called a z-percentile. For example, z_0.90 is the 90th percentile of the Standard Normal distribution. To find this z-score, scan the entries in Table B for the cumulative probability closest to 0.9000. In this instance, the closest cumulative probability is 0.8997. This has an associated z-score of 1.28; therefore, z_0.90 = 1.28.

Exercises

7.7 45th percentile on a Standard Normal curve. What is the 45th percentile on a Standard Normal distribution? (Use Table B to look up z_0.45.)

7.8 64th percentile on a Normal z-curve. What is the 64th percentile on a Standard Normal distribution? What notation denotes this value?

7.9 Middle 50% of WAISs. Recall that the Wechsler Adult Intelligence Scale scores are calibrated to vary according to a Normal distribution with μ = 100 and σ = 15. What Wechsler scores cover the middle 50% of the population? In other words, identify the 25th percentile and 75th percentile of the population.

7.10 Top 10 and 1% of the WAIS. How high must a WAIS score be to be in the top 10% of scores? How high must it be to rank in the top 1%?

7.11 Death row inmate. An inmate on death row in the state of Illinois has a WAIS score of 51. What percentage of people have a score below this level?

ILLUSTRATIVE EXAMPLE

Finding values that correspond to Normal probabilities (Heights of women). What height does a woman have to be in order to be in the 90th percentile of heights? In other words, how tall does a women have to be to be taller than 90% of women?

Solution:

1. State the problem. We have established in a prior exercise that heights of women in the United States vary according to a Normal distribution with μ = 163.3 cm and σ = 6.5 cm (approximately so). Let X represent heights of U.S. women: X ~ N(163.3, 6.5). We want to find the value of x such that Pr(X ≤ x) = 0.9000.

2. Use Table B to scan for a cumulative probability that is closest to 0.9000. As noted, z_0.90 = 1.28.

3. Sketch. Figure 7.15 is a drawing for this problem. The value we are looking for is 1.28 standard deviations above mean.

4. Unstandardize. x = μ + z_0.90σ = 163.3 + (1.28)(6.5) = 171.62. Therefore, a woman that is 171.62 cm tall (about 5′7½″) is taller than 90% of women in the population.

FIGURE 7.15 90th percentile on X ~ N(163.3, 6.5) is 171.62.

7.4 Assessing Departures from Normality

It is important to establish that the random variable being assessed is approximately Normal before applying methods in this chapter. There are several ways to make this assessment.

First look at the shape of the distribution with a stemplot or histogram. If the distribution is asymmetrical or otherwise clearly departs from the typical bell-shape of a Normal curve, avoid application of the methods in this chapter.

We may also examine the distribution with a Normal probability (Q-Q) plot. The idea of a Q-Q plot is to graph observed values against expected Normal z-scores. If the distribution is approximately Normal, points on the Q-Q plot will form a diagonal line. Deviations from the diagonal indicate departures from Normality. Expected z-scores are derived by finding the percentile rank of each value and converting these to Normal z-percentiles. For example, the median of a data set has a percentile rank of 50, which converts to z_0.5 = 0.00. The 25th percentile of a data set (Q1) converts to z_0.25 = −0.67. The 75th percentile (Q3) converts to z_0.75 = 0.67 (and so on). More generally, we need a rule to interpolate percentiles for empirical distributions. One such rule is to rank data 1, 2, …, r and then use z_p as its z-percentile, where p = (r − 1⁄3)/(n + 1⁄3). For example, if a data set has 10 observations, the lowest ranking value has p = (1 − 1⁄3)/(10 + 1⁄3) = 0.0645 with an expected z-score of z_0.0645 = −1.51.

Here are examples of Q-Q plots:

• Figure 7.16 demonstrates an approximately Normal distribution. Points on the Q-Q plot adhere well to the diagonal line.

images

FIGURE 7.16 Histogram and Q-Q plot, approximately Normal data. Graph produced with SPSS for Windows, Rel. 11.0.1.2001. Chicago: SPSS Inc. Reprint Courtesy of International Business Machines Corporation.

• Figure 7.17 depicts a distribution with a pronounced negative skew. The Q-Q plot forms an upward curve.

images

FIGURE 7.17 Negative skew. Graph produced with SPSS for Windows, Rel. 11.0.1.2001. Chicago: SPSS Inc. Reprint Courtesy of International Business Machines Corporation.

• Figure 7.18 depicts a distribution with a positive skew. Positive skews form downward-curving Q-Q plots.

images

FIGURE 7.18 Positive skew, histogram, and Q-Q plot. Graph produced with SPSS for Windows, Rel. 11.0.1.2001. Chicago: SPSS Inc. Reprint Courtesy of International Business Machines Corporation.

• Figure 7.19 shows a leptokurtic distribution (i.e., a distribution with long skinny tails). This forms an S-shaped Q-Q plot.

images

FIGURE 7.19 Leptokurtic = high peak with skinny tails. Graph produced with SPSS for Windows, Rel. 11.0.1.2001. Chicago: SPSS Inc. Reprint Courtesy of International Business Machines Corporation.

A platykurtic distribution (broad fat tails) is not illustrated, but would form a reverse S on the Q-Q plot.

Figure 7.20 has reexpressed the skewed data initially presented in Figure 7.18 on a natural logarithmic scale. This distribution is approximately Normal, now permitting the use of Normal probability methods with these data.

images

FIGURE 7.20 Same data as in Figure 7.18 but with data re-expressed on a log scale. Graph produced with SPSS for Windows, Rel. 11.0.1.2001. Chicago: SPSS Inc. Reprint Courtesy of International Business Machines Corporation.

Summary Points (Normal Probability Distributions)

1. Normal probability distributions are a family of probability density functions (pdfs) characterized by their symmetry, bell shape, points of inflection at μ − σ and μ + σ, and horizontal asymptotes as they approach the X axis on either side.

2. Normal pdfs have two parameters: mean μ and standard deviation σ. Each member of the Normal family is distinguished by its μ and σ. μ determines the location of a Normal pdf and σ determines its spread.

3. The notation X ~ N(μ, σ) is read “random variable X is distributed as a Normal random variable with mean μ and standard deviation σ.”

4. The area under the curve (AUC) between any two points on a Normal curve corresponds to the probability of observing a value between these two points.

5. The 68–95–99.7 rule:

(a) 68% of the area under a Normal curve lies within μ ± σ.

(b) 95% of the area under a Normal curve lies within μ ± 2σ.

6. Many phenomena in nature are not Normally distributed but can be re-expressed on a different scale to approximate a Normal distribution.

7. To determine Normal probabilities for a given range of values for X ~ N(μ, σ):

(a) State the problem.

(b) Standardize: .

(d) Use Table B or a software application to determine the AUC.

8. To determine values that correspond to Normal probabilities:

(a) State the problem.

(b) Use Table B or a software application to look up the z_p value–associated desired probability.

(d) Unstandardize the value: x = μ + σz_p.

9. Major departures from Normality in data can often be detected by visually inspecting histograms and/or Q-Q plots.

Vocabulary

Area under the curve

Expected z-scores

Normal probability (Q-Q) plots

Points of inflection

Probability density function (pdf)

Standardize

Standard Normal table (z table)

z-percentile (z_p)

z-score

μ (mean)

σ (standard deviation)

68–95–99.7 rule

Review Questions

7.1 Fill in the blank: Normal distributions are centered on the value of _____________.

7.2 What is an inflection point?

7.3 Fill in the blank: Normal curves have inflection points that are one _____________ above and below μ.

7.4 Fill in the blanks: _____% of area under a Normal curve is within μ ± σ, _____% is within μ ± 2σ, and ____% is within μ ± 3σ.

7.5 The total area under a Normal curve sums to exactly _____________.

7.6 Normal pdfs have two parameters. Name them.

7.7 What parameter controls the location of the Normal curve?

7.8 What parameter controls the spread of the Normal curve?

7.9 Fill in the blank: The area under the curve between any two points on a Normal density curve corresponds to the _____________of values within that range.

7.10 Fill in the blank: The area under the curve to the left of a point on a Normal density corresponds to the _____________probability of that value.

7.11 How many different Normal distributions are there?

7.12 How many different Standard Normal distributions are there?

7.13 What does “X ~ N(μ, σ)” mean?

7.14 What is the mean of the Standard Normal distribution?

7.15 What is the standard deviation of the Standard Normal distribution?

7.16 Fill in the blank: The Standard Normal random variable is often referred to as a ____ variable. (Answer is a letter.)

7.17 Fill in these blanks: Z ~ N(____, ____)

7.18 Use the 68–95–99.7 rule to determine Pr(Z < −2). (Z table not required.)

7.19 Use the 68–95–99.7 rule to determine Pr(Z > −2). (Z table not required.)

7.20 In the notation z_p, what does the subscript p represent?

7.21 z_0.50 = ? (Z table not required; use your knowledge of Standard Normal curves.)

7.22 z_0.025 = ? (Z table not required.)

7.23 z_0.16 = ? (Z table not required.)

7.24 z_0.84 = ? (Z table not required.)

Exercises

7.12 Standard Normal proportions. Use Table B to find the proportion of a Standard Normal distribution that is:

(a) below −1.42

(b) above 1.42

(d) between 1.42 and 1.25

7.13 Alzheimer brains. The weight of brains from Alzheimer cadavers varies according to a Normal distribution with mean 1077 g and standard deviation 106 g.^l The weight of an Alzheimer-free brain averages 1250 g. What proportion of brains with Alzheimer disease will weigh more than 1250 g?

7.14 Coliform levels. Water samples from a particular site demonstrate a mean coliform level of 10 organisms per liter with standard deviation 2. Values vary according to a Normal distribution. What percentage of samples will contain more than 15 organisms?

7.15 Z-percentiles. Find the following z-percentiles:

(a) z_0.10

(b) z_0.35

(d) z_0.85

(e) z_0.999

7.16 Gestation (99th percentile). Recall that gestation in uncomplicated human pregnancies from conception to birth varies according to a Normal distribution (approximately so) with a mean of 39 weeks and standard deviation of 2 weeks (Figure 7.3). What is the 99th percentile on this distribution? That is, what gestational length is greater than or equal to 99% of the other normal gestations?

7.17 Gestation less than 32 weeks. Recall that uncomplicated human gestational length is approximately Normally distributed with μ = 39 weeks and σ = 2 weeks. What percentage of gestations are less than 32 weeks long?

7.18 Coliform levels (90th percentile). Exercise 7.14 addressed coliform levels in water samples from a particular site. The coliform levels were assumed to vary according to a Normal distribution with a mean of 10 organisms per liter and a standard deviation of 2 organisms per liter. What is the 90th percentile on this distribution?

7.19 A six-foot seven-inch tall man. Have you ever wondered why a man who is 6′ 7″ tall (79″) seems so much taller than a man who is 5′ 10″ (70″) even though he is only 13% taller in relative terms? [(79″ − 70″)/70″ = 0.13 = 13%.] Let us assume that male height is Normally distributed with μ = 70 inches and σ = 3 inches. What proportion of men are 5′ 10″ or taller? What proportion of men are 6′ 7″ or taller?

7.20 College entrance exams. The SAT and ACT are standardized tests for college admission in the United States. Both tests include components that measure reading comprehension. Suppose that SAT critical reading scores are Normally distributed with a mean of 510 and standard deviation of 115. In contrast, ACT reading scores are Normally distributed with a mean of 20.5 and standard deviation of 5. Sam takes the SAT reading test and scores 660. Dave takes the ACT test and scores 28. Who had the superior score, Sam or Dave?

7.21 |Z| ≥ 2.56. What proportion of Standard Normal Z-values are greater than 2.56? What proportion are less than −2.56? What proportion are either below −2.56 or above 2.56?

7.22 BMI. Body mass index (BMI) is equal to “weight in kilograms” divided by “height in meters squared.” A study by the National Center for Health Statistics suggested that women between the ages of 20 and 29 in the United States have a mean BMI of 26.8 with a standard deviation of 7.4. Let us assume that these BMIs are Normally distributed.

(a) A BMI of 30 or greater is classified as being overweight. What proportion of women in this age range are overweight according to this definition?

(b) A BMI less than 18.5 is considered to be underweight. What proportion of women are underweight?

7.23 MCATs. Suppose that scores on the biological sciences section of the Medical College Admissions Test (MCAT) are Normally distributed with a mean of 9.2 and standard deviation of 2.2. Successful applicants to become medical students had a mean score of 10.8 on this portion of the test. What percentage of applicants had a score of 10.8 or greater?

______________

^a This includes those that are up to 8.9999… years of age.

^b μ and σ are the parameters of the distribution.

^c WAIS “IQ” scores follow a bell curve, but it is not necessarily true that “intelligence” has a Normal distribution; what we call intelligence is not likely to be captured by a single test score.

^d It may also be defined as the period between the last menstrual period to delivery.

^e Mittendorf, R., Williams, M. A., Berkey, C. S., & Cotter, P. F. (1990). The length of uncomplicated human gestation. Obstetrics & Gynecology, 75(6), 929–932; Durham, J. (2002). Calculating due dates and the impact of mistaken estimates of gestational age. Retrieved February 2006 from http://transitiontoparenthood.com/ttp/birthed/duedatespaper.htm.

^f Elveback, L. R., Guillier, C. L., & Keating, F. R., Jr. (1970). Health, normality, and the ghost of Gauss. JAMA, 211(1), 69–75.

^g We will work on the natural log scale (base e) unless otherwise specified.

^h Sibley, P. E. C. (2001). Reference range analysis—Lessons from PSA. Retrieved December 10, 2005, from www.dpcweb.com/documents/news&views/tech_reports_pdfs/zb204-a.pdf5700.

ⁱ Low levels of PSA are of no health concern.

^j United States Growth Charts. Retrieved February 24, 2006, from www.cdc.gov/nchs/about/major/nhanes/growthcharts/zscore/zscore.htm. Values have been rounded and fit to a Normal distribution.

^k It makes no difference whether we state this probability as Pr(X ≤ 41) or Pr(X < 41) because Pr(X = 41) = 0. See Section 5.4.

^l Dusheiko, S. D. (1973). [The pathologic anatomy of Alzheimer’s disease]. Zhurnal nevropatologii i psikhiatrii imeni S.S. Korsakova, 73(7), 1047–1052.