Basic Biostatistics: Statistics for Public Health Practice, Second Edition

16	Inference About a Proportion

squimg 16.1 Proportions

This chapter considers the analysis of a categorical response derived from a single simple random sample. We consider only categorical variables with two possible responses in this chapter. This type of response is common in public health research, where many health outcomes are binary by nature (e.g., diseased/not diseased) and others are classified into groups based on a cutoff point (e.g., hypertensive/normotensive).

We take a simple random sample (SRS) of n individuals from a population in which p × 100% of the individuals are classified as successes. The proportion of successes in a sample, denoted inline (“p hat”), is:

where x represents the number of successes observed in the sample. Sample proportion inline is an unbiased estimator of population proportion p.

ILLUSTRATIVE EXAMPLE

Data (Prevalence of smoking in a community). A random-digit dialing technique is used to select 57 individuals from a community. Seventeen of 57 individuals in the sample are classified as smokers.

Therefore, the proportion of smokers in the sample inline . Assuming the sample is an SRS and the information is properly classified, this in an unbiased estimate of the prevalence of smoking in the population.

Notes

1. Counts and proportions. While quantitative data are often described with sums and averages, descriptive statistics for categorical outcomes are based on counts and proportions (Figure 16.1).

2. Incidence and prevalence. We are particularly interested in two specific types of proportions: prevalence proportions and incidence proportions. Prevalence proportions are the proportion of individuals in a population affected by a condition at particular time. Incidence proportions are the proportion of individuals at risk who go on to develop a condition over a specified period of time. Synonyms for incidence proportion include cumulative incidence and average risk.

3. Proportions are a type of average. Consider a binary outcome in which each success is coded “1” and each failure is coded “0.” Here are 10 such observations:

inline

Notice that n = 10, ∑x_i = 2, and inline . Also notice that inline . This shows the equivalency of inline and inline . A sample proportion is an average of zeros and ones.

4. Statistics and parameters. The proportion in the sample (denoted inline ) is a statistic. The proportion in the population (denoted p) is a parameter. Principles of statistical inference introduced in Chapters 8 through 10 apply. The population is sampled; statistics are calculated in the sample; sample statistics are then used to help infer population parameters (Figure 16.2). In previous chapters, knowledge of the sampling distribution of inline was used to help infer population mean μ. In this chapter, we will use knowledge about the sampling distribution of inline to help infer parameter p.

images

FIGURE 16.1 Summarizing data.

FIGURE 16.2

Exercises

16.1 AIDS-related risk factor. A national study of AIDS risk factors used a random-digit dialing technique to contact study participants. Among the 2673 heterosexual adults in the sample, 170 reported two or more sexual partners in the past 12 months.^a

(a) Describe the population to which inferences will be made. What population parameter will be estimated? Calculate the prevalence of multiple sexual partners in the sample.

(b) Practical problems of bias can pose a threat to the validity of a study of this type. What specific types of selection biases might evolve from the sampling method used by this study?

(c) This study relied on the truthfulness of responses. Although the investigators tried their best to make respondents feel comfortable, they could not be assured that the information was fully accurate. If an information bias existed in responses, hypothesize whether it would tend to overestimate or underestimate the prevalence of the risk factor.

16.2 Patient preference. An investigation uses two different methods of nonvolitional muscle function testing to study the effects of parenteral nutrition in debilitated patients. As part of the study, subjects were asked if they preferred method A or B muscle testing (in terms of comfort). Of the eight patients expressing a preference, seven preferred method A.^b

(a) To what population will inferences be made? What parameter will be estimated?

(b) Using intuition, do you believe the current evidence seven of eight is strong enough to conclude a preference for method A in the entire patient population? (Later in the chapter, we will address this question more formally.) Either an affirmative or a negative response is acceptable as long as you explain your reasoning.

squimg 16.2 The Sampling Distribution of a Proportion

Inferential methods in this chapter rest on using the binomial probability mass function to model the random number of successes in a given sample (Chapter 6). The Normal approximation to the binomial can be used when the sample is large (Section 8.3). Because these distributions were covered elsewhere, only a brief review is presented here.

Binomial random variables are based on counting the number of successes in n independent Bernoulli trials. The probability of success in each trial is assumed to be a constant p. The random number of successes (X) will vary according to a binomial distribution with parameters n and p: X ~ b(n,p). Random variable X has expected value μ = np and standard deviation inline , where q = 1 − p. Probabilities are calculated with this formula:

where inline , n represents the number of independent trials, and p represents the probability of success for each trial, and q = 1 − p.

Because calculation of binomial probabilities can be tedious, a Normal approximation to the binomial can be used in large samples. This will simplify calculations with only a minimal and inconsequential loss of accuracy. For our purposes, the sample is considered sufficiently large to use this Normal approximation when npq ≥ 5.

The Normal approximation to the binomial states that, in large samples, binomial random variable X varies according to a Normal distribution with μ = np and inline . This is equivalent to saying that sample proportion inline varies according to a Normal distribution with μ = p and standard deviation inline .

Figure 16.3 depicts a simulation of a sampling distribution of a proportion. Repeated samples of size n are taken from a population in which p × 100% of the individuals are classified as successes. Some of the samples produce inline s that are less than the actual value of p, and some produce inline s that are larger than p. The distribution of sample proportions is approximately Normal with μ = p and inline . This sampling distribution model will be used to help draw inferences about the unknown value of population parameter p.

Exercises

16.3 AIDS-related risk factor. Exercise 16.1 introduced a problem in which a random sample of 2673 adult heterosexuals were asked about presence of an HIV risk factor.

(a) Describe the sampling distribution of the number of individuals who are positive for this risk factor.

(b) Do you think a Normal approximation can be used to characterize the sampling distribution? Explain your reasoning.

16.4 Patient preference. For the problem about patient preference described in Exercise 16.2, characterize the sampling distribution of the number of patients preferring method A. Do you think a Normal approximation could be used to describe this sampling distribution?

images

FIGURE 16.3 Simulation of a sampling distribution of proportions showing superimposed Normal approximation.

16.3 Hypothesis Test, Normal Approximation Method

With large samples, a Normal approximation to the binomial distribution can be used to test a proportion for statistical significance. Again, we use the npq rule as a rough guide to determine whether the sample is large enough to use a Normal approximation (Section 8.3). This rule states that it is acceptable to use the Normal approximation to the binomial when npq ≥ 5, where q = 1 − p. Under these conditions, the sampling distribution of inline is approximately Normal with mean p and standard deviation inline , as depicted in Figure 16.4.

The testing procedure is based on using inline as a Standard Normal random variable. Here is step-by-step guide to the procedure.

A. Hypotheses. The null hypothesis is H₀: p = p₀, where p₀ represents the proportion under the null hypothesis. The value of p₀ comes from the research question itself, not from the data. The alternative hypothesis is either H_a: p > p₀ (one sided to the right), H_a: p < p₀ (one sided to the left), or H_a: p ≠ p₀ (two sided).

B. Test statistic. After confirming that a Normal approximation to the binomial can be used (by checking whether np₀q₀ ≥ 5), calculate:

images

FIGURE 16.4 Sampling distribution of a proportion, Normal approximation.

where inline represents the sample proportion, p₀ represents the value of p under the null hypothesis, and inline where q₀ = 1 − p₀.

Optional: A continuity-correction^c can be incorporated into the test statistics as follows:

This continuity correction is introduced because the P-valve produced by the continuity z method better approximates the probability that would be derived by an exact binomial procedure (Section 16.4) had one been pursued. This is especially true in small samples.

C. P-value. Convert the z-statistic to a P-value in the usual fashion (with either Table B or a software utility). An additional z table, Appendix Table F, is included in the back of the book to make it easier to look up two-tailed P-values directly from the |z_stat|. If a one-sided P-value is needed when using Table F, divide the table entry by 2.

D. Significance level. The difference is said to be statistically significant at the α-level of significance when P ≤ α, in which case H₀ is rejected. Keep in mind that failure to reject the null hypothesis should not be construed as its acceptance.

E. Conclusion. The test results are addressed in the context of the data and research question.

ILLUSTRATIVE EXAMPLE

z-test of a proportion. We propose to test whether the prevalence of smoking considered in the prior illustration is significantly higher than that of the United States as a whole. The prevalence of smoking in U.S. adults according to an NCHS report is 0.21.^d An SRS of 57 individuals from a particular community reveals 17 smokers. Thus, inline = 17/57 = 0.298. A two-sided test is used to determine whether this is significantly different than 0.21.

Solution:

A. Hypotheses. H₀: p = 0.21 against H_a: p ≠ 0.21

B. Test statistic. Before calculating the z_stat, we confirm that a Normal approximation to the binomial can be used. Notice that n = 57, p₀ = 0.21, so q₀ = 1 − 0.21 = 0.79. Therefore, np₀q₀ = 57 · 0.21 · 0.79 = 9.5, showing the sample to be large enough to apply the Normal approximation test.

(a)

(b)

C. P-value. Figure 16.5 depicts the sampling distribution of inline under the null hypothesis. Table F is used to convert the z_stat to P = 0.1031.

images

FIGURE 16.5 P-value, smoking prevalence illustrative example.

D. Significance level. The evidence against H₀ is not sufficient to warrant its rejection at α = 0.10.

E. Conclusion. We do not have enough evidence here to conclude that the prevalence of smoking in this community is any different than the nationally reported average (P = 0.103).

Use of the continuity-correction results in , P (two sided) = 0.1416. This does not materially change our conclusion.

Exercises

16.5 AIDS-related risk factor. Exercises 16.1 and 16.3 considered a survey in which 170 of 2673 individuals (6.4%) reported having two or more sexual partners in the prior 12 months. This study was completed in the early 1990s. Suppose an earlier study (completed in the 1970s) suggested that the prevalence of this attribute in the population was 7.5%. Is the observed proportion significantly different from the prior prevalence? Use a two-sided alternative hypothesis. Show all hypothesis-testing steps.

16.6 AIDS-related risk factor (Continuity-corrected z-statistic). Recalculate the P-value for Exercise 16.5 using the continuity-corrected z-statistic. Is the P-value from the continuity-corrected z-statistic larger or smaller than that from the non-corrected z-statistic?

16.4 Hypothesis Test, Exact Binomial Method

The z-test for proportions is accurate only in large samples. When working with small samples, an exact test based on binomial calculations is required. Here are the steps of Fisher’s exact test.

A. Hypotheses. The hypothesis statements are the same as used in the prior section (i.e., H₀: p = p₀).^e

B. Test statistic. The test statistic is the observed count of successes in the sample, denoted x.

C. P-value. The P-value is the probability of observing x successes or a value more extreme than x under the conditions set forth by the null hypothesis.

(a) For one-sided alterative hypotheses to the right, P = Pr(X ≥ x) = Pr(X = x) + Pr(X = x + 1) + ··· + Pr(X = n), assuming X ~ b(n, p₀).

(b) For one-sided alterative hypotheses to the left, P = Pr(X ≤ x) = Pr(X = x) + Pr(X = x − 1) + ··· + Pr(X = 0), assuming X ~ b(n, p₀).

(c) For two-sided alterative hypotheses, P = Pr(X ≤ x₁) + Pr(X ≥ x₂), where x₁ or x₂ is equal to x and the other x_i is a value as extreme in the opposite direction.^f

D. Significance level. The P-value is compared to various type I error thresholds (α) to gauge the strength of evidence against the null hypothesis.

E. Conclusion. The test results are addressed in the context of the data and research question.

ILLUSTRATIVE EXAMPLE

Exact binomial test (Fisher’s tea challenge). A memorable story in statistical lore has R. A. Fisher drinking tea with an heiress when the heiress claims she can discern by taste alone whether milk is added to the cup before or after the tea is poured.^g Fisher challenges the heiress to a test in which he gives her eight cups of tea in random order and tells her that four had the tea added first and the remaining four had the milk added first. She correctly identifies the order of adding the milk in six of the eight attempts. Can we say from this test that the heiress is doing better than random guessing?

A. Hypotheses. With random guessing, there is a 50/50 chance of guessing correctly for each trial in this experiment. Therefore, H₀: p = 0.5. With better-than-random guessing, the taster has better than a 50% chance of guessing correctly, H_a: p > 0.5.

B. Test statistic. Note that the npq rule derives 8 · 0.5 · 0.5 = 2, confirming the need for an exact binomial procedure. The test statistic for the exact test is observed number of successes x = 6.

C. P-value. P-value = Pr(X ≥ 6) while assuming X is a binomial random variable with n = 8 and p = 0.5. Therefore, we calculate:

• Pr(X = 6) = (₈C₆)(0.5)⁶(1 − 0.5)⁸⁻⁶ = (28)(0.0156)(0.25) = 0.1094

• Pr(X = 7) = (₈C₇)(0.5)⁷(1 − 0.5)⁸⁻⁷ = (8)(0.0078)(0.5) = 0.0313

• Pr(X = 8) = (₈C₈)(0.5)⁸(1 − 0.5)⁸⁻⁸ = (1)(0.0039)(1) = 0.0039

The (one-sided) P-value = Pr(X ≥ 6) = 0.1094 + 0.0313 + 0.0039 = 0.1446. Figure 16.6 depicts this graphically.

D. Significance level. The evidence against the null hypothesis is not significant at the α = 0.10 level.

E. Conclusion. Selecting six of the eight “milk-in-tea sequences” correctly under the stated conditions is not significantly better than mere “50/50” guessing (P = 0.1446).

images

FIGURE 16.6 Exact P-value (shaded), tea challenge illustrative example.

Notes

1. How many correct guesses would provide good evidence against H₀? In the tea challenge illustrative example, we may ask, “How many correct identifications would provide good evidence against guessing?” Because eight of eight would occur only 0.0039 of the time with random guessing, this would provide strong evidence against H₀. Seven of eight correct guesses would derive a one-sided P-value = Pr(X = 7) + Pr(X = 8) = 0.0313 + 0.0039 = 0.0352, which is also good evidence against H₀.

2. Mid-P correction. The discrete nature of the binomial distribution causes Fisher’s test to produce P-values that are a bit too large; a P-value from Fisher’s test will fail to reject H₀ more than α × 100% of the time. A better P-value can be achieved by including only half of Pr(X = x) into the calculation of the P-value. This method is called the Mid-P test. For the tea challenge illustrative example, the Mid-P procedure uses ½·Pr(X = 6) = ½·0.1094 = 0.0547 in its calculation of the P-value. Thus, P_Mid-P = ½·Pr(X = 6) + Pr(X = 7) + Pr(X = 8) = 0.0547 + 0.0313 + 0.0039 = 0.0899. Figure 16.7 depicts this graphically.

images

FIGURE 16.7 Mid-P P-value, tea challenge illustrative example.

3. Software utilities. In practice, exact tests are usually calculated with statistical packages and software utilities. Figure 16.8 is a screenshot from WinPepi’s Describe program (option A)^h with fields filled in for testing the tea-challenge data. Figure 16.9 is the output from the program. Results replicate those of our prior calculations (one-tailed Fisher’s P = 0.145 and one-tailed P_Mid-P = 0.090).

4. OK to use exact tests in large samples. Exact procedures are necessary when testing small samples and also may be used in large samples when computational software is available.

images

FIGURE 16.8 WinPepi’s data-entry screen for exact binomial test, tea challenge illustrative example. Abramson J.H. (2011). WINPEPI updated: computer programs for epidemiologists, and their teaching potential. Epidemiologic Perspectives & Innovations, 8(1), 1.

FIGURE 16.9 WinPepi’s output for tea challenge illustrative example. Abramson J.H. (2011). WINPEPI updated: computer programs for epidemiologists, and their teaching potential. Epidemiologic Perspectives & Innovations, 8(1), 1.

Exercises

16.7 Patient preference, Fisher’s method. Exercises 16.2 and 16.4 considered a problem in which seven of eight patients expressed a preference for medical procedure A compared to medical procedure B. A Normal approximation test was precluded because of the small sample size. Test the hypothesis of equal preference for medical procedure A and medical procedure B with Fisher’s procedure.

16.8 Patient preference, exact binomial test, Mid-P method. What is the exact Mid-P P-value for the problem in Exercise 16.7?

16.5 Confidence Interval for a Population Proportion

Plus-Four Method

We will use a plus-four method as our primary method for calculating confidence intervals for proportions. This method is based on the Wilson score test,ⁱ is simple to calculate, is more accurate than standard Normal approximation methods, and can be used in samples as small as n = 10.^j

We start by adding four imaginary observations to the data set before calculating the confidence interval; the imaginary sample size ñ = n + 4. Half of these imaginary observations go into the numerator of the proportion, so the imaginary number of success inline . The plus-four proportion inline . Now use inline as the center of the confidence interval,^k so (1−α)100% confidence interval for p is:

where inline . Use z₁₋₍_α_/2) = 1.645 for 90% confidence, z₁₋₍_α_/2) = 1.96 for 95% confidence, and z₁₋₍_α_/2) = 2.576 for 99% confidence.

After calculating the confidence interval for population proportion p, interpret the results in the context of the data and research question.

ILLUSTRATIVE EXAMPLE

Confidence interval for population proportion, plus-four method (Smoking prevalence). Recall the survey that found 17 smokers in an SRS on n = 57 in a particular community. What is the 95% confidence interval for the prevalence of smoking in this population?

Solution:

•

• The 95% confidence interval for p = 0.3115 ± (1.96)(0.0593) = 0.3115 ± 0.1162 = (0.1953 to 0.4277) or about (20% to 43%).

We can conclude with 95% confidence that the prevalence of smoking in this community is between 20% and 43%.

Note: Keep in mind that confidence intervals and P-values address random sampling errors only and do not protect us from systematic forms of errors as might arise from selection bias and information bias.

Exact Confidence Intervals

With very small samples (n < 10), the confidence interval for population proportion p should be based directly on the binomial distribution. These methods are analogous to the exact binomial tests presented in Section 16.4.^l Calculations can be based on the relation between the F-distribution and binomial distribution by using the following formulas for the lower confidence limit (LCL) and upper confidence limit (UCL)^m:

where df₁ = 2(n − x + 1) and df₂ = 2x and

where df₁′ = 2(x + 1) and df₂′ = 2(n − x).

A Mid-P adjustment analogous to the one reported in the prior section can be incorporated in the confidence interval.ⁿ We will use the WinPepi Describe.exe program to perform our calculations.

ILLUSTRATIVE EXAMPLE

Confidence interval for proportion, exact method (Tea-challenge). Recall the tea-tasting challenge illustration based on whether a person could predict whether milk was added to a cup before or after tea is poured. The taster proved right in six of eight trials. Based on this finding, determine the proportion that the tester can correctly identify in the long run.

Data are entered into WinPepi’s Describe program (option A), as shown in Figure 16.8. Figure 16.10 contains output from the program showing confidence intervals calculated at three levels of confidence (90%, 95%, and 99%) according to three different methods (exact Mid-P, exact Fisher’s, and Wilson’s). The formulas presented previously correspond to the output labeled “(Fisher’s).” The 95% confidence interval by this method is 0.349 to 0.968. The 95% confidence interval by the Mid-P method is 0.388 to 0.956, reflecting its less-conservative approach.

FIGURE 16.10 WinPepi output showing confidence intervals for p, “tea challenge” illustrative data. Abramson J.H. (2011). WINPEPI updated: computer programs for epidemiologists, and their teaching potential. Epidemiologic Perspectives & Innovations, 8(1), 1.

Exercises

16.9 AIDS-related risk factor. Use the plus-four method to calculate a 95% confidence interval for the prevalence of multiple sexual partners in adult heterosexuals using the information in Exercise 16.1. Recall that 170 of the 2673 subjects reported this behavior.

16.10 AIDS-related risk factor, 90% confidence interval. Calculate a 90% confidence interval for the problem presented in Exercise 16.9.

16.11 Patient preference. Exercise 16.2 stated that seven of eight patients expressed a preference for a particular medical procedure. Use an exact procedure to calculate a 95% confidence interval for p.

16.6 Sample Size and Power

Concepts and methods needed to determine the sample size requirements for estimating and testing means were introduced in Sections 9.6, 10.3, and 11.7. Similar methods apply when determining the sample size requirements for estimating and testing proportions. We will approach this problem from three interrelated angles. We ask:

• What is the sample size needed to estimate p with a given margin of error?

• What is the sample size needed to test a proportion for significance at a stated α-level with given power?

• What is the power of a significance test of a proportion given a stated sample size and α-level?

Sample Size Requirements for Estimating p with Margin of Error m

Define margin of error m as half the confidence interval width. For large samples,

where q = 1 − p and z is the value of a Standard Normal variable with cumulative probabililty inline . Because the value of m depends on p, we must make an educated guess of its value before calculating the sample size requirement. Let p* represent this educated guess. Rearranging the formula inline to solve for n derives the following sample size requirement:

where q*= 1 − p*. Round results up to the next integer to make sure the margin of error is no greater than m.

ILLUSTRATIVE EXAMPLE

Sample-size requirement, confidence interval. The smoking prevalence example in this chapter (n = 57, x = 17) estimated a population proportion of 0.30 with margin of error (assuming 95% confidence) equal to ±0.12. How large a sample would be needed to derive an estimate with a margin of error of ±0.05?

Solution: . Use a sample of 323 observations.

How large a sample would be needed to shrink this margin of error to ±0.03?

Solution: . Use a sample of 897 observations.

When no educated guess of p is available, use p* = 0.50 to provide a “worst-case scenario” estimate. This will ensure that enough data are collected to achieve no less than the required margin of error. Figure 16.11 shows the relation between the sample size requirements to achieve a margin of error of 0.05 at various assumed values of p*, demonstrating that sample size requirements are maximum for p = 0.5.

FIGURE 16.11 Sample size requirements for a study to estimate a proportion with margin of error 0.05. The required sample size has its maximum when p = 0.5.

Sample Size Requirement for Testing a Proportion

In testing a proportion from a single sample, let p₀ represent the population proportion under the null hypothesis and p₁ represent the population proportion under the alternative hypothesis. To test H₀: p = p₀ versus H_a: p = p₁ with 1 − β power at a two-sided α-level of significance, use a sample of size:

Round results derived from this formula up to the next integer to ensure the required power.

ILLUSTRATIVE EXAMPLE

Sample-size requirement, hypothesis test. A prior illustrative example in this chapter had us conduct a two-sided test of H₀: p = 0.21 using a sample of n = 57. How large a sample is needed to detect a proportion of 0.31 in the population (i.e., p₁ = 0.31) at α = 0.05 (two sided) with 90% power?

Solution:

The sample should include 194 observations.

Power

The sample-size formula is rearranged to determine the power of a test of H₀: p = p₀ as follows:

where p₀ is the value of the population proportion under the null hypothesis, p₁ is the value of the population proportion under the alternative hypothesis, n is the sample size, α is the significance level of the test, and Φ(z) represents the cumulative probability of a Standard Normal random variable.^o For a one-sided alternative hypothesis, replace z₁₋₍_α_/2) with z₁₋_α in this formula.

ILLUSTRATIVE EXAMPLE

Power. A test of H₀: p = 0.21 using a sample of n = 57 was not significant (P = 0.103). Suppose the true prevalence in the population p₁ was 0.31. What was the power of the test assuming a two-sided α-level of 0.05?

Solution: inline

inline

(from Table B). The power of the test under the stated conditions was about 46%.

Summary Points (Inference About a Proportion)

1. This chapter addresses the analysis of a binary categorical response.

2. Descriptive statistics are limited to counts and proportions.

3. The proportion of individuals at risk who develop an outcome over a specified period of time is the incidence proportion or, more informally, “risk.” The proportion of individuals with a particular condition at a given point in time is the prevalence proportion.

4. The parameter of interest is population proportion p. When data are based on an SRS of size n, population proportion p is synonymous with binomial parameter p (Chapter 6).

5. Sample proportion inline is the point estimator of parameter p.

6. A (1 − α)100% confidence interval for inline , where, inline , inline . This method is referred to as the “plus-four method” and is reliable in samples as small as n = 10. Smaller samples require exact binomial procedures.

7. A test of H₀: p = p₀ (where the value of p₀ is derived from the research question) is conducted with inline when the sample is large. Small samples require exact binomial procedures.

8. Sample size requirements and power

(a) To derive a confidence interval with a margin of error m, the sample size requirement is inline , where p* represents an educated guess for p.

(b) To conduct a two-sided test of H₀: p = p₀ with 1 − β power at a given α-level, the sample size requirement is.

inline

Vocabulary

Cumulative incidence

Exact test

Fisher’s exact test

Incidence proportions (cumulative incidence or average risk)

Mid-P test

Normal approximation to the binomial

Plus-four method

Prevalence proportions

Proportion (the sample proportion is denoted inline ; the population proportion is denoted p)

Risk

Sampling distribution of a proportion

Review Questions

16.1 What symbol denotes the sample proportion?

16.2 What symbol denotes the population proportion. It also denotes one of the binomial parameters?

16.3 What is the complement of p?

16.4 Fill in the blank: While quantitative data are described with sums and averages, categorical data are described with counts and ______.

16.5 What term refers to the proportion of individuals at risk of developing a disease who develop the condition over a specified period of time?

16.6 What term refers to the proportion of individuals who have a condition at a particular point in time?

16.7 Fill in the blank: A proportion is an ______ of ones and zeros, where “one” represents presence or the condition and 0 represents its absence.

16.8 Fill in the blanks: The random number of successes in n independent Bernoulli trials has a binomial distribution with parameters n and p. When the sample is large, the random number of successes can also be described by a ______ distribution with μ = ______ and σ = ______.

16.9 What does the symbol p₀ represent in the expression H₀: p = p₀?

16.10 Where does the value of p₀ come from when stating H₀?

16.11 Exact inferential procedures for counts and proportions are based on what probability mass function?

16.12 Select the best response: The plus-four method of calculating a confidence interval for p is based on ______ score method.

(a) Student’s

(b) Fisher’s

16.13 The plus-four confidence interval for p adds ______ (a number) imaginary successes to the numerator of the proportion estimate and ______ (a number) imaginary successes to the denominator to derive .

16.14 Fill in the blanks: For Normal-based methods, the margin of error m is equal to approximately ______ the confidence interval width.

16.15 In determining the sample size requirements for estimating p, we specify p* as an educated guess for population proportion p. When no such educated guess is available, we let p* = 0.5. What is the justification for doing this?

16.16 What factors determine the sample size requirements for estimating population proportion p?

16.17 What is the distinction between selection bias and information bias?

16.18 True or false? Confidence intervals compensate for selection bias.

16.19 True or false? Confidence intervals compensate for random sampling error.

16.20 True or false? P-values address information bias.

16.21 True or false? P-values address random sampling error.

Exercises

16.12 Drove when drinking alcohol. The Youth Risk Behavior Surveillance survey for 2005 estimated that, within a 30-day period, 10% of the adolescent population had driven or ridden in a car or other vehicle when the driver had been drinking alcohol.^p

(a) The overall response rate was about 67%. How might this bias the results of the survey?

(b) A separate validation study documented fair to good repeatability of responses when the questionnaire was administered on separate occasions.^q This does not guarantee validity (responses can be repeatedly inaccurate) but does provide some reassurance. Provide examples of things we need to consider when thinking about the accuracy of responses.

16.13 Cerebral tumors and cell phone use. In a case-controlled study on cerebral tumors and cell phone use, tumors occurred more frequently on the same side of the head where cellular telephones had been used in 26 of 41 cases.^r Test the hypothesis that there is an equal distribution of contralaterial and ipsilateral tumors in the population. Use a two-sided alternative. Show all hypothesis-testing steps.

16.14 BRCA1 mutations in familial breast cancer cases. Of 169 women having breast cancer and a familial risk factor, 27 had an inherited BRCA1 mutation.^s Based on this information, estimate the prevalence of BRCA1 mutation in women with familial breast cancer. Include a 95% confidence interval for the prevalence.

16.15 Insulation workers. Twenty-six cancer deaths were observed in a cohort of 556 insulation workers. Based on national statistics, a cohort of this size and age distribution was expected to experience 14.4 incident cases during the observation period. Therefore, under the null hypothesis, p = p₀ = 14.4/556 = 0.02590. Test whether the observed incidence is significantly greater than expected. Show all hypothesis-testing steps.

16.16 AIDS-related risk factor. In a study of AIDS-related risk factors, 5 of 2673 heterosexual respondents reported a history of receiving a blood transfusion or having a sexual partner from a high-risk group.^t Assume this is an SRS of U.S. heterosexuals. Provide a 95% confidence interval for the prevalence of this combined risk factor.

16.17 Kidney cancer survival. An oncologist treats 40 kidney cancer cases. Sixteen of the cases survive at least 5 years. Historically, one in five cases were expected to survive this long. Test whether there has been a significant improvement in survival.

16.18 Leukemia gender preference. A simple random sample of 262 leukemia cases consisted of 150 males and 112 females. Does this provide evidence of a gender preference for the disease? (Observed proportion, male inline = 150/(150 + 112) = 0.5725. Test H₀: p = 0.5.)

16.19 Sample-size requirement. You are planning a study that wants to estimate a population proportion with 95% confidence. How many individuals do you need to study to achieve a margin of error of no greater than 6%? A reasonable estimate for the population proportions is not available before the study, so assume p* = 0.50 to ensure adequate precision.

16.20 Sample-size requirement. As in the prior exercise, you are planning a study that intends to estimate a population proportion with 95% confidence. How many individuals do you need if you intend to cut your margin of error in half (i.e., m = 0.03)? Why is the sample size requirement for this exercise four times that of Exercise 16.19?

16.21 Alternative medicine. According to an April 29, 1998, New York Times article, a nationwide telephone survey conducted for a managed alternative care company found that of 1500 adults interviewed, 660 said they would use alternative medicine if traditional medical care failed to produce the desired results.^u Calculate a 95% confidence interval for the population proportion. (Assume data represent an SRS of a defined population.)

16.22 Perinatal growth failure. Failure to grow normally during the first year of life (perinatal growth failure) was observed in 33 of 249 very-low birth-weight babies.^v Calculate a 90% confidence interval for p based on this information. Assume data were derived by an SRS.

16.23 Perinatal growth failure. Among the 33 perinatal growth failure cases discussed in the prior exercise, eight (24.2%) had very-low intelligence test scores when tested at 8 years of age. In normal birth-weight babies, we’d expect 2.5% of the population to exhibit this trait. Perform a one-sided exact binomial test to address whether the observation is statistically significant. Show all hypothesis-testing steps.

16.24 Binge drinking in U.S. colleges. Alcohol abuse is a serious problem on college campuses. A nationwide survey of students at 4-year colleges found that 3314 of the 17,096 student respondents met the criterion for being a “frequent binge drinker” (five or more drinks in a row three or more times in the past 2-week period).^w Assume data represent an SRS of 4-year colleges.

(a) Calculate the observed prevalence of frequent binge drinking.

(b) Calculate a 95% confidence interval for p.

16.25 Incidence of improvement. Of 75 patients in a clinical study, 20 showed spontaneous improvement within a month. Calculate the 1-month incidence proportion of improvement. Include a 95% confidence interval for the proportion. Assume that the data represent an SRS of a defined clinical population.

16.26 Sample size requirements. How large a sample is needed to estimate the incidence of female breast cancer in a population with 95% confidence and a margin of error that is no greater than 1% (0.01)? Assume that the expected incidence proportion in the population is 3% (0.03). How large a sample would be needed if you were willing to settle for 90% confidence? What was the effect of decreasing the required level of confidence?

16.27 Familial history of breast cancer, sample size requirements. Suppose we want to test the hypothesis that women with a family history of breast cancer are at a higher risk of developing breast cancer than women who do not have this family history. Let us assume that 3% (0.03) of women who do not have a family history of breast cancer will develop the disease in their 60s. We take a simple random sample of women entering their 60s who have a family history of breast cancer.

(a) How large a sample would be needed to detect a different breast cancer risk in our study population if women with a family history of the disease actually had a risk of 5% (0.05) compared to the 3% expected in other women? Let α = 0.05 (two-sided) while seeking a statistical power of 90% (0.90).

(b) How large a sample would be needed if the true incidence among women with a familial history is actually 6%? Again, let α = 0.05 (two-sided) and seek 90% power.

(c) Based on your findings in part (a) and (b) of this exercise, characterize the relationship between the expected differences under the null and alterative hypotheses and the sample size requirements of the study.

(d) Replicate the conditions expressed in part (a) of this exercise, except for insisting on an α-level of 0.01 (two-sided) for your test. How many individuals must now be studied?

(e) Characterize the relationship between the required α-level and the sample size requirement.

16.28 Power. Determine the power of a test of H₀: p = 0.10 when:

(a) p is actually 0.20, α = 0.05 (two-sided), and n = 25.

(b) p is actually 0.15, α = 0.01 (two-sided), and n = 100.

16.29 Freshman binge drinking. A survey of drinking in college students found 1802 binge drinkers among 5266 U.S. freshman completing a survey.^x

(a) The prevalence of binge drinking in this sample is inline or 34.2%. Assume that the data represent an SRS of U.S. freshman. Also assume that the responders provided accurate information. Using this information, determine the prevalence of binge drinking in U.S. freshman with 95% confidence.

(b) Calculate the 99% confidence interval for the prevalence. Interpret the result.

(c) A study from the late 1990s found that 20% of freshman engaged in binge drinking. Assume that 20% is an accurate and reliable estimate for the prevalence of binge drinking in the 1990s. Do the current data provide reliable evidence of an increase in the prevalence of binge drinking since the 1990s?

______________

^a Catania, J. A., Coates, T. J., Stall, R., Turner, H., Peterson, J., Hearst, N., et al. (1992). Prevalence of AIDS-related risk factors and condom use in the United States. Science, 258(5085), 1101–1106.

^b Brooks, S. D., Gerstman, B. B., Sucher, K. P., & Kearns, P. J. (1998). The reliability of muscle function analysis using different methods of stimulation. Journal of Parenteral and Enteral Nutrition, 22(5), 331–334.

^c The continuity correction adjusts for the fit of the smooth Normal curve to the chunky binomial function. The continuity-correct statistic produces larger P-values than the regular test statistic.

^d National Center for Health Statistics. (2006). Early Release of Selected Estimates Based on Data from the January–September 2005 National Health Interview Survey. Retrieved November 2006 from www.cdc.gov/nchs/data/nhis/earlyrelease/200603_08.pdf.

^e The hypotheses can also be stated in terms of expected number of successes, that is, H₀: μ = np₀.

^f Some statisticians merely double the one-sided P-value for a two-sided test. For justification, see Dupont, W. D. (1986). Sensitivity of Fisher’s exact test to minor perturbations in 2 × 2 contingency tables. Statistics in Medicine, 5(6), 629–635.

^g Even this simple problem raises interesting statistical questions such as “How many cups should be tested? Should the cups be paired? In what order should the cups be presented? What should be done about chance variation in temperature, sweetness, and so on?” What conclusions could be drawn from a perfect score? Source: Box, J. F. (1978). R. A. Fisher, the Life of a Scientist. New York: John Wiley.

^h Abramson, J. H. (2004). WINPEPI (PEPI-for-Windows): Computer programs for epidemiologists. Epidemiologic Perspectives & Innovations, 1(1), 6.

ⁱ Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209–212. Also see page 14 in Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. (2nd ed.). New York: John Wiley & Sons.

^j Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126. Agresti, A., & Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54(4), 280–288. Brown, L. D., Cair, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2), 101–117.

^k When n is large, inline .

^l Clopper, C. J., & Pearson, E. A. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404–413.

^m Rothman, K. J., & Boyce, J. D. J. (1979). Epidemiologic Analysis with a Programmable Calculator. Washington, D.C.: U.S. Government Printing Office. p. 31.

ⁿ Berry, G., & Armitage, P. (1995). Mid-P confidence intervals: A brief review. The Statistician, 44(4), 417–423.

^o Use Table B or a statistical utility to look up this cumulative probability.

^p Eaton, D. K., Kann, L., Kinchen, S., Ross, J., Hawkins, J., Harris, W. A., et al. (2006). Youth risk behavior surveillance—United States, 2005. MMWR Surveillance Summaries, 55(5), 1–108. The actual survey used a multistage sampling method. Reported results here have been simplified through “reverse engineering” to accommodate a design effect of about 2.

^q Brener, N. D., Kann, L., McManus, T., Kinchen, S. A., Sundberg, E. C., & Ross, J. G. (2002). Reliability of the 1999 youth risk behavior survey questionnaire. Journal of Adolescent Health, 31(4), 336–342.

^r Muscat, J. E., Malkin, M. G., Thompson, S., Shore, R. E., Stellman, S. D., McRee, D., et al. (2000). Handheld cellular telephone use and risk of brain cancer. JAMA, 284(23), 3001–3007.

^s Couch, F. J., DeShano, M. L., Blackwood, M. A., Calzone, K., Stopfer, J., Campeau, L., et al. (1997). BRCA1 mutations in women attending clinics that evaluate the risk of breast cancer. New England Journal of Medicine, 336(20), 1409–1415.

^t Catania, J. A., Coates, T. J., Stall, R., Turner, H., Peterson, J., Hearst, N., et al. (1992). Prevalence of AIDS-related risk factors and condom use in the United States. Science, 258(5085), 1101–1106.

^u Brody, J. E. (1998, April 28). Alternative medicine makes inroads, but watch out for curves. New York Times.

^v Hack, M., Breslau, N., Weissman, B., Aram, D., Klein, N., & Borawski, E. (1991). Effect of very-low birth-weight and subnormal head size on cognitive abilities at school age. New England Journal of Medicine, 325(4), 231–237.

^w Wechsler, H., Davenport, A., Dowdall, G., Moeykens, B., & Castillo, S. (1994). Health and behavioral consequences of binge drinking in college. A national survey of students at 140 campuses. JAMA, 272(21), 1672–1677.

^x Stahlbrandt, H., Andersson, C., Johnsson, K. O., Tollison, S. J., Berglund, M., & Larimer, M. E. (2008). Cross-cultural patterns in college student drinking and its consequences—A comparison between the USA and Sweden. Alcohol and Alcoholism, 43(6), 698–705.