11

Inference About a Mean

squimg 11.1 Estimated Standard Error of the Mean

The prior three chapters relied on z-procedures to help infer population means. z-procedures, in turn, rely on knowing the value of the population standard deviation (σ) before data are collected. This chapter introduces methods that do away with this often unrealistic condition.

With z-procedures, population standard deviation σ is used to calculate the standard error of the mean using the formula inline. In this chapter, we replace population standard deviation σ with sample standard deviation s to derive the standard error of mean:

image

Both inline are referred to as standard errors, even though they have slightly different formulas.a

Exercises

11.1    Blood pressure. A study of 35 individuals found mean systolic blood pressure inline with sample standard deviation s = 10.3 mmHg.

(a)   Calculate the standard error of the mean based on this information.

(b)   How many individuals would you need to study to decrease the standard error of the mean to 1 mmHg? [Hint: Rearrange the standard error formula to solve for n.]

11.2    Published report. An article published in the American Journal of Public Health that reported the relation between tall stature and cardiovascular disease mortality included a table with the column heading “Mean Height, cm. (SE).” One entry in the table based on n = 1243 individuals was “173.2 (0.2).”b From this information, determine the standard deviation of height in this group. [Hint: Rearrange the standard error formula to solve for s.]

squimg 11.2 Student’s t-Distributions

Using inline to estimate the standard error of the mean tacks on an additional element of uncertainty to inferential procedures. To accommodate this additional uncertainty, we use a t-distribution instead of a Standard Normal z-distribution when making inferences. t-distributions were introduced by William Sealy Gosset (1876–1937) in 1908 writing under the pseudonym “Student,” and are hence referred to as Student’s t-distributions.c

t-distributions resemble the Standard Normal distribution. They are bell shaped and centered on 0. However, t-distributions have more area in their tails than the Standard Normal z-distribution. These broader tails accommodate the additional uncertainty that comes from estimating σ with s.

The t-distributions are a family of distributions with family members sharing common characteristics. Each member of t is identified by its degree of freedom (df). Figure 11.1 displays t-distributions with 1, 9, and ∞ degrees of freedom. As the number of degrees of freedom increases, t-distributions become increasingly like a Standard Normal distribution. A t-distribution with ∞ degrees of freedom is a Standard Normal z-distribution. t-distributions with (say) 60 or more degrees of freedom are nearly indistinguishable from the Standard Normal (z) distribution. Therefore, with large samples, it really doesn’t matter much if you use a t-procedure or z-procedure.

Table C in the appendix of this book lists landmarks (critical values) on t-distributions. Each row in this table refers to a t-distribution with a particular df. Each column corresponds to a cumulative probability on the distribution. Entries in the table are values for these t-percentiles.

images

FIGURE 11.1 t probability density functions with 1, 9, and ∞ degrees of freedom.

Figure 11.2 illustrates how Table C is used. We focus on the row for 9 degrees of freedom and column for a cumulative probability of 0.975. The table entry at the intersection of these points is 2.262, indicating that the 97.5th percentile on a t-distribution with 9 degrees of freedom (t9,0.975) is equal to 2.262. A visual depiction of this is shown in Figure 11.2.

Notation: Let tdf,p denote a t critical value with df degrees of freedom and cumulative probability p. Figure 11.2 highlights t9,0.975 = 2.262. Notice that this t-value has a right-tail probability of 0.025.

images

FIGURE 11.2 Table C and t9,0.975 = 2.262, illustrative example.

ILLUSTRATIVE EXAMPLE

Understanding Table C. Suppose we want to identify the values of a t-random variable with 9 degrees of freedom that captures the middle 80% of values on t9. Start by sketching the curve, which is bell shaped with a mean of 0 and inflection points approximately one standard deviation above and below the mean.d Because the curve is symmetrical, we look for the critical values in Table C that cut off the bottom 10% and top 10% of the curve. These are the 10th percentile and 90th percentile of the t9 distribution. Table C lets us know that t9,0.90 = 1.383. The symmetrical point on the left-hand aspect of the curve corresponding to t9,0.10 = −1.383.e Figure 11.3 depicts these relationships graphically.

images

FIGURE 11.3 The 10th and 90th percentiles on tg.

Exercises

11.3    Sketch a curve. Use Table C to determine the values on a t22 distribution that captures the middle 95% of the area under the curve. Sketch the curve showing the t-values on the horizontal axis and associated tail areas.

11.4    t-percentiles. Use Table C to determine the value of a t-random variable with 19 degrees of freedom and a cumulative probability of 0.95. In addition, determine the value of t19,0.05.

11.5    Probabilities not in Table C. There are times you may need to determine a probability for a t-random variable that does not appear in Table C. For example, you may need to determine the probability a t-random variable with 8 degrees of freedom is greater than 2.65. Even though this value is not in Table C, you can still get a good idea of its probability by bracketing it between two landmarks in the table. In this case, 2.65 is bracketed between t8,0.975 (2.306) and t8,0.99 (2.896). Therefore, it has a cumulative probability that is a little bigger than 0.975 and a little smaller than 0.99. What is the probability of t8 > 2.65? In other words, what is the probability of observing a t random variable with 8 df that is greater than 2.65?

11.6    Upper tail. Determine the probability that a t-random variable with 8 df is greater than 2.98.

11.7    Software utility programs. The Internet has many free applets that can determine exact probabilities for t-random variables.f You can also use WinPepig WhatIs.exe or Microsoft Excel’s TDIST function for this purpose. Use one of these software applications to find:

(a)   Pr(T8 ≥ 2.65)

(b)   Pr(T8 ≥ 2.98)

(c)   Pr(T19 ≤ 2.98)

squimg 11.3 One-Sample t-Test

When σ is known, inline has a Standard Normal (z) distribution. When σ is not known, inline has a t-distribution with (n − 1) degrees of freedom. This is the basis of the one-sample t-test. Here are the steps of the procedure:

A.  Hypothesis statements. The null hypothesis is H0: μ = μ0, where μ0 represents the mean under the null hypothesis. The alternative hypothesis is either Ha: μ > μ0 (one sided to the right), Ha: μ < μ0 (one sided to the left), or Ha: μμ0 (two sided).

B.  Test statistic. inline where inline. This t-statistic has df = n − 1.h

C.  P-value. Use Table C or a software utility to convert the tstat to a P-value. For one-sided alternatives, P = Pr(t ≥ |tstat|). For two-sided alternatives, P = 2 × Pr(t ≥ |tstat|). Recall that small P-values provide good evidence against H0 (Section 9.3).

D.  Significance level. The test is said to be significant at the α-level when Pα (Section 9.4).

E.  Conclusion. The test results are interpreted in the context of the data and research question.

As was the case with one-sample z-tests, the one-sample t-test assumes that the data were generated by an SRS and that the sampling distribution of inline is Normal.

ILLUSTRATIVE EXAMPLE

One-sample t-test (SIDS birth weights). We want to know whether birth weights of full-term infants who ultimately died of SIDS is significantly different from that of other full-term births. A sample of n = 10 SIDS cases demonstrated the following birth weights:

image

Based on this information, we calculate inline. By comparison, the mean weight of other full-term births in the region during this period was 3300 g. We test whether the birth weight in this sample is significantly different from a population mean of 3300 g. A two-sided test is demonstrated.

A.  Hypotheses. H0: μ = 3300 g versus Ha: μ ≠ 3300 g.

B.  Test statistic. The estimated standard error of the mean

image

C.  P-value. The two-sided P-value is twice the area under the curve to the right of |−1.80| on a t-distribution with 9 df. Because there is no entry for 1.80 in Table C, we bracket the tstat between 1.383 (right tail = 0.10) and 1.833 (right tail = 0.05). Therefore, the one-sided P-value is between 0.10 and 0.05 and the two-sided P-value is between 0.20 and 0.10. Using a software utility, we determine P = 0.105. Figure 11.4 depicts the test statistic and associated P-value regions for the problem.

images

FIGURE 11.4 Two-tailed P-value, SIDS illustrative example.

D.  Significance level. The observed difference is not significant at α = 0.10. Results fall just short of “marginal significance.”

E.  Conclusion. The mean birth weight in this sample of 10 SIDS infants (inline) was not significantly different from that of the general population of infants (μ = 3300 g) at conventional levels of α (P = 0.105).

Exercises

11.8    P-value from tstat. A test of H0: μ = 0 based on n = 16 calculates tstat = 2.44.

(a)   Determine the degrees of freedom for the test statistic.

(b)   Provide the t-values from the Table C that bracket the tstat.

(c)   What is the approximate one-sided P-value for the problem?

(d)   What is the two-sided P-value?

11.9    Critical values for a t-statistic. The term critical value is often used to refer to the value of a test statistic that determines statistical significance at some fixed α level for a test. For example, ±1.96 are the critical values for a two-tailed z-test at α = 0.05. In performing a t-test based on 21 observations, what are the critical values for a one-tailed test when α = 0.05? That is, what values of the tstat will give a one-sided P-value that is less than or equal to 0.05? What are the critical values for a two-tailed test at α = 0.05?

11.10  BMI. Body mass index inline. An adult BMI of 24 is considered desirable.i A study of 12 adults that used a one-sample t-test to address H0: μ = 24 reported a tstat of 2.16.

(a)   What is the one-sided P-value for this test?

(b)   What is the two-sided P-value?

11.11  Menstrual cycle length. Menstrual cycle lengths (days) in an SRS of nine women are as follows: {31, 28, 26, 24, 29, 33, 25, 26, 28}. Use this data to test whether mean menstrual cycle length differs significantly from a lunar month. (A lunar month is 29.5 days.) Assume that population values vary according to a Normal distribution. Use a two-sided alternative. Show all hypothesis-testing steps.

squimg 11.4 Confidence Interval for μ

The t confidence interval formula is similar to the z confidence interval formula (Section 10.2) except that it uses s in place of σ and tn−1,1−α/2 in place of z1−α/2. in place of z1−σ/2. A (1 − α)100% confidence interval for μ is provided by:

image

where inline is the sample mean, tn−1,1−α/2 is the value of a t random variable with n − 1 df and a cumulative probability of 1−(α/2),j and inline.

ILLUSTRATIVE EXAMPLE

Confidence intervals for μ (SIDS). Let us return to the birth weight illustrative data concerning 10 SIDS cases. We have established that inline and s = 720.0. Confidence intervals for μ at 90%, 95%, and 99% levels of confidence will be calculated.

•   The standard error of the mean inline.

•   df = 10 − 1 = 9.

•   For 90% confidence, use t9,0.95 = 1.833 (Table C), so the 90% confidence interval for μ is inline.

•   For 95% confidence, use t9,0.975 = 2.262, so the 95% confidence interval for μ is 2890.5 ± (2.262)(227.684) = 2890.5 ± 515.0 = (2375.5 to 3405.5) g.

•   For 99% confidence, use t9,0.995 = 3.250, so the 99% confidence interval for μ is 2890.5 ± (3.250)(227.7) = 2890.5 ± 740.0 = (2150.5 to 3630.5) g.

Keep in mind that the confidence interval is used to help infer population mean μ, not sample mean inline.

Exercises

11.12  t-values for confidence. What is the value of tn−1,1−α/2 when calculating a 95% confidence interval for μ based on n = 28? What is tn−1,1−α/2 for 90% confidence?

11.13  Menstrual cycle length. Exercise 11.11 calculated the mean length of menstrual cycles in an SRS of n = 9 women. The data revealed inline days with standard deviation s = 2.906 days.

(a)   Calculate a 95% confidence interval for the mean menstrual cycle length.

(b)   Based on the confidence interval you just calculated, is the mean menstrual cycle length significantly different from 28.5 days at α = 0.05 (two sided)? Is it significantly different from μ = 30 days at the same α-level? Explain your reasoning. (Section 10.4 considered the relationship between confidence intervals and significance tests. The same rules apply here.)

squimg 11.5 Paired Samples

Data

With paired samples, each data point in one sample is matched to a unique point in a second sample. Here are examples of studies that employ paired samples:

•   Pretest/posttest studies in which a factor is measured before and after an intervention in the same set of individuals.

•   Cross-over trials, in which subjects start on one treatment and then switch to a different treatment.

•   Pair-matches, in which subjects in one sample are matched to subjects in a separate sample based on specific criteria.

Here is an illustrative example of a cross-over trial.

ILLUSTRATIVE EXAMPLE

Data (Oat bran and LDL cholesterol). A cross-over trial sought to learn whether oat bran cereal lowered low-density lipoprotein (LDL) cholesterol in hypercholesterolemic men. Twelve subjects completed the study. Half were randomly assigned a diet that included oat bran cereal. The other half were assigned a diet that included corn flakes. Dietary interventions were applied for 2 weeks, after which LDL levels (mmol/L) were recorded. Subjects were then crossed-over to the alternative diet for 2 weeks. LDL was once again measured. Table 11.1 lists data from the study.

TABLE 11.1 “Oat bran” illustrative data. LDL cholesterol (mmol/L) on the corn flake diet (CORNFLK), oat bran diet (OATBRAN), and their difference (DELTA).

tab

Data from Anderson, J. W., Spencer, D. B., Hamilton, C. C., Smith, S. F., Tietyen, J., Bryant, C. A., et al. (1990). Oat-bran cereal lowers serum total and LDL cholesterol in hypercholesterolemic men. American Journal of Clinical Nutrition, 52(3), 495–499. Data are stored in the file oatbran.sav.

Paired samples are analyzed by creating a new variable to hold within pair differences. Call this new variable DELTA. Table 11.1 lists DELTA values created by subtracting each oat bran value from each corn flake value for each individual. Positive DELTA values reflect lower LDL on the oat bran diet.k

Exploration and Description

It is often a good idea to start the analysis by plotting the data. A stemplot of the DELTA values for the oat bran illustrative data looks like this:

image

This shows that DELTA values range from −0.4 to 0.8 mmol/L. Ten of the 12 subjects (83%) lowered their LDL on oat bran. There are no apparent outliers.

We will attach the subscript d to descriptive statistics to denote application to the DELTA variable. The illustrative data show a mean decline of inline mmol/L with standard deviation sd = 0.4335 (nd = 12) while on oat bran.

Hypothesis Test

When samples are paired, t-procedures introduced earlier in the chapter are applied toward difference variable DELTA. Here’s the procedure for testing a mean difference:

A.  Hypotheses. The population mean difference is denoted μd. In testing “no mean difference,” the null hypothesis is H0: μd = 0. The alternative hypothesis is one of the following: Ha: μd > 0 (one sided to the right), Ha: μd < 0 (one sided to the left), or Ha: μd ≠ 0 (two sided). In practice, most tests are two sided.

B.  Test statistic. The test statistic is inline. This test statistic has df = nd − 1, where nd represents the number of paired observations.

C.  P-value. The tstat is converted to a P-value with Table C or a software utility. Small P-values provide good evidence against H0 (Section 9.3).

D.  Significance level. The difference is said to be statistically significant at the α-level of significance when Pα. By convention, P-values less than 0.10 are said to be marginally significant, those less than 0.05 are said to be significant, and those less than 0.01 are said to be highly significant (Section 9.4).

E.  Conclusion. The test results are interpreted in the context of the data and research question.

ILLUSTRATIVE EXAMPLE

Paired t-test (Oat bran). We test the oat bran illustrative data for significance. We have already established that nd = 12, inline, and sd = 0.4335.

A.  Hypotheses. H0: μ = 0 versus Ha: μd ≠ 0

B.  Test statistic. inline and inline with df = nd − 1 = 12 − 1 = 11.

C.  P-value. Figure 11.5 illustrates the sampling distribution of the test statistic under the null hypothesis. Under this hypothesis, both the tstat and inline have an expected value of 0. The observed mean difference of 0.3808 is 3.04 standard errors above 0. The one-tailed P-value is between 0.005 and 0.01 (Table C). The two-tailed P-value is twice this: 0.01 < P < 0.02. Statistical software derives P = 0.011 (two tailed). This provides good evidence against the null hypothesis.

images

FIGURE 11.5 Two-tailed P-value, oat bran illustrative example. Sampling distribution under the null hypothesis.

D.  Significance. The results are statistically significant at α = 0.05 (reject H0) but not quite at α = 0.01 (retain H0).

E.  Conclusion. LDL levels on the oat bran diet were significantly lower than that on the cornflake diet (P = 0.011).

Confidence Interval for μd

A (1 − α)100% confidence interval for μd is provided by:

image

where inline.

ILLUSTRATIVE EXAMPLE

Confidence intervals for μd (Oat bran). 90%, 95%, and 99% confidence intervals for the oat bran illustrative data set are calculated. We have already established inline, and df = 12 − 1 = 11.

•   For 90% confidence, use t11,0.95 = 1.796 (Table C); the 90% confidence interval for population mean difference μd is inline inline.

•   For 95% confidence, use t11,0.975 = 2.201 (Table C); the 95% confidence interval for population mean difference μd is 0.3808 ± (2.201)(0.1251) = 0.3808 ± 0.2753 = (0.1055 to 0.6561) mmol/L.

•   For 99% confidence, use t11,0.995 = 3.106; the 99% confidence interval for μd is 0.3808 ± (3.106)(0.1251) = 0.3808 ± 0.3886 = (−0.0078 to 0.7694) mmol/L.

Exercises

11.14  Placebo effect in Parkinson’s disease patients. The placebo effect occurs when a patient experiences a perceived benefit after receiving an inert substance. To help understand the mechanism behind this phenomenon in Parkinson’s disease patients, investigators measured striatal RAC binding at a key point in the brains in six subjects. RAC binding was reduced by an average of 0.326 units on a placebo in the six subjects (sd = 0.181).l Test this difference for statistical significance.

11.15  Water fluoridation. A study looked at the number of cavity-free children per 100 in 16 North American cities BEFORE and AFTER public water fluoridation projects. Table 11.2 lists the data.

(a)   Calculate delta values for each city. Then construct a stemplot of these differences. Interpret your plot.

(b)   What percentage of cities showed an improvement in their cavity-free rate?

(c)   Estimate the mean change with 95% confidence.

TABLE 11.2 Cavity-free children per 100 in 16 North American cities before and after public water fluoridation projects.

AFTER

BEFORE

49.2

18.2

30.0

21.9

16.0

  5.2

47.8

20.4

  3.4

  2.8

16.8

21.0

10.7

11.3

  5.7

  6.1

23.0

25.0

17.0

13.0

79.0

76.0

66.0

59.0

46.8

25.6

84.9

50.4

65.2

41.2

52.0

21.0

Source unknown. Data stored online in FLUORIDE.SAV.

squimg 11.6 Conditions for Inference

The t-procedures in this chapter rely on the following underlying conditions:

•   Data are derived by an SRS of individual or paired observations.

•   Measurements of the response are valid.

•   The sampling distribution of the mean or mean difference is Normal.

Numerous studies have shown that t-procedures are robustm against the Normality condition, especially when a two-sided alternative hypothesis is used and the sample is large. This can be traced in part to the central limit theorem (Section 8.2). Rough guidelines for how large a sample needs to be to compensate for non-Normality in the population (Section 9.5) are:

•   When the population is Normal, you can use t-procedures on samples of any size.

•   When the population is mound shaped and symmetrical, you can use t-procedures on samples as small as 5 to 10.

•   When the population is skewed, t-procedures should be reserved for large samples (roughly 30 to 100 observations, depending on the severity of the skew).n

ILLUSTRATIVE EXAMPLE

Can a t-procedure be used? Figure 11.6 displays stemplots for three data sets. Which of these data sets can support t-procedures?

•   Stemplot A has a positive skew and outlier. It has only six observations. In this situation, t-procedures should be avoided.

•   Stemplot B has n = 25 observations, is mound shaped, and has a modest negative skew. There are no outliers. It is okay to use the t-procedures on these data.

•   Stemplot C is highly skewed with a high outlier. There are only 13 observations; it would be imprudent to use t-procedures on these data.

images

FIGURE 11.6 “Can a t-procedure be used?” illustrative examples.

squimg 11.7 Sample Size and Power

The sample size requirements of a study can be approached from a confidence interval or hypothesis testing perspective. Let us start by considering the sample requirements for confidence intervals.

Sample Size for a Confidence Interval

To limit the margin of error of a (1 − α)100% confidence interval for μ (or μd) to m, the sample size should be no less than:

image

where σ is the population standard deviation,o z1−(α/2) is the Standard Normal deviate for (1 − α)100% confidence, and m is the desired margin of error. Results from this formula should be rounded up to the next integer to achieve the stated level of precision.

This formula is accurate when n ≥ 30 because t30+, 1–(α/2)z1−(α/2). When n < 30, apply adjustment factor f = (df + 3)/(df + 1) to compensate for the difference between z and t.p

ILLUSTRATIVE EXAMPLE

Sample size, confidence interval. The oat bran illustrative example calculated a 95% confidence interval for μ that had a margin of error of 0.2753 mmol/L. How large a study is needed to achieve a margin of error 0.2 with 95% confidence? We will use the sample standard deviation from the study as our estimate of σ.

Solution: inline. Round this up to the next integer, so n = 19. Because this number is less than 30, apply adjustment factor f = (df + 3)/(df + 1) = (18 + 3)/(18 + 1) = 1.105 to the final result. Therefore, use n = 1.105 × 19 = 21.

Sample Size for a Hypothesis Test

The sample size required to test H0: μ = μ0 against Ha: μ = μa depends on the desired power of the test (1 − β), desired level of significance (α), size of the mean difference worth detecting (Δ = μ0μa), and standard deviation of the response variable (σ). To achieve these conditions use:

image

For one-sided tests, use z1−α in place of z1−(α/2) in the formula. Results from this equation should be rounded up to the next integer to achieve the stated level of power. For paired t-tests, use σDELTA in place of σ. Apply adjustment factor f = (df + 3)/(df + 1) when n ≤ 30 to compensate for the difference between z and t.

ILLUSTRATIVE EXAMPLE

Sample size requirement, one-sample t-test (SIDS). How large a sample is needed to test the SIDS data presented earlier in this chapter with 90% power at α = 0.05 two sided? We want to detect a mean difference in birth weight (Δ) of 300 g (about 2/3 pound). Let us use sample standard deviation s (720.0) as a reasonable estimate of σ.

Solution: inline. Round up to 61 to ensure adequate power. Because the sample size exceeds 30, there is no need to apply adjustment factor f.

ILLUSTRATIVE EXAMPLE

Sample size requirement, paired t-test (Oat bran). How large a sample is needed to test the oat bran data presented earlier in the chapter with 80% power at α = 0.05 two sided? We want to detect a mean change of 0.2 mmol/L and will assume a standard deviation of 0.4 mmol/L.

Solution: inline. Round up to 32 to ensure adequate power. Because the sample size exceeds 30, there is no need to apply adjustment factor f.

Power

The method to determine the power of the hypothesis initially presented in Section 9.6 applies with minor modification. The power of the test is approximately:

image

where Φ(z) is the cumulative probability of a Standard Normal random variable (Table B), α is the desired significance level, Δ is the difference worth detecting, and σ is the standard deviation of the response variable.

ILLUSTRATIVE EXAMPLE

Power of t-test (SIDS). What is the probability a study with n = 10 will detect a mean difference of 300 g in birth weight in the SIDS population compared to the general population? Let us assume σ = 720 and use a two-sided α-level of 0.05.

Solution: inline. The power of this test is about 26%.

Summary Points (Inference about a Mean)

1.   Data are a quantitative response variable derived by a single SRS or match-pair sample.

2.   Begin the analysis by exploring and describing the data with graphical techniques (e.g., stemplot and boxplot) and summary statistics (e.g., mean, standard deviation, and sample size).

3.   In most practical situations, population standard deviation σ is not known. This invalidates z-procedures and requires the use of Student t-procedures instead.

(a)   t-probability density functions (pdfs) look like a Standard Normal “z” curve except for the fact that they have slightly (almost imperceptibly) broader tails.

(b)   t-distributions are a family of pdfs distinguished by their degrees of freedom (df). As the df increases, the distribution becomes more and more Normal.

(c)   A t-distribution with infinite degrees of freedom is a Standard Normal z-distribution.

4.   One-sample and paired-sample t-tests

(a)   H0: μ = μ0, where μ0 represents the population mean or paired mean difference under the null hypothesis. The alternative hypothesis may be stated in a two-sided (Ha: μμ0) or one-sided way (Ha: μ < μ0 or Ha: μ > μ0).

(b)   inline with df = n − 1.

(c)   Use Table C or a computer applet to convert the tstat to a P-value. The one-sided P-value is the area under the curve to the right of the |tstat|. The two-sided P-value is twice this amount.

(d)   Consider the level of statistical significance.

(e)   Formulate a conclusion in the context of the data and research question.

5.   A (1 − α) 100% confidence interval for μ is given by inline.

(a)   Keep in mind that the confidence interval seeks to capture μ, not

(b)   The interval has (1 − α)100% chance of capturing μ and an α chance of not capturing μ.

(c)   The margin of error of the confidence interval is given by the “± value” in the formula.

(d)   Narrow confidence intervals indicate that the sample mean as an estimate of the population mean is precise.

6.   Inferential methods for matched-pair data are the same as that for single-sample data except that inferences are directed against the differences variable DELTA.

7.   t-procedures require the following conditions:

(a)   SRSs or a reasonable approximation thereof.

(b)   Source population is Normal or the sample is large. t-procedures are known to be robust when the sample is large because of the central limit theorem.

(c)   The measurements are valid.

8.   The power and sample size methods for z-procedures introduced in the prior chapter can be used in this chapter after applying an adjustment factor to compensate for the differences between z and t.

Vocabulary

Critical value

Degrees of freedom (df)

DELTA

One-sample t-test

Paired samples

Standard error of the mean

Student’s t-distributions

t-distribution (t-probability density function)

t-percentiles

t-statistic

Review Questions

11.1    When do you use a t-procedure instead of a z-procedure to help infer a mean?

11.2    Describe the shape, location, and spread of t-distributions.

11.3    How many different t-distributions are there?

11.4    The mean of a t-distribution is equal to _______.

11.5    How do t-distributions differ from Standard Normal z-distributions?

11.6    Select the best response: The total area under a t-curve is equal to

(a)   −1

(b)   0

(c)   1

11.7    Select the best response: In the notation tdf,p, the subscript df represents the

(a)   degrees of freedom for the t-distribution.

(b)   probability of t.

(c)   cumulative probability of t (AUC to the left of the t-value).

11.8    Select the best response: In the notation tdf,p, the subscript p represents the

(a)   degrees of freedom for the t-distribution.

(b)   probability of t.

(c)   cumulative probability of t (AUC to the left of the t-value).

11.9    Determine the value of t8,0.50 without the aid of a t-table.

11.10  t9,0.90 = 1.383; therefore, t9,0.10 = ? (t-table not required)

11.11  A t-distribution with 60 or more degrees of freedom is very nearly a _________ distribution.

11.12  The standard error of the mean is equal to the sample standard deviation divided by the square root of _______.

11.13  Select the best response: In the statement H0: μ = μ0, μ0 represents the value of the population mean when the null hypothesis is ______.

(a)   true

(b)   false

(c)   either true or false

11.14  Select the best response: With matched paired sample, the null hypothesis is most often H0: μ = ______.

(a)   −1

(b)   0

(c)   1

11.15  A one-sample t-procedure with 35 observations has this many degrees of freedom.

11.16  The P-value for a two-sided t-test is equal to

(a)   the area under the curve to the right of the t-statistic.

(b)   twice the area under the curve to the right of the t-statistic.

(c)   twice the area under the curve to the right of the absolute value of the t-statistic.

11.17  Select the best response: P-values assume that

(a)   the null hypothesis is true.

(b)   the null hypothesis is false.

(c)   the null hypothesis is neither true nor false.

11.18  Select the best response: A t-test derives a P-value of 0.06. The P-value represents the probability that

(a)   the null hypothesis is true.

(b)   the null hypothesis is false.

(c)   we would see the data or data that are more extreme assuming the null hypothesis is true.

11.19  Select the best response: A 95% confidence for μ is used to infer the value of the

(a)   sample mean.

(b)   population mean.

(c)   population standard deviation.

11.20  Select the best response: A 95% confidence interval for a mean is −0.91 to 1.36. From this we can infer with 95% confidence that the

(a)   population mean is 0.

(b)   population mean is greater than 0.

(c)   population mean is greater than −0.91.

11.21  Select the best response: A 95% confidence interval for a mean is 0.86 to 1.66. From this we can infer with 95% confidence that the

(a)   population mean is 1.

(b)   population mean is greater than 1.

(c)   neither of the above

11.22  Select the best response: A 95% confidence interval for a mean is 0.91 to 1.36. From this we can infer with 95% confidence that the

(a)   population mean is 0.

(b)   population mean is greater than 0.

(c)   population mean is less than 0.

11.23  Select the best response: A 95% confidence interval for a mean is 0.91 to 1.36. From this we can infer with 95% confidence that the population mean is

(a)   not more than 0.91.

(b)   not less than 0.91.

(c)   not less than 1.36.

11.24  Select the best response. Paired samples can be achieved via

(a)   pretest/posttest samples.

(b)   matching closely on extraneous factors when sampling.

(c)   both “a” and “b.”

11.25  Select the best response: Paired t-procedures focus on the data in the

(a)   first sample in the pair.

(b)   second sample in the pair.

(c)   differences between the first and second samples in the pair.

11.26  Select the best response: These conditions are needed for valid t-procedures:

(a)   SRS of individual or paired difference (or reasonable approximation thereof)

(b)   Normality of the sampling distribution of the mean

(c)   both “a” and “b”

11.27  Select the best response: The sampling distribution of a mean will be approximately Normal even when the population is not exactly Normal as long as the sample is

(a)   representative.

(b)   large.

(c)   small.

11.28  List the determinants of the sample size requirements for estimating μ with a margin of error m.

11.29  List the determinants of the sample size requirements when testing a mean at a given α level.

11.30  List the determinants of the power of a t-test.

Exercises

11.16  t-percentiles. Use Table C to determine the following values of t-values:

(a)   t24,0.975

(b)   t674,0.99 (Suggestion: Because there is no row for df = 674, use the row with df = 100 to derive a conservative estimate for the t critical value).

(c)   t24,0.05

11.17  Large t-statistic. A t-test calculates tstat = 6.60. Assuming the study had more than just a few observations, you do not need a t table or software utility to draw a conclusion about the test. What is this conclusion, and why is a look-up table unnecessary?

11.18  Sketch and shade. In testing H0: μ = 0, you find inline and s = 1.497 based on n = 50. Calculate the tstat for the test. Sketch a t-curve as accurately as possible. Then place this tstat on your curve. Without using Table C, do you think these results would be surprising if H0 were true?

11.19  Vector control in an African village. A study of vector control in an African village found that the mean sprayable surface area was 249 square feet with standard deviation 39.82 square feet in a simple random sample of n = 100 homes.

(a)   Calculate a 95% confidence interval for μ.

(b)   Would it be correct to say that 95% of all the homes in the village have sprayable surfaces between the lower confidence limit and upper confidence limit? Explain.

11.20  Calcium in sound teeth. The calcium content values in a sample n = 5 sound teeth (% calcium) are {33.4, 36.2, 34.8, 35.2, 35.5}. Provide a 99% confidence interval for μ. (Assume the data represent an SRS of healthy adult teeth.)

11.21  Boy height. An SRS of n = 26 boys between the ages of 13 and 14 has a mean height of 63.8 inches with a standard deviation 3.1 inches. Calculate a 95% confidence interval for the mean height of the population.

11.22  Body weight, high school girls. Body weights expressed as a percentage of ideal in an SRS of n = 9 girls selected at random are as follows {114, 100, 104, 94, 114, 105, 103, 105, 96}.

(a)   Plot the data as a stemplot. (Use an axis multiplier of 10 and split stem values.) Are there any outliers or major departures from Normality in these data?

(b)   Calculate a 95% confidence for population mean μ. Show all work.

(c)   What is the margin of error of your confidence interval?

(d)   How large a sample would be needed to reduce this margin of error to three?

11.23  Faux pas. Eight junior high school students were taken to a shopping mall. The number of socially inappropriate behaviors (faux pas) by each student was counted. The students were then enrolled in a program designed to promote social skills. After completing the programs, the subjects were again taken to the shopping mall and the number of social faux pas was again counted. Table 11.3 lists data from this experiment.

TABLE 11.3 Data for Exercise 11.23. Number of faux pas before and after an intervention.

OBS.

VISIT1

VISIT2

1

5

4

2

13

11

3

17

12

4

3

3

5

20

14

6

18

14

7

8

10

8

15

9

Data are fictitious. Data file = FAUXPAS.SAV.

(a)   Calculate the change in the number of faux pas within individuals (i.e., calculate DELTA for each observation).

(b)   Calculate the means and standard deviations for visit 1, visit 2, and their differences (DELTAS).

(c)   Create a stemplot of the differences (DELTAS). Interpret your plot.

(d)   Would you use t-procedures on these data?

(e)   Test the mean decline for statistical significance. Use a two-sided test. Show all hypothesis-testing steps.

11.24  Power. A researcher fails to find a significant difference in mean blood pressure in 36 matched pairs. The standard deviation of the differences was 5 mmHg. What was the power of the test to find a mean difference of 2.5 mmHg at α = 0.05 (two sided)?

11.25  Beware α = 0.05. Two trials looked at red wine consumption in lowering cholesterol levels in hypercholesterolemic men. In each trial, 25 men consumed 8 ounces of red wine for 14 days.

(a)   In trial A, the 25 subjects lowered their cholesterol by an average of 5% (standard deviation = 11.9%). In testing H0: μ = 0, tstat = 2.10 with 24 df. Is this study statistically significant at α = 0.05 (two sided)?

(b)   In trial B, 25 different subjects lowered their cholesterol by 5% with standard deviation 12.2% (tstat = 2.05 with 24 df). Is this result statistically significant at α = 0.05?

(c)   Is it reasonable to come to different conclusions for trial A and trial B?

11.26  Benign prostatic hyperplasia, quality of life. Benign prostatic hyperplasia is a noncancerous enlargement of the prostate gland that adversely affects the quality of life of millions of men. A study of a minimally invasive procedure for the treatment for this condition looked at pretreatment quality of life (QOL_BASE) and quality of life after 3 months on treatment (QOL_3MO). Table 11.4 lists data for 10 subjects chosen at random from this study.

TABLE 11.4 Data for Exercises 11.26 and 11.27. Variables are as follows:

QOL_BASE = quality of life at baseline (coded 0 = Delighted, 1 = Pleased, 2 = Mostly Satisfied, 3 = Mixed, 4 = Mostly Dissatisfied, 5 = Unhappy, 6 = Terrible)
QOL_3MO = Quality of life after 3 months of treatment (same codes)
MAXFLO_B = maximum urine flow at baseline (urine flow measurement scale misplaced)
MAXFLO3M = maximum urine flow after 3 months of treatment

tab

Source: Simple random sample of a data set provided by student Joanne Morales. Data are stored online in BPH-SAMP.SAV.

(a)   Calculate differences in quality of life scores (DELTA) for each subject.

(b)   Explore the differences with a stemplot. Discuss your exploration.

(c)   Calculate the mean and standard deviation of the difference. Then test the mean difference for statistical significance. Use a two-sided alternative hypothesis.q

11.27  Benign prostatic hyperplasia, maximum flow. Table 11.4 also contains data for maximum urine flow at baseline (MAXFLO_B) and maximum urine flow after 3 months of treatment (MAXFLO3M). Test the mean difference in this outcome for statistical significance.

11.28  NASA experiment. A NASA study compared two methods of determining white blood cell counts in laboratory animals. Table 11.5 lists results for 42 paired observations. Calculate DELTA values for each observation and plot these differences as a stemplot. Based on this plot, do you think the methods are interchangeable?

TABLE 11.5 Data for Exercise 11.27. White blood cells counts (×1000 dL) by Celdyne method and Unopett method, n = 42.

tab

Source: Data from student Adam Seddiqi. Data stored online in the file SEDDIQ.SAV.

11.29  Therapeutic touch.r Proponents of an alternative medical treatment known as therapeutic touch claim that each person has a human energy field (HEF) that can be perceived and manipulated by touch. Therapists trained to recognize HEF-related perceptions are said to be particularly adept at manipulating HEFs. In an experiment that started out as a fourth-grade science fair project, therapeutic touch practitioners were tested under blind conditions to see whether they could correctly identify whether the HEF of an unseen hand hovered over their left or right hand (Figure 11.7). Fifteen therapeutic touch therapists underwent an initial set of 10 trials each. If HEF perception through therapeutic touch was possible, the therapists should have each been able to detect the experimenter’s hand in 10 (100%) of 10 trials. Chance alone would produce a mean score of 5 (of 10). However, the n = 15 touch therapists correctly identified the location of the hand an average of 4.67 times (standard deviation 1.74). Calculate a 95% confidence interval for the mean number of correct identification of the HEF. Is the confidence interval compatible with random guessing?

images

FIGURE 11.7 Experimenter (right) hovers hand over one of the therapeutic touch practitioner’s hands (left). The towel in the picture blinds the observation, preventing the therapeutic touch practitioner from seeing the location of the experimenter’s hand. Drawing by Pat Linse. Published with the permission of the artist and Skeptic magazine.

11.30  Therapeutic touch, n = 28. This exercise is an extension of Exercise 11.29. We add 8 observations to the initial 20, bringing the total sample size to 28. Each observation consists of 10 attempts to identify a human energy field, as previously discussed (see Figure 11.7). The number of correct identifications out of 10 was {1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 7, 7, 7, 8}.

(a)   Plot the data as a stemplot. Are there any clear departures from Normality? Can you use t-procedures on these data? Explain your reasoning.

(b)   Provide a 95% confidence interval for the mean number of correct identifications.

______________

a Perhaps it would be more accurate to call inline the estimated standard error, but this distinction often gets lost in practice.

b Langenberg, C., Shipley, M. J., Batty, G. D., & Marmot, M. G. (2005). Adult socioeconomic position and the association between height and coronary heart disease mortality: Findings from 33 years of follow-up in the Whitehall Study. American Journal of Public Health, 95(4), 628–632.

cStudent. (1908). The probable error of a mean. Biometrika, VI, 1–25.

d The standard deviation of a t-distribution is actually a little more than one, but this won’t be detectable in a sketch. The standard deviation of a t-distribution is inline. For example, a t-distribution with 10 df has inline.

e Because t tables do not have negative t-values, you must use your knowledge about the symmetry of the curve to determine lower percentile points.

f See, for example, www.stat.tamu.edu/applets/tdemo.html.

g Abramson, J. H. (2004). WINPEPI (PEPI-for-Windows): Computer programs for epidemiologists. Epidemiologic Perspectives & Innovations, 1(1), 6.

h You lose the 1 degree of freedom in using s as an estimate for σ.

i CDC. (2006). About BMI for Adults. Retrieved on July 15, 2006, from www.cdc.gov/NCCdphp/dnpa/bmi/adult_BMI/about_adult_BMI.htm.

j Table C list confidence levels for t random variables in its bottom row.

k When creating the DELTA variable, it makes no difference which sample is subtracted from which. You must, however, be consistent and keep track of the direction of differences.

l de la Fuente-Fernandez, R., Ruth, T. J., Sossi, V., Schulzer, M., Calne, D. B., & Stoessl, A. J. (2001). Expectation and dopamine release: Mechanism of the placebo effect in Parkinson’s disease. Science, 293(5532), 1164–1166.

m This refers to the fact that the results remain substantially true even when the condition is not perfectly met.

n Skewed distributions may be mathematically transformed to Normalize the distribution (Sections 7.1 and 7.4).

o You may need to estimate σ with a value of s from published sources or a pilot investigation.

p Lachin, J. M. (1981). Introduction to sample size determination and power analysis for clinical trials. Controlled Clinical Trials, 2(2), 93–113.

q Because it may have been imprudent to use a t-procedure in this instance, I redid the analysis with a nonparametric (Wilcoxon signed rank) test that does not require Normality and P = 0.014, deriving a similar conclusion as the t-procedure.

r Rosa, L., Rosa, E., Sarner, L., & Barrett, S. (1998). A close look at therapeutic touch. JAMA, 279(13), 1005–1010.