12.5 Statistical Testing

Learning Objectives

After Chapter 12.5, you will be able to:

Hypothesis testing and confidence intervals allow us to draw conclusions about populations based on our sample data. Both are interpreted in the context of probabilities, and what we deem to be an acceptable risk of error.

Hypothesis Testing

Hypothesis testing begins with an idea about what may be different between two populations. We have a null hypothesis, which is always a hypothesis of equivalence. In other words, the null hypothesis says that two populations are equal, or that a single population can be described by a parameter equal to a given value. The alternative hypothesis may be nondirectional (that the populations are not equal) or directional (for example, that the mean of population A is greater than the mean of population B).

The most common hypothesis tests are z- or t-tests, which rely on the standard distribution or the closely related t-distribution. From the data collected, a test statistic is calculated and compared to a table to determine the likelihood that that statistic was obtained by random chance (under the assumption that our null hypothesis is true). This is our p-value. We then compare our p-value to a significance level (α); 0.05 is commonly used. If the p-value is greater than α, then we fail to reject the null hypothesis, which means that there is not a statistically significant difference between the two populations. If the p-value is less than α, then we reject the null hypothesis and state that there is a statistically significant difference between the two groups. Again, when the null hypothesis is rejected, we state that our results are statistically significant.

The value of α is the level of risk that we are willing to accept for incorrectly rejecting the null hypothesis. This is also called a type I error. In other words, a type I error is the likelihood that we report a difference between two populations when one does not actually exist. A type II error occurs when we incorrectly fail to reject the null hypothesis. In other words, a type II error is the likelihood that we report no difference between two populations when one actually exists. The probability of a type II error is sometimes symbolized by β. The probability of correctly rejecting a false null hypothesis (reporting a difference between two populations when one actually exists) is referred to as power, and is equal to 1 − β. Finally, the probability of correctly failing to reject a true null hypothesis (reporting no difference between two populations when one does not exist) is referred to as confidence. These conditions are summarized in Table 12.1.

Table 12.1. Results of Hypothesis Testing
Truth About the Population
H0 true (no difference) Ha true (difference exists)
Conclusion Based on Sample Reject H0 Type I error (α) Power (1 − β)
Fail to reject H0 Confidence Type II error (β)

Confidence Intervals

Confidence intervals are essentially the reverse of hypothesis testing. With a confidence interval, we determine a range of values from the sample mean and standard deviation. Rather than finding a p-value, we begin with a desired confidence level (95% is standard) and use a table to find its corresponding z- or t-score. When we multiply the z- or t-score by the standard deviation, and then add and subtract this number from the mean, we create a range of values. For example, consider a population for which we wish to know the mean age. We draw a sample from that population and find that the mean of the sample is 30, with a standard deviation of 3. If we wish to have 95% confidence, the corresponding z-score (which would be provided on Test Day) is 1.96. Thus, the range is 30 − (3)(1.96) to 30 + (3)(1.96) = 24.12 to 35.88. We can then report that we are 95% confident that the true mean age of the population from which this sample is drawn is between 24.12 and 35.88.