CHAPTER 11

Inferential Statistics


CHAPTER OBJECTIVES

By the end of this chapter, students will be able to:

1.  Discuss the need for statistics to explain data.

2.  Describe the differences between scientific hypothesis, research questions, null hypothesis, and alternative hypothesis.

3.  Explain the difference between one-tailed and two-tailed inferential statistical tests.

4.  Describe when it is appropriate to use an independent t-test and when it is appropriate to use a paired-samples t-test.

5.  Explain the purpose of correlation coefficients.

6.  Create a scenario to explain the benefits of confidence intervals.

7.  Define type I and type II errors.


KEY TERMS

alternative hypothesis

chi-square

coefficients

confidence intervals

correlations

inferential statistics

null hypothesis

one-tailed tests

t-tests

two-tailed tests

type I and type II errors

INTRODUCTION

This chapter begins with why we need to use statistics to understand data. The first section defines the concepts of the scientific hypothesis, null hypothesis, and alternative hypothesis. The next section explains the difference between one-tailed and two-tailed inferential statistical tests. Building upon this knowledge, various statistical tests are introduced using multiple examples. The chapter concludes by defining type I and type II errors. Because it is a common belief among students that learning statistics is difficult, this chapter attempts to present the new terms and definitions as clearly as possible with plenty of applied examples. If at any time, a concept is not understood, stop and review the previous section. Using this technique builds a foundation of basic definitions of statistics for future use.

TYPES OF STATISTICS

Statistics is a branch of mathematics used for data collection, analysis, and interpretation of data.1 Statistics are used in evaluation and research across most disciplines, including natural science, social science, and business and government. Statistics are used to answer questions related to data. There are two basic types of statistics: descriptive and inferential. Descriptive statistics organize and summarize data without the use of complicated mathematical equations. In this chapter, inferential statistics are defined in detail. For now, think of inferential statistics as the study of determining associations about a population from a random sample of data taken from that population. For example, because it is impossible to study the driving habits of every driver in one county, evaluators survey a random sample of drivers and generalize about all drivers in that county based on the sample. Unlike descriptive statistics, inferential statistics use mathematical equations to generate probabilities about a population. These probabilities help program planners and evaluators make important decisions related to their proposed goals and objectives. By using Microsoft Excel, this chapter presents a few basic equations or statistical tests as a way to illustrate how inferential statistics are applied to data.

THE NEED FOR STATISTICS

Let’s explore why statistics are important to use and understand. For evaluators, statistics are essential for understanding the data. Without using inferential statistics, it is not possible to determine patterns, make associations, and draw conclusions from the data collected from the sample. For example, without using inferential statistics, evaluators would have no way of knowing if one community-based participatory program was more effective in reducing obesity rates than another community-based participatory program. Evaluators could not see patterns in indoor air quality and asthma rates in a community. Evaluators would not know which type of smoking cessation programs was most effective in reducing smoking rates among young adults. Engineers would not know which model of automobile seatbelt was most effective in saving lives. Manufacturers would not know the level of sun protection factor (SPF) needed to provide adequate protection against sun exposure. The list goes on and on, but it is evident that inferential statistics are used to move science forward and thus improve quality of life. Lastly, it is important to note that statistics not only let us know if there are associations and patterns with which to draw conclusions, but also let us know that those associations and conclusions are not due to chance or random error within a particular sample. Although it involves more sophisticated statistical techniques than are discussed in this text, higher level statistics informs evaluators not only that a change took place, but also, sometimes, how the change took place.

INFERENTIAL STATISTICS

Now let’s take the next step in describing inferential statistics in greater detail. As with all evaluations, it is necessary to begin with a question or a goal statement. Inferential statistics are no different in that each statistical project begins with a question. Think of the question as the road map. Travelers begin each trip by determining their destination. Whether traveling by car, train, bus, plane, or ship, travelers plot their journey by purchasing tickets or programming the GPS device. The same is true in evaluations. Evaluators do not collect data without first developing a question or roadmap for their study.

For example, suppose you were part of a team that wanted to ask, “Do university staff (nonfaculty or administration) employees who intentionally walk around the building every few hours report having improved concentration, improved productivity, less desire for caffeine, and less desire for snacks than those staff employees who remain at their desks for most of the day?” This university has two campus locations of approximately equal size: North Campus and South Campus. As previously stated, it is not feasible to study an entire population of university staff employees, so evaluators collect data from a randomly selected sample of the population. As the name implies, evaluators use inferential statistics to “infer” or understand how the sample data help to know the larger population. For example, let’s suppose that there are about 4000 university staff employees on each of the two campus locations, and of that population approximately 1500 staff employees at each campus have job titles that imply that they sit at a desk most of the time. Because it is impossible to recruit all 3000 employees with desk job titles, the evaluators randomly recruit 300 staff employees with job titles that imply desk jobs. After signing the institutional review board (IRB)–approved informed consent documents, the 150 staff employees on the North Campus are asked to complete a monthly survey about their perceived concentration level at work, productivity level, desire for caffeine, and desire for snacks, and the 150 staff employees on the South Campus are encouraged to walk around the building for 15 minutes three times per day and complete the same monthly survey.

Prior to the initiation of the actual study, evaluators use inferential statistics to determine if the recruited staff employees have the same characteristics as most other non-recruited staff employees. For example, evaluators would compare the age, gender, weight and height, length of service, and job titles of both groups. This type of information allows evaluators to determine if their study sample is representative of the population of all university staff employees. It is important to make sure that the sample is representative of the entire population, because you want to discover whether walking for 15 minutes three times a day improves concentration and productivity and decreases desire for caffeine and snacks because it really is effective and is not due to the particular characteristics of the university staff employees included in the evaluation.

After the data are collected, the evaluators use inferential statistics to infer whether the walking: (1) improves concentration, (2) improves productivity, (3) decreases desire for caffeine, and (4) decreases desire for snacks as perceived by the staff employees. This simple example illustrates how evaluators move from asking the initial research question, to recruiting individuals, to collecting data, to using inferential statistics to determine if the staff employees at the large university perceive a change in work habits by simply walking for 15 minutes three times per day.

Before moving on to the next section, let’s discuss a little more about statistical tests. Because there are hundreds of statistical tests, evaluators select the appropriate type of statistical test based on their questions and the type of data. By using inferential statistics, they draw conclusions related to the data. However, it is easy for inexperienced evaluators to use the drop-down menus available in complex statistical computer software programs, but this method does not ensure that the correct statistical test was chosen. Without thoroughly understanding the purpose of various statistical tests, it is easy to receive an incorrect computer-generated result due to choosing the wrong test. For this reason, it is important to consult with a statistician before choosing a test to ensure that the correct statistical test is chosen.

In this section, the discussion defines research questions, null hypothesis, and alternative hypothesis. All of these terms are useful in understanding inferential statistics.

Development of Research Questions

Because evaluation begins with goals and objectives, it is necessary for the evaluator to convert the objectives into research questions. Think of the research questions as defined by the question of interest in the evaluation project. Research questions provide a clear and concise roadmap on which to focus the study. When creating a research question, it is important to address the issue of what topic or evidence is being supported or refuted. Research questions need to be stated as testable questions that can be specifically studied in an investigation. Let’s review how to convert objectives into research questions:

Objective: By May 2015, 100% of the 400 participating children younger than 14 years of age with asthma will report the day and time of their asthma episodes and the pollen index of their geographical location.

Research Question:

Weak: What is the relationship between asthma and air quality?

Strong: For 400 participating children younger than 14 years of age with asthma, is there a relationship between the day and time of their asthma episodes and the pollen index of their geographical location?

Objective: By January 2015, 100% of the 300 elderly residents over the age of 65 who completed the senior safe driving course offered by the State Department of Transportation will report receiving fewer driving citations over the next 12 months.

Weak: Does completing the senior driving course reduce the number of driving citations?

Strong: For the 300 participating elderly residents over the age of 65, is there a relationship between completing the senior safe driving course offered by the State Department of Transportation and the number of driving citations received over the next 12 months?

Null Hypothesis and Alternate Hypothesis

Let’s begin by defining a hypothesis as an educated guess based on prior observation, knowledge, or experience that can be supported or refuted through observations or experiments. Hypotheses make predictions that can be duplicated with future research. After the same research is repeated multiple times, enough evidence is collected to support or disprove the scientific hypothesis.2 For example, it took decades of research to gather enough evidence to link tobacco usage to lung cancer. Now let’s apply the concept of research questions to the term null hypothesis. The word null means no difference or no association. The best way to remember the null hypothesis is to understand that the evaluator is trying to disprove or refute the statement that there is no difference or no association or no relationship. In other words, when using inferential statistics, the way that evaluators support their hypothesis is to refute the null hypothesis.

Remember that the evaluators study a representative sample from the population to find evidence to refute the null hypothesis. You may often hear another term used for the research question, called the alternate hypothesis, when evaluators are discussing the null hypothesis. For example, evaluators must assume their alternate hypothesis is wrong until they find sufficient evidence to the contrary.3 By using inferential statistics, evaluators make a decision to reject or fail to reject the null hypothesis. However, what does that mean? Let’s look at a few examples, so these concepts begin to make sense.

Example 1

Research question: For the 300 participating elderly residents over the age of 65, is there a relationship between completing the senior safe driving course offered by the State Department of Transportation and the number of driving citations received over the next 12 months? Null hypothesis: There is no difference in the number of driving citations received over a 12-month period for elderly individuals who completed the safe driving course and those who did not complete the safe driving course. Alternate hypothesis: There is a difference in the number of driving citations received over a-12 month period for the elderly who completed the safe driving course and the elderly who did not complete the safe driving course.

Here is what you need to be thinking when you read this null hypothesis: Using inferential statistics to analyze the data collected from the two groups of elderly drivers, there is enough evidence to reject (or fail to reject) the null hypothesis.

Example 2

Research question: At the end of 3 months on the Lost-It weight management program, do women working the day shift or women working the night shift lose more weight?

Null hypothesis: At the end of 3 months on the Lost-It weight management program, there is no difference in weight between women working the day shift and women working the night shift.

Alternate hypothesis: At the end of 3 months on the Lost-It weight management program, there is a difference in weight between women working the day shift and women working the night shift.

Here is what you need to be thinking when you read this null hypothesis: Using inferential statistics to analyze the data collected from the day-shift women and night-shift women on the Lost-It weight management program, there is enough evidence to reject (or fail to reject) the null hypothesis.

Let’s try one more example to verify your understanding of the concepts. Research question: At the end of 3 months, do students using the Quick Learn System achieve higher Graduate Record Exam (GRE) practice test scores than students not using the Quick Learn System?

Null hypothesis: At the end of 3 months, there is no difference in the GRE practice test scores between students using the Quick Learn System and the students not using the Quick Learn System.

Alternate hypothesis: At the end of 3 months, there is a difference in the GRE practice test scores between students using the Quick Learn System and the students not using the Quick Learn System.

Here is what you need to be thinking when you read this null hypothesis: Using inferential statistics to analyze the data collected from the students using the Quick Learn System and students not using the Quick Learn System, there is enough evidence to reject (or fail to reject) the null hypothesis.

Why can’t evaluators state that they “accept” or “reject” the null hypothesis? Why is the phrase “fail to reject” used instead of “accept”? It is easier to understand the answer to this question by looking at the previous example. Rejecting the null hypothesis means that the evaluation provided evidence to support the notion that there is a difference in weight loss among day-shift female workers and night-shift female workers that is not due to just chance alone at the end of 3 months on the Lost-It weight management program. Failing to reject the null hypothesis means that the research failed to provide evidence to support the notion that there is no difference in weight loss among day-shift female workers and night-shift female workers at the end of 3 months. Therefore, when the null hypothesis is rejected, the evidence from the data analysis does not support the null hypothesis.4,5 Evaluators should never “accept” the null hypothesis. Doing so would say that they are 100% sure of the null hypothesis in all situations.

BASIC INFERENTIAL STATISTICAL TESTS

One-Tailed and Two-Tailed Statistical Tests

Now that you are starting to understand when to reject or fail to reject the null hypothesis, it is time to introduce the terms one-tailed and two-tailed tests. Let’s begin with stating that evaluators determine whether to use a one-tailed or twotailed test when they state the null hypothesis. Even though this discussion is placed in the independent t-test section, the majority of statistical tests allow evaluators to select the use of a one-tailed test or a two-tailed test. This discussion begins by defining a one-tailed test.

One-Tailed Test

Evaluators use a one-tailed test when their null hypothesis reflects a specific direction. See Figure 11-1. The white area shows 95% of all values that, if obtained, fail to reject the null hypothesis. The gray area shows the 5% of possible values that would reject the null hypothesis, also called the critical area. The critical value shown with an arrow is discussed in detail later in the chapter. One-tailed tests may have the critical value and area on either the far left side or the far right side, depending on the null hypothesis, but only on one side of the normal curve. If evaluators select a one-tailed test, the obtained or calculated value might fall on the extreme positive or extreme negative side, depending on what the researcher is studying. It is possible for evaluators to falsely reject the null hypothesis, thus evaluators select a one-tailed test only when they have reason to believe that the difference falls in a specific direction.

FIGURE 11-1 One-tailed statistical test.

images

Null hypothesis: When using the Quick Learn System, there is no difference in the GRE practice test scores of the students.

Alternate hypothesis: When using the Quick Learn System, there is a difference in the GRE practice test scores of the students.

Evaluators stating this null hypothesis are interested in only the GRE practice test scores when using the Quick Learn System. Evaluators conducting this study are confident that the Quick Learn System improves the GRE practice test scores for students. In this example, evaluators must be so confident in the increased GRE practice test scores that they choose to ignore the possibility that some students using the Quick Learn System may get more confused and may actually decrease their GRE practice test scores and thus fall anywhere outside the critical area. Without high confidence based on previous evaluation results, using a one-tailed test results is a serious mistake. For example, when students may use the Quick Learn System and their test scores may decrease. Evaluators need to be certain that the Quick Learn System is not just based on sales hype and does not cause a decreased GRE practice test score prior to using a one-tailed test.

Two-Tailed Test

When using a two-tailed test, the normal curve is divided into three sections: the large middle section represents 95% of all possible values and each of the two side sections represents 2.5% of the possible values. The total of the possible values represented underneath the normal curve should be equal to 100%. Figure 11-2 illustrates what is called a two-tailed test, because the 5% chance is divided equally into 2.5% in the two tails of the curve.

FIGURE 11-2 Two-tailed statistical test.

images

Using Excel for Statistics

Now it is time to use Microsoft Excel to learn a few basic statistical calculations. First, it is necessary to add the Excel Analysis ToolPak to your software by using the directions provided in Figure 11-3. Once you have added the Analysis ToolPak, it is easier to follow the examples in the remainder of this chapter. Let’s begin by introducing a statistical test called the chi-square test. The symbol χ2 is used in the literature to refer to this statistical test.

Chi-Square Test

The type of data used for a chi-square test is called nominal, because it categorizes the data. Examples of nominal data are ZIP codes, gender, types of trees, names of cars, and so on. Nominal data are also called categorical data, because evaluators can place the data into categories. Evaluators use the chi-square test to analyze nominal data, such as frequencies.5

FIGURE 11-3 Installing Excel Analysis ToolPak.

images

Used with permission from Microsoft.

There are several types of chi-square statistical tests. This discussion explains the one-sample chi-square or “goodness-of-fit” test. The one-sample chi-square test requires nominal or categorical data for two variables. Each variable has two, three, or four levels. For this example, one variable is dichotomous and offers only two responses: yes or no. The second variable is related to age. The categorical data must be independent. Independence is when there is no chance that one responding individual could correctly respond to two response choices for the same survey questions. Review the following survey questions:

1.  At what age did your child complete the required series of immunizations?

a. 6 months to 12 months

b. 12 months to 24 months

c. 24 months or later

2.  Please mark the group that best describes your child’s age:

a. 6 months to 11 months

b. 12 months to 23 months

c. 24 months or older

The first example is not independent or mutually exclusive, because a responding individual could correctly mark (a) and (b) for completion of the required immunizations at 12 months or mark (b) and (c) for completion of the required immunizations at 24 months. The violation of independence does not allow evaluators to know how many respondents are incorrectly placed in the wrong category for analysis.5 The second example shows no violation of independence because there is no overlap.

Before looking at the actual data, let’s review the null hypothesis for this survey question:

Null hypothesis: There is no difference between the numbers of individuals in each of the three age groups. Alternate hypothesis: There is a difference between the numbers of individuals in each of the three age groups.

Note that the above alternate hypothesis does not state how big a difference there is between the three groups, just that there is a difference. Now let’s look at some actual data entered into Excel for 60 individuals who responded to the survey question (see Figure 11-4). The codebook for this survey is as follows: 1 = 6 to 11 months; 2 = 12 to 23 months; and 3 = 24 months or older. Table 11-1 shows the data in three groups.

TABLE 11-1 Actual Data Shown by Age

images

FIGURE 11-4 Survey codebook data.

images

Used with permission from Microsoft.

TABLE 11-2 Calculation of Chi-Square

images

Using these data, evaluators want to know if the ages of responding individuals are equally distributed, so they conduct a chi-square test to answer this question. Of course, with this example, the reader can merely look at the numbers to see that the distribution is uneven. However, the purpose of the simple example is to explain the process. When using real datasets, the process is the same, but the datasets are much larger, and it is not as evident if there is a difference. The first column matches the actual or observed data from Table 11-2. The second column shows what the evaluators expect, or 20 individuals evenly distributed in each category. Conduct the calculations shown in the last column to verify that you understand the equations.

The next step is a bit confusing, so read this section carefully. Most statistical textbooks have an appendix that provides a variety of tables called critical value tables that correspond to specific types of statistical tests. Although how the critical values are calculated is beyond the scope of this chapter, these values are percentage points used in inferential statistics to reject or fail to reject the null hypothesis.6 The critical value tables are provided in this chapter for ease of understanding. Critical value tables may also be easily found on the Internet. For the chi-square test, evaluators refer to the Table of Critical Chi-Square Values (see Table 11-3).7

To use Table 11-3, follow these steps:

1.  The first step in reading Table 11-3 is to determine the degrees of freedom. This concept is based on a mathematical formula that is beyond the scope of this introductory chapter. For this chapter, degrees of freedom are defined as the number of choices (e.g., 18–23, 24–29, and 30+ years in this example) minus 1 for the overall category of age. In this example, there are three age group choices and one group called the age variable, so we have 2 degrees of freedom because 3 – 1 = 2.8

2.  Look at the far left column and locate 2 degrees of freedom. The critical chi-square value is 5.9915.

3.  Decision: Look back at Table 11-2 to find the obtained χ2 value of 1.23.

4.  Compare the obtained χ2 to the critical χ2 value of

5.9915. The obtained value is less than the critical value. Therefore, evaluators fail to reject the null hypothesis. In other words, there is no statistically significant difference between the numbers of individuals in each of the three age groups. In chi-square tests, you want the obtained value to be larger than the critical value.

Let’s try another example to confirm your understanding of chi-square tests. The following example asks the same survey question, but yields different responses from the different participating individuals (see Table 11-4 and Table 11-5).

Once again, calculate the degrees of freedom as 3 – 1 = 2. Look back at Table 11-3 at 2 degrees of freedom and find the critical chi-square value of 5.9915.

Decision: Compare the obtained χ2 value of 30.14 to the critical χ2 value of 5.9915. The obtained value is greater than the critical value. Therefore, evaluators reject the null hypothesis. There is a statistically significant difference in the age group of individuals who answer the survey.

t-Tests

t-tests are used to determine if the mean scores between two groups are statistically different. There are two types of t-tests: independent and paired samples. Independent t-tests look at two groups of individuals (or any other data being measured) examined only once in time. For example, two groups of categorical data (public high school seniors and private school seniors) take the SAT (continuous data) one time. The evaluators conduct an independent t-test to determine if there is a statistical difference between the mean scores of the public school seniors compared to the mean score of the private school seniors.

Paired-samples t-tests look at one group of individuals tested twice, such as with a pretest and posttest. For example, high school students are given the written driving test on the first day of the driver’s education course to determine their knowledge. A pretest mean score is calculated for the group of high school students. The high school students attend the 6-week driver’s education course, after which they complete a posttest to determine how much information was learned during the driver’s education course. For a paired-samples t-test, the pretest mean score is compared to the posttest mean score to determine if there is a statistically significant difference between the two mean scores. Now let’s explore each of the two types of t-tests in more detail.

TABLE 11-3 Table of Critical Chi-Square Values

images

TABLE 11-4 Actual Data Shown by Age

images

TABLE 11-5 Calculation of Chi-Square

images

Independent t-Test

Independent t-tests determine if there is a statistical difference between the mean scores for two groups examined only one time. For example, let’s say that evaluators want to know if the Lost-It weight management plan enabled women who work the day shift or women who work the night shift to lose more weight in 12 weeks. As always in inferential statistics, evaluators begin by stating the null hypothesis.

Null hypothesis: There is no difference in weight between female day-shift workers and female night-shift workers at baseline prior to starting the 12-week Lost-It weight management program.

Alternate hypothesis: There is a difference in weight between female day-shift workers and female night-shift workers at baseline prior to starting the 12-week Lost-It weight management program.

Look at Figure 11-5 to review the baseline weight for all the participants in the program. In Excel, look at the top line to locate the formula for calculating the baseline mean score for all participants: fx =SUM(C2:C21)/20. After entering the formula, it is necessary to place the cursor where you wish the mean score to appear prior to hitting Enter. Now look at line 22 to see that the mean score for all participants is 188.6 pounds at baseline. Although Figure 11-5 familiarizes the reader with Excel formulas, it does not provide a baseline mean score by day shift and night shift. Note that in Figure 11-5, the codebook denotes column A as the case number and column B as their work shift, with 1 = day-shift female workers and 2 = night-shift female workers.

To calculate the baseline mean weight by work shift, it is necessary to sort the data. To sort in Excel, go to the toolbar and click on Data, then click on Sort. Highlight columns A, B, and C and then sort by adding levels as shown in Figure 11-6. Click OK.

Figure 11-6 shows that the evaluators sorted the data, so all the female day-shift workers’ data are grouped first, followed by the female night-shift workers’ data. From here, evaluators use the formula to calculate the mean score provided in Figure 11-7. The female day-shift workers’ mean baseline weight is 187 pounds and female nigh-shift workers’ mean baseline weight is 190.2 pounds. The female day-shift workers had a lower baseline weight than the female nightshift workers. However, this may be due to simple change. Evaluators want to know if the difference between the baseline weight mean scores are statistically significant, so evaluators conduct an independent t-test to reject or fail to reject the null hypothesis.

FIGURE 11-5 Mean baseline weight for all participants.

images

Used with permission from Microsoft.

To calculate the independent t-test, go to Formulas on the toolbar, then click on More Functions, then Statistical, then TTEST. The box shown in Figure 11-8 appears. In the box, enter B2:B11 in Variable 1 Range (day-shift workers) and B12:B21 (night-shift workers) in Variable Range 2. When you click OK, the results are as shown in Figure 11-9.

FIGURE 11-6 Sorting data in Excel.

images

Used with permission from Microsoft.

FIGURE 11-7 Baseline mean weight female day- and night-shift workers.

images

Used with permission from Microsoft.

FIGURE 11-8 Excel and independent t-tests.

images

Used with permission from Microsoft.

FIGURE 11-9 Results of independent t-test.

images

Used with permission from Microsoft.

To help the reader understand Figure 11-9, most lines are explained. The lines without explanation are beyond the scope of this chapter.

Line 1: t-Test: Two-Sample Assuming Equal Variance.

Line 3: Variable 1 = day-shift worker data; Variable 2 = night-shift worker data.

Line 4: Mean is the average baseline weight for day-shift workers as calculated in Figure 11-7.

Line 5: Variance.

Line 6: Observations: 10 day-shift and 10 night-shift workers.

Line 7: Pooled Variance.

Line 8: Hypothesized Mean Difference.

Line 9: df (degrees of freedom) is 20 (10 day-shift workers + 10 night-shift workers) observations – 2 groups (day-shift workers and night-shift workers) = 18.

Line 10: t-Stat is –0.2046 and is called the obtained t-value. It is calculated by Excel using a formula. (Note:

It is possible to calculate a t-test by hand, but the formula is beyond the scope of this introductory chapter, so the Excel formula is illustrated.)

Line 11: p-value (one tail).

Line 12: t-critical (one tail).

Line 13: p-value (two tail): Calculated by Excel; probability of t-value happening by chance is 0.84013.

Line 14: t-critical (two tail): Look at Table 11-6.

Note: Before proceeding to making a decision about the null hypothesis, it is important to become familiar with Table 11-6. As previously mentioned, there are critical value tables for most types of inferential statistic calculations.

Now let’s follow the steps to make a decision regarding the null hypothesis for the t-test:

1.  The first step in reading Table 11-6 is to determine the number of cases. In this example, there are 20 cases or participants. As previously stated, the statistical term degrees of freedom (df) for the t-test is calculated by subtracting 1 for each group from the total number of groups. In this example, there are two groups (day- and night-shift workers), so 20 – 2 = 18 degrees of freedom.

TABLE 11-6 Table of Critical t-Values

images

2.  Locate 18 degrees of freedom on Table 11-6. The critical t-test value is 2.1009.

3.  Look back at Figure 11-9 to find the calculated or obtained t-value of –0.2046.

4.  Compare the obtained t-value of –0.2046 to the critical t-value of 2.1009.

5.  If the obtained t-value does not exceed the critical value, the evaluators fail to reject the null hypothesis. Evaluators report that the null hypothesis is the best explanation and there is no difference in weight between female day-shift and night-shift workers at baseline prior to starting the 12-week Lost-It weight management program. In other words, the decision is that there is no significant difference between the baseline weight of day- and night-shift workers.

The mean is 187 pounds for day-shift workers and 190.2 pounds for night-shift workers. Because the obtained t-test value of –0.2046 is between –2.1009 and +2.1009, evaluators fail to reject the null hypothesis. There is no statistically significant difference between the baseline weight of the female day-shift and night-shift workers.

Now let’s return to the example, using Figure 11-10. An alternative way to determine whether or not to reject or fail to reject the null hypothesis is to look at line 13 in Figure 11-9, p(Tt) two-tail = 0.8401. The p stands for probability value. To determine whether to reject or fail to reject the null hypothesis, the p-value must be less than 5%, or 0.05. In this example p = 0.8401, which is larger than 0.05, and therefore evaluators fail to reject the null hypothesis due to lack of evidence to reject the null hypothesis.

This second example of an independent t-test uses the same null hypothesis with a different dataset.

Null hypothesis: There is no difference in weight between day- and night-shift workers at baseline prior to starting the 12-week Lost-It weight management program. Alternate hypothesis: There is a difference between day- and night-shift workers at baseline prior to starting the 12-week Lost-It weight management program.

Figure 11-11 shows the data. The baseline mean score for 20 day-shift workers is 149.6 pounds and the baseline mean score for 20 night-shift workers is 200.8 pounds.

As with the pervious example, in Excel, select the t-Test: Two-Sample Assuming Equal Variance (see Figure 11-12). When you click OK, the results are as shown in Figure 11-13.

To help the reader understand Figure 11-13, each line of interest is explained here:

Line 1: t-Test: Two-Sample Assuming Equal Variance was selected by the researcher.

Line 3: Variable 1 = day-shift female worker data; Variable 2 = night-shift female worker data.

Line 4: Mean is the average baseline weight for day-shift female workers (149.6 pounds) and night-shift female workers (200.75 pounds).

Line 6: Observations: 20 female day-shift workers and 20 female night-shift workers.

FIGURE 11-10 Failing to reject null hypothesis.

images

Data from One- and Two-Tailed Tests from Cliffs Notes. Available at: http://www.cliffsnotes.com/math/statistics/principles-of-testing/one-and-twotailed-tests.

FIGURE 11-11 Mean baseline weight for female day- and night-shift workers.

images

Used with permission from Microsoft.

Line 8: Hypothesized Mean Difference.

Line 9: df (degrees of freedom) is 40 observations (20 day-shift female workers + 20 night-shift female workers) – 2 groups had their weight measured one time (female day-shift workers and female night-shift workers) = 38.

Line 10: t-Stat is –4.636, calculated by Excel, and is called the obtained t-value.

Line 11: p-value (one tail).

Line 12: t-critical (one tail).

Line 13: p-value (two tail): Calculated by Excel is 0.000004113 and is the probability of the t-value happening by chance. (Hint: In Excel, E-05 equals placing five zeros in front of the first number.)

Line 14: t-critical (two tail): Look at Table 11-6. The critical t-value is 2.024.

Compare the obtained t-value of –4.636 to the critical t-value of 2.024. If the obtained t-value is more extreme than the critical t-value, the evaluators reject the null hypothesis. Evaluators report that there is a significant difference in weight between female day-shift workers and female nightshift workers at baseline prior to starting the 12-week Lost-It weight management program. In other words, evaluators can say that the difference is not due to chance.

The mean is 149.6 pounds for female day-shift workers and 200.75 pounds for female night-shift workers. Because the t-test value of –4.636 is outside the range between –2.02439 and +2.02439, evaluators reject the null hypothesis (see Figure 11-14). Because the t-value is in the gray critical area, evaluators have enough evidence to reject the null hypothesis. There is a statistically significant difference between the baseline weight of the female day-shift workers and female night-shift workers.5

Now it is time for you to practice this new skill with another set of data. It is important to practice using Excel.

Null hypothesis: There is no difference in satisfaction survey scores between the online (online = 1) and telephone (telephone = 2) methods of how individuals purchased Affordable Care Act (ACA) health insurance. Alternate hypothesis: There is a difference in satisfaction survey scores between the online and telephone methods of how individuals purchased Affordable Care Act (ACA) health insurance.

Let’s test your skills. Using the dataset in Figures 11-15 and 11-16, answer the following questions:

1.  Given that online = 1 and telephone = 2, what is the satisfaction mean score for using the online method and the satisfaction mean score for the telephone method? (Answer: online = 4.4 mean score and telephone = 6.06 mean score.)

2.  Because there are 15 online observations and 15 telephone observations, what is the degree of freedom for this independent t-test? (Answer: 28 degrees of freedom.)

FIGURE 11-12 t-test: Two-sample assuming equal variances.

images

Used with permission from Microsoft.

FIGURE 11-13 Results of Example Two independent t-test.

images

Used with permission from Microsoft.

FIGURE 11-14 Illustration of rejecting the null hypothesis.

images

FIGURE 11-15 Method/satisfaction.

images

Used with permission from Microsoft.

3.  For this independent t-test, assume a two-sample with equal variances. What is the t statistic? (Answer: t-Stat = –2.797.)

4.  For this independent t-test, assume a two-sample with equal variances, what is the t-critical two-tail? (Answer: t-critical two-tail = 2.048.)

5.  For this independent t-test, do you reject or fail to reject the null hypothesis? (Answer: Reject the null hypothesis.)

Paired-Samples t-Test

Paired-samples t-tests compare the mean scores of one group twice. For example, let’s say that evaluators want to know if the Lost-It weight management program was effective in helping day-shift females (column A) lose weight between their initial weigh-in (column B) and at the end of the program after 12 weeks (column C). See Figure 11-17.

Null hypothesis: There is no difference in weight between the baseline weight and the 12th week weight for day-shift workers participating in the Lost-It weight management program.

Alternate hypothesis: There is a difference in weight between the baseline weight and the 12th week weight for day shift workers participating in the Lost-It weight management program.

To help the reader understand Figure 11-18, each line of interest is explained here:

Line 1: t-Test: Paired Two Sample for Means was selected by the researcher.

FIGURE 11-16 t-test: Two-sample assuming equal variances.

images

Used with permission from Microsoft.

FIGURE 11-17 Comparison of baseline and 12-week data.

images

Used with permission from Microsoft.

Line 3: Variable 1 = female pre-weight data; Variable 2 = female post-weight data.

Line 4: Mean is the average pre-weight for day-shift workers (187 pounds) and post-weight (173.9 pounds).

Line 6: Observations: 10.

Line 8: Hypothesized Mean Difference.

Line 9: df (degrees of freedom) is 10 observations (one group of female day-shift workers with their weight measured twice) – 1 group of female day-shift workers had their weight measured two times (pretest and post-test weights) = 9.

Line 10: t-Stat is 7.604, calculated by Excel, and is called the obtained t-value.

Line 11: p-value (one tail).

Line 12: t-critical (one tail).

Line 13: p-value (two tail): Calculated by Excel as 0.00000301 and is the probability of the t-value happening by chance. (Hint: In Excel, E-05 equals placing five zeros in front of the first number.)

Line 14: t-critical (two tail): Look at Table 11-6. The critical t-value is 2.2621.

FIGURE 11-18 Results of paired-samples t-test.

images

Used with permission from Microsoft.

For females, the average baseline weight is 187 pounds and the average weight at the 12th week is 173.9 pounds. The paired-samples t-test yields a t-value of 7.6940. With 9 degrees of freedom, the critical t-value is 2.2621. Look at Figure 11-19 to determine the decision. Because the obtained t-value is in the gray critical area, evaluators reject the null hypothesis. In other words, there is a statistically significant difference between the pre-weight and post-weight for day-shift workers who participated in the Lost-It weight management program.5

FIGURE 11-19 Illustration of paired-samples t-test.

images

Data from One- and Two-Tailed Tests from Cliffs Notes. Available at: http://www.cliffsnotes.com/math/statistics/principles-of-testing/one-and-twotailed-tests.

Lastly, let’s determine whether to reject or fail to reject the null hypothesis based on the p-value. Look at Figure 11-13 and see that the two-tailed p-value is 0.000003018. Knowing the p-value must be less than 0.05, evaluators decided to reject the null hypothesis.

Let’s practice your skills with the paired-samples t-test, using the following example:

Null hypothesis: There is no difference in the 20-question knowledge survey pretest mean score for individuals attending a 6-week diabetic course and the identical 20-question knowledge survey posttest mean score upon completion of the 6-week diabetic course.

Alternate hypothesis: There is a difference in the 20-question survey pretest mean score for individuals attending a 6-week diabetic course and the identical 20-question survey posttest mean score upon completion of the 6-week diabetic course.

Now let’s see if you can correctly answer a few questions related to the dataset (see Figure 11-20):

1.  If the pretest = 1 and the posttest = 2, what is the mean score for both variables? (Answer: Pretest mean score is 12.2 and posttest mean score is 14.8.)

FIGURE 11-20 Pretest/posttest.

images

Used with permission from Microsoft.

2.  Because there are 15 pretest scores and 15 post-test scores, how many degrees of freedom are used? (Answer: 14. It is 14 because for paired-samples t-tests, there is only one variable [knowledge survey] observed twice [pretest and posttest].)

3.  For this paired two samples for mean t-test, what is the t-statistic? (Answer: t-Stat = –7.171.)

4.  For this paired two samples for mean t-test, what is the t-critical two-tail statistic? (Answer: t-Critical two tail = 2.1447.)

5.  Would you reject or fail to reject the null hypothesis? (Answer: Reject the null hypothesis.)

Correlation Coefficients

Statistical tests help tell evaluators if there are relationships or associations between two variables that are measured. For example, does a new type of treadmill affect the leg strength of marathon runners as compared to older models? Is there a relationship between wearing ear-buds daily for a minimum of 4 hours per day during work and the amount of hearing loss after 1 year? Though very important, statistical tests can only tell us if there is or is not a relationship. Correlation coefficients provide more information, such as the strength of the association or whether it has a positive or negative effect on the variables.

Correlation coefficients define a numerical relationship between two continuous variables (also called a Pearson product-moment correlation). There are other types of correlations, but this chapter introduces bivariate correlations that show the relationship between two continuous variables. Let’s begin this discussion by providing a simple example of how correlations are used in everyday life. When parents take their child to the pediatrician, the child is weighed and measured. These data are plotted on the child’s growth chart in his medical records. The pediatrician is interested in knowing if the child is proportional for his age for height and weight. This would help tell the doctor if the child is growing normally as compared to other children of the same age group. If the plot on the growth chart shows the child in the 90th percentile for height and the 40th percentile for weight, pediatricians recommend healthy choices of food in order to add calories to the child’s diet because he is somewhat underweight. If, on the other hand, the percentiles were reversed, pediatricians recommend a reduction in calories because the child’s weight exceeds the recommended weight for the child’s height. At each visit, pediatricians plot the height and weight trends for each child. Pediatricians are more concerned about the child following a consistent pattern over time (e.g., 40th percentile for height and weight or 90th percentile for height and weight). Pediatricians become concerned when children’s growth patterns are inconsistent. Keep in mind that growth charts are one of several criteria used by pediatricians to determine the overall health of a child (see Figure 11-21).9

The numerical value of a correlation is called a correlation coefficient and depicts two concepts: direction and strength. Direction is depicted as positive or negative. Positive direction occurs when variables on the x-axis increase as variables on the y-axis increase. For example, pediatric growth charts show height and weight, both of which are continuous data. Negative direction occurs when variables on the x-axis increase as variables on the y-axis decrease. For example, data show that as duration of exercise increases, weight decreases. Strength is described as how dispersed the dots are on the scatter plot. Wide dispersion means less strength and narrow dispersion shows greater strength. Scatter plots describe the direction and strength of the correlations shown on a graph (see Figure 11-22). If the scatter plot dots are closer together, the strength is strong (–1.00 to +1.00), which means that for every unit of increase on the x-axis there is an equal increase in every unit on the y-axis. The reverse is also true for negative correlations. As the scatter plot dots become more widely dispersed, the strength weakens to close to 0. For example, in

FIGURE 11-21 Birth to 36 months growth chart for girls.

images

Reproduced from Centers for Disease Control. Growth Charts. Birth to 36 months: Girls length-for-age and weight-for-age percentiles. Available at http://www.cdc.gov/growthcharts/data/set1clinical/cj41l018.pdf.

FIGURE 11-22 Six scatter plots of continuous data showing strength and direction of correlations.

images

Data from Online Statistics Education. Available at: http://onlinestatbook.com/chapter4/pearson.html

BOX 11-1 Practice Your Skills

Answer the following questions by referring to Figure 11-22.

1.  Which correlation is the strongest?

a. –0.75

b. +0.48

c. –0.20

d. +0.65

2.  Which correlation is the strongest?

a. +0.18

b. +0.20

c. –0.30

d. –0.18

Answers: 1. (a), –0.75 is the strongest correlation of the choices. See Figure 11-22 Example D. 2.  (b), +0.30 is the strongest correlation of the choices. See Figure 11-22 Example E.

Figure 11-22A, the dots line up in a straight line perfectly.10

The direction is positive and the strength is +1.00. Now turn to Box 11-1.

Computing Correlation Coefficients

When determining a correlation, the result of the calculation is called a correlation coefficient and is reported in statistical writing as r. Let’s begin with a null hypothesis using the data in Figure 11-23.

Null hypothesis: There is no correlation (r = 0) between height and weight for female day- and night-shift workers. Alternate hypothesis: There is a correlation between height and weight for female day- and night-shift workers.

Next, enter the formula to calculate a correlation coefficient in the function (fx) toolbar in Microsoft Excel. The formula on the left is the correlation coefficient for height and weight. The formula on the right is the correlation coefficient for weight and BMI (see Figure 11-24). Information on how to calculate BMI can be found in Box 11-2.

When the CORREL formula is entered into the fx toolbar, the box shown in Figure 11-25 appears. Enter Array 1 for height and Array 2 for weight. Repeat the same process for CORREL for weight and body mass index (BMI; see Figure 11-25).

FIGURE 11-23 Data for height, weight, and BMI.

images

Used with permission from Microsoft.

Let’s explore Figure 11-22 further:

1.  Look at Table 11-7.12

2.  The degrees of freedom are calculated: 19 (number of individuals in the data) – 2 (number of variables: height and weight) = 17.

3.  Using 17 degrees of freedom, locate the critical value of 0.4555.

From Figure 11-22, the formula result (obtained r value) is +0.8303. The obtained value is more extreme than the critical value and therefore falls in the critical area for rejecting the null hypothesis (see Figure 11-26). Evaluators reject the null hypothesis. There is a statistically significant correlation between height and weight for adult females and males.

It is essential to remember that with correlations, one variable does not mean that it causes another variable to change, it simply means that two things are numerically related or associated. For example, gaining weight does not cause the individual to grow taller. In another example, more people drown when the temperature is over 90 degrees. In this case, the high temperature has nothing to do with drowning, but rather more people swim when the weather is warm. In reverse, just because more people are swimming does not make the temperature rise.

FIGURE 11-24 Formulas to calculate correlation coefficient.

images

Used with permission from Microsoft.

Confidence Intervals

Now that you have been introduced to the concept of rejecting and failing to reject the null hypothesis with chi-square tests, t-tests, and correlation coefficients, it is time to present the topic of confidence intervals. The confidence interval provides more information than simply rejecting or failing to reject the null hypothesis. By using the null hypothesis, evaluators know that there is a statistically significant difference between the mean scores, while the confidence interval calculates an estimate of the magnitude of the difference.

Let’s explore this definition in greater detail for improved understanding. Remember back at the beginning of this chapter, you learned that in inferential statistics, evaluators are generally unable to gather data from the whole population, so they select a representative sample of individuals to participate in their study. Evaluators collect and analyze their data. If they use a t-test in the analysis, they would calculate a mean score. This mean score is derived from the sample, not from the whole population. Because evaluators cannot study the whole population, they are always left wondering how well the mean score for the sample estimates the actual mean score for the whole population. To ease their speculation, they calculate a confidence interval. Confidence intervals provide a range, including a lower and upper limit, of where the mean score of the population is likely to be contained.13 Because evaluators only have data from the sample, they estimate the confidence interval range from the data collected from the sample representing the population.14

So, how do the evaluators decide their level of certainty in the calculated confidence interval? Good question. First, evaluators set the desired level of their “confidence,” and this value is represented by a percentage. In most cases, evaluators set the level of confidence at 95%, so they can state that they are 95% certain that the lower and upper range of the calculated confidence interval captures the mean score of the population. In other words, the 95% confidence interval indicates the range of values within which the mean score would fall 95% of the time if evaluators repeated the study with an infinite number of sample size samples taken from the same population.15

BOX 11-2 BMI Calculator and Information

The National Heart, Lung, and Blood Institute (http://nhlbisupport.com/bmi/bminojs.htm)11 provides a free BMI calculator, if you wish to calculate your own BMI. BMI is the correlation of body fat based on height and weight.

BMI categories:

•  Normal weight = 18.5–24.9

•  Overweight = 25–29.9

•  Obesity = BMI of 30 or greater

Courtesy of The National Heart, Lung and Blood Institute (http://nhlbisupport.com/bmi/bminojs.htm)

FIGURE 11-25 Correlation arrays.

images

Used with permission from Microsoft.

For this example, the data from the second example for the independent t-test (Figure 11-13) are used to present the basic equation to introduce confidence intervals. Because Excel does not provide a simple formula for confidence intervals, this example uses a combination of Excel data and an equation that can be calculated by hand. (Note: Most specialized statistical software programs provide drop-down menus for ease of calculating confidence intervals.)

The following steps explain the equation used to calculate a confidence interval:

1.  To calculate a confidence interval, you need to use Excel formulas to determine the values for mean scores, standard deviation, sample size, degrees of freedom, and the critical t-value.

TABLE 11-7 Critical Values for Correlation Coefficients

images

Data from Quantitative Psychology at Middle Tennesee State University. Available at: http://capone.mtsu.edu/dkfuller/tables/correlationtable.pdf.

FIGURE 11-26 Correlation coefficient for rejecting null hypothesis.

images

Mean scores: Day-shift female workers: 149.6 pounds

Excel: images

Night-shift female workers: 200.75 pounds

Excel: images

Standard deviation: Day-shift female workers: 32.14

Excel: images

Night-shift female workers: 37.43

Excel: images

Sample size: Day-shift workers: 20 and night-shift workers: 20

Degrees of freedom: 40 – 2 = 38

Critical t-value: 2.0243

2.  To understand the confidence interval formula, it is necessary to know the meaning of each symbol:

images = Mean score 1 (night-shift workers: 200.75 pounds) X

images = Mean score 2 (day-shift workers: 149.6 pounds) s2

images = Standard deviation of night-shift workers (37.43) squared = 1,401 s2

images = Standard deviation of day-shift workers (32.14) squared = 1,033

n1 = Number of night-shift workers = 20

n2 = Number of day-shift workers = 20

Decision: Evaluators are 95% confident that the mean difference between day-shift worker baseline weight and night-shift worker baseline weight is between –73.48 pounds and –28.81 pounds (see Figure 11-27). Why are these numbers negative? The answer is because the female day-shift workers started the Lost-It weight management programs weighing between 28.81 and 73.48 pounds less than the female night-shift workers.16

If program evaluators were aware of this weight difference, they could conduct the evaluation with a specific focus on the weight-loss challenges faced by the female night-shift workers that are different from those of the female day-shift workers.

Now let’s compare the limited information received from conducting an independent t-test (Figure 11-9) and the expanded information received from adding the confidence interval (Figures 11-24 and 11-25). As shown in Figure 11-28, evaluators have enough evidence to reject the null hypothesis and to state that there is a statistically significant difference between the baseline weight of the day-shift and night-shift workers.5

FIGURE 11-27 Confidence interval lower level/confidence interval upper level.

images

FIGURE 11-28 Rejecting the null hypothesis.

images

However, Figure 11-29 gives us the same information, plus it includes added information about the magnitude of the difference between the female day-shift workers’ baseline weight and the female night-shift workers’ baseline weight with the inclusion of the confidence intervals. Now evaluators have enough evidence to reject the null hypothesis and to state with 95% confidence that the mean weight difference between day-shift and night-shift workers for the population is between 21.81 pounds and 73.48 pounds.

Now let’s consider two last important concepts about confidence intervals. First, if the lower and upper values include 0, the evaluators fail to reject the hypothesis. For example, if the lower value is –1.89 and the upper value is +3.73, the spread includes 0, so “fail to reject” is correct. Second, when the confidence interval is extremely wide (e.g., –6.79 to –218.09), the sample size is too small and needs to be increased for a more accurate representation of the population.

FIGURE 11-29 Rejecting the null hypothesis with confidence intervals.

images

TABLE 11-8 Summary of Type I and Type II errors

images

Adapted from Ary D, Jacobs LC, Razavieh A. Introduction to Research in Education, 6th ed. Belmont, CA: Wadsworth/Thompson Learning; 2002; and Salkind NJ; 2010. Statistics for People Who Think They Hate Statistics, 2nd ed. Thousand Oaks, CA: Sage; 2010.

TYPE I AND TYPE II ERRORS

Type I and type II errors build on the concepts of statistical significance level and rejecting the null hypothesis. Evaluators make a type I error when they falsely reject the null hypothesis when looking at differences between the intervention group and the control group. For example, suppose the null hypothesis is “For factory workers with lower back pain, there is no difference between wearing the usual narrow foam back support belt and the improved, wider and flexible back support belt for treating their chronic lower back pain.” Evaluators commit a type I error when they report that there is a difference between the foam belt and the wide belt, when in fact there is no difference.5,17

It is helpful to think of type II errors as the opposite of type I errors. Type II errors occur when evaluators fail to reject the null hypothesis when there is a difference between the control and study groups. For example, evaluators commit a type II error when they report that there is no difference between the narrow foam belt and the wide belt on the outcome variable (lower back pain) being studied. This is a serious error because the Occupational Safety and Health Administration might recommend the use of either back support belt based on evaluators’ conclusions that both back support belts are the same, when in fact the back support belts are different.5,17 Type II errors are also defined as failing to reject a false null hypothesis. See Table 11-8 for a summary of these definitions.18 Practice your skills with Box 11-3. See Box 11-4 for more resources.

SUMMARY

This chapter introduces inferential statistics. It begins by defining scientific hypothesis, research questions, null hypothesis, and alternative hypothesis. How to conduct basic inferential statistical tests using Excel was introduced using chi-square tests and t-tests as examples. After defining correlation coefficients, the chapter concluded with a discussion of confidence intervals and type I and type II errors. The purpose of this chapter is to provide a brief introduction to using this technique and build a foundation of basic definitions of statistics for future use.

CASE STUDY

At the large State University Student Health Center, the administration hired an evaluation team to determine if they should install instant hand sanitizer dispensers near the elevators in the student residence halls as a way to decrease the spread of upper respiratory infections (URIs) and flu symptoms among students. At the beginning of the fall semester, instant hand sanitizers were installed in three of the high-rise student residence halls on the east side of campus and not in the three high-rise student residence halls on the west side of campus. Each time a student came into the student health center with flu or URI symptoms, while waiting to be seen, they were asked to complete a brief survey.

The survey questions were:

1.  Do you live in the east residence halls or west residence halls?

a. I live in the east campus residence halls.

b. I live in the west campus residence halls.

2.  In the past 2 weeks, how many times have you visited friends in the east campus residence halls?

a. 0 visits

b. 1 visit

c. 2 visits

d. 3 visits

e. 4 visits

f. 5 or more visits

3.  In the past 2 weeks, how many times have you visited friends in the west campus residence halls?

a. 0 visits

b. 1 visit

c. 2 visits

d. 3 visits

e. 4 visits

f. 5 or more visits

BOX 11-3 Type I and Type II Errors Practice Problems

Practice with a few more examples.

Example 1

Null hypothesis: There is no difference between the Scholastic Assessment Test (SAT) scores of 1st-year university freshmen and 1st-year community college freshmen.

Data report: The SAT scores of the 1st-year university freshmen are higher than the SAT scores of the 1st-year community college freshmen.

Evaluators’ decision: Reject the null hypothesis, because the null hypothesis is false.

Choose one of the following options to describe the evaluators’ decision:

a. Correct decision

b. Type I error

c. Type II error

Example 2

Null hypothesis: There is no difference between the indoor air quality in homes before and after the new air filter systems are installed in the attic of each home.

Data report: The indoor air quality is decreased after the installation of the attic air filters.

Evaluators’ decision: The evaluator fails to reject the null hypothesis.

Choose one of the following options to describe the evaluators’ decision:

a. Correct decision

b. Type I error

c. Type II error

Example 3

Null hypothesis: There is no difference in job satisfaction between the night-shift factory workers and the day-shift factory workers.

Data report: The job satisfaction is the same between night-shift and day-shift factory workers. Evaluators’ decision:

Evaluators reject the null hypothesis when the null hypothesis is really true. Choose one of the following options to describe the evaluators’ decision:

a. Correct decision

b. Type I error

c. Type II error

Answers: Example 1 = a, Example 2 = c, Example 3 = b.

BOX 11-4 Websites Providing Free Confidence Interval and Sample Size Calculators

http://www.surveysystem.com/sscalc.htm

http://www.dssresearch.com/KnowledgeCenter/toolkitcalculators/samplesizecalculators.aspx

http://www.raosoft.com/samplesize.html

http://www.nss.gov.au/nss/home.nsf/NSS/0A4A642C712719DCCA2571AB00243DC6?opendocument

http://www.gifted.uconn.edu/siegle/research/samples/samplecalculator.htm

http://www.macorr.com/sample-size-calculator.htm

http://www.custominsight.com/articles/random-sample-calculator.asp

4.  If you notice an instant hand sanitizer dispenser at an elevator, how likely are you to use it?

a. I always reach out and apply the hand sanitizer when I see a dispenser.

b. I sometimes reach out and apply the hand sanitizer when I see a dispenser.

c. I never apply the hand sanitizer when I see a dispenser.

5.  Since August, how many times have you come to the student health center because you had cold or flu symptoms?

a. Today is my first visit for cold or flu symptoms.

b. 2 times

c. 3 times

d. 4 times

e. 5 times

f. 6 times

6.  What is your age?

a. Younger than 18 years

b. 18 years old

c. 19 years old

d. 20 years old

e. Older than 20 years

Null hypothesis: There is no difference between the number of student health center visits for cold and flu symptoms between the students living in the east campus residence halls and the students living in the west campus residence halls. See Figure 11-30.

FIGURE 11-30 Pilot study data.

images

Case Study Discussion Questions

1.  What is the mean for number of student health clinic visits? (Answer: 1.8.)

2.  What is the mode for age? (Answer: 2 or 18 years old.)

3.  Using an independent t-test, what is the mean number of clinic visits for east residence hall students and for west residence hall students? (Answer: east mean = 1.5 visits and west mean = 2.06 visits.)

4.  Using the independent t-test, do you reject or fail to reject the null hypothesis? (Answer: fail to reject—there is no difference.)

5.  Create three more null hypotheses to test using this dataset.

STUDENT ACTIVITIES

This section provides some practice questions to help familiarize yourself with the concepts you have learned in this chapter.

Turn the following problem statements into an appropriate research question, a null hypothesis, and an alternate hypothesis.

1.  Leila is a community organizer in a low-income urban center. She believes that there are much fewer grocery stores that offer fresh fruits and vegetables in her neighborhood than in other, wealthier neighborhoods in the same city. If this is true, she would like to try to encourage local grocers to carry more fruits and vegetables.

Research question:

Null hypothesis:

Alternate hypothesis:

2.  Carl is a nursing administrator in the emergency room of a busy hospital. He thinks that patients have better satisfaction with their care and nurses make fewer mistakes when nurses are allowed to take a nap when they are on the night shift. If this is true, he would like to allow nurses to take naps on long breaks.

Research question:

Null hypothesis:

Alternate hypothesis:

3.  Samuel works for the local YMCA. He believes that retired adults who participate in group exercises at the YMCA have lower blood pressure than those who participate in exercise on their own. He thinks that not only does the exercise help them physically, but socializing with others helps lower their stress. If this is so, he would like to institute more group classes for retirees.

Research question:

Null hypothesis:

Alternate hypothesis:

Based on the following alternate hypotheses, should the researcher use a one- or two-tailed statistical test?

4.  Alternate hypothesis: There are fewer grocery stores that provide fresh fruits and vegetables in District A than in District B.

5.  Alternate hypothesis: There is a significant difference in patient satisfaction when nurses are allowed to take a nap while on night shift.

6.  Alternate hypothesis: Retired adults who participate in exercise on their own at the YMCA have higher blood pressure than those who participate in group exercise.

After running the appropriate statistical tests for several research questions, decide which null hypothesis to reject, which to fail to reject, and which to accept.

7.  After running a chi-square statistical test you find that the obtained chi-square value is 6.0123 and the critical chi-square value is 5.9915. What is your judgment?

8.  After running a two-tailed t-test, you obtain a p-value of 0.035. What is your judgment?

9.  Based on a visual inspection of the normal curve in Figure 11-31, what is your judgment?

10.  Evaluators have run a confidence interval. Their upper limit was 0.75 and their lower limit was –1.03. What is your judgment?

Answers

1.  Research question: Are there significantly fewer grocery stores that offer fresh fruits and vegetables in Leila’s neighborhood than in a wealthier neighborhood within the city?

Null hypothesis: There is no difference in the number of grocery stores that offer fresh fruits and vegetables.

Alternate hypothesis: There is a significant difference in the number of grocery stores that offer fresh fruits and vegetables.

2.  Research question: Do nurses provide better care to emergency room patients if they are allowed to take a nap on night shift?

FIGURE 11-31 Normal curve.

images

Null hypothesis: There is no difference in patient satisfaction and number of mistakes when nurses are allowed to take a nap while on night shift.

Alternate hypothesis: There is a significant difference in patient satisfaction and the number of mistakes when nurses are allowed to take a nap while on night shift.

3.  Research question: Do retired adults participating in group exercise classes at the YMCA have lower blood pressure than those retired adults who only participate in solo exercise at the YMCA?

Null hypothesis: There is no difference in blood pressure between retired adults who participate in group exercise and those who do their exercises on their own.

Alternate hypothesis: There is a difference in blood pressure between retired adults who participate in group exercise and those who do their exercises on their own.

4.  One-tailed statistical test. This is because we are asking if there is directionality in the difference (i.e., that there are less stores in one neighborhood than in another).

5.  Two-tailed statistical test. This is because we are not assuming that there are better or worse patient satisfaction outcomes, only that there are different outcomes. Therefore, we would expect that any extreme differences could be on either end of the normal curve.

6.  One-tailed statistical test. This is because we are asking if there is directionality in the difference (i.e., that solo exercisers have higher blood pressure than those who participate in group exercise).

7.  Reject the null hypothesis. The obtained chi-square value is larger than the critical chi-square value.

8.  Fail to reject the null hypothesis. Normally, if you were running a one-tailed t-test, 0.035 would be statistically significant if you had an alpha level of 0.05. However, because we split the alpha level in two (for each side of the normal curve) in a two-tailed t-test, a significant result would be a p-value of less than 0.025. Because 0.035 is greater than 0.025, we fail to reject the null hypothesis.

9.  Reject the null hypothesis. Because the obtained statistical test is in the critical area of the normal curve, we would reject the null hypothesis.

10.  Fail to reject the null hypothesis. Because the confidence interval includes 0, we must fail to reject the null hypothesis.

REFERENCES

1.  Lincoln LM. Think and Explain with Statistics. Boston, MA: Addison-Wesley Publications; 1986.

2.  Zimmerman KA. What Is a Scientific Hypothesis. Live Science. Available at: http://www.livescience.com/21490-what-is-a-scientific-hypothesis-definition-of-hypothesis.html. Accessed May 19, 2014.

3.  Lane DM. HyperStat Online. Rice University. Available at: http://davidmlane.com/hyperstat/A29337.html. Accessed May 19, 2014.

4.  Durham College. Statistics: The Null and Alternate Hypotheses. A Student Academic Learning Services Guide. Available at: http://www.durhamcollege.ca/wp-content/uploads/STAT_nullalternate_hypothesis.pdf; Accessed on May 29, 2014.

5.  Salkind NJ. Statistics for People Who Think They Hate Statistics, 2nd ed. Thousand Oaks, CA: Sage Publications; 2010.

6.  Stockburger DW. One and Two-tailed t Tests. Psychological Statistics at Missouri State. Available at: http://www.psychstat.missouristate.edu/intro-book/sbk25m.htm; Accessed on May 29, 2014.

7.  Statistics Mentor. The T-Table Critical Values. Available at: http://www.statisticsmentor.com/tables/table_t.htm. Accessed November 3, 2012.

8.  Blair RC, Taylor RA. Biostatistics for the Health Sciences. 1st ed. Upper Saddle River, NJ: Pearson Prentice Hall; 2008.

9.  Dallal GE. The Little Handbook of Statistical Practice. Seattle, WA: Amazon Digital Services; 2012.

10.  Centers for Disease Control and Prevention. Growth Charts. Available at: http://www.cdc.gov/growthcharts/. Accessed on May 19, 2014.

11.  Weathington BL, Cunningham CJL, Pettinger DJ. Understanding Business Research. New York, NY: John Wiley & Sons, Inc.; 2012.

12.  The National Heart, Lung, and Blood Institute. Calculate Your Body Mass Index. Available at: http://nhlbisupport.com/bmi/bminojs.htm. Accessed May 19, 2014.

13.  National Institute of Standards and Technology. What Are Confidence Intervals? Available at: http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm; Accessed November 3, 2012.

14.  Yale University. Confidence Intervals. Available at: http://www.stat.yale.edu/Courses/1997-98/101/confint.htm. Accessed May 19, 2014.

15.  Utah Office of Public Health Assessment. Confidence Intervals in Public Health. Available at: http://health.utah.gov/opha/IBIShelp/ConfInts.pdf. Accessed November 3, 2012.

16.  Statistics Lectures. Confidence Intervals for Independent Samples t-Test. Available at: http://www.statisticslectures.com/topics/ciindependent-samplest/. Accessed May 19, 2014.

17.  Campbell RB. University of Northern Iowa. Type I and II Error. Available at: http://www.cs.uni.edu/~campbell/stat/inf5.html#TI. Accessed May 19, 2014.

18.  Ary D, Jacobs LC, Razavieh A. Introduction to Research in Education, 6th ed. Belmont, CA: Wadsworth/Thompson Learning; 2002.