CHAPTER 10

Populations and Samples


CHAPTER OBJECTIVES

By the end of this chapter, students will be able to:

•  Define population and sample.

•  Discuss how probability relates to inferential statistics

•  Discuss hypothesis testing and type I and II errors

•  Analyze how each component of sample size determination influences the other components.

•  Evaluate the differences between probability and nonprobability sampling.

•  Discuss how sampling bias affects the study results.


KEY TERMS

margin of error

nonprobability sampling

nonresponse sampling bias

null hypothesis

population

probability sampling

sample size

type I and II errors

INTRODUCTION

Evaluators plan, implement, and evaluate programs, but they are not always able to study the entire population. For example, if the evaluation involves the entire community, evaluators must select a subset or sample of the population. This chapter introduces how probability and inferential statistics are used to make predictions from the random sample selected from the population. The next step involves making decisions that influence the sample size calculation. Once decisions are finalized, the chapter moves into how to recruit individuals. The recruitment process depends on probability or nonprobability sample designs. Probability sampling involves strict criteria in which every individual in the population has an equal chance of getting selected for the sample. Nonprobability sampling is more flexible, inclusive, and not based on chance procedures. With this overview in mind, let’s begin the chapter with a discussion about populations and samples.

POPULATIONS AND SAMPLES

Let’s step back and look closely at the relationship between populations and samples. In statistics, the word population is defined as the group or collection of interest for the research. Population does not necessarily indicate only humans. Populations may be soybean crop yields in Iowa, bacteria counts in culture dishes, or cancer clusters within brain tissue. Now let’s focus on what a sample is. As previously stated, a sample is a subset of a population. The closer the sample is to representing the whole population, the more accurate the inferences or assumptions that evaluators can make about the population. If the sample does not represent the population, then the assumptions are less likely to be accurate. As discussed later in this chapter, it is important that the sample is randomly selected from the population to increase the chance that the characteristics match the population.1

PROBABILITY AND INFERENTIAL STATISTICS

Probability is defined as every single event being random. For example, the result of the coin toss at the beginning of a football game is a single and random event. However, if the same coin is tossed 100 times, an approximately equal pattern of heads and tails develops (e.g., 46 heads and 54 tails). If the same coin is tossed 1,000 times, the pattern becomes more accurate (e.g., 489 heads and 511 tails). Probability allows evaluators to predict the outcome based on repeating the process, which in this case is tossing a coin.1

The word infer is defined as to assume or understand. Inferential statistics are mathematical procedures used to assume or understand predictions about the whole population based on data collected from a random sample selected from the population.2 These predictions are based on estimates or probabilities rather than absolute facts. Evaluators use characteristics of the sample (statistics) to estimate the characteristics of the population (parameters). When evaluators use inferential statistics, they assume a certain degree of error, because they are never 100% accurate in making assumptions about the population based on the sample. For example, if researchers at a global health pharmaceutical company develop a new highly effective prophylactic malaria drug with few side effects, even after completing laboratory and animal studies, they infer the medication’s effectiveness for humans. If they knew the exact human reaction process to the medication, it would be like coin tossing: patients would either experience side effects or would not experience side effects after ingesting the new medication. Because it is not possible to give this new malaria medication to the entire global population of individuals taking prophylactic malaria medication, evaluators select a random sample of individuals taking prophylactic malaria medication to evaluate the extent of side effects. The evaluators conduct the evaluation, gather data, and use inferential statistics to estimate the degree of side effects of the new prophylactic for the population of individuals needing to take malaria medication. Because evaluators accept a certain degree of error regarding the effectiveness and possible side effects, the global health pharmaceutical company states in marketing materials that the side effects may vary among patients.

Null Hypothesis

Another component of inferential statistics is the development of a hypothesis. Keep in mind that not all evaluations utilize inferential statistics. However, when evaluators plan to use inferential statistics, it is helpful to develop hypothesis statements. A hypothesis is a statement of what the evaluator expects the relationships to be among the study variables.2 The four parts of a hypothesis are described by the following questions:2

•  Is the hypothesis aligned with the current knowledge available on the topic?

•  What is the expected relationship between the study variables?

•  Is the hypothesis able to be tested?

•  Is the hypothesis clearly stated?

Let’s look at examples of hypothesis statements that answer each of the previously stated questions:

•  Example 1: Because 30 minutes of moderate exercise performed three times per week helps to control weight, it is hypothesized that adults who participate in a community-based, 30-minute aerobic exercise program three times per week will achieve greater weight control than adults who do not participate in the community-based exercise program.

•  Example 2: Because outdoor air pollution is a risk factor for childhood asthma, it is hypothesized that more children living in a city with consistently high rates of outdoor air pollution will be diagnosed with asthma than children living in cities with low rates of outdoor air pollution.

In evaluations, the term null hypothesis is also used. The null hypothesis states that there is no relationship between the variables.3 Furthermore, the null hypothesis assumes that if a relationship is found, it is only by chance. Using the first example, the null hypothesis would state that there is no relationship between the amount of aerobic exercise for adults participating in the 30-minute aerobic exercise program three times per week and their rate of weight control. When using the null hypothesis, evaluators want their data from the random sample to reject the null hypothesis. They want their data to show that more frequent participation in the exercise program is advantageous to greater weight control for adults.

Evaluators must reject or fail to reject the null hypothesis. However, keep in mind that evaluators are making this decision based on the sample that they have rather than the population, so their results are always based on incomplete information and are subject to error. If evaluators fail to reject the null hypothesis, they are saying there is a chance that there is no relationship. Using Example 1, consider the possibility that there is no relationship between aerobic exercise and weight control. It might be that the community-based aerobic exercise program has nothing to do with weight management but rather is related to the fact that the exercise program is offered at the same time as the community farmers’ market held in the parking lot of the community center. The exercise participants are purchasing and consuming more fresh fruits and vegetables than they did prior to participating in the community-based exercise program. Nevertheless, evaluators reject or fail to reject the null hypothesis, and their decision may be correct or incorrect. Decisions made regarding the null hypothesis have consequences that are labeled type I and type II errors.

Type I errors occur when the null hypothesis is true (there is no relationship between variables), and evaluators reject it.3 In Example 1, evaluators report that there is a relationship between aerobic exercise and weight control, thereby rejecting the null hypothesis (that there is no relationship). The community center acts on the evaluation report and begins to offer the 30-minute aerobic exercise program daily. However, 1 year later, the community center observes that the adults only participate in the aerobic exercise program on the days when the farmers’ market is open. Another evaluation team repeats the evaluation and declares the first evaluation was incorrect because they made a costly type I error (rejection of a true null hypothesis).

Type II errors occur when the null hypothesis is false (there is a real relationship between variables), and evaluators fail to reject it.3 From the previous example, evaluators report that there is no relationship between participation in aerobic exercise and weight control, thereby failing to reject the null hypothesis when they should have. Their report states that the evaluation found insufficient evidence to increase the number of days that the aerobic exercise is offered at the community center. However, 1 year later, the community center observes that the rate of weight control remains the same among the adults participating in the aerobic exercise program. Another evaluation team repeats the evaluation and declares the first study was incorrect because they made a type II error (retention of a false null hypothesis). See Table 10-1.

Level of Significance

Before leaving the topic of type I and type II errors, it is necessary to understand the concept of level of significance.

Evaluators in fear of making type I errors decide to always fail to reject the null hypothesis or always reject the null hypothesis to avoid type II errors. Because neither of these options is feasible, evaluators make decisions related to the level of significance. Level of significance is the statistical risk that evaluators are willing to accept when making the decision to accept or reject the null hypothesis. If evaluators set the level of significance at 0.01, they are willing to accept that 1 out of 100 times the null hypothesis will be rejected when it is true (type I error). If the level of significance is set at 0.001, the risk of a type I error is 1 in 1000. Evaluators select the level of significance based on the type of evaluation and the severity of type I and II errors.4 In social science, level of significance is typically set at 0.05, because frequently such evaluations are based on human behaviors with numerous confounding variables (e.g., age, gender, race/ethnicity, geographical location, environment, education, income, behavior, culture). In natural science, the level of significance is typically set at

0.01 or 0.001, because the study is conducted in controlled environments (e.g., constant temperature, light, humidity, sound, vibration).

Let’s recap this complex but important section before moving on to determining sample size. The reader should understand that a sample represents the population, and if the sample does not represent the population, then the assumptions made from that sample are less accurate. Evaluators use inferential statistics to estimate the characteristics of the population based on the characteristics of the sample. Because of this assumption, they assume a certain degree of error. Evaluators develop a statement to predict the expected relationship among the study variables, called a hypothesis. The null hypothesis states that there is no relationship among the variables. Based on data gathered from the sample, evaluators reject or fail to reject the null hypothesis at the conclusion of their evaluation. Evaluators make type I errors when the null hypothesis is true and rejected, or type II errors when evaluators do not reject the null hypothesis and it is false. The level of significance allows evaluators to set the risk they are willing to accept when rejecting a null hypothesis when there is no actual relationship (type I error).

TABLE 10-1 Summary of Type I and Type II Errors

images

Reproduced from Ary/Jacobs/Cheser/Razavieh/Sorenson. Introduction to Research in Education, 8E. © 2010 South-Western, a part of Cengage Learning, Inc. Reproduced with permission. www.cengage.com/permissions

SAMPLE SIZE CONSIDERATIONS

Why should an evaluator be concerned about the number of individuals in the sample for their evaluation? Let’s look at this question from several viewpoints. As previously stated in this chapter, evaluators need to select individuals for the sample who represent the population being studied. For example, if the whole population is 10,000, it does not seem logical for evaluators to select 10 individuals for the sample and expect these 10 individuals to represent the entire population. On the other hand, it does not seem necessary or feasible to select 9000 individuals for the sample from a population of 10,000. But then evaluators ask: If 10 are too few and 9000 are too many, how many individuals are needed to adequately represent the population? There are no easy answers to this question. The following discussion introduces different aspects of how to determine the appropriate sample size for various types of evaluations. However, it is important for readers to realize that the topics presented in this chapter interact with each other, so determining the ideal sample size is never simple. Let’s begin with the topics covered in the discussion (note that it is not expected that the reader would be fully familiar with all of these topics). This section defines each of the following terms:

•  Margin of error

•  Population, sample, and variability

•  Confidence level

•  Budget and budget justification

•  Timeline

Margin of Error

Margin of error is about precision. Generally, evaluators do not use a margin of error over 5%. The more precision required, the smaller the acceptable margin of error. For example, if evaluators wish to know the level of employee satisfaction within an organization, they need to know how many surveys to mail to a random sample of employees to represent all departments. Evaluators want the survey data to represent honest responses related to employee satisfaction from 95% of employees who return the completed surveys. This 5% margin of error is described as plus or minus 2.5%. In this scenario, evaluators are willing to accept that 5% of the employees will not be honest in their survey responses. The best way to think of this concept is to watch the evening news during a national election. Reporters say, “With 36% of the votes counted, we predict that Senator Jones is going to win the election for Colorado.” At the bottom of the screen, networks post the margin of error as ±5%, which means that Senator Jones has from 33.5% to 38.5% of the votes based on the accuracy of the exit polls. Networks want to be correct 95% of the time, so they selected a 5% margin of error.5

In Table 10-2, under margin of error, there are four columns labeled with a percentage. Evaluators select the margin of error across the top and population size from the far-left column. For example, if evaluators accept a 5% margin of error and the population size is approximately 5000, the sample size needed is 357. If they wish to increase the level of precision to a 2.5% margin of error, the sample size increases to 1176. Note that decreasing the margin of error by half, from 5% to 2.5%, increased the sample size by almost three times, from 357 to 1176.

TABLE 10-2 Calculation of Sample Size from the Population

images

Modified from Sample Size Table. Paul Boyd, The Research Advisors. Available at: http://www.research-advisors.com/tools/SampleSize.htm. Reprinted with permission.

Population, Sample, and Variability

Let’s go back and revisit how to use Table 10-2 to determine an adequate sample size. To understand Table 10-2, we begin by defining population size and referring back to the previous discussion on margin of error. Population is defined as the total number of individuals in the group or community from which the sample is drawn. For example, if there are approximately 3500 employees where the survey data are collected, the sample size ranges from 346 to 2565, depending on the margin of error. If population size is not listed, select the highest number that is close to population size. The value in the next column is the sample size based on the margin of error. It is worth noting that for small population sizes, the sample size includes the entire population. Of course, using the population (e.g., 200 or less) as the sample is only possible with the availability of adequate funding. For example, if evaluators wanted to interview all Hurricane Sandy first responders who were on the scene within the first 24 hours, they would determine the total number of individuals from personnel records and then attempt to contact all of them. Such interviews would provide in-depth and personal insights into how the evacuation efforts were perceived by the local first responders rather than individuals who arrived later from other states. Lastly, the sample size needs to be large enough so that the data analysis will be able to detect a change among the individuals in the sample. For example, evaluators are trying to determine if the new air filters placed in particular hybrid cars are decreasing the degree of traffic exhaust fume smells reported by drivers more than the currently installed air filters in the same hybrid cars. If the sample size of drivers is too small, the data analysis would not likely show that the new air filters are better, worse, or the same as the currently used air filters.

Variability is another aspect of populations. Variability is defined as the similarities and differences among the population. If the population is composed of diverse (heterogeneous) individuals, in order to conduct a community-based evaluation, for example, a larger sample size is required to obtain a higher level of precision. If the individuals are similar (homogeneous) in the population, such as African American females with doctoral degrees in epidemiology, a smaller sample size is needed to obtain a high level of precision.

Confidence Level

Confidence level is defined as how much certainty evaluators have that the sample is a true reflection of the population. In other words, evaluators are certain that if other evaluators conduct the same evaluation with a different sample from the same population, both evaluations would yield the same results. In social science, evaluators generally use a 95% confidence level, but in natural science, evaluators use a 99% level of confidence. Why the difference? Social science evaluators are studying behaviors of individuals, animals, communities, and populations, whereas in natural science, experiments are more likely conducted in a controlled (temperature, humidity, light, vibration, etc.) laboratory environment, so results are more precise. For evaluators to have a 95% confidence level, it means that 95 out of 100 samples of the same population would obtain the same results. In other words, evaluators are willing to accept the risk that 5 times out of 100, the results are different (see Box 10-1).

Budget and Budget Justification

When determining sample size, evaluators make decisions based on their available funding. Evaluators need to consider all costs associated with sample size prior to final decisions. Let’s explore how data collection, data entry, and data analysis affect the costs of sample size and selection.

Data Collection

What type of quantitative data will be used in the evaluation?

There are two types of quantitative data: primary and secondary data. Primary data (e.g., surveys) are collected by evaluators. Secondary data are data that were previously collected by someone else that evaluators analyze anew.

Evaluators gain permission to explore information within the existing data set. Quantitative secondary data include using existing large national data sets that are collected by federal organizations, such as the Centers for Disease Control and Prevention or the National Cancer Institute. Secondary data might also include personnel records or insurance claims within a large organization.

Evaluators also consider the availability of selected individuals. For easier data collection, evaluators have a convenient sample of easily assessable individuals. However, in more difficult cases individuals under study are limited in number or are more difficult to identify.

Data Entry

How will the data be entered for analysis? If the survey data were collected online, limited data entry costs will be involved. If the data were collected via paper surveys, there are costs associated with data entry.

BOX 10-1 Case Study: Confidence Level

The Newark Airport wants to know if they should increase the number of vendors offering a variety of caffeinated beverages in each of its concourses. An evaluation team is hired to evaluate the type and amount of caffeinated beverage purchases in each concourse. Evaluators hired five undergraduate students to stand near the check-out line at 10 different beverage vendor locations in the Newark Airport. After obtaining institutional review board approval (IRB), the undergraduate students randomly asked people to participate by simply telling the students what type of caffeinated beverages they purchased. Each participant received a $2 coupon for any vendor in the airport. Each undergraduate student collected 20 responses, so the sample size was a total of 100. Evaluators analyzed the data and found that one undergraduate student’s data showed 5 out of 20 measurements yielded much higher rates of caffeine consumption than the mean (average) of the other 95 measurements. When questioning the data, the student reported that on her concourse two flights to Canada had been delayed for more than 8 hours and passengers were purchasing more caffeine beverages to stay awake to avoid missing their connecting flight to snowy Canada. The delayed passengers were bored and therefore willing to participate in the evaluation. Evaluators showed the undergraduate students that if the study were conducted 100 times, 95% of the time the amount of caffeinated beverage purchases would be approximately the same for the true population. Evaluators explained that they are willing to accept that 5 times out of 100 (5%), the sample contains a segment of the population that consumes more caffeinated beverages than 95% of the rest of the population.

Quantitative data require data cleaning, which involves looking at the data spreadsheet and determining what percentage of the data are entered incorrectly or missing from the survey. These errors are corrected by going back to the original survey and correcting the data-entry mistakes. Evaluators randomly select about 10–15% of the surveys to doublecheck the data entry for validation. Look at Table 10-3 for a sample of a data spreadsheet used in quantitative analysis. Find the data-entry errors; there are eight data entry errors total. The answers are located at the end of the chapter.

Data Analysis

Evaluators need to consider the data analysis that they will be conducting when selecting the sample size. If descriptive statistics are used (e.g., frequencies and means) then nearly any sample size over 30 is generally sufficient. On the other hand, some complex statistical analyses require a larger sample size (e.g., 200–500). To save time and resources, evaluators consult biostatisticians prior to initiating any project to ensure that adequate sampling is achieved for appropriate data analyses. Finally, sample size formulas provide the minimum number of responses needed. However, many evaluators add 10–30% to the sample size to compensate for the inability to contact individuals and for missing data. Evaluators plan for this increase when calculating the research budget.6

The cost of data analysis is based on volume of data, the amount of data cleaning needed, complexity of the data analysis performed, and whether or not there is a need to hire a biostatistics consultant to assist with the analysis (see Table 10-4).

Timeline

In addition to the budget and budget justification, it is important to create a timeline for each evaluation. Although there are multiple types of timelines, each one serves the purpose of keeping the project or study on track (see Table 10-5).

PROBABILITY AND NONPROBABILITY SAMPLES

Now let’s explore the two most common types of samples. Let’s begin with the basic differences between probability and nonprobability samples. With probability samples, there is no bias in sample selection. The participants are selected based on a strict, objective, and limited criterion such as a random number list. With nonprobability, there is an element of judgment in the selection process. With these two basic differences in mind, let’s delve deeper into how probability samples are selected. Keep in mind that prior to the initiation of any type of sampling, an approved IRB application is required.

Probability Sampling

The first step in probability sampling is to determine the population of interest. Evaluators make this decision based on the goals and objectives. For example, the population may be based on one or a combination of potential demographic or other specific variables, such as geographical location, gender, age group, ethnicity, religious affiliation, economic status, marital status, type of employment, or environmental exposures.

TABLE 10-3 Sample Spreadsheet for Quantitative Analysis

images

Once the population is framed, the second step is to determine the different layers within the subset of the population. For example, suppose we have an objective in the evaluation that looks at 10 years of toxic environmental exposure data and asks whether there are different survival rates beyond 12 months for patients with lung cancer based on three things: their years of occupational exposure at time of diagnosis, type of initial symptoms at diagnosis, and age at diagnosis. The first step limits the population of study from all types of cancer patients with occupational exposure to just lung cancer patients. The second step involves limiting the dataset for patients who survived until at least 12 months after their diagnosis. Within the 12-month survivors, the third step would group the survivors by years of occupational exposure, then initial symptoms, and age at diagnosis. Each subsequent step adds another layer until the appropriate groups are formed for further data analyses. This type of probability sampling is systematic and objective. There are no vague decisions, because either the sample subject is in one group or the other group in the decision tree.7

TABLE 10-4 Simple Sample Budget: Quantitative Data

images

Note: Personnel costs for this simple budget sample do not include fringe benefits, health insurance, or student tuition.

Budget Justification for Quantitative Data

Personnel:

An evaluator works 20 hours per week for 50 weeks at $80 per hour, for a total of $80,000.

One student works 25 hours per week for 50 weeks at $20 per hour, for a total of $25,000.

A biostatistics consultant works 10 hours for the last 10 weeks at $150 per hour, for a total of $15,000.

Sample:

Printing for 2000 recruitment flyers at $0.18 each will cost $360.

Printing and stapling 1000 copies of the survey at $2.15 will cost $2150.

Incentives for 1000 participants at $10 each will cost $10,000.

Total: $132,510

TABLE 10-5 Sample Timeline: Simple 12-Month Timeline for Quantitative Data

Month

Activity

Person Responsible

1st

Establish a funding account for the study; advertise for employment positions.

Principal investigator (PI)

2nd

Hire staff.

PI

3rd

Develop survey; submit application for institutional review board (IRB) approval for human subjects research.

PI

4th

Develop recruitment strategies for participants.

PI

5th

Establish a way to provide gift card incentives for participation.

PI and student

6th

Quantitative: Print surveys or post online.

PI and student

7th

Conduct pilot study.

PI and student

8th

Revise and submit revisions to the IRB for approval.

PI and student

9th

Quantitative: Collect survey data.

PI and student

10th

Continue to collect data until sample size is achieved.

PI and student

11th

Begin to clean data; begin data analysis.

PI, student, and data expert

12th

Finalize data analysis

PI, student, and data expert

Wrap-up

Write, edit, and submit final report.

PI, student, and data expert

There are several types of probability design, including simple random sampling, quota sampling, and proportionate stratified random sampling.

Simple Random Sampling

Simple random sampling is the most basic way of selecting a group of individuals from a population. The key to simple random sampling is that every individual has an equal chance of being selected from the population. For example, if college administrators want to randomly select a group of graduate students to interview, every active graduate student enrolled in that college must have an equal chance at getting selected. To perform simple random sampling, evaluators use a table of random numbers or a computerized random number generator to select the sample of individuals (see Figure 10-1).

Simple random sampling has several advantages. It is a straightforward procedure to perform, and evaluators may be unable to influence the sample selection (also called bias) because the procedure is controlled by the computer software. However, simple random sampling is not feasible for large populations, such as obtaining a list of names of every individual living in a city or county.

Quota Sampling

Quota sampling is a technique that uses a small sample that matches characteristics of the target population. For example, suppose the 2010 U.S. Census data show that the Greenville County population is composed of 62% White, 18% African American, 12% Hispanic, 6% Asian, and 2% Other racial and ethnic groups and has a gender breakdown that is 51% females and 49% males. Rather than survey the entire population, evaluators recruit a sample of individuals who resemble the racial and gender composition of Greenville County. See Table 10-6 for an example of quota sampling using a sample size of 500.

Using the information in Table 10-6, once the evaluators had 158 white females, they stopped recruiting any additional white females even if more were interested in participating. In this situation, where they have “met the quota,” the evaluators would continue to recruit individuals in the other cells.

Proportionate Stratified Random Sampling

With stratified random sampling, evaluators divide the entire population into different subgroups, such as age, gender, geographical location, which are called strata. Then the evaluators randomly select subjects from each stratum using the same fraction predetermined by evaluators. It is important that individuals in each stratum not overlap with other strata. For example, in Table 10-7, individuals could not be in the 70–79 age stratum and in the 80–89 stratum.8,9 Individuals may only be in one stratum, so all individuals over age 70 have an equal chance of being randomly selected for study. Stratified random sampling is commonly used for demographic variables, such as age, income, educational attainment, gender, religion, and ethnicity. There are two advantages to using stratified random sampling: (1) evaluators obtain representation from subgroups in the given population; and (2) a smaller, but representative, sample size saves evaluators money, time, and effort. For example, if evaluators wish to study the elderly population in a retirement community with 1,000 residents, they might use proportionate stratified random sampling. It is essential to remember to use the same sampling fraction across the different population size strata, so that each group is proportionately represented. This sampling ensures no overlap within the sample.

Nonprobability Sampling

Unlike probability sampling, nonprobability sampling does not use random selection. With nonprobability sampling, evaluators do not know if the selected sample truly represents the population; therefore, it is less rigorous, less accurate, and less generalizable to the larger population. However, there are situations where evaluators desire to select samples of individuals who represent a specific expertise, demographic characteristic, or condition. This purposeful sampling would not be feasible or practical if evaluators used a random sampling design. The following discussion covers several types of nonprobability samples including accidental or convenience sampling, purposive sampling, nonproportional quota sampling, expert sampling, heterogeneity sampling, snowball sampling, and systematic sampling (see Figure 10-2).

Accidental or Convenience Sampling

Accidental or convenience sampling is also called intercept sampling or “person on the street” sampling. A few examples of convenience sampling include emailing an online survey to all undergraduate students, asking for personal opinions about a specific topic by interviewing students as they walk through the student union (hence, intercept sampling), mailing a paper survey to all public health directors in a state to gain their opinion about a new state policy on immunizations, and interviewing voters as they exit a polling precinct location. Of course, there are problems with this type of sampling, because it is not representative of the population. However, it gives evaluators a quick and convenient way to gather data. For example, if evaluators want to know the opinion of college students on U.S. healthcare reform, evaluators ask students a few questions as they pass through the student union at a large public university. The sample is not representative of the entire student body, but it is convenient, fast, and relatively inexpensive and provides quick data collection.2

FIGURE 10-1 Directions for using Microsoft Excel to randomize individuals.

images

Used with permission from Microsoft.

Purposive Sampling

With purposive sampling, evaluators select individuals with a specific purpose in mind. This type of sampling is commonly used by marketing or political advertisement agencies. For example, if marketing researchers want to know the opinion of African American males between the ages of 20 and 40 about a particular presidential candidate, evaluators might stand in the parking lot near a football stadium to recruit this specific, purposive sample. After approaching the prospective African American males, they verify their eligibility criteria and then quickly ask a few select questions.10 In health studies, evaluators use purposive sampling to deliberately include individuals who may typically be excluded from the research.11 For example, purposive sampling would be used if evaluators chose to include males who are the primary head of household and single parents with two or more children for a parenting evaluation.

TABLE 10-6 Quota Sampling

 

Female
(n = 255; 51%)

Male
(n = 245; 49%)

White

158

152

African American

46

44

Hispanic

31

29

Asian

15

15

Other races/ethnicities

5

5

TABLE 10-7 Example of Proportionate Stratified Random Sampling

images

FIGURE 10-2 Nonprobability sampling.

images

Nonproportional Quota Sampling

Nonproportional quota sampling is similar, but less restrictive than quota sampling. With nonproportional sampling, evaluators specify the minimum number of individuals in each category, but these portions are not required to match proportions in the population. Evaluators want to ensure that all groups are represented in the study.10 A 2007 study used nonproportional quota sampling to recruit women at risk for HIV based on their ethnicity and number of sexual partners. Evaluators used word of mouth, community organizations, and media sources to recruit equal percentages of White, Black, and Hispanic women, even though these ethnicity proportions were not representative of the demographics from the community they came from. Of the recruited women, 29% of the women were in single-partner relationships and the other 71% of the women were in multi-partner relationships. For this study, nonproportional quota sampling allowed adequate representation of women at risk for HIV.12

Expert Sampling

Expert sampling is defined as recruiting a group of individuals with known expertise or experience in a specific discipline. This type of sampling is also called a “panel of experts.”10 There are two common reasons for evaluators to use expert sampling. First, evaluators ask the experts about their opinion of the proposed evaluation to gain further insight into solutions or potential pitfalls. Second, evaluators ask the expert to support or refute specific topics of interest. This method allows evaluators to defend their decisions based on expert opinion rather than merely guessing.13,14

Heterogeneity Sampling

Heterogeneity sampling is defined as seeking a wide range of different and diverse opinions. In heterogeneity sampling, evaluators are recruiting a diversity of ideas rather than diversity among participating individuals. Evaluators are seeking unique and unusual opinions.10 For example, evaluators may use heterogeneity sampling to seek opinions about healthcare reform. They want all options rather than seeking how the typical individual voted on the healthcare reform amendment on the ballot.

Snowball Sampling

Snowball sampling is when evaluators identify one individual who meets the inclusion criteria for the study and then ask that individual if he or she knows someone else meeting the inclusion criteria whom they could also ask to join the study or evaluation. With snowball sampling, the sample is not representative, but it can be valuable for specific studies.10,15

For example, evaluators want to learn more about how county health departments transitioned from pre-HIV/AIDS procedures to post-HIV/AIDS procedures, so they plan to interview health department directors who worked in the health department from the mid-1980s into the mid-1990s. Snowball sampling poses some ethical issues that must be considered, such as revealing an individual’s lifestyle choices or medical conditions without their expressed consent. For example, if the purpose of the evaluation is to determine the level of employment discrimination experienced by former prisoners, the evaluators would begin by interviewing a known former prisoner. After the interview, the evaluators would ask if the individual knew other former prisoners who might wish to participate in the interview to increase the sample size. This technique would pose ethical issues if the referred individual had not yet made the decision to be identified as a former prisoner. There are numerous other groups of individuals that may not wish to be identified, such as individuals living with HIV/AIDS or other diseases, individuals diagnosed with mental health challenges, or unemployed or homeless individuals. Evaluators need to be aware of such ethical issues when using the snowball sampling technique.

Systematic Sampling

Systematic sampling involves selecting every nth case from a population list (of course, the population list may be electronic or on paper). For example, if evaluators wish to draw 100 employee files from the list of 3000 employees, they would start by calculating 3000/100 = 30. Evaluators would then select the starting point by randomly selecting a number between 1 and 10 of the first 10 employee files. If the random number selected is 6, then evaluators begin by selecting the 6th employee file, then move through the 3000 employees by selecting every 30th employee file. They would select files 6, 36, 66, 96, 126, 156, and so on up to 3000. Prior to using this method of sampling, if the files are paper, the evaluators must know if the files are arranged in alphabetical order or by the first date of employment. If the employee files are in alphabetical order, systematic sampling would not yield a representative sample, because some traditional ethnic family names may be overrepresented in the sample, while other names are skipped. If the employee files are arranged by month, day, and year of first date of employment, systematic sampling is a good choice for selecting a sample from the population.2

SAMPLING BIAS

When evaluators choose a sampling design, they must be aware of potential bias. Bias is defined as an error caused by systematically selecting one individual or outcome over another.16 Sampling bias occurs when the selected individuals do not represent the population. When sampling bias occurs, evaluators are not able to generalize their findings to the whole population that was supposed to be studied.2 For example, evaluators wanted to study the effectiveness of e-cigarettes as a method for smoking cessation for adults. However, they recruited adults from a large inner-city Medicaid clinic. This type of sample results in sampling bias, because it is likely that adults receiving Medicaid are more likely to have different socioeconomic issues and smoking habits than adults not receiving Medicaid. Even when evaluators select the ideal sampling design, they may encounter bias after the data are collected. After evaluators recognize that data are missing, they determine if the missing data are cause for concern of a nonresponse bias. There are two types of nonresponse bias: item nonresponse and unit nonresponse.17

Item Nonresponse Bias

Item nonresponse is best described by thinking about when individuals complete a survey, but some people leave several survey questions blank (see Table 10-8).

When this situation happens, evaluators are faced with making a decision about what to do about the missing responses. In this situation, there are three common ways to handle the missing data: case deletion, mean replacement, and item deletion.

Case Deletion

With case deletion, evaluators determine what percentage of item nonresponse they are willing to tolerate. This percentage changes depending on the type of survey and sampling design. For example, if an individual completes only 50 out of the 100 survey questions, evaluators agree that the entire survey is removed from the data collection. However, if an individual leaves 2 out of 100 questions blank, evaluators would keep the other 98 responses. Prior to making the final decision on what percentage of blank survey questions are allowed in the data, evaluators calculate the overall response rate of survey questions. If the response rate is low, they are less likely to delete an entire survey with a few blank questions, because that would remove the completed responses as well.

Many problems related to item nonresponse are reduced or eliminated by conducting a pilot study. In the pilot study, the exact survey to be used in the study is given to a small sample of individuals to “test” the survey and note any errors. After the pilot study is complete, evaluators revise the survey as needed and print the final survey or post the survey online. Even after conducting several survey pilot-testing sessions, it is unlikely but still possible for errors to occur.

TABLE 10-8 Sample of Data Spreadsheet with Missing Data

images

If the survey is available online for 5 days, evaluators scan through the responses as data become available. Upon review, evaluators determine whether individuals complete every question, leave the same questions blank, or leave random questions blank. If many respondents leave the same question blank, then evaluators investigate the specific question and identify problems with inappropriate wording, response choices, or other possible errors. After the problem is identified, evaluators decide what they should do about it. If time and funding permit, evaluators correct the one question and repeat the survey online or, as a reprint for the remaining data-collection sites. As a last resort, evaluators simply delete the question from the survey.

Mean Replacement

Mean replacement is another way to handle item nonresponse. The word mean in statistics is defined as average. The mean is calculated by adding the responses and then dividing the sum total by the number of responses. In the mean replacement, evaluators calculate the mean response for each survey question, then each time that individuals leave a survey question blank, evaluators fill in the blank with the mean score of how the other individuals responded. For example, 179 out of 200 individuals answered the Likert-scale question “How would you rate your health today? 5 = excellent, 4 = very good, 3 = good, 2 = fair, and 1 = poor.” The mean score (average) for the 179 respondents was 3.8 for this question. Evaluators would fill in 3.8 on the blank response for the 21 individuals who left this question blank. The same procedure is repeated for each missing response. The disadvantage of this solution is that it causes the survey response to be closer to the mean than it might have been if the individuals actually answered the question. For example, those individuals who left the question blank may have felt poorly and did not wish to complete the survey at all. Their actual response may have been quite different from the replacement mean score, but evaluators have no way to verify the information.

Item Deletion

Item deletion is defined as deleting only that one question.

For example, 87 out of 200 college students answered the Likert-scale question “How many sexual partners have you had in the last 30 days? 7 = six or more sexual partners, 6 = five sexual partners, 5 = four sexual partners, 4 = three sexual partners, 3 = two sexual partners, 2 = one sexual partner, 1 = I did not have any sexual partners in the last 30 days, 0 = I prefer not to respond.” If only 87 (43.5%) college students responded, mean replacement is not appropriate. A better choice is to report that 43.5% responded with a mean score of 2.8 (or between one and two sexual partners) in the last 30 days; however, 56.5% selected “I prefer not to respond.”

Unit Nonresponse Bias

In unit nonresponse bias, the word unit is defined as if or when a type or group of individuals does not respond to a survey. The following examples illustrate several types of unit nonresponse bias.

Unit nonresponse bias occurs when the sampling design, such as quota sampling, is unable to obtain responses from certain segments of the samples. For example, unit nonresponse happens when evaluators are seeking 50 individuals in each of four different ZIP codes to represent different socioeconomic variations in the same county: 33611 (urban), 33647 (suburban), 33628 (rural), and 33690 (new development). In the rural ZIP code of 33628, only 18 of the 50 individuals completed the interview. The lack of respondents in one segment causes a bias to occur, because the 32 nonrespondents in the rural area may have provided responses different from the 18 individuals who did respond.

When evaluators collect survey data, they keep track of when each survey is returned. When paper copy surveys are returned, evaluators label each survey with the date it was received. In addition, each survey is numbered in consecutive order. For online surveys, the survey software assigns a date and time to each survey upon submission. During data analyses, evaluators determine if the early responders are different from the late responders. Because late responders are considered to be more like nonresponders, evaluators estimate how the nonresponders may have answered. This information provides an estimate of the unit nonresponse bias. For example, let’s say that an online survey was emailed to all university employees. Evaluators found that administrative assistants responded first without any reminder emails, while administrators responded last and only after one or two reminder emails. Evaluators use these findings to make further predictions about the profile and reasons why some individuals respond early rather than late.

Instead of comparing early and late responders, evaluators investigate the demographic groups or profiles (e.g., gender, age, ethnicity, and income level) of survey respondents. In the data analysis, evaluators compare survey responses of various demographic characteristics. In some studies, evaluators compare survey response data to U.S. Census data. If data differ, there may be a nonresponse bias in the study data. For example, survey questions ask for respondents’ ZIP code, annual household income, and mode of transportation used.

If the majority of respondents in the same ZIP code inflate their annual household income, evaluators would note this difference when compared to the U.S. Census. If 90% respondents in the same ZIP code respond that they drive a car as their mode of transportation, evaluators could verify the ZIP code and mode of transportation with the U.S. Census. If bias is noted, evaluators would need to question the accuracy of the other survey responses from the same respondents. This type of comparison is called database verification. Evaluators compare survey responses with data in a verifiable database, such U.S. Census data. This process allows evaluators to verify the accuracy of the data and to determine the accuracy and nonresponse bias. In another example, clinic patients are invited to complete a survey. In the survey, respondents are asked to provide their height and weight. Their responses are verified by the electronic medical record containing the respondents’ actual height and weight. With community surveys, database verification is usually not possible.12

All of these techniques allow evaluators to presume when bias exists in their data. Discussion of how statisticians adjust the data analysis to account for unit nonresponse bias is beyond the scope of this chapter. However, it is noted that reducing nonresponse bias in the survey development phase saves time and data adjustment in the data analysis phase.12

SUMMARY

This chapter began by defining what a population and a sample was. It then discussed how populations and samples related to probability, hypothesis testing, and error within inferential statistics. Then it discussed sampling strategies, including important considerations for determining appropriate sampling sizes. Finally, the chapter focused on several types of nonresponse sampling bias as well as methods used to avoid sampling bias in research studies.

CASE STUDY

An evaluation team from a large Florida university was hired by the U.S. Federal Emergency Management Agency (FEMA) to conduct a post–BP oil spill evaluation. FEMA requested that the evaluation involve small business owners from Florida, Alabama, Mississippi and Louisiana affected by the 2009 massive oil spill in the Gulf of Mexico. The purpose of the evaluation is to determine the level of satisfaction regarding FEMA services during the oil spill, immediately after the oil well was capped, and a few years after the oil spill; the current rate of employment of affected small business owners; and the current perception of the water and beach sand quality along the Gulf of Mexico shoreline.

For this evaluation, evaluators selected a complex sample design. Because this evaluation involves qualitative and quantitative data collection, several different sample designs were used. Evaluators decided to begin the evaluation by contacting and interviewing gulf coast small business owners residing in the four affected states. However, the evaluation team was told by FEMA that due to privacy regulations and pending litigation, the list of small business owners is not available, so evaluators devised another recruitment strategy.

Phase 1: Prior to the initiation of phase 1, evaluators obtained IRB approval for this research. For phase 1, snowball sampling was selected to identify a few small business owners along the gulf coast in the four states. Evaluators asked those individuals for the names of other small business owners still living along the gulf coast. The disadvantage with this sampling technique is that the sample is not a representative sample of all gulf coast small business owners, but it remains valuable for the first phase of this research. During this phase, evaluators conducted in-depth interviews with small business owners to gain an overall snapshot of their perceptions of FEMA services during the oil spill and immediately after the oil spill and their current perceptions in relation to their small business. Identical questions were asked of each small business owner interviewed. The business owners were eager to participate in the interviews and gladly shared names of other small business owners along the gulf coast. The convenience of snowball sampling outweighed the disadvantages for phase 1 of the project. Upon completion of each digitally recorded interview, transcriptionists were available to transcribe the interviews for immediate review by the evaluation team. The general themes of the phase 1 interview paved the way for phase 2.

Phase 2: Before the BP oil spill, there was estimated to be approximately 25,500 small business owners along the coastline in the four affected states. However, after the BP oil spill, approximately 9000 small business owners remain. The required sample size is 370 with a 5% margin of error. However, researchers added 30% to the sample for possible missing data and inability to contact some small business owners due to inaccurate contact information.

Because demographic data for only coastline small business owners were not available, the evaluation team decided to oversample the small business owner population.

Number of small business owners:

9000

5% margin of error:

5%

Final sample size:

370

Additional 30% for missing data:

111

Total:

481

In phase 2, researchers contacted the Small Business Owner Association (SBOA) in each of the four states to purchase a list of their member mailing list. Because the SBOA is a private association, it is allowed to sell mailing lists of members. Although not all small business owners are members of SBOA, these lists serve as an adequate representative sample of all small business owners in the four gulf states. Because the survey will be mailed to SBOA members at their home addresses, the evaluation team can identify mailing addresses that reflect cities and small towns along the coast-line in the four states. The small business owners may select to participate by completing and returning the survey or they may decline to participate.

Based on the general themes from phase 1, evaluators developed survey questions for phase 2. The 80-question survey questions included, but was not limited to, demographic information (e.g., age, ethnicity, marital status, gender), years of owning a small business on the coastline, location of small business, type of small business, reasons for staying or leaving the coastline after the BP oil spill, and how owning a small business has changed during their career. The last question in the survey asked respondents if they would be interested in participating in a 30-minute, follow-up telephone interview.

Phase 3: Assuming that the sample size of 481 was achieved in phase 2, researchers utilize random sampling and quota sampling (equal number from each state) in phase 3. Researchers had funding to conduct 100 thirty-minute telephone interviews. This process involved dividing the surveys into four piles based on location of the small business: Florida, Alabama, Mississippi, and Louisiana. The surveys in each pile were assigned consecutive numbers starting with 1. Using a random number generator, survey numbers were pulled until the quota was fulfilled (see Table 10-9).18

Interview questions for phase 3 were generated from survey responses in phase 2. Telephone interviews were conducted to verify if the survey data matched the actual perceptions of small business owners living on the coast who had been affected by the BP oil spill.

Case Study Discussion Questions

1.  List the various types of sampling techniques combined in this case study.

2.  Describe the process of how the researchers will coordinate and conduct 100 thirty-minute telephone interviews.

STUDENT ACTIVITY

Matthew is hired by the March of Dimes Foundation of Nevada to evaluate one of their programs. This educational program has been run in several cities in Nevada to lower the percentage of babies born prematurely and with low birth weight. It invites pregnant women to enroll into a prenatal health class at no charge; it has been in place for 2 years. The class is offered at local women’s hospitals and allows women to schedule their prenatal checkups for the same day that they will attend the half-hour class. The class covers many issues, from eating healthy foods and exercising to the importance of stress reduction and social support. The class also offers important prenatal vitamins to the women. Over the last 2 years, 500 women have taken part in the educational classes in different cities. The March of Dimes Foundation of Nevada would like to see if this program is doing as good of a job at lowering rates of premature and low birth weight among women of poor physical and socioeconomic conditions as other groups endeavoring to lower these rates. Matthew decides to conduct a survey. To do this he must take a sample.

1.  What does Matthew need to consider before he starts his research project?

2.  What list does he need to have before he can select his sample?

3.  What type of sampling method should Matthew use for this study? Why did you choose this method?

4.  What is the appropriate sample size for this study if researchers want to have a 5% margin of error? (Use the following table.)

5.  The following table lists one of the strata. Determine the number of individuals who will need to be included in the sample.

TABLE 10-9 Quota Sampling Based on Florida Demographics

images

Data from How to measure variability in a dataset. Stat Trek. Available at: http://stattrek.com/sampling/variance.aspx

images

6.  Suppose one of the clinics had a fire, and some of the patient files were lost. From talking to the prenatal educator, you know that this clinic regularly sees members from a nearby Native American community. You want to make sure that you are including an adequate number of Native Americans. Before completing your sampling frame and starting the research project, you ask a current participant in the program who is a member of this Native American community if she knows of other members who have gone through the program. She gives you the names of four other mothers, who then collectively give you an additional 17 names. What type of sampling is this?

Answers

1.  Matthew needs to consider how many mothers he will investigate (the sample size).

2.  Before he can select his sample, Matthew needs a list of all of the women who have attended the program over the last 2 years. This is called his sampling frame. Matthew will take the sample from the sampling frame.

3.  The appropriate sampling method for Matthew to choose is stratified random sampling, because he would like to know more about the physical and socioeconomic conditions of the mothers who have gone through the class in the last 2 years. Matthew could look at age, race, income, and education level as different strata.

4.  The appropriate sample size for this study would be 217 participants.

5.

images

images

6.  This is called snowball sampling, where one person who met the criteria you were looking for informed you about others they knew who might also meet that criteria.

REFERENCES

1.  Downing D, Clark J. Statistics: The Easy Way, 2nd ed. Hauppauge, NY: Barron’s Educational Series, Inc.; 1989.

2.  Ary D, Jacobs L, Razavieh A, Sorenson C. Introduction to Research in Education, 8th ed. Belmont, CA: Wadsworth/Cengage Learning; 2010.

3.  Blair R, Taylor R. Biostatistics for the Health Sciences. Upper Saddle River, NJ: Pearson Prentice Hall; 2008.

4.  Creative Research Systems. Significance in Statistics and Surveys. Available at: http://www.surveysystem.com/signif.htm. Accessed May 18, 2014.

5.  The Research Advisors. Sample Size Table. Available at: http://www.research-advisors.com/tools/SampleSize.htm. Accessed May 18, 2014.

6.  Israel G. Determining Sample Size. University of Florida, Institute of Food and Agricultural Studies Extension. Available at: http://edis.ifas.ufl.edu/pd006. Accessed May 18, 2014.

7.  Doherty M. Probability versus non-probability sampling in sample surveys. New Zeal Stat Rev.1994;(7):21–28.

8.  Castillo J. Stratified Sampling Method. Experiment Resources. Available at: http://www.experiment-resources.com/stratified-sampling.html. Accessed May 18, 2014.

9.  Stat Trek. Stratified random sampling. Available at: http://stattrek.com/survey-research/stratified-sampling.aspx. Accessed May 18, 2014.

10.  Trochim W. Nonprobability Sampling. Research Methods Knowledge Base. Available at: http://www.socialresearchmethods.net/kb/sampnon.php. Accessed May 18, 2014.

11.  Barbour R. Checklists for improving rigor in qualitative research: A case of the tail wagging the dog? Brit Med J. 2001;322:115–17.

12.  Morrow K, Vargas S, Rosen R, Christensen A, Salomon L, Shulman L, Barroso C, Fava J. The utility of non-proportional quota sampling for recruiting at-risk women for microbicide research. AIDS Behav. 2007;11:586–95.

13.  Statistics Solutions. Sampling. Available at: http://www.statisticssolutions.com/academic-solutions/resources/dissertation-resources/sample-size-calculation-and-sample-size-justification/. Accessed September 26, 2013.

14.  Stewart D, Strasser G. Expert role assignment and information sampling during collective recall and decision making. J Pers Soc Psychol, 1995;69(4):619–28.

15.  Browne K. Snowball sampling: Using social networks to research non-heterosexual women. Int J Soc Res Meth. 2005;8(1):47–60.

16.  Panzeri S, Magri C, Carraro L. Sampling Bias. Scholarpedia. Available at: http://www.scholarpedia.org/article/Sampling_bias. Accessed May 18, 2014.

17.  Groves R. Nonresponse rates and nonresponse bias in household surveys. Public Opin Q. 2006;70(5):646–675.

18.  Stat Trek. How to measure variability in a dataset. Available at: http://stattrek.com/sampling/variance.aspx. Accessed May 18, 2014.

ANSWERS to Table 10-3 Sample Spreadsheet for Quantitative Analysis

images