6 Overview of Methods of Data Collection
CENSUS
SAMPLE SURVEY
EXPERIMENT
OBSERVATIONAL STUDY
In the real world, time and cost considerations usually make it impossible to analyze an entire population. Does the government question you and your parents before announcing the monthly unemployment rates? Does a television producer check every household’s viewing preferences before deciding whether a pilot program will be continued? In studying statistics we learn how to estimate population characteristics by considering a sample. For example, later in this book we will see how to estimate population means and proportions by looking at sample means and proportions.
To derive conclusions about the larger population, we need to be confident that the sample we have chosen represents that population fairly. Analyzing the data with computers is often easier than gathering the data, but the frequently quoted “Garbage in, garbage out” applies here. Nothing can help if the data are badly collected. Unfortunately, many of the statistics with which we are bombarded by newspapers, radio, and television are based on poorly designed data collection procedures.
A census is a complete enumeration of an entire population. In common use, it is often thought of as an official attempt to contact every member of the population, usually with details regarding age, marital status, race, gender, occupation, income, years of school completed, and so on. Every 10 years the U.S. Bureau of the Census divides the nation into nine regions and attempts to gather information about everyone in the country. A massive amount of data is obtained, but even with the resources of the U.S. government, the census is not complete. For example, many homeless people are always missed, or counted at two temporary residences, and there are always households that do not respond even after repeated requests for information. It is estimated that the 2010 census missed about 2.1% of Black Americans and 1.5% of Hispanics, together accounting for some 1.5 million people.
In most studies, both in the private and public sectors, a complete census is unreasonable because of time and cost involved. Furthermore, attempts to gather complete data have been known to lead to carelessness. Finally, and most important, a well-designed, well-conducted sample survey is far superior to a poorly designed study involving a complete census. For example, a poorly worded question might give meaningless data even if everyone in the population answers.
The census tries to count everyone; it is not a sample. A sample survey aims to obtain information about a whole population by studying a part of it, that is, a sample. The goal is to gather information without disturbing or changing the population. Numerous procedures are used to collect data through sampling, and much of the statistical information distributed to us comes from sample surveys. Often, controlled experiments are later undertaken to demonstrate relationships suggested by sample surveys.
However, the one thing that most quickly invalidates a sample and makes useful information impossible to obtain is bias. A sample is biased if in some critical way it does not represent the population. The main technique to avoid bias is to incorporate randomness into the selection process. Randomization protects us from effects and influences, both known and unknown. Finally, the larger the sample, the better the results, but what is critical is the sample size, not the percentage or fraction of the population. That is, a random sample of size 500 from a population of size 100,000 is just as representative as a random sample of size 500 from a population of size 1,000,000.
In a controlled study, called an experiment, the researcher should randomly divide subjects into appropriate groups. Some action is taken on one or more of the groups, and the response is observed. For example, patients may be randomly given unmarked capsules of either aspirin or acetaminophen and the effects of the medication measured. Experiments often have a treatment group and a control group; in the ideal situation, neither the subjects nor the researcher knows which group is which. The Salk vaccine experiment of the 1950s, in which half the children received the vaccine and half were given a placebo, with not even their doctors knowing who received what, is a classic example of this double-blind approach. Controlled experiments can indicate cause-and-effect relationships.
The critical principles behind good experimental design include control (outside of who receives what treatments, conditions should be as similar as possible for all involved groups), blocking (the subjects can be divided into representative groups to bring certain differences directly into the picture), randomization (unknown and uncontrollable differences are handled by randomizing who receives what treatments), replication (treatments need to be repeated on a sufficient number of subjects), and generalizability (ability to repeat an experiment in a variety of settings).
Sample surveys are one example of what are called observational studies. In observational studies there is no choice in regard to who goes into the treatment and control groups. For example, a researcher cannot ethically tell 100 people to smoke three packs of cigarettes a day and 100 others to smoke only one pack per day; he can only observe people who habitually smoke these amounts. In observational studies the researcher strives to determine which variables affect the noted response. While results may suggest relationships, it is difficult to conclude cause and effect.
Observational studies are primary, vital sources of data; however, they are a poor method of measuring the effect of change. To evaluate responses to change, one must impose change, that is, perform an experiment. Furthermore, observational studies on the impact of some variable on another variable often fail because explanatory variables are confounded with other variables.
SUMMARY
A complete census is usually unreasonable because of time and cost constraints.
Estimate population characteristics (called parameters) by considering statistics from a sample.
Analysis of badly gathered sample data is usually a meaningless exercise.
A sample is biased if in some critical way it does not represent the population.
The main technique to avoid bias is to incorporate randomness into the selection process.
Experiments involve applying a treatment to one or more groups and observing the responses.
Observational studies involve observing responses to choices people make.
QUESTIONS ON TOPIC SIX: OVERVIEW OF METHODS OF DATA COLLECTION
Multiple-Choice Questions
Directions: The questions or incomplete statements that follow are each followed by five suggested answers or completions. Choose the response that best answers the question or completes the statement.
1. When travelers change airlines during connecting flights, each airline receives a portion of the fare. Several years ago, the major airlines used a sample trial period to determine what percentage of certain fares each should collect. Using these statistical results to determine fare splits, the airlines now claim huge savings over previous clerical costs. Which of the following is true?
(A) The airlines ran an experiment using a trial period for the control group.
(B) The airlines ran an experiment using fare splits as treatments.
(C) The airlines ran an observational study using the calculations from a trial period as a sample.
(D) The airlines ran an observational study, but fare splits were a confounding variable.
(E) The airlines tried to gather a census but ended up with a sample.
2. Which of the following is not true?
(A) In an experiment some treatment is intentionally forced on one group to note the response.
(B) In an observational study information is gathered on an already existing situation.
(C) Sample surveys are observational studies, not experiments.
(D) While observational studies may suggest relationships, it is usually not possible to conclude cause and effect because of the lack of control over possible confounding variables.
(E) A complete census is the only way to establish a cause-and-effect relationship absolutely.
3. In one study on the effect of niacin on cholesterol level, 100 subjects who acknowledged being long-time niacin takers had their cholesterol levels compared with those of 100 people who had never taken niacin. In a second study, 50 subjects were randomly chosen to receive niacin and 50 were chosen to receive a placebo.
(A) The first study was a controlled experiment, while the second was an observational study.
(B) The first study was an observational study, while the second was a controlled experiment.
(C) Both studies were controlled experiments.
(D) Both studies were observational studies.
(E) Each study was part controlled experiment and part observational study.
4. In one study subjects were randomly given either 500 or 1000 milligrams of vitamin C daily, and the number of colds they came down with during a winter season was noted. In a second study people responded to a questionnaire asking about the average number of hours they sleep per night and the number of colds they came down with during a winter season.
(A) The first study was an experiment without a control group, while the second was an observational study.
(B) The first study was an observational study, while the second was a controlled experiment.
(C) Both studies were controlled experiments.
(D) Both studies were observational studies.
(E) None of the above is a correct statement.
5. In a 1992 London study, 12 out of 20 migraine sufferers were given chocolate whose flavor was masked by peppermint, while the remaining eight sufferers received a similar-looking, similar-tasting tablet that had no chocolate. Within 1 day, five of those receiving chocolate complained of migraines, while no complaints were made by any of those who did not receive chocolate. Which of the following is a true statement?
(A) This study was an observational study of 20 migraine sufferers in which it was noted how many came down with migraines after eating chocolate.
(B) This study was a sample survey in which 12 out of 20 migraine sufferers were picked to receive peppermint-flavored chocolate.
(C) A census of 20 migraine sufferers was taken, noting how many were given chocolate and how many developed migraines.
(D) A study was performed using chocolate as a placebo to study one cause of migraines.
(E) An experiment was performed comparing a treatment group that was given chocolate to a control group that was not.
6. Suppose you wish to compare the average class size of mathematics classes to the average class size of English classes in your high school. Which is the most appropriate technique for gathering the needed data?
(A) Census
(B) Sample survey
(C) Experiment
(D) Observational study
(E) None of these methods is appropriate.
7. Two studies are run to compare the experiences of families living in high-rise public housing to those of families living in townhouse subsidized rentals. The first study interviews 25 families who have been in each government program for at least 1 year, while the second randomly assigns 25 families to each program and interviews them after 1 year. Which of the following is a true statement?
(A) Both studies are observational studies because of the time period involved.
(B) Both studies are observational studies because there are no control groups.
(C) The first study is an observational study, while the second is an experiment.
(D) The first study is an experiment, while the second is an observational study.
(E) Both studies are experiments.
8. Two studies are run to determine the effect of low levels of wine consumption on cholesterol level. The first study measures the cholesterol levels of 100 volunteers who have not consumed alcohol in the past year and compares these values with their cholesterol levels after 1 year, during which time each volunteer drinks one glass of wine daily. The second study measures the cholesterol levels of 100 volunteers who have not consumed alcohol in the past year, randomly picks half the group to drink one glass of wine daily for a year while the others drink no alcohol for the year, and finally measures their levels again. Which of the following is a true statement?
(A) The first study is an observational study, while the second is an experiment.
(B) The first study is an experiment, while the second is an observational study.
(C) Both studies are observational studies, but only one uses both randomization and a control group.
(D) The first study is a census of 100 volunteers, while the second study is an experiment.
(E) Both studies are experiments.