For reference only.
SECTION I
Questions 1–40
Spend 90 minutes on this part of the exam.
Directions: The questions or incomplete statements that follow are each followed by five suggested answers or completions. Choose the response that best answers the question or completes the statement.
1. Suppose that the regression line for a set of data, = mx + 3, passes through the point (2, 7). If x and y are the sample means of the x- and y-values, respectively, then y =
(A) x.
(B) x – 2.
(C) x + 3.
(D) 2x + 3.
(E) 3.5x + 3.
2. A study is made to determine whether more hours of academic studying leads to higher point scoring by basketball players. In surveying 50 basketball players, it is noted that the 25 who claim to study the most hours have a higher point average than the 25 who study less. Based on this study, the coach begins requiring the players to spend more time studying. Which of the following is a correct statement?
(A) While this study may have its faults, it still does prove causation.
(B) There could well be a confounding variable responsible for the seeming relationship.
(C) While this is a controlled experiment, the conclusion of the coach is not justified.
(D) To get the athletes to study more, it would be more meaningful to have them put in more practice time on the court to boost their point averages, as higher point averages seem to be associated with more study time.
(E) No proper conclusion is possible without somehow introducing blinding.
3. The longevity of people living in a certain locality has a standard deviation of 14 years. What is the mean longevity if 30% of the people live longer than 75 years? Assume a normal distribution for life spans.
(A) 61.00
(B) 67.65
(C) 74.48
(D) 82.35
(E) The mean cannot be computed from the information given.
4. Which of the following is a correct statement about correlation?
(A) If the slope of the regression line is exactly 1, then the correlation is exactly 1.
(B) If the correlation is 0, then the slope of the regression line is undefined.
(C) Switching which variable is called x and which is called y changes the sign of the correlation.
(D) The correlation r is equal to the slope of the regression line when z-scores for the y-variable are plotted against z-scores for the x-variable.
(E) Changes in the measurement units of the variables may change the correlation.
5. Which of the following are affected by outliers?
I. Mean
II. Median
III. Standard deviation
IV. Range
V. Interquartile range
(A) I, III, and V
(B) II and IV
(C) I and V
(D) III and IV
(E) I, III, and IV
6. An engineer wishes to determine the quantity of heat being generated by a particular electronic component. If she knows that the standard deviation is 2.4, how many of these components should she consider to be 99% sure of knowing the mean quantity to within ±0.6?
(A) 27
(B) 87
(C) 107
(D) 212
(E) 425
7. A company that produces facial tissues continually monitors tissue strength. If the mean strength from sample data drops below a specified level, the production process is halted and the machinery inspected. Which of the following would result from a Type I error?
(A) Halting the production process when sufficient customer complaints are received.
(B) Halting the production process when the tissue strength is below specifications.
(C) Halting the production process when the tissue strength is within specifications.
(D) Allowing the production process to continue when the tissue strength is below specifications.
(E) Allowing the production process to continue when the tissue strength is within specifications.
8. Two possible wordings for a questionnaire on a proposed school budget increase are as follows:
I. This school district has one of the highest per student expenditure rates in the state. This has resulted in low failure rates, high standardized test scores, and most students going on to good colleges and universities. Do you support the proposed school budget increase?
II. This school district has one of the highest per student expenditure rates in the state. This has resulted in high property taxes, with many people on fixed incomes having to give up their homes because they cannot pay the school tax. Do you support the proposed school budget increase?
One of these questions showed that 58% of the population favor the proposed school budget increase, while the other question showed that only 13% of the population support the proposed increase. Which produced which result and why?
(A) The first showed 58% and the second 13% because of the lack of randomization as evidenced by the wording of the questions.
(B) The first showed 13% and the second 58% because of a placebo effect due to the wording of the questions.
(C) The first showed 58% and the second 13% because of the lack of a control group.
(D) The first showed 13% and the second 58% because of response bias due to the wording of the questions.
(E) The first showed 58% and the second 13% because of response bias due to the wording of the questions.
9. Suppose that for a certain Caribbean island in any 3-year period the probability of a major hurricane is 0.25, the probability of water damage is 0.44, and the probability of both a hurricane and water damage is 0.22. What is the probability of water damage given that there is a hurricane?
(A) 0.25 + 0.44 – 0.22
(B)
(C) 0.25 + 0.44
(D)
(E) 0.25 + 0.44 + 0.22
10. A union spokesperson is trying to encourage a college faculty to join the union. She would like to argue that faculty salaries are not truly based on years of service as most faculty believe. She gathers data and notes the following scatterplot of salary versus years of service.
Which of the following most correctly interprets the overall scatterplot?
(A) The faculty member with the fewest years of service makes the lowest salary, and the faculty member with the most service makes the highest salary.
(B) A faculty member with more service than another has the greater salary than the other.
(C) There is a strong positive correlation with little deviation.
(D) There is no clear relationship between salary and years of service.
(E) While there is a strong positive correlation, there is a distinct deviation from the overall pattern for faculty with fewer than ten years of service.
11. Two random samples of students are chosen, one from those taking an AP Statistics class and one from those not. The following back-to-back stemplots compare the GPAs.
Which of the following is true about the ranges and standard deviations?
(A) The first set has both a greater range and a greater standard deviation.
(B) The first set has a greater range, while the second has a greater standard deviation.
(C) The first set has a greater standard deviation, while the second has a greater range.
(D) The second set has both a greater range and a greater standard deviation.
(E) The two sets have equal ranges and equal standard deviations.
12. In a group of 10 scores, the largest score is increased by 40 points. What will happen to the mean?
(A) It will remain the same.
(B) It will increase by 4 points.
(C) It will increase by 10 points.
(D) It will increase by 40 points.
(E) There is not sufficient information to answer this question.
13. Suppose X and Y are random variables with µx = 32, x = 5, µy = 44, and
y = 12. Given that X and Y are independent, what are the mean and standard deviation of the random variable X + Y?
(A) µx+y = 76, x+y = 8.5
(B) µx+y = 76, x+y = 13
(C) µx+y = 76, x+y = 17
(D) µx+y = 38, x+y = 17
(E) There is insufficient information to answer this question.
14. Suppose you toss a fair die three times and it comes up an even number each time. Which of the following is a true statement?
(A) By the law of large numbers, the next toss is more likely to be an odd number than another even number.
(B) Based on the properties of conditional probability the next toss is more likely to be an even number given that three in a row have been even.
(C) Dice actually do have memories, and thus the number that comes up on the next toss will be influenced by the previous tosses.
(D) The law of large numbers tells how many tosses will be necessary before the percentages of evens and odds are again in balance.
(E) The probability that the next toss will again be even is 0.5.
15. A pharmaceutical company is interested in the association between advertising expenditures and sales for various over-the-counter products. A sales associate collects data on nine products, looking at sales (in $1000) versus advertising expenditures (in $1000). The results of the regression analysis are shown below.
Which of the following gives a 90% confidence interval for the slope of the regression line?
(A) 12.633 ± 1.415(0.378)
(B) 12.633 ± 1.895(0.378)
(C) 123.800 ± 1.414(1.798)
(D) 123.800 ± 1.895(1.798)
(E) 123.800 ± 1.645(1.798/)
16. Suppose you wish to compare the AP Statistics exam results for the male and female students taking AP Statistics at your high school. Which is the most appropriate technique for gathering the needed data?
(A) Census
(B) Sample survey
(C) Experiment
(D) Observational study
(E) None of these is appropriate.
17. Jonathan obtained a score of 80 on a statistics exam, placing him at the 90th percentile. Suppose five points are added to everyone’s score. Jonathan’s new score will be at the
(A) 80th percentile.
(B) 85th percentile.
(C) 90th percentile.
(D) 95th percentile.
(E) There is not sufficient information to answer this question.
18. To study the effect of music on piecework output at a clothing manufacturer, two experimental treatments are planned: day-long classical music for one group versus day-long light rock music for another. Which one of the following groups would serve best as a control for this study?
(A) A third group for which no music is played
(B) A third group that randomly hears either classical or light rock music each day
(C) A third group that hears day-long R & B music
(D) A third group that hears classical music every morning and light rock every afternoon
(E) A third group in which each worker has earphones and chooses his or her own favorite music
19. Suppose H0: p = 0.6, and the power of the test for Ha: p = 0.7 is 0.8. Which of the following is a valid conclusion?
(A) The probability of committing a Type I error is 0.1.
(B) If Ha is true, the probability of failing to reject H0 is 0.2.
(C) The probability of committing a Type II error is 0.3.
(D) All of the above are valid conclusions.
(E) None of the above are valid conclusions.
20. Following is a histogram of the numbers of ties owned by bank executives.
Which of the following is a correct statement?
(A) The median number of ties is five.
(B) More than four executives own over eight ties each.
(C) An executive is equally likely to own fewer than five ties or more than seven ties.
(D) One tie is a reasonable estimate for the standard deviation.
(E) Removing all the executives with three, nine, and ten ties may change the median.
21. Which of the following is a binomial random variable?
(A) The number of tosses before a “5” appears when tossing a fair die.
(B) The number of points a hockey team receives in 10 games, where two points are awarded for wins, one point for ties, and no points for losses.
(C) The number of hearts out of five cards randomly drawn from a deck of 52 cards, without replacement.
(D) The number of motorists not wearing seat belts in a random sample of five drivers.
(E) None of the above.
22. Company I manufactures bomb fuses that burn an average of 50 minutes with a standard deviation of 10 minutes, while company II advertises fuses that burn an average of 55 minutes with a standard deviation of 5 minutes. Which company’s fuse is more likely to last at least 1 hour? Assume normal distributions of fuse times.
(A) Company I’s, because of its greater standard deviation
(B) Company II’s, because of its greater mean
(C) For both companies, the probability that a fuse will last at least 1 hour is 15.9%
(D) For both companies, the probability that a fuse will last at least 1 hour is 84.1%
(E) The problem cannot be solved from the information given.
23. Which of the following is not important in the design of experiments?
(A) Control of confounding variables
(B) Randomization in assigning subjects to different treatments
(C) Use of a lurking variable to control the placebo effect
(D) Replication of the experiment using sufficient numbers of subjects
(E) All of the above are important in the design of experiments.
24. The travel miles claimed in weekly expense reports of the sales personnel at a corporation are summarized in the following boxplot.
Which of the following is the most reasonable conclusion?
(A) The mean and median numbers of travel miles are roughly equal.
(B) The mean number of travel miles is greater than the median number.
(C) Most of the claimed numbers of travel miles are in the [0, 200] interval.
(D) Most of the claimed numbers of travel miles are in the [200, 240] interval.
(E) The left and right whiskers contain the same number of values from the set of personnel travel mile claims.
25. Which of the following statements about residuals is true?
(A) Influential scores have large residuals.
(B) If the linear model is good, the number of positive residuals will be the same as the number of negative residuals.
(C) The mean of the residuals is always zero.
(D) If the correlation is 0, there will be a distinct pattern in the residual plot.
(E) If the correlation is 1, there will not be a distinct pattern in the residual plot.
26. Four pairs of data are used in determining a regression line = 3x + 4. If the four values of the independent variable are 32, 24, 29, and 27, respectively, what is the mean of the four values of the dependent variable?
(A) 68
(B) 84
(C) 88
(D) 100
(E) The mean cannot be determined from the given information.
27. According to one poll, 12% of the public favor legalizing all drugs. In a simple random sample of six people, what is the probability that at least one person favors legalization?
(A) 6(0.12)(0.88)5
(B) (0.88)6
(C) 1 – (0.88)6
(D) 1 – 6(0.12)(0.88)5
(E) 6(0.12)(0.88)5 + (0.88)6
28. Sampling error occurs
(A) when interviewers make mistakes resulting in bias.
(B) because a sample statistic is used to estimate a population parameter.
(C) when interviewers use judgment instead of random choice in picking the sample.
(D) when samples are too small.
(E) in all of the above cases.
29. A telephone executive instructs an associate to contact 104 customers using their service to obtain their opinions in regard to an idea for a new pricing package. The associate notes the number of customers whose names begin with A and uses a random number table to pick four of these names. She then proceeds to use the same procedure for each letter of the alphabet and combines the 4 × 26 = 104 results into a group to be contacted. Which of the following is a correct conclusion?
(A) Her procedure makes use of chance.
(B) Her procedure results in a simple random sample.
(C) Each customer has an equal probability of being included in the survey.
(D) Her procedure introduces bias through sampling error.
(E) With this small a sample size, it is better to let the surveyor pick representative customers to be surveyed based on as many features such as gender, political preference, income level, race, age, and so on, as are in the company’s data banks.
30. The graph below shows cumulative proportions plotted against GPAs for high school seniors.
What is the approximate interquartile range?
(A) 0.85
(B) 2.25
(C) 2.7
(D) 2.75
(E) 3.1
31. PCB contamination of a river by a manufacturer is being measured by amounts of the pollutant found in fish. A company scientist claims that the fish contain only 5 parts per million, but an investigator believes the figure is higher. The investigator catches six fish that show the following amounts of PCB (in parts per million): 6.8, 5.6, 5.2, 4.7, 6.3, and 5.4. In performing a hypothesis test with H0: µ = 5 and Ha: µ > 5, what is the test statistic?
(A)
(B)
(C)
(D)
(E)
32. The distribution of weights of 16-ounce bags of a particular brand of potato chips is approximately normal with a standard deviation of 0.28 ounce. How does the weight of a bag at the 40th percentile compare with the mean weight?
(A) 0.40 ounce above the mean
(B) 0.25 ounce above the mean
(C) 0.07 ounce above the mean
(D) 0.07 ounce below the mean
(E) 0.25 ounce below the mean
33. In general, how does tripling the sample size change the confidence interval size?
(A) It triples the interval size.
(B) It divides the interval size by 3.
(C) It multiples the interval size by 1.732.
(D) It divides the interval size by 1.732.
(E) This question cannot be answered without knowing the sample size.
34. Which of the following statements is false?
(A) Like the normal distribution, the t-distributions are symmetric.
(B) The t-distributions are lower at the mean and higher at the tails, and so are more spread out than the normal distribution.
(C) The greater the df, the closer the t-distributions are to the normal distribution.
(D) The smaller the df, the better the 68–95–99.7 Rule works for t-models.
(E) The area under all t-distribution curves is 1.
35. A study on school budget approval among people with different party affiliations resulted in the following segmented bar chart:
Which of the following is greatest?
(A) Number of Democrats who are for the proposed budget
(B) Number of Republicans who are against the budget
(C) Number of Independents who have no opinion on the budget
(D) The above are all equal.
(E) The answer is impossible to determine without additional information.
36. The sampling distribution of the sample mean is close to the normal distribution
(A) only if both the original population has a normal distribution and n is large.
(B) if the standard deviation of the original population is known.
(C) if n is large, no matter what the distribution of the original population.
(D) no matter what the value of n or what the distribution of the original population.
(E) only if the original population is not badly skewed and does not have outliers.
37. What is the probability of a Type II error when a hypothesis test is being conducted at the 10% significance level (α = 0.10)?
(A) 0.05
(B) 0.10
(C) 0.90
(D) 0.95
(E) There is insufficient information to answer this question.
38.
Above is the dotplot for a set of numbers. One element is labeled X. Which of the following is a correct statement?
(A) X has the largest z-score, in absolute value, of any element in the set.
(B) A modified boxplot will plot an outlier like X as an isolated point.
(C) A stemplot will show X isolated from two clusters.
(D) Because of X, the mean and median are different.
(E) The IQR is exactly half the range.
39. A 2008 survey of 500 households concluded that 82% of the population uses grocery coupons. Which of the following best describes what is meant by the poll having a margin of error of 3%?
(A) Three percent of those surveyed refused to participate in the poll.
(B) It would not be unexpected for 3% of the population to begin using coupons or stop using coupons.
(C) Between 395 and 425 of the 500 households surveyed responded that they used grocery coupons.
(D) If a similar survey of 500 households were taken weekly, a 3% change in each week’s results would not be unexpected.
(E) It is likely that between 79% and 85% of the population use grocery coupons.
40.
The above two-way table summarizes the results of a survey of high school seniors conducted to determine if there is a relationship between whether or not a student is taking AP Statistics and whether he or she plans to attend a public or a private college after graduation. Which of the following is the most reasonable conclusion about the relationship between taking AP Statistics and the type of college a student plans to attend?
(A) There appears to be no association since the proportion of AP Statistics students planning to attend public schools is almost identical to the proportion of students not taking AP Statistics who plan to attend public schools.
(B) There appears to be an association since the proportion of AP Statistics students planning to attend public schools is almost identical to the proportion of students not taking AP Statistics who plan to attend public schools.
(C) There appears to be an association since more students plan to attend private than public schools.
(D) There appears to be an association since fewer students are taking AP Statistics than are not taking AP Statistics.
(E) These data do not address the question of association.
If there is still time remaining, you may review your answers.
SECTION II
Part A
QUESTIONS 1–5
Spend about 65 minutes on this part of the exam.
Percentage of Section II grade—75
Directions: You must show all work and indicate the methods you use. You will be graded on the correctness of your methods and on the accuracy of your results and explanations.
1. Ten volunteer male subjects are to be used for an experiment to study four drugs (aloe, camphor, eucalyptus oil, and benzocaine) and a placebo with regard to itching relief. Itching is induced on a forearm with an itch stimulus (cowage), a drug is topically administered, and the duration of itching is recorded.
(a) If the experiment is to be done in one sitting, how would you assign treatments for a completely randomized design?
(b) If 5 days are set aside for the experiment, with one sitting a day, how would you assign treatments for a randomized block design where the subjects are the blocks?
(c) What limits are there on generalization of any results?
2. A comprehensive study of more than 3,000 baseball games results in the graph below showing relative frequencies of runs scored by home teams.
(a) Calculate the mean and the median.
(b) Between the mean and the median, is the one that is greater what was to be expected? Explain.
(c) What is the probability that in 4 randomly selected games, the home team is shut out (scores no runs) in at least one of the games?
(d) If X is the random variable for the runs scored per game by home teams, its standard deviation is 2.578. Suppose 200 games are selected at random, and x, the mean number of runs scored by the home teams, is calculated. Describe the sampling distribution of x.
3. A study is performed to explore the relationship, if any, between 24-hour urinary metabolite 3-methoxy-4-hydroxyphenylglycol (MHPG) levels and depression in bipolar patients. The MHPG level is measured in micrograms per 24 hours while the manic-depression (MD) scale used goes from 0 (manic delirium), through 5 (euthymic), up to 10 (depressive stupor). A partial computer printout of regression analysis with MHPG as the independent variable follows:
Average MHPG = 1243.1 with SD = 384.9
Average MD = 5.4 with SD = 2.875
95% confidence interval for slope b1 = (–0.0096, –0.0016)
(a) Calculate the slope of the regression line and interpret it in the context of this problem.
(b) Find the equation of the regression line.
(c) Calculate and interpret the value of r2 in the context of this problem.
(d) What does the correlation say about causation in the context of this problem.
4. An experiment is run to test whether daily stimulation of specific reflexes in young infants will lead to earlier walking. Twenty infants were recruited through a pediatrician’s service, and were randomly split into two groups of ten. One group received the daily stimulation while the other was considered a control group. The ages (in months) at which the infants first walked alone were recorded.
Is there statistical evidence that infants walk earlier with daily stimulation of specific reflexes?
5. A city sponsors two charity runs during the year, one 5 miles and the other 10 kilometers. Both runs attract many thousands of participants. A statistician is interested in comparing the types of runners attracted to each race. She obtains a random sample of 100 runners from each race, calculates five-number summaries of the times, and displays the results in the parallel boxplots below.
(a) Compare the distributions of times above.
The statistician notes that 5 miles equals 8 kilometers and so decides to multiply the times from the 5-mile event by and then compare box plots. This display is below.
(b) How do the distributions now compare?
(c) Given that the 10-kilometer race was a longer distance than the 5-mile race, was the change from the first set of boxplots to the second set as expected? Explain.
(d) Based on the boxplots, would you expect the difference in mean times in the first set of parallel boxplots to be less than, greater than, or about the same as the difference in mean times in the second set of parallel boxplots? Explain.
SECTION II
Part B
QUESTION 6
Spend about 25 minutes on this part of the exam.
Percentage of Section II grade—25
6. A random sample of 250 SAT mathematics scores tabulates as follows:
(a) Find a 95% confidence interval for the proportion of scores over 500.
(b) Test the null hypothesis that the data follow a normal distribution with µ = 500 and = 100.
(c) Assume the above data comes from a normally distributed population with unknown µ and . In this case, the sampling distribution of
follows a chi-square distribution. If computer output of the above data yields n = 250,
= 504.8, and s = 114.4, and if in this problem the
2-values for tails of 0.025 and 0.975 are 294.6 and 207.2, respectively, find a 95% confidence interval for the standard deviation
.
If there is still time remaining, you may review your answers.