Let's say that Sue's course-work mark is 60. Now, suppose we make a scatter diagram for the course-work and exam marks of the other students. We can then see what kind of exam scores were made by other students who scored 60 in course-work. The range of their exam marks gives us an estimate for Sue's.
Prediction and regression 177
However, as I indicated above, the precision of this estimate depends on the degree of correlation. For instance, either of the two scatter diagrams below might describe the relationship. In which case (A or B) would it be possible to give the more precise estimate of Sue's exam score?
Course-work marks
If * * * * *
The more precise estimate can be made in relationship (B). Reading up the vertical line from a mark of 60 on the course-work
178 Analysing relationships
scale, we see the students who scored that mark and then read across to see their exam marks. In (A), such students scored between 24 and 56. In (B) they scored between 28 and 44. So, if (A) represented the correlation in Sue's group, we'd have to estimate her exam score as 'between 24 and 56' (a range of 32); but if (B) represented her group, our estimate would be ' between 28 and 44' (thus narrowing the range to 16).
Clearly, the less the vertical spread of the scattered dots, the more precise our estimates; and, of course, the less the spread, the greater the correlation. Perfect (100%) estimates or predictions are possible only when all the dots lie on a straight line (as they did with our example of the radii and circumferences of circles). With perfect correlation, we can say exactly what value of one variable will go with any given value of the other.
Hence, one approach to prediction is to * reduce' the data to a straight line. We ask: *What is the *'underlying" straight line from which all these points deviate?' We look for the so-called *LiNE OF BEST FIT'. This acts rather like a measure of central tendency, in that it attempts to average out the variations.
Here, for example, is where I'd put the' line of best fit' on one of the two scatter diagrams we just looked at. 1 could then use this line to make a precise estimate of the exam mark corresponding to any given course-work mark. If we took this line to represent
y
J2 80
E
I 60
40
20
0 20 40 60 80 100
Course-work marks
Prediction and regression 179
the relationship, what exam mark would correspond to Sue's course-work mark of 60?
« * 4: 4( ♦ ♦ ♦
The corresponding exam score would be 36.
The problem is, of course: where does the line of best fit belong? We can sketch it in * by eyeas I did above. In doing so -trying to find a sensible straight path through the dots - we must try to judge whether, on average, the dots would deviate by as much on one side of a possible line as they would on the other. No easy task! And you can be sure that we'd all draw rather different lines of best fit, with the result, of course, that we'd all make slightly diff"erent estimates of Sue's missing mark. And, again, the weaker the correlation, the bigger the scatter, and the more we'd diff'er in where we drew the line - see how difficult it would be to decide in the other scatter diagram on page 177 - and so the more we'd diff'er in our estimates.
Not surprisingly, then, techniques exist for calculating a position for lines of best fit. (They must, for example, pass through the mean score on both variables.) Such lines are called regression LINES. The term was introduced by the nineteenth-century British scientist, Francis Galton. In studying the relationship between the heights of sons and their fathers, he found that, while taller-than-average fathers tended to have taller-than-average sons (and smaller fathers, smaller sons), the sons tended to be nearer the average height of all men than were their fathers. This he called a * regression to mediocrity' - a falling back towards the average. He and his friend Karl Pearson (who introduced the correlation coefficient) developed a variety of techniques for studying such relationships, and these became known as regression techniques.
Like any other straight line on a graph, a regression line can be described by an equation. This is called a ^regression equation'. For instance, letting x stand for the course-work marks and y for the exam marks, the regression equation for the line in the diagram above would bQ y = ix — \1, Values of x
180 Analysing relationships
(the exam marks) can then be estimated directly from the equation without looking at the scatter diagram any further.
Try using the equation yourself. Suppose a student made a course-work mark (x) of 80. What would you estimate as his exam mark (j) ?
« 4t « ♦ « 4c «
Since exam marks (y) and course-work marks (x) are connected by the regression equation y = ix — 17, a course-work mark (x) of 80 suggests an exam mark of:
y = i(80) - 17 = 70-17 = 53
As you can check from the scatter diagram, this seems to agree with the value we'd read off, using the regression line directly.
y
^ gQ
TO
E
I 60
40
20
0 20 40 60 80 100
Course-work marks
Of course, you may well wonder at the wisdom of making precise-looking estimates on the basis of a regression line, especially when the points are actually scattered quite far away from it. In fact, any such estimate could only be stated with honesty if accompanied by an indication of the possible error -do you remember the idea of a confidence interval ?
Prediction and regression 181
An alternative approach to prediction is to display the scatter in a table. In our scatter diagrams so far, we have assumed that no two members of the sample had the same pair of values, e.g. that no two students both scored 50 on course-work and 40 in the exam. So each dot has represented a different member of the sample. But, in practice, especially with large samples, it's quite likely that, on several occasions, a particular pair of values would be shared by more than one member of the sample. To represent this diagrammatically, we'd need a three-dimensional chart -with columns instead of dots, varying in height according to the number of times each particular pair of values was observed. Another solution is to use a table like the one below:
Practical and theory marks of 118 students
Theory marks
Practical marks
This table shows that, for example, of the ten students who scored 7 in the practical test, three scored 8 on the theory test, four scored 7, two scored 6 and one scored 5. We might then predict, on the basis of this sample, that a student who scored 7 on the practical would have a or 40% chance of scoring 7 in the theory test. We might also point out that, although the mean theory score of students scormg 7 on practical is 6.9, such a student has a 30% chance of scoring 8 on theory.
Again, we might notice that the mean theory score of students scoring 5 on practical is 5.5. However, suppose we think of a wider population of similar students who might have taken the practical test but not the theory. If we can generalize from the
182 Analysing relationships
sample in this table, what is the probability that such a student who scored 5 on the practical will score (i) less than 5; (ii) more than 5 on the theory test ?
******
Of the twenty students who scored 5 on practical, five scored less than 5 on theory and eleven scored more. So the probability of a student with 5 on practical scoring (i) less than that on theory is oi)- or 25%, while the probability of his scormg (ii) more is or 55%.
Notice that the table makes clear the evidence on which our prediction is based. While a precise estimate can be offered (e.g. the mean score), the probability of greater or lesser scores is plainly visible.
If we were to draw such a table for a large sample, covering as wide a mark-range (0-100) as we envisaged for the pairs of course-work/exam marks, we'd probably group the data. Thus, as youMI see in the example below, one cell might show that 6 students scored between 60 and 69 on course-work and between 50 and 59 on the exam. Precise information is lost. We no longer show exactly what course-work mark and exam mark each of these students obtained. But the data are easier to handle, and the only precision lost in the estimates we'd make is a fairly spurious kind of precision anyway.
Course-work and final exam marks of 123 students
Final exam marks
Course-work marks
Prediction and regression 183
Sue, with her course-work score of 60, would obviously be compared with one of the twenty students in the '60-69' column.
It would seem that there is only a ^20^ ^ ^ ^ "^^^ chance
of her having made the same score or higher on the final exam. Indeed, there is a 2^ or 20% chance that she may have scored as low as 30-39. There is even a -g-Q- or 5% chance that her exam score may be between 20 and 29. What is her most likely exam score; and what is its probability?
« ♦ * iie ♦ 4c He
Sue's most likely exam score is 50-59 marks. Of the twenty students comparable with her on course-work marks, more (6) fell into that exam-mark range than into any other - the probability of such a score is or 30%. (Sue also benefits from the way we happen to have grouped the marks. If she'd scored 59 rather than 60, she'd have fallen in the 50-59 class-work group, and her most likely exam score would be only 40-^9.)
Don't forget, however, that we have a very limited sample: only twenty students in Sue's range of course-work marks. Yet we wish to generalize to such students in general, and to Sue in particular. Sue, for all we know, may have special qualities that would enable her to score 100 in the exam. It appears improbable, but it remains possible.
This seems a suitable point at which to begin drawing this book to a conclusion. For we are back where we began, distinguishing between samples and populations, and asking how reasonable it is to generalize from one to the other. And, in any case, what are the risks in predicting for just one individual (Sue) from the population? Statistics, as I'm sure you'll have gathered, is far better equipped to make inferences about things ' in general' and *in the long run' than about particular things on a particular occasion. This, though, is one of its most important lessons. Human beings are bound to generalize, expecting to see the characteristics, diff'erences, trends and associations we have noted in one instance to be repeated in apparently similar instances. Without some such expectation of constancy, we could not
184 Analysing relationships
survive. But all such expectations must be hedged about with probability. They must be tentative rather than absolute. We must expect them to be confounded much of the time. Unless we can, to some extent at least, expect things to be different from what we expect, then we cannot learn from experience. And that is also a path to stagnation and extinction.
Postscript
In the preceding pages I've been very much aware that we've been skimming the surface of most issues. This is because my aim was to give you a bird's-eye-view of the field of statistical concepts, rather than take you crawling, like a calculating snake, through the undergrowth of statistical computation. If you feel I've raised more questions in your mind than I've answered, I shan't be surprised or apologetic. The library shelves groan with the weight of the books in which you'll find answers to such questions (see the Bibliography, pages 191-5). Even though you'll be well aware that statistics is a subject with ramifications and inter-connections far beyond anything we've discussed here, you should be pretty confident that you've *got the feel' of its main lines of concern.
If you have worked through the book to this point, you should now have a basic grasp of the concepts and terminology of statistics. It should be of great help to you in: (1) reading a research report (or a newspaper) with some expectation that you'll see the point of any statistics it may contain; (2) describing your research interests to a professional statistician in terms that would enable him to offer you technical advice; and (3) turning to books or courses on statistical calculation with a view to learning the techniques yourself.
Unless you have to pass an exam in statistics, you may well need little more than you have already learned. Although students in many subjects must take courses in statistics, much of what they are taught often seems of doubtful relevance to their other activities. Certainly research has shown (if it can be relied upon!) that social scientists tend to retain only a partial knowledge of the statistics they were taught. But perhaps they remember all that
186 Postscript
is useful to them. For when social scientists do make liberal use of statistics in their research reports, one often gets the impression they are being used to make the work look more 'scientific' (and therefore believable) rather than to clarify its meaning.
To conclude, TU briefly review the * bird's-eye-view* and add a final note of caution.
Review
Statistics is a means of coming to conclusions in the face of uncertainty. It enables us to recognize and evaluate the errors involved in quantifying our experience, especially when generalizing from what is known of some small group (a sample) to some wider group (the population).
Statistical analysis begins in a description of the sample. We may find diagrams a revealing way of describing a sample and comparing it with other distributions. But we are particularly interested in getting a measure of the sample's central tendency and, where appropriate, a measure of its dispersion. The two most important such measures (with quantity-variables) are the arithmetic mean and the standard deviation. These are of particular value in defining the normal distribution: a symmetrical, bell-shaped curve.
Once we have the mean and the standard deviation, we can compare values from two different distributions (in z-units) and we can estimate the percentage of observations in a distribution that would fall above and below various values of the variable. We can also infer parameter values (in the population) based upon the statistics (from the sample), using the concept of standard error to decide the confidence interval within which we believe the true population value (e.g. a mean or a proportion) to lie. We would commonly quote a range in which we are 95% certain the true value lies, or a wider one about which we can be 99% confident.
Following the same principles, we can compare two (or more) samples, and ask whether they are sufficiently similar to have come
Review 187
from the same population. Or is the difference between them big enough to signify a real difference in populations - one that would be repeated in subsequent pairs of samples chosen in the same way? We make a null hypothesis that the samples come from the same population and that the difference has arisen purely by chance. Tests enable us to determine the plausibility of this hypothesis. If the probability of getting two samples so different from one population is less than 5%, we may reject the hypothesis. If we want to be more careful still, we may prefer not to reject the null hypothesis (i.e. not recognize the difference as a real one) unless it is so big that its probability in two samples from the same population is less than 1%. Such differences are said to be significant (even though they may not be important in any practical sense).
There are also tests to establish whether there appears to be a significant difference anywhere among a group of more than two samples. This involves the analysis of variance: comparing the variance between groups with the variance within groups. When dealing with categories rather than with quantity-variables, and asking whether there is a significant difference between two samples in proportions rather than means, we use a non-parametric technique called the chi-square test. This compares the frequency with which we'd expect certain observations to occur, if chance only were operating, with the frequency that actually occurred. (Such non-parametric techniques are essential when dealing with category-variables and may in other cases be advisable when we can't be sure that the parent population Is normally distributed.)
Finally, we are often interested in the relationship between related pairs of values from two different variables (for example, people's heights and weights) Correlation is a measure ot such relationship and the correlation coefficient indicates its strength-on a scale irom —1 and +1 (equally strong) down to zero. Scatter diagrams are a useful way of displaying correlation, but may need to be replaced by tables when the same pair of values is recorded for several members of the sample. With regression techniques, we can use the relationship observed in the sample to
188 Postscript
predict, for the population, values of one variable that would correspond with given values of the other variable. The likely accuracy of such predictions increases with the strength of the correlation. Even when the correlation is very strong and predictions are firm, we cannot use that fact to prove that one variable causes the other, even if we can explain a causal connection. Variable X may cause variable Y, or vice versa, or both may be determined by another variable, Z, or the mathematical relationship may be a coincidence. As usual in statistics, however, the data would lend support (or deny it) to a reasoned argument along one of these lines, but absolute proof is never forthcoming.
Caution
Finally, a few words of caution. It has been pointed out, over and over again, that: * There are lies, damned lies, and statistics!' or * Figures don't lie but liars use figures!' or, less hysterically (but despite what 1 said at the end of the last paragraph) ' You can prove anything with statistics.' How to Lie with Statistics by Darrell Huff and Irving Gels (Norton, 1954) is the classic book on this topic; the warnings it conveys are both deadly serious and amusingly presented.
in general, it is well to remember that a person who uses statistics may be someone with an axe to grind. He may be propping up a weak case with rather dubious figures which he hopes will impress or intimidate any potential critics. For example, a British politician recently claimed that *50% more teachers considered that educational standards had fallen rather than risen over the previous five years.' This pronouncement is worrying but rather obscure: 50% more ... than what ? In fact, it was based on a survey in which 36% of the teachers believed that standards of pupil achievement had fallen, 24% believed they had risen, 32% believed they had remained the same, and 8% didn't know. Clearly, the politician ignored the * stayed the same' and the 'don't know' groups (a cool 40% of the sample) to arrive at his '50% more' figures. Had he wished to propagate a rosier view
Caution 189
of British educational standards, he might, with equal validity, have pointed out that no less than 64% of teachers do not believe that standards have fallen. On the other hand, he might have painted an even blacker picture by saying that 76% of teachers do not believe standards have risen!
But please don't think that untrustworthy statistics are found only in the rantings of politicians and the cajolements of advertisers. Just this week, as 1 put the finishing touches to this chapter, the leading French opinion poll stands accused of suppressing the fact that 77% of the French people polled thought that immigrants should be sent home; and publishing instead the false figure of 57% - a figure that would be more acceptable to their clients (the French government).
Even scientific researchers are subject to human failings. They are most unlikely to attempt out-and-out deception - though the life-work of a famous, recently deceased British psychologist is currently alleged by some critics to be based on fraudulent experimental and statistical data. But even they may get carried away by the desire to 'prove' a pet theory and, in the process, overlook the inadequacy or bias in a particular sample, or choose inappropriate tests or insufficiently stringent levels of significance. Studies of journal articles (in the psychology field, for example) have indicated many such lapses. Indeed, a * classic' 1971 study by a Finnish medical researcher (which suggested that one's chance of heart disease would be reduced by eating vegetable margarine rather than butter) is now being attacked for just such statistical weaknesses.
Of course, you cannot be expected to sniff* out the transgressors. You simply do not have sufficient technical expertise. Nor do I want to encourage you to be unduly suspicious. Your problem will chiefly lie in understanding and interpreting people's statistics, not in trying to catch them. At the same time, your interpretation will be colored by what you have learned in this book: for instance the possibility of bias in samples, the distinction between significance and importance, the fact that correlation does not imply causation, and so on. Needless to say, if you should find yourself producing statistics for other people,
190 Postscript
to back up an argument of your own, I assume you will strive for all the honesty you'd wish for from others.
Jn short: as a consumer of statistics, act with caution; as a producer, act with integrity.
Bibliography
Here is a small (and not at all random!) sample from the vast population of books in which you might follow up what you have learned so far. I begin with a small 'general' section, listing books that should be helpful to the reader who wants to go a little bit further. I follow this with sections listing books that apply statistical thinking to each of several particular subject-areas. This is not to imply, however, that such books could be of help only to students of the subject under which they are listed. You might well find the exposition of some authors to be so good that you'd be prepared to overlook the fact that their examples were not drawn from your subject. (In any case, my classification has to be somewhat arbitrary, since some of the books could appear in more than one category.) However, what I am assuming really is that, having endured my general introduction, you'll be most concerned to hear from authors who tackle problems in your own subject-area.
General Introductions
Freund, J. E. Statistics: A First Course, 3rd ed. Englewood Cliffs, N.J.: Prentice-Hall, 1981.
Haber, A., and Runyon, R. P. General Statistics, 3rd ed. Reading, Mass.: Addison-Wesley, 1977.
Huff, D., and Geis, I. How to Lie with Statistics. New York: Norton, 1954.
Sanders, D. H., and others. Statistics: A Fresh Approach. New York: McGraw-Hill, 1979.
192 Bibliography
Behavioral Sciences, Psychology and Education
Crocker, A. C. Statistics for Teachers. Atlantic Heights, NJ.: Humanities Press, 1974.
Gehring, R. E. Basic Behavioral Statistics. Boston: Houghton Mifflin, 1978.
Gellman, E. S. Statistics for Teachers. New York: Harper & Row, 1973.
Guilford, J. P., and Fruchter, B. Fundamental Statistics in Psychology and Education, 6th ed. New York: McGraw-Hill, 1977.
Hardyck, C. D., and Petrovorich, L. F. Introduction to Statistics for Behavioral Sciences, 2d ed. New York: Holt, Rinehart & Winston, 1976.
Lewis, D, G. Statistical Methods in Education. New York: International Publications Service, 1967. Lynch, M. D., and Huntsberger, D. V. Elements of Statistical Inference for Psychology and Education. Boston: Houghton Mifflin, 1976.
MacCall, R. B. Fundamental Statistics for Psychology, 2d ed. New York: Harcourt Brace Jovanovich, 1975. Popham, W. J., and Sirotnik, K. A. Educational Statistics, 2d ed. New York: Harper & Row, 1973.
Slakter, M. J. Statistical Inference for Educational Researchers. Reading, Mass.: Addison-Wesley, 1972. Siegel, S. Nonparametric Statistics for the Behavioral Sciences New York: McGraw-Hill, 1972.
Business and Management
Braverman, J. D., and Stewart, W. C. Statistics for Business
and Economics. New York: John Wiley, 1973.
Broster, E. J. Glossary of Applied Management and Financial
Statistics. New York: Crane-Russak, 1974.
Levin, Richard L Statistics for Management, 2d ed. Engle-
woodCHffs, N.J.: Prentice-Hall, 1981.
Bibliography 193
Shao, S. Statistics for Business and Economics, 3d ed. Columbus, Oh.: Merrill, 1976
Thirkettle, G. L. (Wheldon's) Business Statistics. Philadelphia: International Ideas, 1972.
Economics
Beals, R. E. Statistics for Economists. Chicago: Rand Mc-Nally, 1972.
Davies, B., and Foad, J. N. Statistics for Economics. Exeter, N.H.: Heinemann, 1977.
Jolliffe, F. R. Commonsense Statistics for Economists and Others. Boston: Routledge & Kegan Paul, 1974. Thomas, J. J. Introduction to Statistical Analysis for Economists. New York: John Wiley, 1973.
Geography
Ebdon, D. Statistics in Geography. Totowa, N.J.: Biblio Dist., 1977.
King, L. J. Statistical Analysis in Geography. Englewood Cliffs, N.J.: Prentice-Hall, 1969.
Norcliffe, G. B. Inferential Statistics for Geographers. New York: Halstad Press, 1977.
History
Dollar, C. M. Historian's Guide to Statistics. Huntington, N.Y.: Krieger, 1974.
Floud, R. Introduction to Quantitative Method for Historians. New York: Methuen, 1973.
194 Bibliography
Medical and Biological Sciences
Brown, B. W. Statistics: A Biomedical Introduction. New York: John Wiley, 1977.
Colquhoun, D. Lectures on Biostatistics. New York: Oxford University Press, 1971.
Hill, A. B. A Short Textbook of Medical Statistics, 10th ed. Philadelphia: Lippincott, 1977.
Hills, M. Statistics for Comparative Studies. New York: Methuen, 1974.
Mather, K. Statistical Analysis in Biology. New York: Methuen, 1972.
Mosiman, J. E. Elementary Probability for Biological Sciences. New York: Appleton Century Crofts, 1968.
Physical Sciences and Technology
Bury, K. V. Statistical Models in Applied Science. New York: John Wiley, 1975.
Chatfield, C. Statistics for Technology, 2d ed. New York: Methuen, 1979.
Eckschlager, K. Errors, Measurements and Results in Chemical Analysis. New York: Van Nostrand, 1969. Hald, A. Statistical Theory with Engineering Applications. New York: John Wiley, 1952.
Johnson, N. L. Statistics and Experimental Design in Engineering, 2d ed. New York: John Wiley, 1977. Koch, G. S., and Link, R. F. Statistical Analysis of Geological Data. New York: John Wiley, 1970. Till, R. Statistical Methods for the Earth Scientist. New York: Halstad Press, 1978.
Young, H. D. Statistical Treatment of Experimental Data. New York: McGraw-Hill, 1962.
Bibliography 195
Sociology and Social Sciences
Anderson, T. R., and Zelditch, M. Basic Course in Statistics with Sociological Applications, 2d ed. New York: Appleton Century Crofts, 1968.
Blalock, H. M. Social Statistics, 2d ed. New York: McGraw-Hill, 1972.
Davis, J. A. Elementary Survey Analysis. Englewood Cliffs, N.J.: Prentice-Hall, 1971.
Hays, W. L. Statistics for the Social Sciences, 2d ed. New York: Holt, Rinehart & Winston, 1973.
Levin, J. Elementary Statistics in Social Research, 2d ed. New York: Harper & Row, 1977.
Ott, L., and others. Statistics: A Tool for the Social Sciences, 2d ed. N. Sciuate, Mass.: Duxbury Press, 1978. Moser, C. A., and Kalton, G. Survey Methods in Social Investigation, 2d ed. New York: Basic Books, 1972. Mueller, J. H. Statistical Reasoning in Sociology, 3d ed. Boston: Houghton Mifflin, 1977.
Palumbo, D. J. Statistics in Political and Behavioral Science, 2d ed. New York: Columbia University Press, 1977. Tufte, E. Data Analysis for Politics and Policy. Englewood Cliffs, N.J.: Prentice-Hall, 1974.
Accuracy 35-7
Alternative hypothesis 111, 112,
129, 130, 131, 134 Analysis of variance 143-50, 187 one-way 143-9 two-way 149, 150 Approximations 36 Arithmetic mean 44, 45, 48-50,
53, 68, 71, 186 Array 43
Averages 44, 45, 46, 48-50 Bias 24-7
Bimodal distribution 63, 64 Block diagrams 40, 41, 46
Categories 29, 46, 187 Category-data 34, 40, 42, 124 Category-variables 29, 33, 34,
128, 150, 163, 187 Causation 171-3, 188 Central tendency 48-50, 59, 178,
186
Chi-square 150-54, 187 Coefficient of correlation 162-76, 187
Comparisons 79-81, 141, 155, 186
among means 141-50 among proportions 150-54 Confidence intervals 96-9, 118, 180,186
Continuous variables 32, 33 Control group 26, 128 Correlation 156-76. 187, 188
coefficient of 162-76, 187 Counting 33 Critical region 135, 137, 138
Critical values 131 Cross-breaks/-tabulations 42 Curve of distribution 64 Cut-ofT points 118
Data 14, 17, 34, 37, 38-56 raw 48
Descriptive statistics 19, 21 Differences 102, 107, 116-23,
126, 129-36, 152-4 Discrete variables 32, 33 Dispersion 48, 50-56, 85, 89,
122,123, 143-4, 186 Distributions 43, 45, 46
bimodal 63, 64
curve of 64, 124
frequency 46
normal 64-81, 87, 108, 124, 139, 140, 186, 187
of differences 107
of sample-means 87-99
shape of 57-81
skewed 58-64, 70 Dot diagrams 45 Double-blind 128
197
198 Index
Error 21, 35-7, 88, 94, 119-21 133, 168, 186
Type I 119-21, 133
Type n 119-21, 168 Estimates 82-6, 146, 147, 177-83 Expected frequencies 151
Frequency distributions 46-8,
59, 69 F-distribution 148 F-ratio 147 F-test 143
Generalization 14, 15. 21, 82, 183, 186
Grouped frequency distributions 46-8, 59, 70
Histogram 47, 70 Hypothesis 111, 150 alternative 111, 112, 129, 130,
131, 134 null 111, 112, 115, 118, 119, 124, 125, 129, 132, 133, 135, 146, 151, 154, 168, 186
Inference 19, 20, 48, 82-6, 102,
164, 183, 186 Inferential statistics 19, 21 Inflection, point of 72 Information, loss of 34, 47, 126,
155
Interaction effects 149 Interquartile range 52
Line of best fit 178 Loss of information 34, 47, 126, 155
Mean, arithmetic 44, 45, 48-50, 53, 68, 71, 186
sample- 87-99 Measurement 33, 34, 35-7 Median 43, 44, 45, 48-50, 52, 68 Mode 46, 48-50, 68
Nominal variables 29, 33 Non-parametric methods 124-7,
150, 163, 187 Normal curve 67, 124
proportions under 72-9 Normal distribution 64-81, 87,
108, 124, 139, 140, 186, 187 Null hypothesis 111, 112, 115,
118, 119, 124, 125, 129, 132,
133, 134, 135, 146, 151, 154,
168, 186
Observations 14, 34, 38, 52, 59,
186, 187 One-tailed test 131-34, 137 One-way analysis of variance
143-9
Ordinal variables 30, 33 Over-generalizing 22
Paradox of sampling 23 Parameters 83, 99, 186 Parametric methods 124 Pie charts 40, 41 Pilot inquiry 101 Point of inflection 72 Population 19-27, 49, 81, 82-101,
107, 116, 137, 167, 186. 188 Prediction 14, 16, 48, 81, 142
156,176-84, 187 Probability 16, 18, 21, 106, 116,
120, 138, 187 Product-moment correlation
coefficient 163
Index 199
Quantity-variables 32, 33, 34,
42, 128, 186, 187 Quartiles 52
Questionnaires 34, 35, 38
Randomness 24, 25, 73, 142 Range 43, 51, 52
inter-quartile 52 Ranking 30, 31, 125, 163 Rani< correlation coefficient 163 Raw data 48
Regression 156, 176-84, 187
equation 179
lines 179 Relationships 155-84, 187
Sample-means 87-99 Samples 19-27, 28-37, 49,
82-101, 102-27, 167, 186,
187
Sampling 86-94 Sampling distributions 90, 105, 110
Sampling, paradox of 23 Sampling variation 86, 113, 118,
121, 142, 147, 164 Scatter diagram 160, 164, 181,
187
Significance 107, 116-27, 128-54,
164-9, 187 tests of 107-16, 122, 124, 127,
129-54, 168 Skewed distributions 58-64, 70 Skewness, coefficient of 62 Standard deviation 53-6, 63, 71,
89, 90, 103, 122, 144, 163,
186
Standard error 90-101, 110, 186 of correlation coefficient 166 of differences 110, 135, 137, 139
of mean 90-99, 135, 137, 139
of proportions 99-101 Statistics (sample-figures) 17, 51,
52, 82, 163, 169, 186 Stratified random sample 26
Tally marks 38, 39 t-distributions 139, 140, 168 Time series 41 Trend 41, 172 t-test 139. 141, 168 Two-tailed test 130, 133, 134 Two-way analysis of variance
149, 150 Type I error 119-21, 133 Type II error 119-21, 168
Variability 48, 50, 85, 101, 117,
143, 144-7 Variables 28-34, 38, 41, 81, 156,
171, 172, 186, 187, 188 category- 29, 33, 34, 128, 150,
163
continuous 32, 33 discrete 32, 33 nominal 29, 33 ordinal 30, 33 quantity- 32, 33, 42, 128 Variance 54, 146, 187 analysis of 143-50, 187
z-test 139, 140, 141
z-values 80, 81, 132, 139, 186