Abelson, Robert P. 1995. Statistics as Principled Argument. Hillsdale, NJ: Lawrence Erlbaum.
Abelson, who taught at Yale University for 42 years, provides an excellent discussion of how to think through, and with, statistics.
Frey, Bruce. 2006. Statistics Hacks: Tips and Tools for Measuring the World and Beating the Odds. Sebastopol, CA: O’Reilly.
Statistics Hacks is a collection of entertaining short essays that use everyday examples to introduce statistical concepts, from testing the randomness or lack thereof in your iPod’s “random” shuffle feature to using Benford’s law to detect fabricated data.
Huff, Darryl. 1954. How to Lie with Statistics. Repr., New York: W.W. Norton, 1993.
Originally published in 1954, Huff’s work remains a classic introduction to how even the simplest statistical techniques can be used to mislead, confuse, or even outright lie. Readers who can look past the dated examples and (in particular) stereotypical illustrations will find this slim volume an excellent resource and a lot of fun as well.
Levitt, Steven D., and Stephen J. Dubner. 2005. Freakonomics: A Rogue Economist Explores the Hidden Side of Everything. New York: HarperCollins.
In this New York Times bestseller, a University of Chicago economist uses economic theory and statistical analysis to examine questions from the existence of cheating in sumo wrestling to whether legalizing abortion lowered the crime rate. Although written for the public, Freakonomics has been adopted as require reading at some universities.
Salsburg, David. 2001. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: W.H. Freeman.
This popular history examines the application of statistics and probability to scientific problems in the twentieth century, shaping the story around the lives and accomplishments of pioneers such as Ronald Fisher, Karl Pearson, and Jerzy Neyman.
Tucker, Martha A., and Nancy D. Anderson. 2004. Guide to Information Sources in Mathematics and Statistics. Westport, CT: Libraries Unlimited.
This is a guide to sources of information about mathematics and statistics; the target market is librarians, but researchers will also find it useful. Categories include finding tools, journals, reference books, biographical and historical materials, and math books for science collections (for example, applications of math to other disciplines).
Carmines, Edward G., and Richard A. Zeller. 1979. Reliability and Validity Assessment. Thousand Oaks, CA: Sage.
One of the earliest entries in the Sage “little green books” series, this volume introduces classical methods to evaluate reliability and validity assessment, and gives a brief discussion of factor analytic methods.
Fleming, Thomas R. 2005. “Surrogate endpoints and FDA’s accelerated approval process.” Health Affairs 24 (January/February): 67–78.
Fleming examines the use of surrogate endpoints in clinical trials intended to provide definitive evidence about the benefits of drugs and other treatments, and describes several situations in which a treatment apparently effective on surrogate endpoints might not be effective with regard to a true clinical endpoint.
Hand, D.J. 2004. Measurement Theory and Practice: The World Through Quantification. London: Arnold.
Hand provides an excellent discussion of the theory and practice of measurement, including chapters devoted to special problems in the fields of psychology, medicine, the physical sciences, and economics and the social sciences.
Michiels, Stefan, Aurelie Le Maitre, Marc Buyse, Tomasz Byrzykowski, Emilie Maillard, Jan Bogaerts, et al. 2009. “Surrogate endpoints for overall survival in locally advanced head and neck cancer: Meta-analyses of individual patient data.” The Lancet Oncology 10 (April): 341–350.
In an article based on 104 clinical trials, Michiels and colleagues examine the usefulness of two surrogate endpoints in evaluating the success of treating locally advanced head and neck squamous-cell cancer. They conclude that event-free survival correlates more closely than does locoregional control with overall survival (the true clinical endpoint).
Uebersax, John. “Kappa coefficients.” http://www.john-uebersax.com/stat/kappa.htm.
Uebersax provides a thorough discussion of the strengths and weaknesses of kappa as part of his discussion of agreement statistics in general.
Hacking, Ian. 2001. An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press.
This volume was written as an introductory text for philosophy students but will be appreciated by anyone who would like a verbal, rather than mathematical, introduction to the basic ideas of statistics.
Mendenhall, William, et al. 2008. Introduction to Probability and Statistics. 13th ed. Pacific Grove, CA: Duxbury Press.
This is a popular probability and statistics textbook for students who have not taken calculus.
Packel, Edward W. 2006. The Mathematics of Games and Gambling. Washington, D.C.: Mathematical Association of America.
Packel traces the connections between games and gambling (including backgammon, roulette, and poker) and mathematics and statistics in a manner that assumes only standard high school preparation in mathematics. Many illustrations and exercises are included.
Ross, Sheldon. 2005. A First Course in Probability. 7th ed. Prentice Hall.
Ross provides a basic introduction to probability theory, illustrated with many examples, for students who have taken elementary calculus.
Cohen, J. 1994. “The earth is round (p < .05).” American Psychologist 49: 997–1003.
This is a classic article by one of the most vocal critics of the enshrinement of alpha = 0.05 as absolute indicator of the statistical significance or lack thereof.
Dorofeev, Sergey, and Peter Grant. 2006. Statistics for Real-Life Sample Surveys: Non-Simple-Random Samples and Weighted Data. Cambridge: Cambridge University Press.
This is a well-written guide to sampling and the analysis of survey data when simple random sampling is not possible (which is most of the time).
Mosteller, Frederick, and John W. Tukey. 1977. Data Analysis and Regression: A Second Course in Statistics. Reading, MA: Addison Wesley.
This classic textbook in inferential statistics includes a chapter on data transformation.
National Institute of Standards and Technology. Engineering Statistics Handbook: Gallery of Distributions. http://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm.
This is a nice presentation of 19 common statistical distributions, including ample illustrations, formulas, and common uses for each.
Peterson, Ivars. 1997. “Sampling and the census: Improving the decennial count.” Science News (October 11).
This clearly written article discusses problems with the data collection efforts of the U.S. census and the controversy over using sampling as part of the process.
Rice Virtual Lab in Statistics. “Simulations/Demonstrations.” http://onlinestatbook.com/stat_sim/index.html.
This Internet site has links to many Java simulations demonstrating statistical concepts, including the central limit theorem, confidence intervals, and data transformations.
Cleveland, William S. 1993. Visualizing Data. Summit, NJ: Hobart Press.
This book discusses effective graphical presentation of data with many examples; it also includes a discussion of the visual and psychological principles that lie behind effective graphical presentation of information.
Erceg-Hurn, David M., and Vikki M. Mirosevich. 2008. “Modern statistical methods: An easy way to maximize the accuracy and power of your research.” American Psychologist 63: 591–601.
This discusses robust statistical methods, including trimmed means, and argues for their wider use.
Robbins, Naomi. 2004. Creating More Effective Graphs. Hoboken, NJ: Wiley.
An easy-to-use guide that shows good and bad examples of graphs presenting the same information, this book always has an eye to using graphical techniques to communicate statistical information more effectively.
Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.
This book is a landmark that forever changed the way researchers use graphics to display information. Admirers of Tufte’s sometimes contentious approach will want to check out his other works as well, including Beautiful Evidence (2006).
Wand, M.P. 1996. “Data-based choice of histogram bin width.” The American Statistician 51(1): 59–73.
Not for the faint of heart or the mathematically underprepared, but this book is a thorough technical investigation of various rules for determining the appropriate number of bins for a histogram.
Wilkins, Jesse L.M. 2000. “Why divide by N-1?” Illinois Mathematics Teacher (Fall): 13–18. https://scholar.vt.edu/access/content/user/wilkins/Public/IMT.pdf.
This is a clear and detailed explanation of a question that invariably arises in statistics classes and proves surprisingly difficult to answer: why, when calculating the sample variance, do we divide by (n − 1) rather than n?
Agresti, Alan. 2002. Categorical Data Analysis. 2nd ed. Hoboken, NJ: Wiley.
This is the standard textbook for advanced classes on categorical data analysis. It can be heavy going for the beginner but is clearly written and covers everything from 2×2 tables to linear models.
Davenport, Ernest C., and Nader A. El-Sanhurry. 1991. “Phi/phimax: Review and synthesis.” Educational and Psychological Measurement 51(4): 821–828.
This is a discussion of the range of phi in relation to different data distributions and investigation of a potential solution.
Fisher, R.A. 1925. “Applications of ‘student’s’ distribution.” Metron 5: 90–104.
This discusses testing for differences between means using the characteristics of the t distribution.
Gosset, William Sealy. 1908. “The probable error of a mean.” Biometrika 6(1): 1–25.
This is the original paper describing characteristics of the t distribution.
Senn, S., and W. Richardson. 1994. “The first t-test.” Statistics in Medicine 13(8): 785–803.
This article is about the first application of the t-test in a medical clinical trial.
Case, Anne, and Christina Paxson. 2008. “Stature and status: Height, ability, and labor market outcomes.” Journal of Political Economy 116(3), 499–532.
This article discusses the positive relationship between height and income, arguing that this observed relationship is due to the positive relationship between height and cognitive ability.
Holland, Paul W. 1986. “Statistics and causal inference.” Journal of the American Statistical Association 81(396): 945–960.
This describes the problematic relationship between the need to determine causal inference and the statistical tools available to analyze certain types of data.
Spearman, C. 1904. “The proof and measurement of association between two things.” American Journal of Psychology 15: 72–101.
This is perhaps the most influential paper on measures of association in the history of psychology.
Stanton, Jeffrey M. 2001. “Galton, Pearson, and the peas. A brief history of linear regression for statistics instructors.” Journal of Statistics Education 9(3).
This is a very readable introduction to the development of ideas underlying correlation and regression.
Cohen, J., P. Cohen, S.G. West, and L.S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates.
This is an excellent textbook introduction to simple and multiple regression.
Dunteman, George H., and Moon-Ho R. Ho. 2006. An Introduction to Generalized Linear Models. Thousand Oaks, CA: SAGE Publications.
One of the Sage “little green books,” this slim (72 pages) volume provides an excellent overview of the general linear model for those who are comfortable with reading mathematical equations.
Galton, Francis. 1886. “Regression towards mediocrity in hereditary stature.” Journal of the Anthropological Institute 15: 246–263. http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf.
This is the original paper on regression to the mean.
Glass, G.V., P.D. Peckham, and J.R. Sanders. 1972. “Consequences of failure to meet assumptions underlying the analysis of variance and covariance.” Review of Educational Research 42: 237–288.
This is a technical paper on the assumptions underlying ANOVA and ANCOVA and the consequences for the analysis when they are not met.
Fisher, R.A. 1931. “Studies in crop variation. I. An examination of the yield of dressed grain from Broadbalk.” Journal of Agricultural Science 11: 107–135.
This covers the original experiments and formulation underlying ANOVA.
Miler, G.A., and J.P. Chaplin. 2001. “Misunderstanding analysis of covariance.” Journal of Abnormal Psychology 110(1): 40–48.
This is a clear discussion of the appropriate use of ANCOVA and what this technique can and can’t do for a research project.
Achen, Christopher H. 1982. Interpreting and Using Regression. Thousand Oaks, CA: Sage Publications.
A Sage “little green book,” this offers an excellent introduction to the correct (and cautious) interpretation of multiple linear regression models.
Jacard, James, Robert Turrisi, and C.K. Wan. 1990. Interaction Effects in Multiple Regression. Thousand Oaks, CA: Sage Publications.
Another Sage “little green book,” this one offers a straightforward synthesis of theory and practice regarding interaction effects in regression models.
O’Brien, R.M. 2007. “A caution regarding rules of thumb for variance inflation factors.” Quality & Quantity 41: 673–690.
O’Brien argues that applying conventional rules of thumb exaggerates the problems caused by multicollinearity and that typical solutions to perceived multicollinearity can cause more problems than they solve.
Bates, Douglas M., and Donald G. Watts. 1988. Nonlinear Regression Analysis and Its Applications. New York: Wiley.
This is a very practical textbook introduction to curve fitting and nonlinear modeling.
Efron, Bradley. 1982. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadephia: Society for Industrial and Applied Mathematics.
This is a classic textbook on resampling methods.
Hosmer, David W., and Stanley Lemeshow. 2000. Applied Logistic Regression, 2nd ed. New York: Wiley.
This is a practical presentation of logistic regression and its applications for advanced students and specialists.
Gould, Stephen Jay. 1996. The Mismeasure of Man. W.W. Norton & Company.
This excellent book sets out the historical context of intelligence testing and the (mis)use of various multivariate techniques in the understanding of individual differences.
Hartigan, J.A. 1975. Clustering Algorithms. New York: Wiley.
This book is a modern classic with complete coverage of foundation concepts in clustering, including distance measures, with sufficient detail to implement all the algorithms.
Conover, W.J. 1999. Practical Nonparametric Statistics. Hoboken, NJ: Wiley.
This is one book that lives up to its title; it’s a great reference for people who need to learn how to do the appropriate nonparametric test for a particular situation and don’t want a lengthy theoretical discussion of each statistic. Conover’s book includes a handy chart for finding nonparametric equivalents for a parametric test.
HealthKnowledge. “Parametric and non-parametric tests for comparing two or more groups.” http://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests.
This is a series of handy charts to help you locate the appropriate nonparametric statistics for different analytic situations, produced as part of an online public health course by the Department of Health of the United Kingdom.
Mann, H.B., and D.R. Whitney. 1947. “On a test of whether one of two random variables is stochastically larger than the other.” Annals of Mathematical Statistics 18: 50–60.
This paper extends the Wilcoxon Mann Whitney-U test to unequal sample sizes.
Wilcoxon, F. 1945. “Individual comparisons by ranking methods.” Biometrics Bulletin 1: 80–83.
This is the original paper describing the Wilcoxon Mann Whitney-U test for equal sample sizes.
Wilcoxon, F. 1957. Some Rapid Approximate Statistical Procedures. Stamford, CT: American Cyanamid. Revised with R.A. Wilcox, 1964.
These are the original and revised papers that describe Wilcoxon’s Signed Rank Test, including a table of critical values.
Clemen, Roger T. 2001. Making Hard Decisions: An Introduction to Decision Analysis. Pacific Grove, CA: Duxbury Press.
This textbook emphasizes the logical and philosophical problems behind decision making while discussing different approaches to decision analysis.
The Economist Newspaper. 1997. Numbers Guide: The Essentials of Business Numeracy. Hoboken, NJ: Wiley.
This handy pocket guide describes numerical operations useful in business, including index numbers, interest and mortgage problems, forecasting, hypothesis testing, decision theory, and linear programming.
Gordon, Robert J. 1999. “The Boskin Commission Report and its aftermath.” Paper presented at the Conference on the Measurement of Inflation, Cardiff, Wales. http://faculty-web.at.northwestern.edu/economics/gordon/346.pdf.
This summarizes criticisms regarding the U.S. Consumer Price Index, including those identified by the 1995 Boskin Commission report, which suggested that the CPI overstated inflation.
Shumway, Robert, and David S. Stoffer. 2006. Time Series Analysis and Its Applications: With R Examples. New York: Springer.
This popular time series textbook includes code in R (a free computer language) to execute time series analyses.
Tague, Nancy. 2005. The Quality Toolbox. 2nd ed. Milwaukee, WI: American Society for Quality.
This reference book provides an overview and brief history of Quality Improvement (QI), followed by an alphabetical guide to QI tools, including standard statistical and graphical procedures such as the box plot and hypothesis testing, and more specialized tools such as control charts and fishbone diagrams.
Cohen, Jacob. 2002. “A power primer.” Psychological Bulletin 112 (July).
This very readable introduction to power concepts is prefaced by research by Cohen and others into the neglect of power considerations in published studies.
Ahrens, Wolfgang, and Iris Pigeot, Eds. 2004. Handbook of Epidemiology. New York: Springer.
This guide to epidemiology consisting of chapters on specialized topics written by experts in each field. The chapter on sample size calculations and power analysis includes formulas and examples for the most common study designs used in medicine and epidemiology.
Hennekens, Charles H., and Julie E. Buring. 1987. Epidemiology in Medicine. Boston: Little, Brown.
This is an easy-to-read introduction to epidemiology, from basic concepts through study design and types of analysis.
Pagano, Marcello, and Kimberlee Gauvreau. 2000. Principles of Biostatistics. 2nd ed. Pacific Grove, CA: Duxbury Press.
This introduction to biostatistics is suitable for an undergraduate course; it’s less detailed and easier to use than Rosner’s text.
Rosner, Bernard. Fundamentals of Biostatistics. 6th ed. Pacific Grove, CA: Duxbury Press, 2005.
An excellent introduction to biostatics for graduate students or those who are willing to grapple with more theoretical details than are provided in Pagano and Gauvreau’s text.
Rothman, Kenneth J., et al. 2008. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott, Wilkins, and Williams.
This is a very thorough discussion of epidemiology, including several chapters written by guest authors, for students willing and able to grapple with the subject.
Crocker, Linda, and James Algina. 2006. Introduction to Classical and Modern Test Theory. Independence, KY: Wadsworth.
This is an updated version of a standard textbook that is strongest in its descriptions of models based on classical test theory.
Ebel, R.L. 1965. Measuring Educational Achievement. Englewood Cliffs, NJ: Prentice Hall.
This text is the source of the rules to interpret item discrimination that are cited in Chapter 16.
Embretson, Susan, and Steven Reise. 2000. Item Response Theory for Psychologists. Mahwah, NJ: Erlbaum.
This is an introductory textbook that takes an intuitive approach to IRT, with many graphical displays and analogies with classic measurement approaches.
Hambleton, Ronald K., et al. 1991. Fundamentals of Item Response Theory. Thousand Oaks, CA: Sage Publications.
This provides a very clear introduction to item response theory that explains how it overcomes some of the limitations of classic test theory.
Tanner, David E. 2001. Assessing Academic Achievement. Boston: Allyn and Bacon.
This straightforward text, written for teachers and administrators, covers the major issues in academic testing and evaluation; it discusses contemporary issues (authentic assessment, high-stakes testing, computer-adaptive testing) as well as traditional topics such as classic test theory and norm-referenced versus criterion-referenced assessment.
Boslaugh, Sarah. 2004. An Intermediate Guide to SPSS Programming: Using Syntax for Data Management. Thousand Oaks, CA: Sage.
Boslaugh covers the basic aspects of data management for people who will be managing and analyzing data by using SPSS and includes the code to perform many tasks.
Cody, Ron. 1999. Cody’s Data Cleaning Techniques Using SAS Software. Cary, NC: SAS Institute.
Cody presents techniques for checking and cleaning data by using SAS, including many examples of standard procedures and the SAS code to carry them out.
Hernandez, M.J. 2003. Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design. 2nd ed. Upper Saddle River, NJ: Addison Wesley.
This is a good guide to the theory and practice of setting up databases, discussed in terms of principles applicable to any database rather than instructions in using any particular software product.
Levesque, Raynald. Raynald’s SPSS Pages. http://www.spsstools.net/.
Two websites run by the experienced SPSS programmer Raynald Levesque; both are loaded with tips, tricks, and sample code.
Little, Roderick J.A., and Donald B. Rubin. 2002. Statistical Analysis with Missing Data. 2nd ed. Hoboken, NJ: Wiley.
Little and Rubin wrote the book on missing data, and this is the standard reference on the subject. However, it’s not for the faint of heart and assumes considerable mathematical sophistication on the part of the reader.
Christensen, Larry B. 2006. Experimental Methodology, 10th ed. Boston: Allyn & Bacon.
This is a very readable and comprehensive introduction to research and experimental design with a focus on educational and psychological topics.
Fisher, R.A. 1990. Statistical Methods, Experimental Design, and Scientific Inference: A Re-issue of Statistical Methods for Research Workers, the Design of Experiments, and Statistical Methods and Scientific Inference. Oxford: Oxford University Press.
If you want to read the original rationale for many of the designs and issues described in this chapter, there is no better place than the original source.
The Framingham Heart Study. http://www.framinghamheartstudy.org/.
This is the official website of one of the largest, longest, and most famous prospective cohort studies in the history of medicine.
Martin, F., and D. Siddle. 2003. “The interactive effects of alcohol and Temazepam of P300 and reaction time.” Brain and Cognition, 53(1): 58–65.
This is the article used as an example of research design in Chapter 18.
Robinson, W.S. 1950. “Ecological correlations and the behavior of individuals.” American Sociological Review 15(3): 351–357. Reprinted in the International Journal of Epidemiology (2009). http://ije.oxfordjournals.org/content/early/2009/01/28/ije.dyn357.full.pdf+html.
This is a classic paper on the ecological fallacy, demonstrated with data correlating literacy with race and national origin.
Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The central role of the propensity score in observational studies for causal effects.” Biometrika 70: 41–55.
This is the article in which Rosenbaum and Rubin introduced the concept of the propensity score, which is now commonly used in case control studies in medical research.
Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Florence, KY: Wadsworth Publishing.
This is an updated version of the classic text on research design; not the best book for beginners but essential for anyone who really wants to understand the issues.
Wolff, Alexander, Albert Chen, and Tim Smith. 2002. “That Old Black Magic.” Sports Illustrated 96 (January): 50–62.
This article examines the validity for claims of the “Sports Illustrated jinx,” a phenomenon often cited as a classic case of regression to the mean.
Alley, Michael. 2003. The Craft of Scientific Presentations. New York: Springer.
A book-length consideration of different styles of scientific presentation (for example, informative versus persuasive), this book has many examples as well as general principles for what makes a presentation successful or unsuccessful.
LaMontaigne, Mario. “Planning a scientific presentation.” http://www.biomech.uottawa.ca/english/teaching/apa6905/lectures/presentation-style.pdf.
This is a slide presentation on how to create good slide presentations, with humorous illustrations of some ways to go wrong as well.
“Slides from NISS/ASA Technical Writing Workshop for Young Researchers . . . and Some Other Stuff.” (August 2007). http://www.public.iastate.edu/~vardeman/RTGWritingStuff.html.
This collection of slides and other resources about writing scientific articles for professional journals is from a workshop sponsored by the American Statistical Association and the National Independent Statistical Service; the primary target is students and young researchers writing their first article, but there’s lots of advice that will be useful to more experienced writers as well.
Ternes, Reuben. 2011. “Writing with statistics.” Purdue Online Writing Lab. http://owl.english.purdue.edu/owl/resource/672/1/.
This is a basic guide to communicating with statistics, intended for undergraduate students. The OWL (Online Writing Lab) has other useful information for scientific writers, including a guide to writing abstracts, guides to the major citation systems, and guides to writing in medicine, nursing, and engineering.
The OPEN Notebook. http://www.theopennotebook.com.
A website devoted to science journalism, The OPEN Notebook focuses on writing for general audiences and presents a combination of technical advice and behind-the-scenes looks at the process behind the writing of well-known articles and books (such as Rebecca Skloot’s The Immortal Life of Henrietta Lacks).
United Nations Economic Commission for Europe. 2009. Making Data Meaningful.
This three-part series is written for managers, public relations officers, statisticians, and others who communicate statistical information to the public and other nontechnical audiences. Part 1 explains how to turn statistical information into a story that will capture the public’s imagination and communicate important information, Part 2 discusses how to present statistics (both verbally and graphically), and Part 3 discusses media relations.
Good, Phillip I., and James W. Hardin. 2006. Common Errors in Statistics (and How to Avoid Them). Hoboken, NJ: Wiley.
This is a guide to avoiding common mistakes in statistical methodology and reasoning.
Alderson, Phil, and Sally Green, eds. 2009. The Cochrane Collaboration Learning Material for Reviewers. http://www.cochrane-net.org/openlearning/.
This includes a clear discussion of publication bias, written to support the efforts of The Cochrane Collaboration, an international organization whose purpose is to support informed decision making in health care.
Darryl Huff’s How to Lie with Statistics, cited as a general reference at the start of this appendix, is also highly relevant to this chapter.