Probability and Bayesian Modeling

The Traditional Introduction to Statistics

A traditional introduction to statistical thinking and methods is the two-semester probability and statistics course offered in mathematics and statistics departments. This traditional course provides an introduction to calculus-based probability and statistical inference. The first half of the course is an introduction to probability including discrete, continuous, and multivariate distributions. The chapters on functions of random variables and sampling distributions naturally lead into statistical inference including point estimates and hypothesis testing, regression models, design of experiments, and ANOVA models.

Although this traditional course remains popular, there seems to be little discussion in this course on the application of the inferential material in modern statistical practice. Although there are benefits in discussing methods of estimation such as maximum likelihood, and optimal inference such as a best hypothesis test, the students learn little about statistical computation and simulation-based inferential methods. As stated in Cobb (2015), there appears to be a disconnect between the statistical content we teach and statistical practice.

Developing a New Course

The development of any new statistics course should be consistent with current thinking of faculty dedicated to teaching statistics at the undergraduate level. Cobb (2015) argues that we need to deeply rethink our undergraduate statistics curriculum from the ground up. Towards this general goal, Cobb (2015) proposes “five imperatives” that can help the process of creating this new curriculum. These imperatives are to: (1) flatten prerequisites, (2) seek depth in understanding fundamental concepts, (3) embrace computation in statistics, (4) exploit the use of context to motivate statistical concepts, and (5) implement research-based learning.

Why Bayes?

There are good reasons for introducing the Bayesian perspective at the calculus-based undergraduate level. First, many people believe that the Bayesian approach provides a more intuitive and straightforward introduction than the frequentist approach to statistical inference. Given that the students are learning probability, Bayes provides a useful way of using probability to update beliefs from data. Second, given the large growth of Bayesian applied work in recent years, it is desirable to introduce the undergraduate students to some modern Bayesian applications of statistical methodology. The timing of a Bayesian course is right given the ready availability of Bayesian instructional material and increasing amounts of Bayesian computational resources.

We propose that Cobb’s five imperatives can be implemented through a Bayesian statistics course. Simulation provides an attractive “flattened prerequisites” strategy in performing inference. In a Bayesian inferential calculation, one avoids the integration issue by simulating a large number of values from the posterior distribution and summarizing this simulated sample. Moreover, by teaching fundamentals of Bayesian inference of conjugate models together with simulation-based inference, students gain a deeper understanding of Bayesian thinking. Familiarity with simulation methods in the conjugate case prepares students for the use of simulation algorithms later for more advanced Bayesian models.

One advantage of a Bayes perspective is the opportunity to input expert opinion by the prior distribution which allows students to “exploit context” beyond a traditional statistical analysis. This text introduces strategies for constructing priors when one has substantial prior information and when one has little prior knowledge.

To further “exploit context”, we introduce one particular Bayesian success story: the use of hierarchical modeling to simultaneously estimate parameters from several groups. In many applied statistical analyses, a common problem is to combine estimates from several groups, often with certain groups having limited amounts of available data. Through interesting applications, we introduce hierarchical modeling as an effective way to achieve partial pooling of the separate estimates.

Thanks to a number of general-purpose software programs available for Bayesian MCMC computation (e.g. openBUGS, JAGS, Nimble, and Stan), students are able to learn and apply more advanced Bayesian models for complex problems. We believe it is important to introduce the students to at least one of these programs which “flattens the prerequisite” of computational experience and “embraces computation”. The main task in the use of these programs is the specification of a script defining the Bayesian model, and the Bayesian fitting is implemented by a single function that inputs the model description, the data and prior parameters, and any tuning parameters of the algorithm. By writing the script defining the full Bayesian model, we believe the students get a deeper understanding of the sampling and prior components of the model. Moreover, the use of this software for sophisticated models such as hierarchical models lowers the bar for students implementing these methods. The focus of the students’ work is not the computation but rather the summarization and interpretation of the MCMC output. Students interested in the nuts and bolts of the MCMC algorithms can further their learning through directed research or independent study.

Last, we believe all aspects of a Bayesian analysis are communicated best through interesting case studies. In a good case study, one describes the background of the study and the inferential or predictive problems of interest. In a Bayesian applied analysis in particular, one learns about the construction of the prior to represent expert opinion, the development of the likelihood, and the use of the posterior distribution to address the questions of interest. We therefore propose the inclusion of fully-developed case studies in a Bayesian course for students’ learning and practice. Based on our teaching experience, having students work on a course project is the best way for them to learn, resonating with Cobb’s “teach through research”.

Audience and Structure of this Text

This text is intended for students with a background in calculus but not necessarily any experience in programming. Chapters 1 through 6 resemble the material in a traditional probability course, including foundations, conditional probability, discrete and continuous distributions, and joint distributions. Simulation-based approximations are introduced throughout these chapters to get students exposed to new and complementary ways to understand probability and probability distributions, as well as programming in R.

Although there are applications of Bayes’ rule in the probability chapters, the main Bayesian inferential material begins in Chapters 7 and 8 with a discussion of inferential and prediction methods for a single binomial proportion and a single normal mean. The foundational elements of Bayesian inference are described in these two chapters, including the construction of a subjective prior, the computation of the likelihood and posterior distributions, and the summarization of the posterior for different types of inference. Exact posterior distributions based on conjugacy, and approximation based on Monte Carlo simulation, are introduced and compared. Predictive distributions are described both for predicting future data and also for implementing model checking.

Chapters 9 through 13 are heavily dependent on simulation algorithms. Chapter 9 provides an overview of Markov Chain Monte Carlo (MCMC) algorithms with a focus on Gibbs sampling and Metropolis-Hastings algorithms. We also introduce the Just Another Gibbs Sampler (JAGS) software, enabling students to gain a deeper understanding of the sampling and prior components of a Bayesian model and stay focused on summarization and interpretation of the MCMC output for communicating their findings.

Chapter 10 describes the fundamentals of hierarchical modeling where one wishes to combine observations from related groups. Chapters 11 and 12 illustrate Bayesian inference, prediction, and model checking for linear and logistic regression models. Chapter 13 describes several interesting case studies motivated by some historical Bayesian studies and our own research. JAGS is the main software in these chapters for implementing the MCMC inference.

For the interested reader, there is a wealth of good texts describing Bayesian modeling at different levels and directed to various audiences. Berry (1996) is a nice presentation of Bayesian thinking for an introductory statistics class, and Gelman, et al (2013) and Hoff (2009) are good descriptions of Bayesian methodology at a graduate level.

Resources

The following website hosts the datasets and R scripts for all chapters and maintains a current errata list:

https://monika76five.github.io/ProbBayes/

A special R package, ProbBayes (Albert (2019)), containing all of the datasets and special functions for the text, is available on GitHub. The package can be installed by the install_github() function from the devtools package.

library(devtools)
install_github("bayesball/ProbBayes")

Teaching material, including lecture slides and videos, homework and labs of an undergraduate Bayesian statistics course taught at one of the authors’ institutions, is available at:

https://github.com/monika76five/BayesianStatistics

Acknowledgments

The authors are very grateful to Dalene Stangl, who played an important role in our collaboration, and our editor, John Kimmel, who provided us with timely reviews that led to significant improvements of the manuscript. We wish to thank our partners, Anne and Hao, for their patience and encouragement as we were working on the manuscript. Jingchen Hu also thanks the group of students taking Bayesian Statistics at Vassar College during Spring 2019, who tried out some of the material.