Inside IES Research

Notes from NCER & NCSER

Going beyond existing menus of statistical procedures: Bayesian multilevel modeling with Stan

For nearly 15 years, NCER has supported the development and improvement of innovative methodological and statistical tools and approaches that will better enable applied education researchers to conduct high-quality, rigorous education research. This blog spotlights the work of Andrew Gelman, a professor of statistics and political science at Columbia University, and Sophia Rabe-Hesketh, a professor of statistics at the School of Education at the University of California, Berkeley. IES has supported their research on hierarchical modeling and Bayesian computation has for many years. In this interview blog, Drs. Gelman and Rabe-Hesketh reflect on how Bayesian modeling applies to educational data and describe the general principles and advantages of Bayesian analysis.

What motivates your research on hierarchical modeling and Bayesian computation?

Education data can be messy. We need to adjust for covariates in experiments and observational studies, and we need to be able to generalize from non-random, non-representative samples to populations of interest.

The general motivation for multilevel modeling is that we are interested in local parameters, such as public opinion by states, small-area disease incidence rates, individual performance in sports, school-district-level learning loss, and other quantities that vary among people, across locations, and over time. In non-Bayesian settings, the local parameters are called random effects, varying intercepts/slopes, or latent variables.

Bayesian and non-Bayesian models differ in how completely the researcher using them must specify the probability distributions of the parameters. In non-Bayesian models, typically only the data model (also called the likelihood function) must be specified. The underlying parameters, such as the variances of random intercepts, are treated as unknown constants. On the other hand, the Bayesian approach requires specifying a full probability model for all parameters.  

A researcher using Bayesian inference encodes additional assumptions about all parameters into prior distributions, then combines information about the parameters from the data model with information from the prior distributions. This results in a posterior distribution for each parameter, which, compared to non-Bayesian model results, provides more information about the appropriateness of the model and supports more complex inferences.

What advantages are there to the Bayesian approach?

Compared to other estimates, Bayesian estimates are based on many more assumptions. One advantage of this is greater stability at small sample sizes. Another advantage is that Bayesian modeling can be used to produce flexible, practice-relevant summaries from a fitted model that other approaches cannot produce. For instance, when modeling school effectiveness, researchers using the Bayesian approach can rely on the full probability model to justifiably obtain the rankings of schools or the probabilities that COVID-related declines in NAEP mean test scores for a district or state have exceeded three points, along with estimates for the variability of these summaries. 

Further, Bayesian inference supports generalizability and replicability by freely allowing uncertainty from multiple sources to be integrated into models. Without allowing for uncertainty, it’s difficult to understand what works for whom and why. A familiar example is predicting student grades in college courses. A regression model can be fit to obtain a forecast with uncertainty based on past data on the students, and then this can be combined with student-specific information. Uncertainties in the forecasts for individual students or groups of students will be dependent and can be captured by a joint probability model, as implemented by posterior simulations. This contrasts with likelihood-based (non-Bayesian) inference where predictions and their uncertainty are typically considered only conditionally on the model parameters, with maximum likelihood estimates plugged in. Ignoring uncertainty leads to standard error estimates that are too small on average (see this introduction to Bayesian multilevel regression for a detailed demonstration and discussion of this phenomenon).

What’s an important disadvantage to the Bayesian approach?

Specifying a Bayesian model requires the user to make more decisions than specifying a non-Bayesian model. Until recently, many of these decisions had to be implemented using custom programming, so the Bayesian approach had a steep learning curve. Users who were not up to the programming and debugging task had to work within some restricted class of models that had already been set up with existing software. 

This disadvantage is especially challenging in education research, where we often need to adapt and expand our models beyond a restricted class to deal with statistical challenges such as imperfect treatment assignments, nonlinear relations, spatial correlations, and mixtures, along with data issues such as missingness, students changing schools, guessing on tests, and predictors measured with error.

How did your IES-funded work address this disadvantage?

In 2011, we developed Stan, our open-source Bayesian software, with funding from a Department of Energy grant on large-scale computing. With additional support from the National Science Foundation and IES, we have developed model types, workflows, and case studies for education researchers and also improved Stan’s computational efficiency.

By combining a state-of-the-art inference engine with an expressive modeling language, Stan allows education researchers to build their own models, starting with basic linear and logistic regressions and then adding components of variation and uncertainty and expanding as needed to capture challenges that arise in applied problems at hand.  We recommend the use of Stan as part of a Bayesian workflow of model building, checking, and expansion, making use of graphs of data and fitted models.

Stan can be accessed using R, Python, Stata, Julia, and other software. We recommend getting started by looking at the Stan case studies. We also have a page on Stan for education research and a YouTube channel.

In terms of dealing with the issues that arise in complex educational data, where do we stand today?

Put all this together, and we are in the business of fitting complex models in an open-ended space that goes beyond any existing menu of statistical procedures. Bayesian inference is a flexible way to fit such models, and Stan is a flexible tool that we have developed, allowing general models to be fit in reasonable time using advanced algorithms for statistical computing.  As always with research, there are many loose ends and there is more work to be done, but we can now routinely fit, check, and display models of much greater generality than was before possible, facilitating the goals of understanding processes in education.


This blog was produced by Charles Laurin (Charles.Laurin@ed.gov), NCER program officer for the Statistical and Research Methodology in Education grant program.