# Inside IES Research

### Notes from NCER & NCSER

For nearly 15 years, NCER has supported the development and improvement of innovative methodological and statistical tools and approaches that will better enable applied education researchers to conduct high-quality, rigorous education research. This blog spotlights the work of Andrew Gelman, a professor of statistics and political science at Columbia University, and Sophia Rabe-Hesketh, a professor of statistics at the School of Education at the University of California, Berkeley. IES has supported their research on hierarchical modeling and Bayesian computation has for many years. In this interview blog, Drs. Gelman and Rabe-Hesketh reflect on how Bayesian modeling applies to educational data and describe the general principles and advantages of Bayesian analysis.

What motivates your research on hierarchical modeling and Bayesian computation?

Education data can be messy. We need to adjust for covariates in experiments and observational studies, and we need to be able to generalize from non-random, non-representative samples to populations of interest.

The general motivation for multilevel modeling is that we are interested in local parameters, such as public opinion by states, small-area disease incidence rates, individual performance in sports, school-district-level learning loss, and other quantities that vary among people, across locations, and over time. In non-Bayesian settings, the local parameters are called random effects, varying intercepts/slopes, or latent variables.

Bayesian and non-Bayesian models differ in how completely the researcher using them must specify the probability distributions of the parameters. In non-Bayesian models, typically only the data model (also called the likelihood function) must be specified. The underlying parameters, such as the variances of random intercepts, are treated as unknown constants. On the other hand, the Bayesian approach requires specifying a full probability model for all parameters.

A researcher using Bayesian inference encodes additional assumptions about all parameters into prior distributions, then combines information about the parameters from the data model with information from the prior distributions. This results in a posterior distribution for each parameter, which, compared to non-Bayesian model results, provides more information about the appropriateness of the model and supports more complex inferences.

What advantages are there to the Bayesian approach?

Compared to other estimates, Bayesian estimates are based on many more assumptions. One advantage of this is greater stability at small sample sizes. Another advantage is that Bayesian modeling can be used to produce flexible, practice-relevant summaries from a fitted model that other approaches cannot produce. For instance, when modeling school effectiveness, researchers using the Bayesian approach can rely on the full probability model to justifiably obtain the rankings of schools or the probabilities that COVID-related declines in NAEP mean test scores for a district or state have exceeded three points, along with estimates for the variability of these summaries.

Further, Bayesian inference supports generalizability and replicability by freely allowing uncertainty from multiple sources to be integrated into models. Without allowing for uncertainty, it’s difficult to understand what works for whom and why. A familiar example is predicting student grades in college courses. A regression model can be fit to obtain a forecast with uncertainty based on past data on the students, and then this can be combined with student-specific information. Uncertainties in the forecasts for individual students or groups of students will be dependent and can be captured by a joint probability model, as implemented by posterior simulations. This contrasts with likelihood-based (non-Bayesian) inference where predictions and their uncertainty are typically considered only conditionally on the model parameters, with maximum likelihood estimates plugged in. Ignoring uncertainty leads to standard error estimates that are too small on average (see this introduction to Bayesian multilevel regression for a detailed demonstration and discussion of this phenomenon).

What’s an important disadvantage to the Bayesian approach?

Specifying a Bayesian model requires the user to make more decisions than specifying a non-Bayesian model. Until recently, many of these decisions had to be implemented using custom programming, so the Bayesian approach had a steep learning curve. Users who were not up to the programming and debugging task had to work within some restricted class of models that had already been set up with existing software.

This disadvantage is especially challenging in education research, where we often need to adapt and expand our models beyond a restricted class to deal with statistical challenges such as imperfect treatment assignments, nonlinear relations, spatial correlations, and mixtures, along with data issues such as missingness, students changing schools, guessing on tests, and predictors measured with error.