IES Blog

Institute of Education Sciences

Going beyond existing menus of statistical procedures: Bayesian multilevel modeling with Stan

For nearly 15 years, NCER has supported the development and improvement of innovative methodological and statistical tools and approaches that will better enable applied education researchers to conduct high-quality, rigorous education research. This blog spotlights the work of Andrew Gelman, a professor of statistics and political science at Columbia University, and Sophia Rabe-Hesketh, a professor of statistics at the School of Education at the University of California, Berkeley. IES has supported their research on hierarchical modeling and Bayesian computation has for many years. In this interview blog, Drs. Gelman and Rabe-Hesketh reflect on how Bayesian modeling applies to educational data and describe the general principles and advantages of Bayesian analysis.

What motivates your research on hierarchical modeling and Bayesian computation?

Education data can be messy. We need to adjust for covariates in experiments and observational studies, and we need to be able to generalize from non-random, non-representative samples to populations of interest.

The general motivation for multilevel modeling is that we are interested in local parameters, such as public opinion by states, small-area disease incidence rates, individual performance in sports, school-district-level learning loss, and other quantities that vary among people, across locations, and over time. In non-Bayesian settings, the local parameters are called random effects, varying intercepts/slopes, or latent variables.

Bayesian and non-Bayesian models differ in how completely the researcher using them must specify the probability distributions of the parameters. In non-Bayesian models, typically only the data model (also called the likelihood function) must be specified. The underlying parameters, such as the variances of random intercepts, are treated as unknown constants. On the other hand, the Bayesian approach requires specifying a full probability model for all parameters.  

A researcher using Bayesian inference encodes additional assumptions about all parameters into prior distributions, then combines information about the parameters from the data model with information from the prior distributions. This results in a posterior distribution for each parameter, which, compared to non-Bayesian model results, provides more information about the appropriateness of the model and supports more complex inferences.

What advantages are there to the Bayesian approach?

Compared to other estimates, Bayesian estimates are based on many more assumptions. One advantage of this is greater stability at small sample sizes. Another advantage is that Bayesian modeling can be used to produce flexible, practice-relevant summaries from a fitted model that other approaches cannot produce. For instance, when modeling school effectiveness, researchers using the Bayesian approach can rely on the full probability model to justifiably obtain the rankings of schools or the probabilities that COVID-related declines in NAEP mean test scores for a district or state have exceeded three points, along with estimates for the variability of these summaries. 

Further, Bayesian inference supports generalizability and replicability by freely allowing uncertainty from multiple sources to be integrated into models. Without allowing for uncertainty, it’s difficult to understand what works for whom and why. A familiar example is predicting student grades in college courses. A regression model can be fit to obtain a forecast with uncertainty based on past data on the students, and then this can be combined with student-specific information. Uncertainties in the forecasts for individual students or groups of students will be dependent and can be captured by a joint probability model, as implemented by posterior simulations. This contrasts with likelihood-based (non-Bayesian) inference where predictions and their uncertainty are typically considered only conditionally on the model parameters, with maximum likelihood estimates plugged in. Ignoring uncertainty leads to standard error estimates that are too small on average (see this introduction to Bayesian multilevel regression for a detailed demonstration and discussion of this phenomenon).

What’s an important disadvantage to the Bayesian approach?

Specifying a Bayesian model requires the user to make more decisions than specifying a non-Bayesian model. Until recently, many of these decisions had to be implemented using custom programming, so the Bayesian approach had a steep learning curve. Users who were not up to the programming and debugging task had to work within some restricted class of models that had already been set up with existing software. 

This disadvantage is especially challenging in education research, where we often need to adapt and expand our models beyond a restricted class to deal with statistical challenges such as imperfect treatment assignments, nonlinear relations, spatial correlations, and mixtures, along with data issues such as missingness, students changing schools, guessing on tests, and predictors measured with error.

How did your IES-funded work address this disadvantage?

In 2011, we developed Stan, our open-source Bayesian software, with funding from a Department of Energy grant on large-scale computing. With additional support from the National Science Foundation and IES, we have developed model types, workflows, and case studies for education researchers and also improved Stan’s computational efficiency.

By combining a state-of-the-art inference engine with an expressive modeling language, Stan allows education researchers to build their own models, starting with basic linear and logistic regressions and then adding components of variation and uncertainty and expanding as needed to capture challenges that arise in applied problems at hand.  We recommend the use of Stan as part of a Bayesian workflow of model building, checking, and expansion, making use of graphs of data and fitted models.

Stan can be accessed using R, Python, Stata, Julia, and other software. We recommend getting started by looking at the Stan case studies. We also have a page on Stan for education research and a YouTube channel.

In terms of dealing with the issues that arise in complex educational data, where do we stand today?

Put all this together, and we are in the business of fitting complex models in an open-ended space that goes beyond any existing menu of statistical procedures. Bayesian inference is a flexible way to fit such models, and Stan is a flexible tool that we have developed, allowing general models to be fit in reasonable time using advanced algorithms for statistical computing.  As always with research, there are many loose ends and there is more work to be done, but we can now routinely fit, check, and display models of much greater generality than was before possible, facilitating the goals of understanding processes in education.


This blog was produced by Charles Laurin (Charles.Laurin@ed.gov), NCER program officer for the Statistical and Research Methodology in Education grant program.

What We are Learning from NAEP Data About Use of Extended Time Accommodations

For students with learning disabilities, many of whom may take more time to read and process information than non-disabled peers, an extended time accommodation (ETA) is often used on standardized assessments. In 2021, IES awarded a grant for researchers to explore the test-taking behavior, including use of accommodations such as ETA, of students with disabilities in middle school using response process data from the NAEP mathematics assessment. In this blog, we interview Dr. Xin Wei from Digital Promise to see what she and Dr. Susu Zhang from University of Illinois at Urbana-Champaign are learning from their study.

The researchers have delved into the performance, process, and survey data of the eighth graders who took the digital NAEP mathematics test in 2017. Their recent article presents a quasi-experimental study examining the differences in these data across three distinct profiles of students with learning disabilities (LDs)—students with LD who received and utilized ETAs, students with LD who were granted ETAs but did not use them, and students with LD who did not receive ETAs.

The key findings from their study are as follows:

  • Students with LDs who used their ETAs performed statistically significantly better than their peers with LDs who were not granted ETA and those who received ETA but did not use it. They also engaged more with the test, as demonstrated by more frequent actions, revisits to items, and greater use of universal design features like drawing tool and text-to-speech functionalities on most of the math items compared to students who were not granted extended time.
  • Students with LDs who had ETAs but chose not to use them performed significantly worse than their peers with LDs who were not granted extended time.
  • Students with LDs who were granted ETAs saw the best performance with an additional 50% time (45 minutes compared to the usual 30 minutes provided to students without ETA).
  • Students who were given extra time, regardless of whether they used it, reported feeling less time pressure, higher math interest, and enjoying math more.
  • There were certain item types for which students who used ETAs performed more favorably.

We recently discussed the results of the study with Dr. Wei to learn more.


Which types of items on the test favored students who used extended time and why do you think they benefited?

Headshot of Xin Wei

The assessment items that particularly benefited from ETAs were not only complex but also inherently time-consuming. For example, students need to complete four sub-questions for item 5, drag six numbers to the correct places for item 6, type answers into four places to complete an equation for item 9, type in a constructive response answer for item 11, and complete a multiple-choice question and type answers in eight places to complete item 13.

For students with LDs, who often have slower processing speeds, these tasks become even more time-intensive. The additional time allows students to engage with each element of the question thoroughly, ensuring they have the opportunity to fully understand and respond to each part. This extended time is not just about accommodating different processing speeds; it's about providing the necessary space for these students to engage with and complete tasks that are intricate and time-consuming by design.

Why did you decide to look at the additional survey data NAEP collects on math interest and enjoyment in your study of extended time?

These affective factors are pivotal to academic success, particularly in STEM fields. Students who enjoy the subject matter tend to perform better, pursue related fields, and continue learning throughout their lives. This is especially relevant for students with LDs, who often face heightened test anxiety and lower interest in math, which can be exacerbated by the pressure of timed assessments. Our study's focus on these affective components revealed that students granted extra time reported a higher level of math interest and enjoyment even if they did not use the extra time. ETAs appear to alleviate the stress tied to time limits, offering dual advantages by not only aiding in academic achievement but also by improving attitudes toward math. ETAs could be a low-cost, high-impact accommodation that not only addresses academic needs but also contributes to emotional health.

What recommendations do you have based on your findings for classroom instruction?

First, it is crucial to prioritize extra time for students with LDs to enhance their academic performance and engagement. This involves offering flexible timing for assignments and assessments to reduce anxiety and foster a greater interest in learning. Teachers should be encouraged to integrate Universal Design for Learning principles into their instructional methods, emphasizing the effective use of technology, such as text-to-speech tools and embedded digital highlighters and pencils for doing scratchwork. Professional development for educators is essential to deepen their proficiency in using digital learning tools. Additionally, teachers should motivate students to use the extra time for thorough problem-solving and to revisit math tasks for accuracy. Regularly adjusting accommodations to meet the evolving needs of students with LDs is vital in creating an inclusive learning environment where every student can achieve success.

What is the implication of the study findings on education equity? 

Our study demonstrates that ETAs offer more than just a performance boost: they provide psychological benefits, reducing stress and enhancing interest and enjoyment with the subject matter. This is vital for students with LDs, who often face heightened anxiety and performance pressure. To make the system more equitable, we need a standardized policy for accommodations that ensures all students who require ETAs receive them. We must consider the variable needs of all students and question the current practices and policies that create inconsistencies in granting accommodations. If the true aim of assessments is to gauge student abilities, time is a factor that should not become a barrier.


U.S. Department of Education Resources

Learn more about the Department’s resources to support schools, educators, and families in making curriculum, instruction, and assessment accessible for students with disabilities.

Learn more about conducting research using response process data from the 2017 NAEP Mathematics Assessment.

 

This  interview blog was produced by Sarah Brasiel (Sarah.Brasiel@ed.gov), a program officer in the National Center for Special Education Research.

New Standards to Advance Equity in Education Research

One year ago, IES introduced a new equity standard and associated recommendations to its Standards for Excellence in Education Research (SEER). The intent of this standard, as well as the other eight SEER standards, is to complement IES’s focus on rigorous evidence building with guidance and supports for practices that have the potential to make research transformational. The addition of equity to SEER is part of IES’s ongoing mission to improve academic achievement and access to educational opportunities for all learners (see IES Diversity Statement). IES is mindful, however, that to authentically and rigorously integrate equity into research, education researchers may need additional resources and tools. To that end, IES hosted a Technical Working Group (TWG) meeting of experts to gather input for IES’s consideration regarding the existing tools and resources that the education community could use as they implement the new SEER equity standard in their research, along with identifying any notable gaps where tools and resources are needed. A summary of the TWG panel discussion and recommendations is now available.

The TWG panel recommended several relevant resources and provided concrete suggestions for ways IES can support education researchers’ learning and growth, including training centers, coaching sessions, webinars, checklists, and new resource development, acknowledging that different researchers may need different kinds of supports. The meeting summary includes both a mix of recommendations for tools and resources, along with important considerations for researchers, including recommendations for best practices, as they try to embed equity in their research. 

The new SEER equity standard and accompanying recommendations have been integrated throughout the current FY 2024 Request for Applications. By underscoring the importance of equity, the research IES supports will both be rigorous and relevant to address the needs of all learners.   


This blog was written by NCER program officer Christina Chhin. If you have questions or feedback regarding the equity TWG, please contact Christina Chhin (Christina.Chhin@ed.gov) or Katina Stapleton (Katina.Stapleton@ed.gov), co-chair of the IES Diversity Council. If you have any questions or feedback regarding the equity standard or associated recommendations, please email NCEE.Feedback@ed.gov.

Encouraging the Use of LGBTQI+ Education Research Data

Until recently, limited data existed in education research focused on the LGBTQI+ community and their experiences. As this area of interest continues to grow, education researchers are learning how to effectively collect these data, interpret their implications, and use them to help improve the educational outcomes of LGBTQI+ identifying students. In this blog post, we review current federal recommendations for data collection and encourage researchers to submit FY 2024 applications focused on the educational experiences and outcomes of Lesbian, Gay, Bisexual, Transgender, Queer, and Intersex (LGBTQI+) identifying students.

Collecting Data on Sexual Orientation and Gender Identities

In January 2023, the Office of the Chief Statistician of the United States released a report with recommendations on how to effectively design federal statistics surveys to account for sexual orientation and gender identities (SOGI). While this report is for a federal audience, the recommendations are relevant and useful for education researchers who wish to measure the identities and experiences of those in the LGBTQI+ community. Some suggestions include—

  • Provide multiple options for sexual orientation identification (for example, gay/lesbian, straight, bisexual, use other term)
  • Provide a two-question set in order to measure gender identity—one asking for sex assigned at birth, and one for current self-identification
  • Provide write-in response and multiple-response options for SOGI-related questions
  • Allow respondents to proceed through the survey if they choose not to answer unless answers to any of these items are critical for data collection

Education researchers looking to incorporate SOGI data into their studies can also use existing SOGI data collected by the National Center for Education Statistics (NCES) to support their research. A new NCES blog outlines the studies that collect SOGI information and outlines some initial findings from that data.

Funding Opportunities for Research to Improve Outcomes of LGBTQI+ students

In alignment with the SEER Equity Standard, IES encourages researchers to submit applications to the FY 2024 research grant competitions that support the academic and social behavioral outcomes of students who identify as LGBTQI+. IES is especially interested in research proposals that involve—

  • Describing the educational experiences and outcomes of LGBTQI+ students
  • Creating safe and inclusive learning environments that support the needs of all LGBTQI+ students.
  • Identifying promising practices for school-based health services and supports, especially mental health services, that are accessible to and supportive of LGBTQI+ students
  • Identifying systems-level approaches that reduce barriers to accessing and participating in high quality learning environments for LGBTQI+ students

Check out our funding opportunities page for more information about our FY 2024 requests for applications. If you have specific questions about the appropriateness of your research for a specific FY 2024 research competition, please contact the relevant program officer listed in the request for applications.


This blog is part of a 3-part Inside IES Research blog series on sexual orientation and gender identity in education research in observance of Pride month. The other posts discuss the feedback from the IES LGBTQI+ Listening and Learning session and the first ever learning game featuring a canonically nonbinary character.

This blog was produced by Virtual Student Federal Service intern Audrey Im with feedback from IES program officers Katina Stapleton (NCER - Katina.Stapleton@ed.gov) and Katherine Taylor (NCSER - Katherine.Taylor@ed.gov) and NCES project officers Elise Christopher (Elise.Christopher@ed.gov) and Maura Spiegelman (Maura.Spiegelman@ed.gov).

English Learners: Analyzing What Works, for Whom, and Under What Conditions?

April is National Bilingual/Multilingual Learner Advocacy Month! In this guest blog, Dr. Ryan Williams, principal researcher at the American Institutes for Research, describes his IES-funded project focused on identifying factors that help explain variation in the effects programs have on English learner student outcomes using a broad systematic review and meta-analysis.  

Over the past two decades, empirical research on programs that support English language and multilingual learners has surged. Many of the programs that researchers have studied are designed to support English literacy development and are tailored to the unique needs of English learners. Other programs are more general, but researchers often study program impacts on English learners in addition to impacts on a broader population of students. Relatively few attempts have been made to identify common findings across this literature. Even fewer attempts have been made to identify meaningful sources of variation that drive program impacts for English learner students—that is, understanding what works, for whom, and under what conditions. To help provide educators and policymakers answers to those important questions, we conducted a systematic review and meta-analysis of the effectiveness of programs and strategies that may support English language learner students.

Our Systematic Review Process

We conducted a broad search that combed through electronic databases, unpublished ‘grey’ literature (for example, working papers, conference presentations, or research briefs), and sources that required hand-searching such as organizational websites. After documenting our primary decision-making factors within a review protocol, we applied a set of rigorous criteria to select studies for inclusion in the meta-analysis. We ultimately identified 83 studies that met our inclusion criteria. Each of these were randomized field studies that included English learner students in grades PK-12 and student academic learning outcomes such as English literacy, mathematics, science, and social studies. Each of the included studies was systematically coded to capture characteristics about the research methods, students and schools, settings, programs, outcome measures, and importantly, the program impacts that the studies reported. We then conducted a meta-analysis to understand the relationships between the characteristics we coded and the program impacts.

Preliminary Findings

We are still working on finalizing our analyses; however, our initial analyses revealed several interesting findings.

  • Programs that included support for students to develop their first language skills tended to have larger improvements in student learning. This is consistent with prior research that suggests that supporting first language development can lead to improved learning in core content areas. However, the initial findings from this meta-analysis build on the prior research by providing empirical evidence across a large number of rigorous studies.
  • There are some particularly promising practices for educators serving English learner students. These promising practices include the use of content differentiation, the use of translation in a student’s first language, and a focus on writing. Content differentiation aligns with best practices for teaching English learners, which emphasize the importance of providing instruction that is tailored to language proficiency levels and academic needs. The use of first language translation can be helpful for English learner students, as it can support their ability to access and comprehend academic content while they are still building their English proficiency. Focusing on writing can also be particularly important for English learners, as writing is often the last domain of language proficiency for students to develop. Our preliminary findings that English learner writing skills are responsive when targeted by instructional programs may hold implications for how to focus support for students who are nearing but not yet reaching English proficiency.
  • The type of test used to measure program impact was related to the size of the program impact on student learning that studies found. Specifically, we found that it is reasonable to expect smaller program impacts when examining state standardized tests and larger impacts for other types of tests. This is consistent with findings from prior meta-analyses based on more general student populations, and it demonstrates the same applies when studying program impacts for English learner students. Statewide standardized tests are typically designed to cover a broad range of state content standards and thus may not reflect improvements in more specific areas of student learning targeted by a given program. On the other hand, researcher-developed tests may align too closely with a program and may not reflect broader, policy-relevant, changes in learning. Our initial evidence suggests that to understand program impacts for English learner students—or any group of students—we may want to use established, validated assessments but not only consider statewide standardized tests.

Next Steps

In terms of next steps, we will complete the meta-analysis work this summer and focus on disseminating the findings through multiple avenues, including a journal publication, review summaries on the AIR website, and future conference proceedings. In addition, we are working to deepen our understanding of the relationships identified in this study and explore promising avenues for practice and future research.

If you’d like to continue learning and see the results of this study, please continue to check back at AIR’s Methods of Synthesis and Integration Center project page, located here.


This blog was produced by Helyn Kim (Helyn.Kim@ed.gov), program officer for the English Learners portfolio, NCER.