Skip to main content

Breadcrumb

Home arrow_forward_ios Information on IES-Funded Research arrow_forward_ios Improving the Power of Education Ex ...
Home arrow_forward_ios ... arrow_forward_ios Improving the Power of Education Ex ...
Information on IES-Funded Research
Grant Closed

Improving the Power of Education Experiments with Auxiliary Data

NCER
Program: Statistical and Research Methodology in Education
Program topic(s): Core
Award amount: $576,429
Principal investigator: Johann Gagnon-Bartsch
Awardee:
University of Michigan
Year: 2021
Award period: 3 years 7 months (03/01/2021 - 10/15/2024)
Project type:
Methodological Innovation
Award number: R305D210031

Purpose

The purpose of this project is to develop novel methodology to estimate treatment effects from randomized controlled trials (RCTs), while incorporating large observational remnant data and cutting-edge machine learning prediction algorithms to improve precision. The statistical precision of effect estimates from an RCT is limited by the RCT's sample size, which itself is typically subject to a number of practical constraints, such as cost. In many cases, RCT estimates may be too imprecise to guide policy or inform science, and this problem is particularly acute in the case of subgroup analyses.

Project Activities

The research team developed statistical methods and data science tools to combine data from RCTs in education research with auxiliary data gathered from large administrative databases: that is, covariate and outcome data on students or schools that did not participate in the RCT. Precision gains derived from the use of these data allowed the research team to increase the  effective sample size of trials, thereby increasing statistical power and allowing for reduced trial costs or more efficient use of resources, or both. Added precision additionally allowed for improved subgroup analyses and estimates of effect variability in reanalysis of prior studies, resulting in broader generalizability of the results. The team specifically addressed common RCT designs and data structures in education research, including paired-cluster randomized trials and longitudinal data measurements. Furthermore, the team addressed the practical challenge of sharing data under privacy constraints. The team also produced software implementing their methods for third party education researchers.

Key outcomes

The research team developed statistical methods and data science tools to combine data from RCTs in education research with auxiliary data gathered from large administrative databases: that is, covariate and outcome data on students or schools that did not participate in the RCT. Precision gains derived from the use of these data allowed the research team to increase the  effective sample size of trials, thereby increasing statistical power and allowing for reduced trial costs or more efficient use of resources, or both. Added precision additionally allowed for improved subgroup analyses and estimates of effect variability in reanalysis of prior studies, resulting in broader generalizability of the results. The team specifically addressed common RCT designs and data structures in education research, including paired-cluster randomized trials and longitudinal data measurements. Furthermore, the team addressed the practical challenge of sharing data under privacy constraints. The team also produced software implementing their methods for third party education researchers.

The main products of this research were peer-reviewed articles and conference papers detailing the novel statistical methods, as well as flexible, user-friendly, open-source software intended for use by applied education researchers. 

People and institutions involved

Project contributors

Neil Heffernan III

Co-principal investigator

Adam Sales

Co-principal investigator

Products and publications

The main product of this research will be flexible, user-friendly, open-source software available to and readily usable by applied education researchers.

Software:

  • https://github.com/manncz/dRCT – R package that implements the core statistical methodology developed for the project.
  • https://github.com/Bakri-1/loop_shiny_app – Shiny app for users of ASSISTments
  • https://github.com/jaylinlowe/dRCTpower – Shiny app for power calculations

Project website:

R package that implements the core statistical methodology developed for the project.

Publications:

Publications:

Published Journal Articles:

Gagnon-Bartsch*, J. A., Sales*, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., and Heffernan, N. T. (2023). “Precise unbiased estimation in randomized experiments using auxiliary observational data.” Journal of Causal Inference, vol. 11, no. 1, 2023, pp. 20220011. https://doi.org/10.1515/jci-2022-0011 

Sales, A. C., Prihar, E. B., Gagnon-Bartsch, J. A., & Heffernan, N.T. (2023). Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results. Journal of Educational Data Mining, 15(2), 53–85. https://doi.org/10.5281/zenodo.8016854 

Published Conference Papers:

Sales, A.C., Prihar, E., Gagnon-Bartsch, J., Gurung, A., Heffernan, N.T. (2022). More Powerful A/B Testing Using Auxiliary Data and Deep Learning. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium. AIED 2022. Lecture Notes in Computer Science, vol 13356. Springer, Cham. https://doi.org/10.1007/978-3-031-11647-6_107 

Pham, D. M., Vanacore, K. P., Sales, A. C., & Gagnon-Bartsch, J. A. (2024). LOOL: Towards Personalization with Flexible & Robust Estimation of Heterogeneous Treatment Effects. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 376-384). 

Lowe, J., Mann, C. Z., Wang, J., Sales, A. C., and Gagnon-Bartsch, J. A. (2024), “Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data,” In Proceedings of the 17th International Conference on Educational Data Mining. 

Mann, C. Z., Wang, J., Sales, A. C., and Gagnon-Bartsch, J. A. (2024), “Using Publicly Available Auxiliary Data to Improve Precision of Treatment Effect Estimation in a Randomized Efficacy Trial,” in Proceedings of the 17th International Conference on Educational Data Mining. 

Pei, Y., Sales, A., & Gagnon-Bartsch, J. (2024). Boosting Precision in Educational A/B Tests Using Auxiliary Information and Design-Based Estimators. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 990-993). 

Journal Articles In Press:

Mann, C. Z., Sales, A. C., and Gagnon-Bartsch, J. A. (2025). Combining observational and experimental data for causal inference considering data privacy. To appear in Journal of Causal Inference. Draft available at https://arxiv.org/abs/2308.02974 

Related projects

Direct Adjustment in Combination With Robust or Nonlinear Regression: Software and Methods for RDDs, RCTs and Matched Observational Studies

R305D210029

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

Tags

Academic AchievementData and Assessments

Share

Icon to link to Facebook social media siteIcon to link to X social media siteIcon to link to LinkedIn social media siteIcon to copy link value

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

You may also like

Zoomed in IES logo
Workshop/Training

Data Science Methods for Digital Learning Platform...

August 18, 2025
Read More
Zoomed in IES logo
Workshop/Training

Meta-Analysis Training Institute (MATI)

July 28, 2025
Read More
Zoomed in Yellow IES Logo
Workshop/Training

Bayesian Longitudinal Data Modeling in Education S...

July 21, 2025
Read More
icon-dot-govicon-https icon-quote