Project Activities
The research team developed statistical methods and data science tools to combine data from RCTs in education research with auxiliary data gathered from large administrative databases: that is, covariate and outcome data on students or schools that did not participate in the RCT. Precision gains derived from the use of these data allowed the research team to increase the effective sample size of trials, thereby increasing statistical power and allowing for reduced trial costs or more efficient use of resources, or both. Added precision additionally allowed for improved subgroup analyses and estimates of effect variability in reanalysis of prior studies, resulting in broader generalizability of the results. The team specifically addressed common RCT designs and data structures in education research, including paired-cluster randomized trials and longitudinal data measurements. Furthermore, the team addressed the practical challenge of sharing data under privacy constraints. The team also produced software implementing their methods for third party education researchers.
Key outcomes
The research team developed statistical methods and data science tools to combine data from RCTs in education research with auxiliary data gathered from large administrative databases: that is, covariate and outcome data on students or schools that did not participate in the RCT. Precision gains derived from the use of these data allowed the research team to increase the effective sample size of trials, thereby increasing statistical power and allowing for reduced trial costs or more efficient use of resources, or both. Added precision additionally allowed for improved subgroup analyses and estimates of effect variability in reanalysis of prior studies, resulting in broader generalizability of the results. The team specifically addressed common RCT designs and data structures in education research, including paired-cluster randomized trials and longitudinal data measurements. Furthermore, the team addressed the practical challenge of sharing data under privacy constraints. The team also produced software implementing their methods for third party education researchers.
The main products of this research were peer-reviewed articles and conference papers detailing the novel statistical methods, as well as flexible, user-friendly, open-source software intended for use by applied education researchers.
People and institutions involved
Project contributors
Products and publications
The main product of this research will be flexible, user-friendly, open-source software available to and readily usable by applied education researchers.
Software:
- https://github.com/manncz/dRCT – R package that implements the core statistical methodology developed for the project.
- https://github.com/Bakri-1/loop_shiny_app – Shiny app for users of ASSISTments
- https://github.com/jaylinlowe/dRCTpower – Shiny app for power calculations
Project website:
Publications:
Publications:
Published Journal Articles:
Gagnon-Bartsch*, J. A., Sales*, A. C., Wu, E., Botelho, A. F., Erickson, J. A., Miratrix, L. W., and Heffernan, N. T. (2023). “Precise unbiased estimation in randomized experiments using auxiliary observational data.” Journal of Causal Inference, vol. 11, no. 1, 2023, pp. 20220011. https://doi.org/10.1515/jci-2022-0011
Sales, A. C., Prihar, E. B., Gagnon-Bartsch, J. A., & Heffernan, N.T. (2023). Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results. Journal of Educational Data Mining, 15(2), 53–85. https://doi.org/10.5281/zenodo.8016854
Published Conference Papers:
Sales, A.C., Prihar, E., Gagnon-Bartsch, J., Gurung, A., Heffernan, N.T. (2022). More Powerful A/B Testing Using Auxiliary Data and Deep Learning. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium. AIED 2022. Lecture Notes in Computer Science, vol 13356. Springer, Cham. https://doi.org/10.1007/978-3-031-11647-6_107
Pham, D. M., Vanacore, K. P., Sales, A. C., & Gagnon-Bartsch, J. A. (2024). LOOL: Towards Personalization with Flexible & Robust Estimation of Heterogeneous Treatment Effects. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 376-384).
Lowe, J., Mann, C. Z., Wang, J., Sales, A. C., and Gagnon-Bartsch, J. A. (2024), “Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data,” In Proceedings of the 17th International Conference on Educational Data Mining.
Mann, C. Z., Wang, J., Sales, A. C., and Gagnon-Bartsch, J. A. (2024), “Using Publicly Available Auxiliary Data to Improve Precision of Treatment Effect Estimation in a Randomized Efficacy Trial,” in Proceedings of the 17th International Conference on Educational Data Mining.
Pei, Y., Sales, A., & Gagnon-Bartsch, J. (2024). Boosting Precision in Educational A/B Tests Using Auxiliary Information and Design-Based Estimators. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 990-993).
Journal Articles In Press:
Mann, C. Z., Sales, A. C., and Gagnon-Bartsch, J. A. (2025). Combining observational and experimental data for causal inference considering data privacy. To appear in Journal of Causal Inference. Draft available at https://arxiv.org/abs/2308.02974
Related projects
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.