Project Activities
People and institutions involved
IES program contact(s)
Products and publications
Journal article, monograph, or newsletter
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... and Riddell, A. (2017). Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1).
Chen, Q., Gelman, A., Tracy, M., Norris, F. H., and Galea, S. (2015). Incorporating the Sampling Design in Weighting Adjustments for Panel Attrition. Statistics in medicine, 34(28), 3637-3647.
Gelman, A. (2010). Bayesian Statistics Then and Now. Statistical Science, 25(2): 162-165.
Gelman, A. (2011). Induction and Deduction in Bayesian Data Analysis. Markets and Morals, 2: 67-78.
Gelman, A., and Shalizi, C. (2013). Philosophy and the Practice of Bayesian Statistics. British Journal of Mathematical and Statistical Psychology, 66(1): 8-38.
Gelman, A., and Unwin, A. (2013). Infovis and Statistical Graphics: Different Goals, Different Looks. Journal of Computational and Graphical Statistics, 22(1): 2-28.
Gelman, A., and Unwin, A. (2013). Tradeoffs in Information Graphics. Journal of Computational and Graphical Statistics, 22(1): 45-49.
Kropko, J., Goodrich, B., Gelman, A., and Hill, J. (2014). Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches. Political Analysis, 22(4), 497-519.
Lock, K., and Gelman, A. (2010). Bayesian Combination of State Polls and Election Forecasts. Political Analysis, 18(3): 337-348.
Supplemental information
Co-Principal Investigator: Hill, Jennifer
The project developed, extended, and tested strategies for multiple imputation of missing data. The project's goals were: (1) investigating the properties of imputation models and algorithms; (2) developing diagnostics to reveal problems with imputations in real time; (3) developing models and algorithms that are more likely to create appropriate imputations; (4) creating software in both R and Stata that is reliable and usable by non-statisticians yet can accommodate the needs of more sophisticated modelers; and (5) testing the diagnostics, models, and algorithms in applied research. An important part of the tests of the developed software is a comparison of the performance of multiple imputations with simpler missing data strategies. These tests are designed to help identify when multiple imputation is worth using.
The research regarding goal 2 will improve or develop graphical and numerical diagnostics that can be used to identify problems with parametric assumptions, flag likely violations of structural assumptions, monitor convergence of the fitting algorithm, and determine situations in which the implicit model from the chained regressions is not close to any joint distribution.
The software development work for goal 4 will incorporate the results from the work done for goals 1–3 while also including an accessible user-interface. This user-interface will help researchers identify potential problems at the outset (e.g., perfect correlation among predictors); choose the right model and accommodate complications such as interactions and transformations; provide the ability to implement other missing data strategies (including a range of multiple imputation models and algorithms) to allow for comparisons; and make the software available both in the open-source statistical environment R as well as a stand-alone platform-independent package.
The work done for goal 5 will engage education researchers to test and apply the software to multiple datasets with varied study designs and missing data patterns. Additionally, efforts will be made to establish a catalog of scenarios and examples where current missing data imputation algorithms fail.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.