Project Activities
The research related to goals 1 and 3 will lead to better missing data models and algorithms. This work focused on four objectives. First, the researchers explored the relative efficacy of imputation algorithms as compared to simpler strategies, particularly in the context of randomized experiments. The second objective was to identify the conditions under which chained imputation algorithms can fail and identify modeling choices that won't violate these conditions. The third was to examine the properties of competing models to accommodate a wider variety of data structures (e.g., time series and multilevel data) and non-ignorable missing data mechanisms and then implement the most useful. The final objective was to increase computational efficiency when implementing the most useful models.
The research regarding goal 2 will improve or develop graphical and numerical diagnostics that can be used to identify problems with parametric assumptions, flag likely violations of structural assumptions, monitor convergence of the fitting algorithm, and determine situations in which the implicit model from the chained regressions is not close to any joint distribution.
The software development work for goal 4 will incorporate the results from the work done for goals 1–3 while also including an accessible user-interface. This user-interface will help researchers identify potential problems at the outset (e.g., perfect correlation among predictors); choose the right model and accommodate complications such as interactions and transformations; provide the ability to implement other missing data strategies (including a range of multiple imputation models and algorithms) to allow for comparisons; and make the software available both in the open-source statistical environment R as well as a stand-alone platform-independent package.
The work done for goal 5 will engage education researchers to test and apply the software to multiple datasets with varied study designs and missing data patterns. Additionally, efforts will be made to establish a catalog of scenarios and examples where current missing data imputation algorithms fail.
People and institutions involved
IES program contact(s)
Project contributors
Products and publications
Journal article, monograph, or newsletter
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... and Riddell, A. (2017). Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1).
Chen, Q., Gelman, A., Tracy, M., Norris, F. H., and Galea, S. (2015). Incorporating the Sampling Design in Weighting Adjustments for Panel Attrition. Statistics in medicine, 34(28), 3637-3647.
Gelman, A. (2010). Bayesian Statistics Then and Now. Statistical Science, 25(2): 162-165.
Gelman, A. (2011). Induction and Deduction in Bayesian Data Analysis. Markets and Morals, 2: 67-78.
Gelman, A., and Shalizi, C. (2013). Philosophy and the Practice of Bayesian Statistics. British Journal of Mathematical and Statistical Psychology, 66(1): 8-38.
Gelman, A., and Unwin, A. (2013). Infovis and Statistical Graphics: Different Goals, Different Looks. Journal of Computational and Graphical Statistics, 22(1): 2-28.
Gelman, A., and Unwin, A. (2013). Tradeoffs in Information Graphics. Journal of Computational and Graphical Statistics, 22(1): 45-49.
Kropko, J., Goodrich, B., Gelman, A., and Hill, J. (2014). Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches. Political Analysis, 22(4), 497-519.
Lock, K., and Gelman, A. (2010). Bayesian Combination of State Polls and Election Forecasts. Political Analysis, 18(3): 337-348.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.