Skip to main content

Breadcrumb

Home arrow_forward_ios Information on IES-Funded Research arrow_forward_ios Practical Solutions for Missing Dat ...
Home arrow_forward_ios ... arrow_forward_ios Practical Solutions for Missing Dat ...
Information on IES-Funded Research
Grant Closed

Practical Solutions for Missing Data and Imputation

NCER
Program: Statistical and Research Methodology in Education
Program topic(s): Core
Award amount: $904,972
Principal investigator: Andrew Gelman
Awardee:
Columbia University
Year: 2009
Project type:
Methodological Innovation
Award number: R305D090006

Purpose

Missing data are ubiquitous in education research studies. The literature discusses the shortcoming of simple missing data approaches such as complete case analysis and inclusion of indicators for missing data; however, the use of these practices remains widespread. Multiple imputation is becoming an increasingly widely used approach to handling missing data but there are outstanding research questions regarding the most reliable methods for implementing it and when it is worthwhile to invest in this technique. In addition, researchers may have a legitimate reluctance to use an algorithm whose steps and outcomes they do not completely understand.

Project Activities

The research related to goals 1 and 3 will lead to better missing data models and algorithms. This work focused on four objectives. First, the researchers explored the relative efficacy of imputation algorithms as compared to simpler strategies, particularly in the context of randomized experiments. The second objective was to identify the conditions under which chained imputation algorithms can fail and identify modeling choices that won't violate these conditions. The third was to examine the properties of competing models to accommodate a wider variety of data structures (e.g., time series and multilevel data) and non-ignorable missing data mechanisms and then implement the most useful. The final objective was to increase computational efficiency when implementing the most useful models.

People and institutions involved

IES program contact(s)

Allen Ruby

Associate Commissioner for Policy and Systems
NCER

Products and publications

Journal article, monograph, or newsletter

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., ... and Riddell, A. (2017). Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76(1).

Chen, Q., Gelman, A., Tracy, M., Norris, F. H., and Galea, S. (2015). Incorporating the Sampling Design in Weighting Adjustments for Panel Attrition. Statistics in medicine, 34(28), 3637-3647.

Gelman, A. (2010). Bayesian Statistics Then and Now. Statistical Science, 25(2): 162-165.

Gelman, A. (2011). Induction and Deduction in Bayesian Data Analysis. Markets and Morals, 2: 67-78.

Gelman, A., and Shalizi, C. (2013). Philosophy and the Practice of Bayesian Statistics. British Journal of Mathematical and Statistical Psychology, 66(1): 8-38.

Gelman, A., and Unwin, A. (2013). Infovis and Statistical Graphics: Different Goals, Different Looks. Journal of Computational and Graphical Statistics, 22(1): 2-28.

Gelman, A., and Unwin, A. (2013). Tradeoffs in Information Graphics. Journal of Computational and Graphical Statistics, 22(1): 45-49.

Kropko, J., Goodrich, B., Gelman, A., and Hill, J. (2014). Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches. Political Analysis, 22(4), 497-519.

Lock, K., and Gelman, A. (2010). Bayesian Combination of State Polls and Election Forecasts. Political Analysis, 18(3): 337-348.

Supplemental information

Co-Principal Investigator: Hill, Jennifer

The project developed, extended, and tested strategies for multiple imputation of missing data. The project's goals were: (1) investigating the properties of imputation models and algorithms; (2) developing diagnostics to reveal problems with imputations in real time; (3) developing models and algorithms that are more likely to create appropriate imputations; (4) creating software in both R and Stata that is reliable and usable by non-statisticians yet can accommodate the needs of more sophisticated modelers; and (5) testing the diagnostics, models, and algorithms in applied research. An important part of the tests of the developed software is a comparison of the performance of multiple imputations with simpler missing data strategies. These tests are designed to help identify when multiple imputation is worth using.

The research regarding goal 2 will improve or develop graphical and numerical diagnostics that can be used to identify problems with parametric assumptions, flag likely violations of structural assumptions, monitor convergence of the fitting algorithm, and determine situations in which the implicit model from the chained regressions is not close to any joint distribution.

The software development work for goal 4 will incorporate the results from the work done for goals 1–3 while also including an accessible user-interface. This user-interface will help researchers identify potential problems at the outset (e.g., perfect correlation among predictors); choose the right model and accommodate complications such as interactions and transformations; provide the ability to implement other missing data strategies (including a range of multiple imputation models and algorithms) to allow for comparisons; and make the software available both in the open-source statistical environment R as well as a stand-alone platform-independent package.

The work done for goal 5 will engage education researchers to test and apply the software to multiple datasets with varied study designs and missing data patterns. Additionally, efforts will be made to establish a catalog of scenarios and examples where current missing data imputation algorithms fail.

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

Tags

Data and AssessmentsMathematics

Share

Icon to link to Facebook social media siteIcon to link to X social media siteIcon to link to LinkedIn social media siteIcon to copy link value

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

You may also like

Zoomed in IES logo
Workshop/Training

Data Science Methods for Digital Learning Platform...

August 18, 2025
Read More
Zoomed in IES logo
Workshop/Training

Meta-Analysis Training Institute (MATI)

July 28, 2025
Read More
Zoomed in Yellow IES Logo
Workshop/Training

Bayesian Longitudinal Data Modeling in Education S...

July 21, 2025
Read More
icon-dot-govicon-https icon-quote