Project Activities
Despite the advances in educational data analysis, a ubiquitous problem is that explanatory variables and outcomes are subject to be missing at any of the levels. Due to the lack of widely available methods for efficiently handling such missing data within the context of multilevel data and hierarchical models, the project sought to draw on the researchers’ recent advances in developing: (1) methods for efficient analysis of two-level data, (2) a generally applicable approach for three-level data, (3) software to estimate the model and impute missing data, and (4) an efficient method for three-level data where the outcomes and covariates at any level are subject to be missing.
People and institutions involved
IES program contact(s)
Products and publications
Book chapter
Shin, Y. (2013). Efficient Handling of Predictors and Outcomes Having Missing Values. In L. Rutkowski, M. VonDavier, and D. Rutkowski (Eds.), A Handbook of International Large-Scale Assessment Data Analysis (pp. 451-479). Boca Raton, FL: CRC Press.
Journal article, monograph, or newsletter
Shin, Y. (2012). Do Black Children Benefit More From Small Classes? Multivariate Instrumental Variable Estimators With Ignorable Missing Data. Journal of Educational Behavioral Statistics, 37(4): 543-574.
Shin, Y. and Raudenbush, S.W. (2010). A Latent Cluster-Mean Approach to the Contextual Effects Model With Missing Data. Journal of Educational and Behavioral Statistics, 35(1): 26-53.
Shin, Y., and Raudenbush, S.W. (2011). The Causal Effect of Class Size on Academic Performance: Multivariate Instrumental Variable Estimators With Tennessee Class Size Data Missing at Random. Journal of Educational and Behavioral Statistics, 36(2): 154-185.
Shin, Y., and Raudenbush, S.W. (2013). Efficient Analysis of "Q"-Level Nested Hierarchical General Linear Models Given Ignorable Missing Data. International Journal of Biostatistics, 9(1): 109-133.
Related projects
Supplemental information
Co-Principal Investigator: Shin, Yongyin
In experimental research, the dominant design involves randomly assigned classrooms or schools to treatments. Therefore, the key explanatory variables are at the classroom or school levels while the outcome is measured at the individual level. In most cases, classrooms or schools are matched or blocked prior to randomization, so that the design will often have two or more levels of variation. The longitudinal follow-up of students generates an additional level. Hierarchical models, also known as multilevel models, are appropriate for the analysis of such data. Similarly, educational surveys involve multi-stage samples. Because of student mobility across classrooms, schools or school districts, the analysis may require a cross-classified hierarchical model.
Specifically, the project (1) tested, validated, and disseminated free software for the case of two- or three-level continuous data with missing values at any level; (2) developed, tested, and refined new methods for cross-classified models and discrete outcomes; and (3) ran a series of workshops to train researchers to use these methods.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.