Technical Methods Report: What to Do When Data Are Missing in Group Randomized Controlled Trials - 3. Selected Techniques for Addressing Missing Data in RCT Impact Analysis

Technical Methods Report: What to Do When Data Are Missing in Group Randomized Controlled Trials

NCEE 2009-0049
October 2009

3. Selected Techniques for Addressing Missing Data in RCT Impact Analysis

This chapter describes a selected set of techniques that are available to educational researchers to deal with the problem of missing data in group randomized trials.²⁵ As discussed in Chapter 1, the methods were selected based on a review of several recent articles by experts in the field (Graham, 2009 Schafer & Graham, 2002; Allison, 2002; and Peugh & Enders, 2004)²⁶ as well as a review of the techniques that have been used in RCTs recently sponsored by the U.S. Department of Education.²⁷

As shown in the chart below, some of the methods discussed in this chapter can only be used to address missing data for the dependent or outcome "Y" variable (e.g., student post-test scores), others are only applicable for missing data on the independent "X" variables (e.g., student demographics and pretest score), while some can be used to address missing data problems for both types of variables.

Methods Discussed	Can be used for missing data in…
Methods Discussed	X Variables	Y Variables
Imputation Methods	√	√
Maximum Likelihood Estimation	√	√
Dummy Variable Adjustment	√
Weighting Methods²⁸		√
"Fully-Specified" Regression Models		√
Selection Modeling		√
Pattern Mixture Modeling		√

The discussion of these different methods is organized into two parts. The first deals with what we refer to as "standard" missing data methods that are in common use, particularly when one can assume that missing data are MAR:29 imputation methods, maximum likelihood estimation, dummy variable adjustment, weighting methods, and fully-specified regression models. The second section focuses on two methods that have been developed to address situations where the missing data can be considered to be NMAR:30 selection modeling and pattern-mixture modeling. In this second section, we also discuss the use of sensitivity testing that can be used to enhance the reporting of RCT findings under either missing data circumstance.

Top

²⁵ General issues in conducting group randomized trials, independent of missing data, are covered in many excellent publications. See for example Klar & Donner (2001) from the medical literature, and Bloom (2005) concerning social policy experiments. For a thorough discussion of missing data issues and analysis options in studies that randomize individuals rather than groups see Carpenter & Kenward (2007).
²⁶ As discussed elsewhere, we intentionally include methods that are commonly criticized in the literature—particularly listwise deletion and simple mean value imputation—for two reasons: the use of these methods is widespread in education (see Peugh & Enders, 2004); and because we are focusing on RCTs as conducted in education, we want to understand how different missing data strategies performed within this unique context, including methods that may have shortcomings in more general applications.
²⁷ For example, case deletion is commonly used to address missing outcomes (e.g., Bernstein, et al., 2009; Corrin, et al., 2009; Garet, et al., 2008), but studies sometimes use multiple imputation (e.g., Campuzano, et al., 2009) or re-weighting (e.g., Wolf, et al., 2009) to address the missing data problem.
²⁸ Weighting methods could in theory be used to address both missing X variables and missing Y variables. However, in our experience, RCTs in education never use weighting methods to address missing X variables. In our opinion, this may be because researchers are reluctant to drop observations with missing values of the X variables and re-weight the observed sample members.
²⁹ These methods in principle can also produce unbiased impact estimates in the NMAR situation with the specification of appropriate models for the missing data mechanism. But, as discussed later, knowing what model is appropriate is the difficulty.
³⁰ While these two methods constitute ways to model missing data, they do not of themselves provide a means of estimating impacts. For that purpose, they have to be combined with other estimation techniques such as maximum likelihood. It is also worth noting that while they were developed to meet the challenges of NMAR data, these models can also be applied to the MAR situation.