Skip Navigation
Technical Methods Report: What to Do When Data Are Missing in Group Randomized Controlled Trials

NCEE 2009-0049
October 2009

Appendix D: Full Set of Simulation Results

This appendix presents all tables of estimates produced in conducting the simulations reported in Chapter 4 and described in more detail in Appendix C. For these simulations, we generated 1,000 data sets, each with its own pattern of missing data . By letting missing data occur at random (within defined probabilities) many many times, and then averaging statistical results across the 1,000 data sets, we ensure the robustness of the simulation findings—and of the conclusions drawn from those findings concerning the performance of the different missing data methods examined. Multiple replications also give us distributions for the impact estimates and their standard errors, reflective of the sampling variability built into the data (and present in real data).

As described in Appendix C, different scenarios are used in the simulations, defined by (a) the nature of the missing data mechanism; (b) the missing data rate (5 percent or 40 percent); and (c) whether data are missing for students within schools or for entire schools. Therefore, the appendix contains 12 tables:

Each table consists of two panels:

  1. Panel A, which shows the simulations results for situations where the pretest is missing for a fraction of the sample.
  2. Panel B, which shows the simulations results for situations where the post-test is missing for a fraction of the sample.

The goal of these simulations was to estimate the bias in the impact estimates and standard errors from using different approaches to addressing missing data. Since bias is defined by the difference between the expected value of the estimator and the true parameter value, we estimated the bias in the two key estimates in the following way:

  • Impact estimate. For the impact estimate, we estimated the bias by subtracting the true impact of 0.20 from the average of the impact estimates across the 1,000 samples.
  • Standard error. For the estimate of the standard error of the impact estimate, we estimated the bias by subtracting an unbiased estimate of the standard error—the standard deviation of the 1,000 impact estimates—from the average of the standard error estimates across the 1,000 samples.

Note that each table begins by displaying the estimates from simulations in which none of the data were missing. These estimates do not match the true parameter values exactly due to random error. For example, the impact estimate with no missing data equals 0.203, which differs from the true impact of 0.200. When none of the data are missing, the impact estimates and standard error estimates are unbiased, and the non-zero bias estimates are entirely due to sampling error.

Top