To assess the performance of different missing data methods, we set objective standards and applied those standards in interpreting the results. This appendix describes these standards and how they were chosen. In summary:
A. Bias Standards for the Impact Estimates
To summarize the performance of the different measures, we identify the methods
that yielded bias that would be considered “high” relative to a benchmark set by
the What Works Clearinghouse (WWC). In developing its attrition standards,
as well as its standards for baseline equivalence, the WWC decided that bias in
the impact estimate of more than 0.05 standard deviations was unacceptably large
(US ED, 2008, p. 14, 30-31). Like most performance standards, this threshold is
inherently arbitrary. However, because WWC plays a large role in assessing the quality
of impact studies in education, we adopted this threshold for assessing whether
missing data methods yield bias that is large or small.
Specifically, in our simulations, we classified an impact estimate as having “high bias” if the absolute value of the bias was greater than 0.05 standard deviations. Because this threshold is based on the WWC's attrition standards, simulation results that show a particular method yields lower bias than 0.05 can be treated as evidence that the method produced estimates with a level of bias that is treated as acceptable by the WWC.87
B. Bias Standards for the Estimated Standard Errors of the Impact Estimates Missing data can lead to biased standard errors, as well as biased impact estimates. In addition, impact and standard error estimates both contribute to the hypothesis test of whether an impact is statistically significant: the t-statistic equals the estimated impact divided by the estimated standard error. Therefore, for our simulations, we decided it would be useful to set standards for assessing the magnitude of the bias in the standard errors. In interpreting the results from the simulations, we apply these standards to assess whether a given missing data method produced a standard error for the impact estimate with “high bias” or “low bias.”
Building on the chosen standards for impact estimates described in Section A, we classified a standard error estimate as having “high bias” if it would yield a t-statistic with as much bias as the t-statistics that result from an impact estimate for which the absolute value of the bias is greater than 0.05 when the impact estimate has zero bias. In this way, we rely entirely on the WWC's attrition standard to determine whether to classify the bias in the impact estimate or standard error as large or small.
To calculate the bias thresholds for the estimated standard errors, let SE equal the true standard error of the impact estimate, given the extent of missing data and the choice of method for addressing missing data.88 Then the t-statistic used to test the null hypothesis of zero impact is given in equation (1) below:
How much bias in the t-statistic is introduced by a bias in the impact estimate of 0.05 standard deviations? If the bias is positive—that is, the impact estimate converges to 0.25 instead of the true impact of 0.20—then equation (2) shows the value of the t-statistic that results from this bias. This equation shows that a positive bias of 0.05 standard deviations yields a t-statistic that is 25 percent larger than it would be with an unbiased estimate of the standard error:
If the bias is negative—that is, the impact estimate converges to 0.15 instead of the true impact of 0.20—then equation (3) shows the value of the t-statistic that results from this bias. This equation shows that a negative bias of 0.05 standard deviations yields a t-statistic that is 25 percent smaller than it would be with an unbiased estimate of the standard error:
The results from equations (2) and (3) can be used to set thresholds for bias in the standard errors. Equation (4) shows the magnitude of the standard error (SE+0.05) necessary when there is no bias in the impact estimate to generate the same bias in the t- statistic as a positive 0.05 standard deviation bias in the impact estimate when there is no bias in the standard error:
, which is equivalent to
This implies that the standard error would need to be 20 percent smaller than the true standard error to have the same effect on the t-statistic as a positive bias in the impact estimate of 0.05 standard deviations.
Similarly, equation (5) shows the magnitude of the standard error ( SE-0.05 ) necessary when there is no bias in the impact estimate to generate the same bias in the t-statistic as a negative 0.05 standard deviation bias in the impact estimate when there is no bias in the standard error:
, which is equivalent to
This implies that the standard error would need to be 33 percent larger than the true standard error to have the same effect on the t-statistic as a negative bias in the impact estimate of 0.05 standard deviations.