Technical Methods Report: What to Do When Data Are Missing in Group Randomized Controlled Trials - Appendix E: Standards for Judging the Magnitude of the Bias for Different Missing Data Methods

Technical Methods Report: What to Do When Data Are Missing in Group Randomized Controlled Trials

NCEE 2009-0049
October 2009

1. Overview and Guidance
2. Randomized Controlled Trials (RCTs) in Education and the Problem of Missing Data
3. Selected Techniques for Addressing Missing Data in RCT Impact Analysis
4. Testing the Performance of Selected Missing Data Methods
References
Exhibits
Appendix A: Missing Data Bias as a Form of Omitted Variable Bias
Appendix B: Resources for Using Multiple Imputation
Appendix C: Specifications for Missing Data Simulations
Appendix D: Full Set of Simulation Results
Appendix D: Tables
Appendix E: Standards for Judging the Magnitude of the Bias for Different Missing Data Methods
PDF & Related Info

Appendix E: Standards for Judging the Magnitude of the Bias for Different Missing Data Methods

To assess the performance of different missing data methods, we set objective standards and applied those standards in interpreting the results. This appendix describes these standards and how they were chosen. In summary:

We relied on the attrition standards of the What Works Clearinghouse.
An impact estimate was considered to have “high bias” if the absolute value of the bias was greater than 0.05 of a standard deviation of the outcome measure.
A standard error estimate was considered to have “high bias” if it yielded as much bias in the t-statistic as did bias in the impact estimate of 0.05 standard deviations.

A. Bias Standards for the Impact Estimates
To summarize the performance of the different measures, we identify the methods that yielded bias that would be considered “high” relative to a benchmark set by the What Works Clearinghouse (WWC). In developing its attrition standards, as well as its standards for baseline equivalence, the WWC decided that bias in the impact estimate of more than 0.05 standard deviations was unacceptably large (US ED, 2008, p. 14, 30-31). Like most performance standards, this threshold is inherently arbitrary. However, because WWC plays a large role in assessing the quality of impact studies in education, we adopted this threshold for assessing whether missing data methods yield bias that is large or small.

Specifically, in our simulations, we classified an impact estimate as having “high bias” if the absolute value of the bias was greater than 0.05 standard deviations. Because this threshold is based on the WWC's attrition standards, simulation results that show a particular method yields lower bias than 0.05 can be treated as evidence that the method produced estimates with a level of bias that is treated as acceptable by the WWC.⁸⁷

B. Bias Standards for the Estimated Standard Errors of the Impact Estimates Missing data can lead to biased standard errors, as well as biased impact estimates. In addition, impact and standard error estimates both contribute to the hypothesis test of whether an impact is statistically significant: the t-statistic equals the estimated impact divided by the estimated standard error. Therefore, for our simulations, we decided it would be useful to set standards for assessing the magnitude of the bias in the standard errors. In interpreting the results from the simulations, we apply these standards to assess whether a given missing data method produced a standard error for the impact estimate with “high bias” or “low bias.”

Building on the chosen standards for impact estimates described in Section A, we classified a standard error estimate as having “high bias” if it would yield a t-statistic with as much bias as the t-statistics that result from an impact estimate for which the absolute value of the bias is greater than 0.05 when the impact estimate has zero bias. In this way, we rely entirely on the WWC's attrition standard to determine whether to classify the bias in the impact estimate or standard error as large or small.

To calculate the bias thresholds for the estimated standard errors, let SE equal the true standard error of the impact estimate, given the extent of missing data and the choice of method for addressing missing data.⁸⁸ Then the t-statistic used to test the null hypothesis of zero impact is given in equation (1) below:

t-statistic used to test the null hypothesis of zero impact

How much bias in the t-statistic is introduced by a bias in the impact estimate of 0.05 standard deviations? If the bias is positive—that is, the impact estimate converges to 0.25 instead of the true impact of 0.20—then equation (2) shows the value of the t-statistic that results from this bias. This equation shows that a positive bias of 0.05 standard deviations yields a t-statistic that is 25 percent larger than it would be with an unbiased estimate of the standard error:

positive bias of 0.05 standard deviations yields a t-statistic that is 25 percent larger than it would be with an unbiased estimate

If the bias is negative—that is, the impact estimate converges to 0.15 instead of the true impact of 0.20—then equation (3) shows the value of the t-statistic that results from this bias. This equation shows that a negative bias of 0.05 standard deviations yields a t-statistic that is 25 percent smaller than it would be with an unbiased estimate of the standard error:

a negative bias of 0.05 standard deviations yields a t-statistic that is 25 percent smaller than it would be with an unbiased estimate

The results from equations (2) and (3) can be used to set thresholds for bias in the standard errors. Equation (4) shows the magnitude of the standard error (SE^+0.05) necessary when there is no bias in the impact estimate to generate the same bias in the t- statistic as a positive 0.05 standard deviation bias in the impact estimate when there is no bias in the standard error:

the magnitude of the standard error , which is equivalent to equivalent to the magnitude of the standard error

This implies that the standard error would need to be 20 percent smaller than the true standard error to have the same effect on the t-statistic as a positive bias in the impact estimate of 0.05 standard deviations.

Similarly, equation (5) shows the magnitude of the standard error ( SE-0.05 ) necessary when there is no bias in the impact estimate to generate the same bias in the t-statistic as a negative 0.05 standard deviation bias in the impact estimate when there is no bias in the standard error:

magnitude of the standard error , which is equivalent to equivalent to the magnitude of the standard error

This implies that the standard error would need to be 33 percent larger than the true standard error to have the same effect on the t-statistic as a negative bias in the impact estimate of 0.05 standard deviations.

Top

⁸⁷ It is important to note that this does not mean that studies which employ the missing data method in question would necessary, if reviewed by the WWC, be determined to have met WWC's standards. In fact, there are no WWC standards for which missing data methods are acceptable and which methods are unacceptable (US ED, 2008).
⁸⁸ Note that this is not the same as the standard error of the impact estimate that researchers would have obtained with complete data.