Skip Navigation
The Late Pretest Problem in Randomized Control Trials of Education Interventions

NCEE 2009-4033
October 2008

Chapter 8: Summary and Conclusions

This paper has examined theoretical and empirical issues related to the inclusion of late pretests in posttest impact models for clustered RCT designs in a school setting. The inclusion of late pretests will increase the precision of the estimated posttest impacts but could also introduce bias. Accordingly, the theoretical work examined, using a loss function approach, the conditions under which these biased estimators will produce impact estimates that are likely to be closer to the truth than unbiased estimators that either exclude the pretests or use uncontaminated test score data from other sources. The empirical work quantified the variance-bias tradeoffs for several commonly-used impact estimators.

The first research question that the paper addressed is: Under what conditions should late pretest data be collected and included in the posttest impact models? The answer to this question is clear: From a loss function perspective, estimators that include late pretests will typically be preferred to estimators that exclude them. This finding is supported by both the theoretical and empirical work, and will hold under most reasonable assumptions about the growth trajectory of impacts and pretest collection dates. In particular, the two most common pretest-posttest estimators—the DID and ANCOVA estimators—will typically yield smaller loss function values than the posttest-only estimator. This remains true even if the early treatment effect is a relatively large fraction of the expected posttest impact, and for designs in which schools, classrooms, or students are the unit of random assignment.

Another analysis finding is that the ANCOVA estimator will typically have smaller biases and smaller variances than the more restrictive DID estimator. Thus, the ANCOVA approach will often be preferred to the DID approach, because it will generate estimators with smaller loss function values.

The second research question that this paper addressed is: If pretest data are to be collected in education RCTs, what are statistical power losses when late pretests are included in the estimation models? The answer is that relative to a design with uncontaminated pretests, power losses with late pretests can be large, even if pretest contamination is modest. Thus, school sample sizes for RCTs in the education field should be increased to offset power losses if pretest data are expected to be collected several months after the start of the school year.

The final research question that this paper addressed is: Instead of collecting pretest data, is it preferable to collect uncontaminated baseline test score data from alternative sources? The answer is generally "no." Under the assumption that R2 values for these alternative test scores are somewhat smaller than those for the pretests, the ANCOVA estimator will tend to dominate the UANCOVA estimator as long as the growth in test score impacts do not grow very quickly early in the school year. These somewhat surprising results hold because even relatively small increases in R2 values will likely offset estimator biases and variance increases due to the collinearity of the model covariates.

The results comparing the ANCOVA and UANCOVA estimators, however, will not hold if R2 values using school records and pretest data are similar. Bloom et al. (2005) and Cook et al. (2008) provide preliminary evidence that aggregate school-level R2 values using school records data can be large, but this issue has not been systematically explored in the literature. Thus, comparing R2 values using pretest and school records data is an important area for future research. Another important future research topic is to examine the relative costs of obtaining the two types of data. To the extent that school records data are cheaper to collect than pretest data, the UANCOVA estimator could be preferred to the ANCOVA estimator if the loss functions account not only for variance and bias, but also for data collection costs.

Another important issue that affects the findings is the growth trajectory of test score impacts over the school year. Although it is reasonable to assume that impacts grow linearly (the most agnostic assumption) or quadratically, there may be contexts where test score impacts grow very quickly and then 32 Summary and Conclusions level off. In these instances, the biased estimators may perform worse than the unbiased ones. To obtain a base of knowledge about actual patterns of impact growth, future studies could be designed to administer tests at several points throughout the school year.

Finally, the methods developed in this paper could also be applied to examine the late pretest problem for RCTs in fields other than education. The main conclusions presented here, however, could differ in other contexts due to differences in the growth trajectory of treatment effects, the timing of pretest data collection, pretest-posttest correlations, and other key parameter values.

Top