Skip Navigation
The Late Pretest Problem in Randomized Control Trials of Education Interventions

NCEE 2009-4033
October 2008

Chapter 3: Measuring the Variance-Bias Tradeoff

The main advantage of including late pretests in the posttest impact models is that they can substantially improve the precision of the impact estimates. The main disadvantage of including them is that they could yield biased impact estimates. This paper uses two related loss functions for quantifying this variance-bias tradeoff for a posttest impact estimator ŷ. The first loss function is the mean square error (MSE):

formula

where Var (ŷ) is the variance of the estimator, y is the true posttest impact, and Bias (ŷ) = [E(ŷ) - y] is the bias of the estimator. An estimator is preferred to another if it has a lower MSE value.

The second loss function, which is typically used in the design stage of impact evaluations to determine appropriate sample sizes, is the minimum detectable impact (MDI). The MDI represents the smallest program impact that can be detected with a high probability. I follow the usual practice of standardizing minimum detectable impacts into effect size units—that is, as a percentage of the standard deviation of the outcome measures (also known as Cohen’s d)—to facilitate the comparison of findings across outcomes that are measured on different scales (Cohen 1988). Hereafter, minimum detectable impacts in effect size units are denoted as MDEs.

To develop manageable MDE formulas for biased estimators, it is assumed that under the null hypothesis of no impacts on posttest scores, there are no impacts on late pretest scores. This assumption rules out early positive or negative intervention effects that disappear by the follow-up test date. This key assumption considerably simplifies the MDE calculations because Type I error rates remain the same for all estimators.

Under this assumption, the MDE formula for ŷ can be obtained by first noting that for significance level α, the critical value for the t-statistic under the null hypothesis of no impact on posttest scores is T -1(1-{α / 2}) for a two-tailed test and T -1(1-α ) for a one-tailed test, where T -1(.) is the inverse of the student’s t distribution function with df degrees of freedom. For a given MDI value, statistical power for a two-tailed test under the alternative hypothesis H1: y = MDI can then be expressed as follows:

formula

where β is the preset statistical power level (for example, 80 percent). The MDE formula for ŷ can then be obtained by solving for MDI in (2) and dividing MDI by the standard deviation of the posttest score ( θ1):

formula

where Factor(.) is [T-1(1-{α / 2}) + T -1 (β)] for a two-tailed test and [T -1 (1 - α) + T -1 (β)] for a one- tailed test. Factor(.) becomes larger as α and df decrease and as β increases (see Schochet 2008). If α = .05 and β = .80 (typical assumptions) and df > 40, Factor(.) is about 2.5 for a one-tailed test and 2.8 for a two-tailed test. An estimator is preferred to another if it has a lower MDE value.

The MDE formula in (3) is appropriate only when the posttest impact estimators are unbiased or biased downwards (that is, when Bias(ŷ) ≤ 0) so that there is a variance-bias tradeoff when comparing estimators. In these cases, relative to the MSE criterion, the MDE criterion tends to place more weight on the variance component and less weight on the bias component.

Finally, it is important to note that the MSE and MDE criteria do not include pretest data collection costs. Thus, this paper does not consider these costs when comparing estimators.

Top