Pretest-posttest experimental designs are often used to examine the impacts of educational interventions on student achievement test scores. For these designs, a test is administered to students in the fall of the school year (the pretest) and at a spring follow-up (the posttest). Average treatment effects are then estimated by either examining treatment-control differences on pretest-posttest gain scores or by including pretests as covariates in posttest regression models.
In clustered randomized control trials (RCTs) in the education field, the availability of pretests on individual students is critical for obtaining, at reasonable cost, precise posttest impact estimates (Schochet 2008; Bloom et al. 2005). In these RCTs, groups (such as schools or classrooms) rather than students are typically randomly assigned to the treatment or control conditions. This clustering considerably reduces statistical power due to the dependency of student outcomes within groups. The inclusion of pretests in the analysis, however, can substantially increase precision levels, because group-level pretest-posttest correlations tend to be large. Schochet (2008), for example, demonstrates that for a design in which schools are the unit of random assignment, about 44 total schools are required to detect an impact of 0.25 standard deviations if pretests are used in the analysis, compared to about 86 schools if pretest data are not available. This occurs because pretests tend to explain a large proportion of the variance in posttest scores.
For logistic reasons, however, pretests on individual students are typically collected after the start of the school year. In these cases, including late pretests in the analysis could bias the posttest impact estimates in the presence of early treatment effects. Because of variance gains, however, these biased estimators could yield impact estimates that tend to be closer to the truth than unbiased estimators that exclude the late pretests. Thus, the issue of whether to collect and use late pretest data in RCTs involves a variancebias tradeoff.
This paper is the first to systematically examine, both theoretically and empirically, the late pretest problem in education RCTs for several commonly-used impact estimators. The paper addresses three main research questions:
The theory presented in this paper is based on a unified regression approach for group-based RCTs that is anchored in the causal inference and hierarchical linear modeling (HLM) literature. The empirical analysis quantifies the late pretest problem in education RCTs using simulations that are based on key parameter values found in the literature that pertain to achievement test scores of elementary school and preschool students in low-performing school districts. The focus on test scores is consistent with accountability provisions of the No Child Left Behind Act of 2001, and the ensuing federal emphasis on testing interventions to improve reading and mathematics scores of young students.
The rest of this paper is in seven chapters. Chapter 1 discusses the late pretest problem in more detail, and Chapter 2 discusses two measures for quantifying the variance-bias tradeoff when late pretests are included in the impact models. Chapter 3 discusses the considered school-based designs, and Chapter 4 presents the causal inference statistical theory underlying the late pretest problem. Chapter 5 applies this theory to several commonly-used impact estimators, and Chapter 6 presents simulation results. Finally, Chapter 7 presents a summary and conclusions.