Skip Navigation
Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations

NCEE 2008-4026
August 2008

Chapter 7: Illustrative Precision Calculations

In this section, I collate formulas from above and use key design parameter values from the literature to obtain illustrative MDE calculations for RD designs in the education field. The focus is on standardized test scores of elementary school and preschool students in low-performing schools. MDEs are calculated for each design considered above (where I use the multilevel versions of Designs II and III).

Presentation and Assumptions

Tables 7.1 and 7.2 display, under various assumptions and for each of the considered RD designs, the total number of schools that are required to achieve precision targets of 0.20, 0.25, and 0.33 of a standard deviation, respectively. These are benchmarks that are typically used in impact evaluations of educational interventions that balance statistical rigor and study costs (Schochet 2008; Hill et al. 2007). In Table 7.1, it is assumed that the score cutoff is at the center of the score distribution and that the treatment and control group samples are balanced. In Table 7.2, it is assumed that the cutoff is at a tertile of the score distribution and that there is a 2:1 split of the research samples. Table 7.3 presents comparable figures to those in Table 7.1 for the RA design.

Because the amount and quality of baseline data vary across evaluations, the power calculations are conducted assuming R2 values of 0, 0.20, 0.50, and 0.70 at each group level. The R2 value of 0.50 is conservative if pretests are available for analysis; the more optimistic 0.70 figure has sometimes been found in the literature (Schochet 2008; Bloom et al. 2005a).

To keep the presentation manageable, RD design effects are presented assuming that scores are normally distributed. As discussed, for a given treatment-control sample split, the RD design effect does not vary much according to the score distribution or location of the cutoff score. Thus, the results that are presented are broadly applicable, but could easily be revised using the alternative score distributions or parameter values that were discussed above.

The estimates also assume:

  • A two-tailed test at 80 percent power and a 5 percent significance level
  • The intervention is being tested in a single grade with an average of 3 classrooms per school per grade and an average of 23 students per classroom. Thus, the sample contains 69 students per school.
  • 80 percent of students in the sample will provide follow-up (posttest) data, so that posttest data are available for about 55 students per school.
  • ICC values of 0.15 at the school and classroom levels (which are consistent with the empirical findings in Schochet 2008, Hedges and Hedberg 2007, and Bloom et al. 2005a)
  • An ICC value of 0.15 pertaining to the variance of treatment effects across schools in Designs IV and V (Schochet 2008)
  • A sharp RD design rather than a fuzzy RD design (that is, that all units comply with their treatment assignments)

Results

The key results can be summarized as follows:

  • Much larger sample sizes are typically required under RD than RA designs. Consider the most commonly-used design in education-related impact studies where equal numbers of schools are assigned to treatment or control status. Under this design, about 114 total schools (57 treatment and 57 control) are required to yield an MDE of 0.25 standard deviations, assuming a regression R2 value of 0.5 (Design III; Table 7.1). The corresponding figure for the RA design is only 42 total schools (Table 7.3). Similarly, for the classroombased Design II, the required number of schools is 45 for the RD design (Table 7.1), compared to only 16 for the RA design (Table 7.3).
  • Because of resource constraints, school-based RD designs may only be feasible for interventions that are likely to have relatively large effects (about 0.33 standard deviations or more). Under Design III, 66 schools (33 treatment and 33 control) are required to achieve an MDE of 0.33 standard deviations (assuming an R2 value of 0.50; Table 7.1). This number is comparable to the number of schools that are typically included in large-scale experimental impact evaluations funded by the U.S. Department of Education.
  • A 2:1 split of the sample has a small effect on statistical power. The required school sample sizes are similar in Tables 7.1 and 7.2. This occurs because as discussed, a balanced sample allocation yields larger RD design effects than an unbalanced allocation, but also yields smaller variances under the RA design; these two effects are largely offsetting.
  • R2 values matter. The viability of RD designs in education research hinges critically on the availability of detailed baseline data at the aggregate school or individual student level—and in particular, pretest data—that can be used as covariates in the regression models to improve R2 values. For instance, for the school-based Design III, the number of schools required to achieve an MDE of 0.33 standard deviations is 39 if the R2 value is 0.70, 66 if the R2 value is 0.50, and 131 for a zero R2 value (Table 7.1).
  • RD designs may be most viable for less-clustered designs where classrooms or students are the unit of treatment assignment. For example, under the classroom-based Design II, 45 schools are required to achieve an MDE of 0.25 standard deviations, assuming an R2 value of 0.50 (Table 7.1). The comparable figure for the classroom-based Design V (with random school effects) increases to only 53, because, as discussed, RD design effects are smaller for this design than for Design II. For the student-level Design I, the comparable number of required schools is 13 schools (Table 7.1).

Top