Chapter 7: Illustrative Precision Calculations
In this section, I collate formulas from above and use key design parameter values
from the literature to obtain illustrative MDE calculations for RD designs in the
education field. The focus is on standardized test scores of elementary school and
preschool students in low-performing schools. MDEs are calculated for each design
considered above (where I use the multilevel versions of Designs II and III).
Presentation and Assumptions
Tables 7.1 and 7.2
display, under various assumptions and for each of the considered RD designs, the
total number of schools that are required to achieve precision targets
of 0.20, 0.25, and 0.33 of a standard deviation, respectively. These are benchmarks
that are typically used in impact evaluations of educational interventions that
balance statistical rigor and study costs (Schochet 2008; Hill et al. 2007). In
Table 7.1, it is assumed that the score cutoff
is at the center of the score distribution and that the treatment and control group
samples are balanced. In Table 7.2, it is assumed
that the cutoff is at a tertile of the score distribution and that there is a 2:1
split of the research samples. Table 7.3 presents
comparable figures to those in Table 7.1 for
the RA design.
Because the amount and quality of baseline data vary across evaluations, the power
calculations are conducted assuming R2 values of 0, 0.20, 0.50,
and 0.70 at each group level. The R2 value of 0.50 is conservative
if pretests are available for analysis; the more optimistic 0.70 figure has sometimes
been found in the literature (Schochet 2008; Bloom et al. 2005a).
To keep the presentation manageable, RD design effects are presented assuming that
scores are normally distributed. As discussed, for a given treatment-control sample
split, the RD design effect does not vary much according to the score distribution
or location of the cutoff score. Thus, the results that are presented are broadly
applicable, but could easily be revised using the alternative score distributions
or parameter values that were discussed above.
The estimates also assume:
- A two-tailed test at 80 percent power and a 5 percent significance level
- The intervention is being tested in a single grade with an average of 3 classrooms
per school per grade and an average of 23 students per classroom. Thus, the sample
contains 69 students per school.
- 80 percent of students in the sample will provide follow-up (posttest) data, so
that posttest data are available for about 55 students per school.
- ICC values of 0.15 at the school and classroom levels (which are consistent with
the empirical findings in Schochet 2008, Hedges and Hedberg 2007, and Bloom et al.
2005a)
- An ICC value of 0.15 pertaining to the variance of treatment effects across schools
in Designs IV and V (Schochet 2008)
- A sharp RD design rather than a fuzzy RD design (that is, that all units comply
with their treatment assignments)
Results
The key results can be summarized as follows:
- Much larger sample sizes are typically required under RD than RA designs.
Consider the most commonly-used design in education-related impact studies where
equal numbers of schools are assigned to treatment or control status. Under this
design, about 114 total schools (57 treatment and 57 control) are required to yield
an MDE of 0.25 standard deviations, assuming a regression R2
value of 0.5 (Design III; Table 7.1). The corresponding
figure for the RA design is only 42 total schools (Table
7.3). Similarly, for the classroombased Design II, the required number of
schools is 45 for the RD design (Table 7.1),
compared to only 16 for the RA design (Table 7.3).
- Because of resource constraints, school-based RD designs may only be
feasible for interventions that are likely to have relatively large effects (about
0.33 standard deviations or more). Under Design III, 66 schools (33
treatment and 33 control) are required to achieve an MDE of 0.33 standard deviations
(assuming an R2 value of 0.50; Table
7.1). This number is comparable to the number of schools that are typically
included in large-scale experimental impact evaluations funded by the U.S. Department
of Education.
- A 2:1 split of the sample has a small effect on statistical power.
The required school sample sizes are similar in Tables
7.1 and 7.2. This occurs because as discussed,
a balanced sample allocation yields larger RD design effects than an unbalanced
allocation, but also yields smaller variances under the RA design; these two effects
are largely offsetting.
- R2 values matter. The viability of
RD designs in education research hinges critically on the availability of detailed
baseline data at the aggregate school or individual student level—and in particular,
pretest data—that can be used as covariates in the regression models to improve
R2 values. For instance, for the school-based Design III, the
number of schools required to achieve an MDE of 0.33 standard deviations is 39 if
the R2 value is 0.70, 66 if the R2 value
is 0.50, and 131 for a zero R2 value (Table 7.1).
- RD designs may be most viable for less-clustered designs where classrooms
or students are the unit of treatment assignment. For example, under
the classroom-based Design II, 45 schools are required to achieve an MDE of 0.25
standard deviations, assuming an R2 value of 0.50 (Table
7.1). The comparable figure for the classroom-based Design V (with random
school effects) increases to only 53, because, as discussed, RD design effects are
smaller for this design than for Design II. For the student-level Design I, the
comparable number of required schools is 13 schools (Table
7.1).
Top