Skip Navigation
Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations

NCEE 2008-4026
August 2008

Chapter 4: Aggregated Designs: RD Design Theory and Design

Theoretical Underpinnings

This paper considers both RD and RA designs where n study units are assigned to either a single treatment or control condition (for simplicity, the comparison group under the RD design is hereafter referred to as the "control" group). The sample contains np treatment units and n(1-p) control units, where p is the sampling rate to the treatment group (0< p <1).

Let YTi be the "potential" outcome for unit i in the treatment condition and YCi be the potential outcome for unit i in the control condition. Potential outcomes for the n study units are assumed to be random draws from potential treatment and control outcome distributions in the study population. The means of these distributions are denoted by μT for potential treatment outcomes and μC for potential control outcomes. It is assumed further that Scorei— the variable that is used to assign units to a research status under the RD design—is a random draw from the population score distribution with mean μS and variance σS2. To consistently compare statistical power under the RD and RA designs, it is assumed that the score variable is also available for the RA design.2

The difference between the two potential outcomes, (YTi -YCi), is the unit-level treatment effect, and the average treatment effect parameter (ATE) under this "superpopulation" causal inference model is ATE =E(YT -YC) =μTC The unit-level treatment effects, and hence, the ATE parameter, cannot be calculated directly because for each unit, the potential outcome is observed in either the treatment or control condition, but not in both. Formally, if Ti is a treatment status indicator variable that equals 1 for treatments and 0 for controls, then the observed outcome for a unit, yi, can be expressed as follows:

outcome for a unit

The simple relation in (2) forms the basis for the theory underlying both the RA and RD designs.

In what follows, constant treatment effects are assumed within the population, which implies (1) the same variance, σ2, for the random variables YTi and YCi, and (2) the same covariance, σSY (and associated correlation, ρSY) between Scorei and YTi and YCi. These assumptions are consistent with ordinary least squares (OLS) methods that are typically used to estimate program impacts in education research, and are required to ensure that variances based on OLS methods are justified by the Neyman model of causal inference (Freedman 2008; Schochet 2007).

The RA and RD designs differ in the treatment assignment process. Under the RA design, treatment status, TiRA, is assigned randomly to study units, whereas under the RD design, treatment status, TiRD, is assigned depending on whether Scorei is larger or smaller than a cutoff value K. This paper considers RD designs with the following treatment assignment rule:

treatment assignment rule

All results apply, however, if, instead, the treatment were offered to those with scores less than K. For simplicity, the same cutoff value is assumed within and across study sites.

Next, the RA and RD designs are discussed in more detail. The RA design is discussed first because it provides the foundation for examining statistical power under the RD design.

The RA Design
Under the RA design, the difference in expected observed outcomes between treatments and controls can be calculated using (2) as follows:

expected observed outcome between treatment and controls equation

where the last equality holds because of random assignment. Accordingly, (YTRA -YCRA) is an unbiased estimator for the ATE parameter.

This simple differences-in-means ATE estimator can also be obtained by rearranging (2) and applying OLS methods to the following regression equation:

differences-in-means <em>ATE</em> estimator equation

where α0 = μC and α1 =(μTC). The error term ui =TiRA (YTiT) +(1 -TiRA)(YCiC) has mean zero and variance σ2 and is uncorrelated with TiRA.

Although not needed to produce unbiased estimates, Scorei can be included as an “irrelevant” variable in the regression equation to improve the precision of the impact estimates. The true model is still (4), but the estimation model is now:

estimation model

where ei is an error term (conditional on Scorei) with variance σe2. OLS methods yield consistent estimates of α1 in (5) because TiRA and Scorei are asymptotically uncorrelated due to random assignment (Schochet 2007; Yang and Tsiatis 2001). As discussed below, the model in (5) is used to compare the RA and RD designs.

The RD Design
Figure 4.1 displays graphically the theory underlying the RD design, where hypothetical posttest data (averaged to the unit level) are plotted against hypothetical treatment assignment scores (for example, pretest scores). The figure also displays fitted regression lines based on the observed data for treatments and controls, assuming constant treatment effects. The estimated impact under the RD design is the vertical difference between the two regression lines at the hypothetical score cutoff value of 50 (that is, at the point of discontinuity). The regression line for potential treatment group outcomes can be obtained by extending the regression line for the treatment group over the full score distribution, and similarly for potential control group outcomes. These extended regression lines pertain also to the fitted regression lines under the RA design, where units are randomly assigned across the entire score distribution.

Hahn, Todd, and Van der Klaauw (2001) formally prove that if the conditional expectations E(YTi|Scorei = S) and E(YCi|Scorei = S) are continuous in S (as in Figure 4.1), the average causal effect of the treatment at the cutoff score K can be identified by comparing average observed outcomes immediately to the right and left of K:

average observed outcomes

Using (2), this average causal effect, ATEK, can be expressed in terms of potential outcomes as follows:

potential outcomes

Equation (6) suggests that impact estimates under the RD design generalize to a population that is typically narrower (units with scores right around the cutoff score value) than under the RA design (units with scores that cover the full score distribution). In our case, the ATEK parameter equals the ATE parameter because of the constant treatment effects assumption, but this equality will not necessarily hold in general. The ATEK parameter can also be interpreted as a marginal average treatment effect (MATE) parameter (Heckman and Vytlacil 2005), which addresses whether a marginal expansion of the program is warranted for units with scores just beyond the cutoff value.

The RD-RA analogy is not exact, however, because under the RD design, chance alone may not fully determine which units are on either side of the cutoff. Furthermore, in many practical applications, there are not enough observations around the cutoff to obtain precise impact estimates. Thus, observations further from the cutoff are typically included in RD study samples (as in Figure 4.1). In these situations—which are the focus of this paper—treatment effects must be estimated using parametric or nonparametric methods where potential outcomes are modeled as a smooth function of the assignment scores. Unbiased impact estimates will result only if this outcome-score relationship is modeled correctly. Thus, unlike RA designs, RD designs hinge critically on the validity of key modeling assumptions.

For the analysis, it is assumed that the true functional form relationship between potential outcomes and scores is linear:

true functional form relationship between potential outcomes and scores

The same slope coefficient, α2, applies to both (7a) and (7b) because of the constant treatment effects assumption, and is the same coefficient as in (5) for the RA design. A linear specification is adopted, because this is a reasonable starting point for an analysis of data from RD designs, and simplifies the variance and power calculations. Furthermore, the linear specification is consistent with the local linear regression approach (Fan and Gijbels 1996) that has become increasingly popular in the literature for analyzing data under RD designs. It is also likely to approximately hold if the score is a pretest. The exact outcome-score relationship will depend on the specific design application, but the linearity (and constant treatment effects) assumptions will likely provide a lower bound on RD design effects.

Using (2), equations (7a) and (7b) yield the following regression model for the RD design:

regression model for the RD design

where ηi is a mean zero error term with variance ση2.3

In Appendix B, it is proved (for the more general multilevel models) that the OLS estimator α̂1 in (8) yields a consistent estimator of the ATEK parameter (and ATE parameter in our case) assuming that the model is specified correctly.4 Importantly, this result holds even if Scorei is correlated with ηi (for example, due to measurement error in Scorei), because conditional on Scorei, TiRD and ηi are independent. Thus, although the estimates of α0 and α2 will be asymptotically biased if Scorei is correlated with ηi, the estimator for α1 will be asymptotically unbiased. A similar situation occurs under the RA design in (5).

Top

2 Neyman (1923) considered a finite population RA model where YTi and YCi are assumed to be fixed for the study population and where the only source of randomness is treatment status. This paper considers a “superpopulation” version of the model (see, for example, Schochet 2007). Note that a finite population version of the RD model would need to assume that Scorei is random (for example, due to measurement error).
3 It is often convenient to include (Scorei -K) in the model rather than Scorei (especially if score-by-treatment interactions are included as covariates) so that α1 always represents the treatment effect at the cutoff score. This scaling, however, has no effect on the results presented in this paper, and thus, the simpler specification in (8) is used.
4 Rubin (1977) and Griliches and Ringstad (1971) provide proofs of this result for nonclustered designs in a slightly different context (see also Cappelleri et al. 1991).