Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations - Chapter 4: Aggregated Designs: RD Design Theory and Design

Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations

NCEE 2008-4026
August 2008

Chapter 1: Introduction
Chapter 2: Measuring Statistical Power
Chapter 3: Considered Designs
Chapter 4: Aggregated Designs: RD Design Theory and Design
- Theoretical Underpinnings
- Variance Calculations
- The Design Effect for the RD Design
- The RD Design Effect for the MDE Calculations
- Including Additional Baseline Covariates
- The Fuzzy RD Design
Chapter 5: Multilevel RD Designs
Chapter 6: Selecting the Score Range for the Sample
Chapter 7: Illustrative Precision Calculations
Chapter 8: Summary and Conclusions
References
List of Tables
List of Figures
Appendix A
Appendix B
PDF & Related Info

Chapter 4: Aggregated Designs: RD Design Theory and Design

Theoretical Underpinnings

This paper considers both RD and RA designs where n study units are assigned to either a single treatment or control condition (for simplicity, the comparison group under the RD design is hereafter referred to as the "control" group). The sample contains np treatment units and n(1-p) control units, where p is the sampling rate to the treatment group (0< p <1).

Let Y_Ti be the "potential" outcome for unit i in the treatment condition and Y_Ci be the potential outcome for unit i in the control condition. Potential outcomes for the n study units are assumed to be random draws from potential treatment and control outcome distributions in the study population. The means of these distributions are denoted by μ_T for potential treatment outcomes and μ_C for potential control outcomes. It is assumed further that Score_i— the variable that is used to assign units to a research status under the RD design—is a random draw from the population score distribution with mean μ_S and variance σ_S². To consistently compare statistical power under the RD and RA designs, it is assumed that the score variable is also available for the RA design.²

The difference between the two potential outcomes, (Y_Ti -Y_Ci), is the unit-level treatment effect, and the average treatment effect parameter (ATE) under this "superpopulation" causal inference model is ATE =E(Y_T -Y_C) =μ_T -μ_C The unit-level treatment effects, and hence, the ATE parameter, cannot be calculated directly because for each unit, the potential outcome is observed in either the treatment or control condition, but not in both. Formally, if T_i is a treatment status indicator variable that equals 1 for treatments and 0 for controls, then the observed outcome for a unit, y_i, can be expressed as follows:

outcome for a unit

The simple relation in (2) forms the basis for the theory underlying both the RA and RD designs.

In what follows, constant treatment effects are assumed within the population, which implies (1) the same variance, σ², for the random variables Y_Ti and Y_Ci, and (2) the same covariance, σ_SY (and associated correlation, ρ_SY) between Score_i and Y_Ti and Y_Ci. These assumptions are consistent with ordinary least squares (OLS) methods that are typically used to estimate program impacts in education research, and are required to ensure that variances based on OLS methods are justified by the Neyman model of causal inference (Freedman 2008; Schochet 2007).

The RA and RD designs differ in the treatment assignment process. Under the RA design, treatment status, T_i^RA, is assigned randomly to study units, whereas under the RD design, treatment status, T_i^RD, is assigned depending on whether Score_i is larger or smaller than a cutoff value K. This paper considers RD designs with the following treatment assignment rule:

treatment assignment rule

All results apply, however, if, instead, the treatment were offered to those with scores less than K. For simplicity, the same cutoff value is assumed within and across study sites.

Next, the RA and RD designs are discussed in more detail. The RA design is discussed first because it provides the foundation for examining statistical power under the RD design.

The RA Design
Under the RA design, the difference in expected observed outcomes between treatments and controls can be calculated using (2) as follows:

expected observed outcome between treatment and controls equation

where the last equality holds because of random assignment. Accordingly, (Y_T^RA -Y_C^RA) is an unbiased estimator for the ATE parameter.

This simple differences-in-means ATE estimator can also be obtained by rearranging (2) and applying OLS methods to the following regression equation:

differences-in-means <em>ATE</em> estimator equation

where α₀ = μ_C and α₁ =(μ_T -μ_C). The error term u_i =T_i^RA (Y_Ti -μ_T) +(1 -T_i^RA)(Y_Ci -μ_C) has mean zero and variance σ² and is uncorrelated with T_i^RA.

Although not needed to produce unbiased estimates, Score_i can be included as an “irrelevant” variable in the regression equation to improve the precision of the impact estimates. The true model is still (4), but the estimation model is now:

estimation model

where e_i is an error term (conditional on Score_i) with variance σ_e². OLS methods yield consistent estimates of α₁ in (5) because T_i^RA and Score_i are asymptotically uncorrelated due to random assignment (Schochet 2007; Yang and Tsiatis 2001). As discussed below, the model in (5) is used to compare the RA and RD designs.

The RD Design
Figure 4.1 displays graphically the theory underlying the RD design, where hypothetical posttest data (averaged to the unit level) are plotted against hypothetical treatment assignment scores (for example, pretest scores). The figure also displays fitted regression lines based on the observed data for treatments and controls, assuming constant treatment effects. The estimated impact under the RD design is the vertical difference between the two regression lines at the hypothetical score cutoff value of 50 (that is, at the point of discontinuity). The regression line for potential treatment group outcomes can be obtained by extending the regression line for the treatment group over the full score distribution, and similarly for potential control group outcomes. These extended regression lines pertain also to the fitted regression lines under the RA design, where units are randomly assigned across the entire score distribution.

Hahn, Todd, and Van der Klaauw (2001) formally prove that if the conditional expectations E(Y_Ti|Score_i = S) and E(Y_Ci|Score_i = S) are continuous in S (as in Figure 4.1), the average causal effect of the treatment at the cutoff score K can be identified by comparing average observed outcomes immediately to the right and left of K:

average observed outcomes

Using (2), this average causal effect, ATE_K, can be expressed in terms of potential outcomes as follows:

potential outcomes

Equation (6) suggests that impact estimates under the RD design generalize to a population that is typically narrower (units with scores right around the cutoff score value) than under the RA design (units with scores that cover the full score distribution). In our case, the ATE_K parameter equals the ATE parameter because of the constant treatment effects assumption, but this equality will not necessarily hold in general. The ATE_K parameter can also be interpreted as a marginal average treatment effect (MATE) parameter (Heckman and Vytlacil 2005), which addresses whether a marginal expansion of the program is warranted for units with scores just beyond the cutoff value.

The RD-RA analogy is not exact, however, because under the RD design, chance alone may not fully determine which units are on either side of the cutoff. Furthermore, in many practical applications, there are not enough observations around the cutoff to obtain precise impact estimates. Thus, observations further from the cutoff are typically included in RD study samples (as in Figure 4.1). In these situations—which are the focus of this paper—treatment effects must be estimated using parametric or nonparametric methods where potential outcomes are modeled as a smooth function of the assignment scores. Unbiased impact estimates will result only if this outcome-score relationship is modeled correctly. Thus, unlike RA designs, RD designs hinge critically on the validity of key modeling assumptions.

For the analysis, it is assumed that the true functional form relationship between potential outcomes and scores is linear:

true functional form relationship between potential outcomes and scores

The same slope coefficient, α₂, applies to both (7a) and (7b) because of the constant treatment effects assumption, and is the same coefficient as in (5) for the RA design. A linear specification is adopted, because this is a reasonable starting point for an analysis of data from RD designs, and simplifies the variance and power calculations. Furthermore, the linear specification is consistent with the local linear regression approach (Fan and Gijbels 1996) that has become increasingly popular in the literature for analyzing data under RD designs. It is also likely to approximately hold if the score is a pretest. The exact outcome-score relationship will depend on the specific design application, but the linearity (and constant treatment effects) assumptions will likely provide a lower bound on RD design effects.

Using (2), equations (7a) and (7b) yield the following regression model for the RD design:

regression model for the RD design

where η_i is a mean zero error term with variance σ_η².³

In Appendix B, it is proved (for the more general multilevel models) that the OLS estimator α̂₁ in (8) yields a consistent estimator of the ATE_K parameter (and ATE parameter in our case) assuming that the model is specified correctly.⁴ Importantly, this result holds even if Score_i is correlated with η_i (for example, due to measurement error in Scorei), because conditional on Score_i, T_i^RD and η_i are independent. Thus, although the estimates of α₀ and α₂ will be asymptotically biased if Score_i is correlated with η_i, the estimator for α₁ will be asymptotically unbiased. A similar situation occurs under the RA design in (5).

Top

² Neyman (1923) considered a finite population RA model where Y_Ti and Y_Ci are assumed to be fixed for the study population and where the only source of randomness is treatment status. This paper considers a “superpopulation” version of the model (see, for example, Schochet 2007). Note that a finite population version of the RD model would need to assume that Score_i is random (for example, due to measurement error).
³ It is often convenient to include (Score_i -K) in the model rather than Score_i (especially if score-by-treatment interactions are included as covariates) so that α₁ always represents the treatment effect at the cutoff score. This scaling, however, has no effect on the results presented in this paper, and thus, the simpler specification in (8) is used.
⁴ Rubin (1977) and Griliches and Ringstad (1971) provide proofs of this result for nonclustered designs in a slightly different context (see also Cappelleri et al. 1991).