- Chapter 1: Introduction
- Chapter 2: Measuring Statistical Power
- Chapter 3: Considered Designs
- Chapter 4: Aggregated Designs: RD Design Theory and Design
- Chapter 5: Multilevel RD Designs
- Chapter 6: Selecting the Score Range for the Sample
- Chapter 7: Illustrative Precision Calculations
- Chapter 8: Summary and Conclusions
- References
- List of Tables
- List of Figures
- Appendix A
- Appendix B
- PDF & Related Info

This paper considers both RD and RA designs where *n* study units are assigned
to either a single treatment or control condition (for simplicity, the comparison
group under the RD design is hereafter referred to as the "control" group). The
sample contains *np* treatment units and *n(1-p)* control units, where
*p* is the sampling rate to the treatment group (0< p <1).

Let *Y _{Ti}* be the "potential" outcome for unit i in the treatment
condition and

The difference between the two potential outcomes, (*Y _{Ti}* -

The simple relation in (2) forms the basis for the theory underlying both the RA and RD designs.

In what follows, constant treatment effects are assumed within the population, which
implies (1) the same variance, σ^{2}, for the random variables *Y _{Ti}*
and

The RA and RD designs differ in the treatment assignment process. Under the RA design,
treatment status, *T _{i}^{RA}*, is assigned randomly to study
units, whereas under the RD design, treatment status,

All results apply, however, if, instead, the treatment were offered to those with
scores less than *K*. For simplicity, the same cutoff value is assumed within
and across study sites.

Next, the RA and RD designs are discussed in more detail. The RA design is discussed first because it provides the foundation for examining statistical power under the RD design.

**The RA Design**

Under the RA design, the difference in expected observed outcomes between treatments
and controls can be calculated using (2) as follows:

where the last equality holds because of random assignment. Accordingly, (*Y _{T}^{RA}* -

This simple differences-in-means *ATE* estimator can also be obtained by
rearranging (2) and applying OLS methods to the following regression equation:

where α_{0} = μ_{C} and α_{1} =(μ_{T}
-μ_{C}). The error term *u _{i}* =

Although not needed to produce unbiased estimates, *Score _{i}* can
be included as an “irrelevant” variable in the regression equation to improve the
precision of the impact estimates. The true model is still (4), but the estimation
model is now:

where *e _{i}* is an error term (conditional on

**The RD Design**

Figure 4.1 displays graphically the theory underlying
the RD design, where hypothetical posttest data (averaged to the unit level) are
plotted against hypothetical treatment assignment scores (for example, pretest scores).
The figure also displays fitted regression lines based on the observed data for
treatments and controls, assuming constant treatment effects. The estimated impact
under the RD design is the vertical difference between the two regression lines
at the hypothetical score cutoff value of 50 (that is, at the point of discontinuity).
The regression line for *potential* treatment group outcomes can be obtained
by extending the regression line for the treatment group over the full score distribution,
and similarly for potential control group outcomes. These extended regression lines
pertain also to the fitted regression lines under the RA design, where units are
randomly assigned across the entire score distribution.

Hahn, Todd, and Van der Klaauw (2001) formally
prove that if the conditional expectations *E*(*Y _{Ti}*|

Using (2), this average causal effect, *ATE _{K}*, can be expressed
in terms of potential outcomes as follows:

Equation (6) suggests that impact estimates under the RD design generalize to a
population that is typically narrower (units with scores right around the cutoff
score value) than under the RA design (units with scores that cover the *full*
score distribution). In our case, the *ATE _{K}* parameter equals
the

The RD-RA analogy is not exact, however, because under the RD design, chance alone may not fully determine which units are on either side of the cutoff. Furthermore, in many practical applications, there are not enough observations around the cutoff to obtain precise impact estimates. Thus, observations further from the cutoff are typically included in RD study samples (as in Figure 4.1). In these situations—which are the focus of this paper—treatment effects must be estimated using parametric or nonparametric methods where potential outcomes are modeled as a smooth function of the assignment scores. Unbiased impact estimates will result only if this outcome-score relationship is modeled correctly. Thus, unlike RA designs, RD designs hinge critically on the validity of key modeling assumptions.

For the analysis, it is assumed that the true functional form relationship between potential outcomes and scores is linear:

The same slope coefficient, α_{2}, applies to both (7a) and (7b) because
of the constant treatment effects assumption, and is the same coefficient as in
(5) for the RA design. A linear specification is adopted, because this is a reasonable
starting point for an analysis of data from RD designs, and simplifies the variance
and power calculations. Furthermore, the linear specification is consistent with
the local linear regression approach (Fan and Gijbels
1996) that has become increasingly popular in the literature for analyzing
data under RD designs. It is also likely to approximately hold if the score is a
pretest. The exact outcome-score relationship will depend on the specific design
application, but the linearity (and constant treatment effects) assumptions will
likely provide a lower bound on RD design effects.

Using (2), equations (7a) and (7b) yield the following regression model for the RD design:

where η_{i} is a mean zero error term with variance σ_{η}^{2}.^{3}

In Appendix B, it is proved (for the more general multilevel
models) that the OLS estimator α̂_{1} in (8) yields a consistent
estimator of the *ATE _{K}* parameter (and