Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations - Chapter 6: Selecting the Score Range for the Sample

Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations

NCEE 2008-4026
August 2008

Chapter 1: Introduction
Chapter 2: Measuring Statistical Power
Chapter 3: Considered Designs
Chapter 4: Aggregated Designs: RD Design Theory and Design
Chapter 5: Multilevel RD Designs
Chapter 6: Selecting the Score Range for the Sample
Chapter 7: Illustrative Precision Calculations
Chapter 8: Summary and Conclusions
References
List of Tables
List of Figures
Appendix A
Appendix B
PDF & Related Info

Chapter 6: Selecting the Score Range for the Sample

A central issue for designing RD studies is assessing the appropriate range of scores for selecting the sample (that is, the score bandwidth around the cutoff score). In some evaluations where the sample universe is small, a large proportion of study-eligible units (with a wide range of scores) must be included in the sample for power reasons, and the estimation of impacts must rely heavily on modeling assumptions. This was the case, for example, in the Early Reading First (Jackson et al. 2007) and Reading First (Bloom et al. 2005b) evaluations, where the sample universe consisted of a relatively small number of grantees who applied for program funds. In other designs, however, there may be many available potential study units, but only a subsample can be included in the study for cost reasons. In these cases, how should study units be sampled?

The key advantage of selecting a narrow bandwidth around the cutoff score is that this approach will likely yield impact estimates with little bias, because the correct posttest-score relationship can usually be specified (and is likely to be approximately linear). Increasing the bandwidth could increase bias if the posttest-score relationship varies across different regions of the score distribution, thereby making the modeling more difficult.

There are, however, three main disadvantages of using a narrower versus wider bandwidth. First, for a given sample size, a narrower bandwidth could yield less precise impact estimates if the outcome-score relationship can be correctly modeled using a wider range of scores. For instance, as discussed above, if scores have a truncated normal distribution and p=0.50, the RD design effect tends to decrease as the bandwidth increases (although this pattern does not generally hold). Second, in instances where there is a limited sample around the cutoff score, widening the bandwidth could yield larger samples, thereby increasing statistical precision.

A third disadvantage of using a narrow bandwidth is that the study will have less basis for extrapolating impact findings to units with scores further away from the cutoff. Theoretically, impact findings from the RD design generalize only to units with scores near the cutoff value. However, the estimated parametric regression lines for the treatment and control groups could be extended to obtain impact estimates for units over a wider score range (see Figure 4.1). These extrapolations are likely to be more defensible if the bandwidth is wider rather than narrower (that is, if the sample contains units that cover a broad range of scores).

The choice of the appropriate bandwidth could involve a variance versus bias tradeoff. Methods have been developed for assessing the optimal score bandwidth after data have been collected. For example, Ludwig and Miller (2007) propose a cross-validation criterion which selects the bandwidth to minimize the average squared distance between actual outcome values and predicted values from the fitted regression lines. A variant of this approach is to estimate weighted regressions where kernel functions are used to assign larger weights to data points closer to the cutoff than to those further from the cutoff (Porter 2003).

These same approaches could be used to select the appropriate bandwidth in the design phase of RD studies if pertinent secondary data are available for analysis. In these cases, criteria for selecting the bandwidth should include the goodness-of-fit statistics based on the cross-validation models, available bandwidth sample sizes, and external validity considerations.

Top