Technical Methods Report: Statistical Power for Regression Discontinuity Designs in Education Evaluations

NCEE 2008-4026
August 2008

Chapter 1: Introduction
Chapter 2: Measuring Statistical Power
Chapter 3: Considered Designs
Chapter 4: Aggregated Designs: RD Design Theory and Design
Chapter 5: Multilevel RD Designs
Chapter 6: Selecting the Score Range for the Sample
Chapter 7: Illustrative Precision Calculations
Chapter 8: Summary and Conclusions
References
List of Tables
List of Figures
Appendix A
Appendix B
PDF & Related Info

Chapter 2: Measuring Statistical Power

An important part of any evaluation design is the statistical power analysis, which demonstrates how well the design of the study will be able to distinguish real impacts from chance differences. To determine appropriate sample sizes for impact evaluations, researchers typically calculate minimum detectable impacts, which represent the smallest program impacts—average treatment and comparison group differences—that can be detected with a high probability. In addition, it is common to standardize minimum detectable impacts into effect size units—that is, as a percentage of the standard deviation of the outcome measures (also known as Cohen's d)—to facilitate the comparison of findings across outcomes that are measured on different scales (Cohen 1988). Hereafter, minimum detectable impacts in effect size units are denoted as "MDEs."

Mathematically, the MDE formula can be expressed as follows:

MDE formula

where Var(impact) is the variance of the impact estimate, σ is the standard deviation of the outcome measure, and Factor(.) is a constant that is a function of the significance level (α), statistical power (β), and the number of degrees of freedom.¹ Factor(.) becomes larger as α and df decrease and as β increases (see Table A.1).

As an example, consider an experimental design with a single treatment and control group and α=.05 and β=.80. In this case, for a given sample size and design structure, there is an 80 percent probability that a two-sample t-test will yield a statistically significant impact estimate at the 5 percent significance level if the true impact were equal to the MDE value in equation (1).

This approach for measuring statistical power differs slightly from the one used in Cappelleri et al. (1994) who apply Fisher's Z transformation to the partial correlation coefficient between the outcome measure and treatment status. This difference in metric accounts for the small differences between comparable results in this paper and those in Cappelleri et al. (1994).

¹ Specifically, Factor(.) can be expressed as [T^-1(α) + T^-1(β)] for a one-tailed test and [T^- (α/2) + T^-1(β)] for a two-tailed test, where T^-1(.) is the inverse of the student's t distribution function with df degrees of freedom (see Murray 1998 and Bloom 2004 for derivations of these formulas). Equation (1) ignores the estimation error in the standard deviation.