- Chapter 1: Introduction
- Chapter 2: Measuring Statistical Power
- Chapter 3: Considered Designs
- Chapter 4: Aggregated Designs: RD Design Theory and Design
- Chapter 5: Multilevel RD Designs
- Chapter 6: Selecting the Score Range for the Sample
- Chapter 7: Illustrative Precision Calculations
- Chapter 8: Summary and Conclusions
- References
- List of Tables
- List of Figures
- Appendix A
- Appendix B
- PDF & Related Info

An important part of any evaluation design is the statistical power analysis, which
demonstrates how well the design of the study will be able to distinguish real impacts
from chance differences. To determine appropriate sample sizes for impact evaluations,
researchers typically calculate minimum detectable impacts, which represent the
smallest program impacts—average treatment and comparison group differences—that
can be detected with a high probability. In addition, it is common to standardize
minimum detectable impacts into *effect size units*—that is, as a percentage
of the standard deviation of the outcome measures (also known as Cohen's *d*)—to
facilitate the comparison of findings across outcomes that are measured on different
scales (Cohen 1988). Hereafter, minimum detectable
impacts in effect size units are denoted as "MDEs."

Mathematically, the MDE formula can be expressed as follows:

where *Var(impact)* is the variance of the impact estimate, σ is the
standard deviation of the outcome measure, and *Factor(.)* is a constant
that is a function of the significance level (α), statistical power (β),
and the number of degrees of freedom.^{1} *Factor(.)*
becomes larger as α and *df* decrease and as β increases (see
Table A.1).

As an example, consider an experimental design with a single treatment and control
group and α=.05 and β=.80. In this case, for a given sample size and design
structure, there is an 80 percent probability that a two-sample *t*-test
will yield a statistically significant impact estimate at the 5 percent significance
level if the true impact were equal to the MDE value in equation (1).

This approach for measuring statistical power differs slightly from the one used in Cappelleri et al. (1994) who apply Fisher's Z transformation to the partial correlation coefficient between the outcome measure and treatment status. This difference in metric accounts for the small differences between comparable results in this paper and those in Cappelleri et al. (1994).