An important part of any evaluation design is the statistical power analysis, which demonstrates how well the design of the study will be able to distinguish real impacts from chance differences. To determine appropriate sample sizes for impact evaluations, researchers typically calculate minimum detectable impacts, which represent the smallest program impacts—average treatment and comparison group differences—that can be detected with a high probability. In addition, it is common to standardize minimum detectable impacts into effect size units—that is, as a percentage of the standard deviation of the outcome measures (also known as Cohen's d)—to facilitate the comparison of findings across outcomes that are measured on different scales (Cohen 1988). Hereafter, minimum detectable impacts in effect size units are denoted as "MDEs."
Mathematically, the MDE formula can be expressed as follows:
where Var(impact) is the variance of the impact estimate, σ is the standard deviation of the outcome measure, and Factor(.) is a constant that is a function of the significance level (α), statistical power (β), and the number of degrees of freedom.1 Factor(.) becomes larger as α and df decrease and as β increases (see Table A.1).
As an example, consider an experimental design with a single treatment and control group and α=.05 and β=.80. In this case, for a given sample size and design structure, there is an 80 percent probability that a two-sample t-test will yield a statistically significant impact estimate at the 5 percent significance level if the true impact were equal to the MDE value in equation (1).
This approach for measuring statistical power differs slightly from the one used in Cappelleri et al. (1994) who apply Fisher's Z transformation to the partial correlation coefficient between the outcome measure and treatment status. This difference in metric accounts for the small differences between comparable results in this paper and those in Cappelleri et al. (1994).