This paper has examined, both theoretically and empirically, the extent to which typical large-scale school-based RCTs in the education area will have sufficient statistical power for conducting analyses to estimate associations between teacher practice mediators and student gain scores. These exploratory analyses are of interest to quantitatively link impact estimates on teachers and students, as postulated by the study’s conceptual model.

The theory in the paper developed asymptotic formulas for calculating statistical power for detecting mediator effects using two regression approaches. First, the paper considered a simple OLS (correlational) approach, which can easily accommodate multiple mediators, but which may yield biased estimates due to omitted variables, simultaneity, and measurement error. Thus, an IV approach, where treatment status is used as an instrument for the mediator, was also considered to help avoid these biases. For both approaches, the power formulas incorporate precision losses due to measurement error in the mediator.

In the empirical analysis, the theoretical formulas were used to simulate the likely statistical power of mediator analyses for the considered models. A key finding is that for typical RCTs with about 60 total study schools, OLS methods will yield precise estimates of mediator effects under two stringent conditions. First, the reliability of the observed teacher practice mediator as defined in equation (13a) must be at least 0.50. Second, the correlation between the mediator and estimated classroom effects must be at least 0.45, so that the mediator must explain a good deal of the classroom-level variation in student gain scores.

For several reasons, however, these conditions are likely to be stringent in practice. First, Raudenbush et al. (2008) demonstrate that currently available mediators from classroom observation data may have reliabilities that are lower than 0.50, due to considerable variability in rater measurements and teacher practices throughout the school day. Second, as discussed in this paper, studies of educational interventions often find weak associations between classroom practices and student outcomes, suggesting that mediator-test score correlations may be considerably lower than 0.45. Thus, it is more likely that about 150-200 schools would be required to produce precise estimates of mediator effects using the OLS approach.

The conditions under which the OLS approach will yield unbiased estimates seem unlikely to hold in practice. Thus, the IV approach may be preferable because the key condition under which it can produce unbiased estimates—the exclusion restriction that all intervention effects on student gain scores must work through the mediator—may be plausible for some interventions. However, the IV approach has very little statistical power for mediator analyses. Furthermore, there are other limitations of the IV approach, such as finding suitable instruments when multiple mediators are included in the model, the fact that only between-school mediator effects can be identified, and that full population causal effects can be estimated only under certain conditions, such as constant treatment effects.

Thus, results from this paper suggest that unless the sample contains a large number of schools (about 150-200), regression-based mediator analyses are likely to be informative only if new mediators can be developed that have higher reliabilities and stronger associations with student learning measures. Even with these improved measures, however, mediator analyses will need to rely on OLS methods—which could produce biased estimates—because sample size requirements would be prohibitively large using the IV approach.

The findings from this paper may have implications for the types of mediators that RCTs currently collect and the budget allocated to collecting these expensive data. For instance, mediators that assess the fidelity of implementation of the intervention may have descriptive importance for RCTs to help understand the impact findings. However, measures of teacher practices may be of less use if there is little chance that significant mediator-test score relationships can be detected. In these cases, the evaluation may have sufficient power for detecting impacts on the teacher practice mediators and student test scores in isolation, but would have little basis for quantitatively linking these two sets of outcomes and impacts. Thus, these classroom practice mediators may be of little help in confirming the study’s conceptual model and identifying teacher practices that are most associated with student learning gains.