Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes? - Chapter 5: Empirical Analysis

Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes?

NCEE 2009-4065
October 2009

Chapter 5: Empirical Analysis - Empirical Results

MDE Results

For context, this section first presents MDEs for impacts on test score gains and a study mediator using OLS estimates of α₁ in (1) and β₁ in (2). The section then presents simulation results from the statistical power analysis for γ₁.

Using (1) and (2) and the methods discussed above and in Schochet (2008a), the MDE formulas for student test score gains and a classroom-level mediating outcome are as follows:

(27) MDE(Test Score Gains) =2.802√Var(α̂_1,OLS) /σ_y² = 2.802√deff_B/ncmp(1 -p),

and

(28) MDE(Mediator) =2.802√Var(β̂_1,OLS) /σ_M² =2.802√deff_M/λncmp(1 -p),

where deff_B = [1 +ρ1(m -1) +ρ2(cm -1)] and deff_M = [1 +λ(ψc -1)] .

For typical RCT samples of 60 schools and 180 classrooms split evenly between the treatment and control groups and using the assumptions from above, the MDE on student gain scores is 0.27 (Table 5.1). With these samples, the MDE on a study mediator is 0.51 if λ=1 (that is, in the absence of measurement error), 0.66 if λ=0.5 and 0.98 if λ=0.2 (Table 5.1). With 300 study schools, the corresponding MDEs are about half as large.

Statistical Power Results for Mediator Effects

What are likely power levels for RCT exploratory analyses that aim to estimate associations between teacher practice and student achievement measures? To help answer this question, Tables 2 to 4 present the number of schools that are required to detect targeted mediator effects with power levels (probabilities) ranging from 0.60 to 0.90. Figures are presented for mediator effects within schools, between schools, and overall. In addition, figures are presented separately for reliability values of 0.2, 0.5, and 1.0 for the mediator (as defined in equation [13a]). Table 5.2 presents figures assuming that the teacher practice mediator explains 10 percent of the variance in classroom effects, while Tables 5.3 and 5.4 assume corresponding values of 20 percent and 5 percent, respectively. Figures for the between-school mediator effects are presented for both the OLS and IV estimators.

The two main empirical findings can be summarized as follows:

Finding 1: For typical RCTs with about 60 total study schools, the OLS approach will yield estimates of overall and within-school mediator effects with sufficient power under two stringent conditions: (1) the reliability of the mediator must be relatively large (at least 0.50), and (2) the mediator must explain a relatively large share of the classroom-level variation in student test score gains (at least 20 percent).

For instance, if λ_rel=0.5 and the teacher practice mediator explains 20 percent of the variance in classroom effects, a statistical power level of 80 percent could be achieved with 43 schools for the overall mediator effect and 53 schools for the within-school mediator effect (middle panel of Table 5.3). Stated differently, with 43 (53) schools, the RCT would have an 80 percent probability of finding a statistically significant overall (within-school) mediator effect. In contrast, if the reliability of the mediator was instead 0.2, the numbers of required schools would be 108 and 135, respectively (bottom panel of Table 5.3). Similarly, if the mediator explains only 10 percent of the variance in classroom effects, a power level of 80 percent could only be achieved with 60 study schools if λ_rel was close to 1 (Table 5.2).

These two conditions are intuitive. They imply that there must be a strong association between the mediator and student gain scores (so that the mediator is capturing key dimensions of teacher practices), and that there is sufficient signal in the observed mediator (that is, high reliability) so that this strong association can be estimated precisely.

Importantly, as discussed, these conditions are stringent. The finding that the mediator must explain at least 20 percent of the variation in estimated classroom effects implies a relatively high correlation of 0.45 between the two measures. Furthermore, Raudenbush et al. (2008) demonstrate that the reliability of teacher practice measures as defined in (13a) may not be high. Thus, in practice, it is more likely that 150 to 200 schools would be required to produce precise overall and within-school mediator associations using the OLS approach (Tables 5.2 and 5.4).

Finding 2: For typical RCT samples, the IV approach will yield estimates with very little statistical power for detecting between-school mediator associations. Even in the most favorable of the considered scenarios—where λ_rel =1 and the mediator explains 20 percent of the classroom-level variation in rel student test scores—more than 500 schools would be required under the IV approach to achieve a statistical power level of 80 percent (top panel of Table 5.3). Furthermore, more than 100 schools would be required under this best case scenario even if the impact on the mediator was 1.4 standard deviations (so that the treatment status indicator would explain about 50 percent of the variance in the mediator; not shown). Under less favorable scenarios, hundreds, or even thousands of schools would be required (Tables 5.2 and Table 5.4).

This low power occurs because the denominator of the asymptotic variance of the IV estimator includes the squared correlation between M_i and T_i which, as discussed, is likely to be small. Thus, although the IV approach can adjust for simultaneity and omitted variable biases that are likely to plague the OLS estimators, this approach has very little statistical power for mediator analyses.

Top