Skip Navigation
Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes?

NCEE 2009-4065
October 2009

Chapter 5: Empirical Analysis - Additional Assumptions for the Statistical Power Calculations

The statistical power calculations were conducted using the following "real-world" assumptions: (1) a two-tailed test, (2) a 5 percent significance level, (3) a balanced allocation of schools to the treatment and control groups (p = 0.5), (4) an average of 3 classrooms per school (c = 3 ), (5) an average of 23 students per classroom, (6) data on student test score gains are available for 80 percent of students in the sample (so that m = 18.2 ), and (7) data on mediating outcomes are available for all teachers.

The statistical power calculations also required real-world assumptions on values for several additional parameters that enter the non-centrality parameter formulas, as discussed next.

Reliability-Related Parameters (λrel, λ, and λB). The reliability of a teacher practice mediator, λrel as defined in (13a), will likely depend on the nature of the mediator and the study design. For example, reliability may differ for a mediator constructed using classroom observation data, principal ratings, or teacher survey data. Because of this uncertainty, the power calculations were conducted assuming reliability values of 0.2, 0.5, and 1.0. Although perfect reliability is never attainable, reliability values of 1 are used in the analysis as a best-case scenario.

The 0.2 and 0.5 values are in the range of plausible values for λrel reported in Raudenbush et al. (2008) based on an analysis of Classroom Assessment Scoring System (CLASS) data. Raudenbush et al. (2008) estimated the measurement error variances in (13) using the observed variation in instructional climate scores across raters and time segments. The 0.2 to 0.5 reliability values are lower than those usually reported for commonly-used classroom observation protocols. This is because the reliability values found in the literature are typically based on the internal consistency of item responses, and do not typ

Finally, for simplicity, the same parameter values are used for λ, λrel, and λB even though these parameters may differ in practice.

The ratios ψ and ψObs. These parameters represent the extent to which mean mediator values vary across schools, and enter the design effect formulas. As discussed, these parameters can be obtained from ICC estimates for the mediator. These ICCs, however, are not typically reported in study reports, and there is no literature that collates such ICC estimates from previous studies. Thus, to obtain plausible ICC values, classroom observation mediators were analyzed from two large school-based education RCTs: (1) the Evaluation of the Effectiveness of Selected Supplemental Reading Comprehension Interventions (James-Burdumy et al. 2009), and (2) the Evaluation of Comprehensive Teacher Induction Programs (Glazerman et al. 2008). The Reading Comprehension study used the Expository Reading Comprehension (ERC) Classroom Observation Instrument, and the Teacher Induction study used the Vermont Classroom Observation Tool (Saginor and Hyjek 2005).

The ICC estimates for the mediators differ for the two studies. The ICC estimates for the Reading Comprehension study are 0.21 for the interactive teaching scale, 0.33 for the strategy instruction scale, 0.26 for the effective instruction behavioral scale, and 0.20 for the classroom management scale. The ICC estimates for the Teacher Induction study are 0.11 for the lesson content scale, 0.01 for the classroom culture scale, and 0.08 for the lesson implementation scale.

Due to this variation, a conservative mediator ICC value of 0.15 was assumed for the analysis, which implies an estimate of about 0.5 for ψ. This 0.5 value was also assumed for ψObs (although ψObs and ψ may differ in practice).

RMB,T2 and RM,T2 values. The RMB,T2 parameter is the population squared correlation between Mi and Ti, and is a function of the size of the treatment effect on the mediator. To obtain plausible values for this parameter, it is convenient to use the relation from (2) that RMB,T21,effp(1-p), where β1,eff = β12MB2 is the squared impact on Mimeasured in effect size (standard deviation) units. Thus,estimates of RMB,T2 can be obtained using estimates of β1,eff2.

Two similar approaches were used for obtaining plausible values for β1,eff2. First, a "rule-of-thumb" from the IV literature is that if the F =t =β̂12 /Vâr(β̂1) statistic from (2) is 10, then Ti can be considered to be a strong instrument for Mi (see Murray 2006 and Stock et al. 2002). With 60 study schools (a typical sample size), this condition implies that β1,eff =0.66 and, thus, that RMB,T2 =0.11 (see (28)). The second approach is to set β1eff equal to the minimum detectable impact in effect size units (MDE) for the mediator. With 60 schools, this approach yields β1eff =MDE =0.51 and RMB,T2 =0.07 (see (28)).

Based on these analyses, an RMB,T2 value of 0.10 was used for the simulations. Importantly, this small RMB,T2 value suggests that the variance of the IV estimator will be large, because RRMB,T2 nters the denominator of the IV variance formulas. Furthermore, this denominator term will matter unless the impact on the mediator is unrealistically large. For example, the impact on the mediator would need to be 1.4 standard deviations to yield an RMB,T2 value of 0.5, and 1.8 standard deviations to yield an RMB,T2 value of 0.8.

Finally, because RMB,T2RMB,T2, an RMB,T2 value of 0.05 was used for the simulations, which was obtained by multiplying estimates of ψ=0.50 and RMB,T2=0.10 .