Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes? - Chapter 5: Empirical Analysis - Additional Assumptions for the Statistical Power Calculations

Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes?

NCEE 2009-4065
October 2009

Chapter 1: Introduction
Chapter 2: Definition of a Mediator
Chapter 3: Theoretical Framework
Chapter 4: Statistical Power Formulas
Chapter 5: Empirical Analysis
- Identifying Plausible R² Values
- Additional Assumptions for the Statistical Power Calculations
- Empirical Results
Chapter 6: Summary and Conclusions
Appendix A: Proof of Equation (18)
References
List of Tables
List of Figures
PDF & Related Info

Chapter 5: Empirical Analysis - Additional Assumptions for the Statistical Power Calculations

The statistical power calculations were conducted using the following "real-world" assumptions: (1) a two-tailed test, (2) a 5 percent significance level, (3) a balanced allocation of schools to the treatment and control groups (p = 0.5), (4) an average of 3 classrooms per school (c = 3 ), (5) an average of 23 students per classroom, (6) data on student test score gains are available for 80 percent of students in the sample (so that m = 18.2 ), and (7) data on mediating outcomes are available for all teachers.

The statistical power calculations also required real-world assumptions on values for several additional parameters that enter the non-centrality parameter formulas, as discussed next.

Reliability-Related Parameters (λ_rel, λ, and λ_B). The reliability of a teacher practice mediator, λ_rel as defined in (13a), will likely depend on the nature of the mediator and the study design. For example, reliability may differ for a mediator constructed using classroom observation data, principal ratings, or teacher survey data. Because of this uncertainty, the power calculations were conducted assuming reliability values of 0.2, 0.5, and 1.0. Although perfect reliability is never attainable, reliability values of 1 are used in the analysis as a best-case scenario.

The 0.2 and 0.5 values are in the range of plausible values for λ_rel reported in Raudenbush et al. (2008) based on an analysis of Classroom Assessment Scoring System (CLASS) data. Raudenbush et al. (2008) estimated the measurement error variances in (13) using the observed variation in instructional climate scores across raters and time segments. The 0.2 to 0.5 reliability values are lower than those usually reported for commonly-used classroom observation protocols. This is because the reliability values found in the literature are typically based on the internal consistency of item responses, and do not typ

Finally, for simplicity, the same parameter values are used for λ, λ_rel, and λ_B even though these parameters may differ in practice.

The ratios ψ and ψ^Obs. These parameters represent the extent to which mean mediator values vary across schools, and enter the design effect formulas. As discussed, these parameters can be obtained from ICC estimates for the mediator. These ICCs, however, are not typically reported in study reports, and there is no literature that collates such ICC estimates from previous studies. Thus, to obtain plausible ICC values, classroom observation mediators were analyzed from two large school-based education RCTs: (1) the Evaluation of the Effectiveness of Selected Supplemental Reading Comprehension Interventions (James-Burdumy et al. 2009), and (2) the Evaluation of Comprehensive Teacher Induction Programs (Glazerman et al. 2008). The Reading Comprehension study used the Expository Reading Comprehension (ERC) Classroom Observation Instrument, and the Teacher Induction study used the Vermont Classroom Observation Tool (Saginor and Hyjek 2005).

The ICC estimates for the mediators differ for the two studies. The ICC estimates for the Reading Comprehension study are 0.21 for the interactive teaching scale, 0.33 for the strategy instruction scale, 0.26 for the effective instruction behavioral scale, and 0.20 for the classroom management scale. The ICC estimates for the Teacher Induction study are 0.11 for the lesson content scale, 0.01 for the classroom culture scale, and 0.08 for the lesson implementation scale.

Due to this variation, a conservative mediator ICC value of 0.15 was assumed for the analysis, which implies an estimate of about 0.5 for ψ. This 0.5 value was also assumed for ψ_Obs (although ψ_Obs and ψ may differ in practice).

R_{M_B,T}² and R_M,T² values. The R_{M_B,T}² parameter is the population squared correlation between M_i and T_i, and is a function of the size of the treatment effect on the mediator. To obtain plausible values for this parameter, it is convenient to use the relation from (2) that R_{M_B,T}² =β_1,effp(1-p), where β_1,eff = β₁² /σ_{M_B}² is the squared impact on M_imeasured in effect size (standard deviation) units. Thus,estimates of R_{M_B,T}² can be obtained using estimates of β_1,eff².

Two similar approaches were used for obtaining plausible values for β_1,eff². First, a "rule-of-thumb" from the IV literature is that if the F =t =β̂₁² /Vâr(β̂₁) statistic from (2) is 10, then T_i can be considered to be a strong instrument for M_i (see Murray 2006 and Stock et al. 2002). With 60 study schools (a typical sample size), this condition implies that β_1,eff =0.66 and, thus, that R_{M_B,T}² =0.11 (see (28)). The second approach is to set β_1eff equal to the minimum detectable impact in effect size units (MDE) for the mediator. With 60 schools, this approach yields β_1eff =MDE =0.51 and R_{M_B,T}² =0.07 (see (28)).

Based on these analyses, an R_{M_B,T}² value of 0.10 was used for the simulations. Importantly, this small R_{M_B,T}² value suggests that the variance of the IV estimator will be large, because RR_{M_B,T}² nters the denominator of the IV variance formulas. Furthermore, this denominator term will matter unless the impact on the mediator is unrealistically large. For example, the impact on the mediator would need to be 1.4 standard deviations to yield an R_{M_B,T}² value of 0.5, and 1.8 standard deviations to yield an R_{M_B,T}² value of 0.8.

Finally, because R_{M_B,T}² =ψR_{M_B,T}², an R_{M_B,T}² value of 0.05 was used for the simulations, which was obtained by multiplying estimates of ψ=0.50 and R_{M_B,T}²=0.10 .

Top