Technical Methods Report: Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes?

NCEE 2009-4065
October 2009

Chapter 1: Introduction

Randomized control trials (RCTs) in the education field often test interventions that aim to improve teacher practices, with the ultimate goal of increasing student academic achievement. These interventions typically provide enhanced services to teachers, such as training in a new reading or math curriculum, mentoring services, or the introduction of new technologies or materials in the classroom. Consequently, the conceptual model for these RCTs posits that improvements in student outcomes are mediated by treatment-induced improvements in teacher practices.

Given this conceptual model, RCTs often collect data on mediating teacher practice outcomes (using classroom observation protocols, videotaping, principal ratings, and teacher logs or surveys) and on student outcomes (such as achievement test scores). These data are then typically used to estimate impacts (mean treatment-control differences) on both sets of outcomes.

For these RCTs, there is also often interest in conducting analyses to link the impact estimates on the teacher practice and student outcomes (Baron and Kenny 1986; Gamse et al. 2008; Jackson et al. 2007; Holland 1988; MacKinnon and Dwyer 1993; Sobel 2008). These exploratory analyses are often conducted using regression methods to estimate the association between the two sets of outcomes. These mediator analyses aim to assess the extent to which the study’s conceptual model is supported by the data, and to identify pathways—specific dimensions of teacher practices represented by the mediators and their subscales—through which the intervention improves the classroom environment and student learning.

In RCTs in the education area, sample sizes are typically selected so that the study will have sufficient power for detecting impacts on student outcomes—and in particular, on student achievement test scores— that are deemed to be educationally meaningful and attainable (for example, 0.25 standard deviations). In assessing appropriate sample sizes, some RCTs also consider power levels for detecting impacts on teacher practice outcomes. Thus, there is a growing literature in the education field on methods to calculate statistical power for detecting impacts on student outcomes (Hedges and Hedberg 2007; Raudenbush 1997; Schochet 2008) and mediating outcomes (Raudenbush et al. 2008).

There is also a large literature on methods for calculating statistical power for regression coefficients under non-clustered designs (see Cohen 1977, 1988; Kramer and Thiemann 1987; MacCallum et al. 1996; and Rogers and Hopkins 1988). However, the literature has not addressed statistical power issues for regression-based mediator analyses for the types of large-scale clustered RCT designs that are typically used in education research. These methods are needed to assess whether typical RCT samples (for example, 60 schools and 180 classrooms) have sufficient power for detecting associations between teacher practice mediators and student outcomes that are likely to hold in practice. This issue is important, because it could influence decisions about the scope of data collection for teacher practice measures, which tends to be very costly, especially if classroom observations are conducted and videotapes and observation protocols are coded for scale construction. If power levels are low for mediator analyses— that is, if there is little chance that significant mediator-test score relationships can be found—the teacher practice mediators may have limited value for the study beyond a heuristic, qualitative linking of the mediating and student outcomes (and, hence, impacts).

This report is the first to systematically examine, both theoretically and empirically, the calculation of statistical power for regression-based mediator analyses for clustered RCTs in the education area. The focus is on the most commonly-used clustered design where schools are randomly assigned to a single treatment or control condition. The report develops formulas for calculating statistical power for mediator analyses using two regression approaches: (1) a simple ordinary least squares (OLS) approach where the student outcome is regressed on a single mediator and (2) an instrumental variables (IV) approach where treatment status is used as an instrument for the mediator. The formulas also incorporate the effects of measurement error in the mediator. Finally, the report uses the developed formulas to simulate the statistical power of mediator analyses that aim to associate teacher practice and student test score outcomes. This analysis attempts to answer the key question: How many study schools are required to ensure that RCTs of education interventions have enough statistical power for linking teacher practice and student achievement outcomes?

The rest of this report is in five chapters. Chapter 2 defines a "mediator" for the paper, and Chapter 3 discusses the theoretical framework for the analysis. Chapter 4 develops formulas for calculating statistical power using the OLS and IV regression frameworks, Chapter 5 presents the statistical power simulation results, and Chapter 6 presents a summary and conclusions.

Top