NCEE 2009-4065October 2009

## Chapter 5: Empirical Analysis - Identifying Plausible R2 Values

To obtain benchmark R2 values for the analysis, it is convenient to use estimates found in the literature on the proportion of the total variance in student gain scores that is due to classroom-level variation in gain scores—the ρ1 and ρ2 parameters from above (and the ICC parameters in Figure 3.1). As discussed, these ICCs are likely to provide an upper bound on the extent to which classroom-level mediators can explain the variation in student gain scores.

Chiang (2009) presents a host of ICCestimates from the literature and using new data sources. The estimates pertain to fall-spring test score gains on various math, reading, and language arts tests for elementary school students. Most studies were performed in low-income schools, but not all.

The ICCs in Chiang (2009) vary across studies, reflecting differences in study samples and achievement tests. The ICCs at the classroom level range from 0.02 to 0.15, and the ICCs at the school-level range from 0.05 to 0.20. Using mean values of ρ1 =0.05 and ρ2 =0.10 , it appears that overall, about 15 percent of the variance in student gain scores can be explained by differences in classroom effects within and between schools.

A measured mediator can be expected to capture only particular dimensions of teacher practices, and thus, to explain only a fraction of the 15 percent variation in classroom effects within and between schools (this fraction is denoted by RCE,M2, in Figure 3.1). For example, Jacob and Lefgren (2005) found that principal assessments of teachers explained only about 10 percent of the variation in classroom effects on reading and math. Similarly, Aaronson et al. (2007) found that a host of teacher characteristics—including age, gender, race, educational background, tenure, and total experience—together only explained about 20 percent of the variation in classroom effects. Thus, it is likely that even a strong predictor of classroom effects could explain only a portion of this variation. Furthermore, mediator subscales, that can help determine which practices matter, may explain even less.

Based on this literature, the power calculations were conducted assuming that the mediator explains 10 percent of the 15 percent variation in classroom effects (that is, RCE,M2 =.10 in Figure 3.1). This implies a benchmark R2 value of 1.5 percent for the mediator effect γ1 (which can be obtained using the relation Ry,M2, =ICC* RCE,M2, =.15*.10 in Figure 3.1). The calculations were also conducted using a more optimistic R2 value of 3 percent (RCE,M2 = .20 ), and a less optimistic R2 value of .75 percent(RCE,M2 = .05 ). Similarly, using values of ρ1 =0.05 and ρ2 = 0.10 , the power calculations assumed target R2 values of 0.005, 0.01, and 0.0025 for the analysis of mediator effects within schools (γ1W), and 0.01, 0.02, and 0.005 for the analysis of mediator effects between schools (γ1B).

Finally, viewing these target R2 values as squared correlations suggests also that they are nontrivial. For instance, the assumption that the mediator can explain 10 percent of the variance in estimated classroom effects implies a correlation of 0.32 between these two measures. Similarly, the assumption that the mediator can explain 20 percent of the variance in estimated classroom effects implies a correlation of 0.45, which is larger than those that are typically found in practice (Perez-Johnson et al. 2009).

Top