Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

1. Introduction

In randomized control trials (RCTs) of educational interventions, random assignment is often performed at the school or classroom level rather than at the student level. These group-based designs are common, because RCTs in the education field often test interventions that provide enhanced services to teachers (for example, training in a new reading or math curriculum or mentoring services) or that test interventions that affect the entire school (for example, a school-wide social and character development program or re-structuring initiative). Thus, for these types of interventions, it is infeasible to randomly assign the treatment directly to students.

Under these group-based designs, data are typically collected on students. Thus, using student-level data, the statistical procedures that are used to estimate average treatment effects (ATEs) and their standard errors must account for the potential correlation of the outcomes of students within the same groups. In particular, the standard errors of the ATE estimators must be inflated to account for design effects due to clustering.

Over the past 40 years, a huge statistical literature across multiple disciplines discusses the estimation of treatment effects under two-stage clustered designs (see, for example, Rao 1972, Harville 1977, Laird and Ware 1982, Hsiao 1986, Liang and Zeger 1986, Baltagi and Chang 1994, Murray 1998, Raudenbush and Bryk 2002, Wooldridge 2002, and De Leeuw and Meijer 2008). These models have a number of labels, including random effects models, random coefficient models, one-way models, variance components models, panel models, hierarchical linear models (HLM), and linear mixed models. A number of statistical packages have been developed to estimate these models using analysis of variance (ANOVA), maximum likelihood (ML), restricted ML (REML), generalized estimation equation (GEE), and other methods.

This paper contributes to this literature by discussing the estimation and interpretation of the ATE parameter under clustered RCTs using the non-parametric model of causal inference that underlies experimental designs. This model was introduced for non-clustered designs by Neyman (1923) and later developed in Rubin (1974, 1977) and Holland (1986). This article extends this theory to two-stage clustered RCTs, and develops regression equations that are consistent with this theory. The analysis focuses on continuous outcomes (such as test scores), and discusses relevant ATE parameters assuming that the outcome data are either (1) fixed for the study population (a finite-population model) or (2) random draws from population outcome distributions (the more common super-population model). Appropriate estimation methods and asymptotic moments are discussed for each model, and the methods are linked to the following commonly-used statistical packages: SAS, STATA, R, SUDAAN, and HLM. The paper considers both simple differences-in-means models and those that include baseline covariates.

Finally, ATEs and their standard errors are estimated using the considered methods using data from five recent large-scale clustered RCTs in the education area. The purpose of this analysis is to examine the robustness of study findings to alternative estimation approaches. This is important, because education researchers typically employ statistical packages and estimation routines with which they are most comfortable, and published articles in the evaluation literature rarely report impact results using alternative estimation schemes. Thus, this article can provide information to education researchers about the assumptions underlying commonly-used ATE estimation methods, how these methods work, and the sensitivity of impact findings to alternative estimation strategies. The goal is not to identify the best methods, but to discuss options and interpretation.

The rest of this paper is in six chapters. Chapter 2 discusses the Neyman causal inference model, and Chapters 3 and 4 discuss the estimation of the ATE parameter under the finite- and super-population models, respectively. Chapter 5 discusses methods for estimating variance components for the super-population model, and Chapter 6 presents findings from the empirical analysis. The final chapter presents a summary and conclusions.

Top