Skip Navigation
Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

Chapter 2. The Neyman Causal Inference Model

This chapter discusses the Neyman finite-population (FP) and super-population (SP) causal inference models under two-stage clustered designs—the most common designs used in education RCTs. The focus is on continuous outcomes. The theory is then used to derive regression equations for estimating the ATE parameters.

The Neyman Finite-Population Model for Two-Stage Clustered Designs

Consider an experimental design where n schools (or classrooms) are randomly assigned to either a single treatment or control condition. The sample contains np treatment and n(1- p) control group schools where p is the sampling rate to the treatment group (0 < p <1) . It is assumed that the sample contains mi students from school i and that there are M = Σni =1mi total students in the sample. It is assumed that student outcomes are not affected by the treatment status of other students.

It is assumed for now that the n schools and M students define the population universe—the FP model considered by Neyman for non-clustered designs. Let YTij be the "potential" outcome for student j in school i in the treatment condition and YCij be the potential outcome for the student in the control condition. The difference between the two fixed potential outcomes,(YTij -YCij) , is the student-level treatment effect, and the ATE parameter, β1, is the average treatment effect over all students:

average treatment effect over all students

This ATE parameter cannot be calculated directly because potential outcomes for each student cannot be observed in both the treatment and control conditions. Formally, if Ti is a treatment status indicator variable that equals 1 for treatment schools and 0 for control schools, then the observed outcome for a student, yij, can be expressed as follows:

(2) yij =TiYTij +(1 -Ti)YCij.

Importantly, the potential outcomes in (2) are fixed and the only source of randomness is Ti. Thus, under the Neyman model, the ATE parameter pertains only to those students and schools at the time the study was conducted. Stated differently, the impact findings have internal validity but do not necessarily generalize beyond the study sample. This approach can be justified on the grounds that schools are usually purposively selected for education RCTs, and thus, may be a self-selected sample of schools that are willing to participate, and that are deemed to be suitable for the study based on their student and teacher populations and typical service offerings. Similarly, students in the study sample may not be representative of all students in the study schools, because they could be a potentially nonrandom subset of students whose parents consented to participate in the study, who provided follow-up data, and who did not leave the study schools between baseline and followup.1

Under this fixed population scenario, researchers are to be agnostic about whether the study results have external validity. Policymakers and other users of the study results can decide whether the impact evidence is sufficient to adopt the intervention on a broader scale, perhaps by examining the similarity of the observable characteristics of schools and students in the study samples to their own contexts, and using results from subgroup and implementation analyses.

Following the approach for non-clustered designs used by Freedman (2008) and Schochet (2009), a regression model for (2) can be constructed by re-writing (2) as follows:

(3) yij01(Ti -p) +ηij, where

  • β0 =pYT +(1 -p)YC and β =YT -YC are parameters to be estimated
  • ηijij + τji(Ti -p) is an "error" term, where αij = p(YTij-YT) +(1-p)(YCij -YC) and τij = (YTij - YT) -(YCij -YC).2

The error term ηij is a function of two terms: (1) αij, the expected observed outcome for the student relative to the expected mean observed outcome; and (2) τij, the student-level treatment effect relative to the ATE. Note that αij and τij sum to zero over all students. This model is non-parametric because it does not depend on the distributions of the potential outcomes.

The model in (3) does not satisfy key assumptions of the usual random effects model, because ηij does not have mean zero (over all possible treatment assignment configurations), and, to the extent that τij varies across students, ηij is heteroscedastic, Covijηi′j′) is not constant for students in the same schools, Covijηij) is nonzero for students in different schools (for i≠′i, j≠′j), and ηij is correlated with the i-regressor (Ti -p):

Eij) =αij, Varij) =τij2p(1 -p), Covijηij′) =τijτij′p(1 -p),
Covijηi′j′) =τijτi′j′p(1 -p)/(n -1), E[(Ti -piij′] =τijp(1 -p).

Note that in this model, the error terms for students within the same schools are correlated only because they have the same treatment status, not because they face similar environments.

Importantly, the model in (3) should not be confused with a fixed effects model, where cluster effects are treated as fixed, and cluster-level dummy variables are included in the model. Rather, the model treats cluster-level effects as random due to the randomness of treatment status in the model error term.

Finally, (3) implicitly assumes that schools are weighted by their student sample sizes. An alternative specification is to weight schools equally. In this case, the ATE parameter is β1 =YT -YC, where YT= (1/n) Σi=1n Σj=1mi and YC(1 /ni=1n Σj=1mi(YCij /mi are averages of school-level means. This ATE parameter pertains to the average school effect in the sample rather than to the average student effect. This weighting scheme will result in different impact estimates than the unweighted analysis if student sample sizes vary across schools and impacts vary by school sample size.

The Super-Population Model for Two-Stage Clustered Designs

We now consider a SP version of the Neyman causal inference model where the study schools and students are assumed to be random samples from broader populations (see Imbens and Rubin 2007 and Schochet 2008, 2009). This framework is typically used to estimate impacts under clustered RCTs in the education area, and is consistent with popular linear mixed model approaches, such as HLM.

Under this framework, students are nested within schools. Let ZTi be the potential outcome (mean posttest score) for school i in the treatment condition and ZCi be the potential outcome for school i in the control condition. Potential outcomes for the n study schools are assumed to be random draws from potential treatment and control outcome distributions in the study super-population. It is assumed that means and variances of these distributions are finite and denoted by μτ and σμτ2 for potential treatment outcomes and μC and σuC2for potential control outcomes. These two outcome distributions also define the distribution of school-level treatment effects in the super-population, which are assumed to have mean μτ2 and variance στ2.

Suppose next that mi students are sampled from the student super-population in study school i. The potential student-level outcomes YTij and YCij are now assumed to be random draws from student-level potential outcome distributions (which are conditional on school-level potential outcomes) with respective means ZTi and ZCi and respective variances σeT2 >0 and σeC2.

Under the SP model, the ATE parameter is μτ =E(ZTi -ZCi) = μTC.Thus, the impact findings are now assumed to generalize to the super-population of schools that are "similar" to the study schools. How should one interpret this super-population? Does it pertain to the study schools over the "long term" for a broader universe of students and school staff that change over time? Does it pertain to a broader set of schools in the study districts? To similar schools nationwide? The answers to these questions will likely depend on the context (and may not exist), but researchers should be aware that the usual approach for estimating treatment effects in education research makes the implicit assumption of external validity to a school universe that is likely to be vaguely defined. Nonetheless, this approach can be justified on the grounds that policymakers may generalize the findings anyway, especially if the study provides a primary basis for deciding whether to implement the tested interventions more broadly. Furthermore, this approach is more consistent with the Bayesian view that assessing intervention effects is a dynamic process that takes place in a context of continuously increasing knowledge.

As before, we can use (2) to express observed student outcomes in terms of potential outcomes, and can rearrange terms to yield the following regression model:

(4) yij01Ti +(ui +eij), where
α0C and α1TC (the ATE parameter) are coefficients to be estimated
ui =Ti(ZTiT) +(1 -Ti)(ZCiC)is a school-level error term where E(ui) =0,
E(Tiui) =0, Var(ui|Ti =1) =σuT2, and Var(ui|Ti =0) =σuC2
eij =Ti(YTij -ZTi) +(1 -Ti)(Ycij -ZCi) is a student-level error team where E(eij) =0,
E(Tieij) =E(uieij) =0, Var(eij|Ti =1) =σeC2.

Furthermore, if we define δij =ui +eij as the total error term:

Varij|Ti =1) =σuT2eT2, Varij|Ti =0) =σuC2eC2 , Coviji′j′) =0, Covij′|Ti =1),σuT2, Covij′|Ti =0),σuC2

Thus, this model is the usual random effects model with an exchangeable block diagonal variance-covariance matrix for the error vector except that variances and covariances are allowed to differ for treatments and controls.

Finally, note that (4) can also be derived using the following two-level HLM model (Bryk and Raudenbush, 1992):

Level 1:yij =zi +eij
Level 2: zi01Ti +ui

where zi =TiZTi +(1-Ti)ZCi is the observed school-level outcome, Level 1 corresponds to students, and Level 2 to units. Inserting the Level 2 equation into the Level 1 equation yields (4). Thus, the HLM approach is consistent with the SP causal inference theory.

Top

1 For cost reasons, in education RCTs, follow-up data are not usually collected for students in the baseline sample who leave the study districts.
2 In (3), the term (Ti -p) is used rather than Ti because it simplifies the mathematical proofs presented later in this paper, but this centering has no effect on the findings.