Skip Navigation
Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

Chapter 4: ATE Parameter Estimation for the Super-Population Model

This chapter examines ATE parameter estimation for the SP model with and without baseline covariates, where it is assumed that error variances are the same in the treatment and control conditions: σuT2uC2u2 and σeT2eC2e2. This assumption is commonly applied and greatly simplifies the presentation.

This chapter focuses on generalized least squares (GLS) methods that are typically used to provide consistent and efficient estimators for α1 in (4). However, the chapter starts with a discussion of the OLS approach (which produces consistent, but inefficient estimates) so the SP and FP estimators can be compared using a common approach. Methods for estimating variance components to obtain feasible GLS estimates are discussed in Chapter 5.

Super-Population Model Without Covariates

The SP model in (4) for students in school i can be expressed in vector notation as follows:

(14) yi01Ti +δi,

where Ω*i = E(δiδi') is an mixmi positive definite variance-covariance with diagonal terms σu2e2 and off-diagonal terms σu2. The estimation of this model using OLS and GLS methods is discussed next.

OLS Methods
Standard methods (see, for example, Schochet 2008) can be used to show that as n increases to infinity, the OLS estimator α̂1,SR = (yT -yC) is asymptotically normal with mean α1 and asymptotic variance that can be estimated as follows:

asymptotic variance

where σ̂u2 and σ̂e2 are estimators for σu2 and σe2, respectively. Note that this variance is minimized if p = 0.5 and mi= m for all schools (that is, for balanced designs).

The term in parentheses in (15) can be computed by summing the elements of Ω̂*i across schools and dividing by M2, where Ω̂*i is an estimator for Ω*i. Thus, (15) is comparable to the ST2 and SC2 terms in (6) for the FP model. Thus, an important difference between the SP and FP models is that unlike the SP model, the FP model contains Sτ2, which reduces variance. Thus, in theory, the variance may be somewhat smaller under the FP model, which is expected, because the SP model assumes external validity, with an associated loss in statistical precision. However, as noted, it is difficult to estimate Sτ2 for clustered designs; thus, precision gains for the FP model cannot typically be realized in practice.

GLS Methods
Consider a generic regression model where the covariate and variance matrices for school i are denoted by Xi and Ωi, respectively. The feasible GLS estimator of the parameter vector α is then:

asymptotic variance

where Ω̂i i is an estimator for Ωi.

In our case Xi = [K Ti], so (16) reduces to

asymptotic variance reduces to this equation

where yi is the mean outcome in school i and wi= [σ̂u2 +(σ̂e2/mi)]-1 is the associated school-levelweight. This is a weighted differences-in-means estimator, where the weights are inverses of the variances of school-level means.

The weights can also be expressed as wi = [ICC +{(1- ICC )/mi}]-1 where ICC =σ̂u2/(σ̂u2 +σ̂e2 is the estimated intraclass correlation coefficient. The first ICC term inside the brackets is common to all schools. Thus, the weights differ due to the second term. Schools with smaller variances (more sampled students) receive more weight in the analysis than schools with larger variances (fewer sampled students), because the larger schools provide more information on the super-population parameters μT and μC. As ICC approaches zero, the SP weights converge to the FP weights where schools are weighted by their sample sizes. Conversely, as ICC approaches one, the SP weights converge to the FP weights where schools are weighted equally. Under the SP approach, it may be reasonable to weight each school district by the size of their school population if random assignment is conducted within school districts.

It is well known that under weak regularity conditions, the feasible GLS estimator is asymptotically normal with mean α and variance EiX′iΩi-1Xi)-1/n (see, for example, Wooldridge 2002). This variance can be estimated as follows:

variance estimation

which in our case reduces to

variance estimation reduces to this

For known Ωi, the GLS estimator is the best linear unbiased estimator (BLUE) (although this may not hold if Ω̂i is replaced by Ωi). The ANOVA, ML, REML, and GEE approaches discussed in Chapter 5 yield feasible GLS estimators where estimators for σu2 and σe2 are inserted into (16) and (17).

For a given sample size, the variance in (18) is minimized when mi =m and p =0.5. Furthermore, if mi =m, the OLS and GLS estimators of α1 are identical and yield the following simple variance estimator:

simple variance estimator

Note that replacing m by m in (19) is a serviceable variance estimator for designs where sample sizes vary somewhat across schools, which can be seen by setting m =m in (18).

Super-Population Model With Covariates

Under the SP model with covariates, the covariates qij as well as the potential outcomes are considered to be random draws from joint super-population distributions. For the estimation model, the covariate matrix is now Xi =[K Ti Qi] and Ωi is now conditional on Qi. In principle, the covariates should beconsidered irrelevant variables because (14) is the true model. This procedure, however, considerably complicates the asymptotics for the GLS estimator, because Qi will tend to be correlated with the error term, and Ωi will differ from the true Ω*i.4

Consequently, the following analysis strays somewhat from the Neyman framework and assumes that the true model contains Qi. In this case, the GLS formulas in (16) and (17) also apply to the SP model with covariates.

Top

4 For the OLS estimator, the first problem can be overcome (as it was for the FP model) and the second problem does not occur. The asymptotic variance of the OLS estimator is similar in form to that for the FP model in (12) but does not include terms comparable to Sτ2 (not shown).