This chapter examines ATE parameter estimation for the SP model with and without baseline covariates, where it is assumed that error variances are the same in the treatment and control conditions: σuT2 =σuC2 =σu2 and σeT2 =σeC2 =σe2. This assumption is commonly applied and greatly simplifies the presentation.
This chapter focuses on generalized least squares (GLS) methods that are typically used to provide consistent and efficient estimators for α1 in (4). However, the chapter starts with a discussion of the OLS approach (which produces consistent, but inefficient estimates) so the SP and FP estimators can be compared using a common approach. Methods for estimating variance components to obtain feasible GLS estimates are discussed in Chapter 5.
Super-Population Model Without Covariates
The SP model in (4) for students in school i can be expressed in vector
notation as follows:
(14) yi =α0 +α1Ti +δi,
where Ω*i = E(δiδi') is an mixmi positive definite variance-covariance with diagonal terms σu2 +σe2 and off-diagonal terms σu2. The estimation of this model using OLS and GLS methods is discussed next.
OLS Methods
Standard methods (see, for example, Schochet 2008)
can be used to show that as n increases to infinity, the OLS estimator
α̂1,SR = (yT
-yC) is asymptotically
normal with mean α1 and asymptotic variance that can be estimated
as follows:
where σ̂u2 and σ̂e2 are estimators for σu2 and σe2, respectively. Note that this variance is minimized if p = 0.5 and mi= m for all schools (that is, for balanced designs).
The term in parentheses in (15) can be computed by summing the elements of Ω̂*i across schools and dividing by M2, where Ω̂*i is an estimator for Ω*i. Thus, (15) is comparable to the ST2 and SC2 terms in (6) for the FP model. Thus, an important difference between the SP and FP models is that unlike the SP model, the FP model contains Sτ2, which reduces variance. Thus, in theory, the variance may be somewhat smaller under the FP model, which is expected, because the SP model assumes external validity, with an associated loss in statistical precision. However, as noted, it is difficult to estimate Sτ2 for clustered designs; thus, precision gains for the FP model cannot typically be realized in practice.
GLS Methods
Consider a generic regression model where the covariate and variance matrices for
school i are denoted by Xi and Ωi,
respectively. The feasible GLS estimator of the parameter vector α
is then:
where Ω̂i i is an estimator for Ωi.
In our case Xi = [K Ti], so (16) reduces to
where yi is the mean outcome in school i and wi= [σ̂u2 +(σ̂e2/mi)]-1 is the associated school-levelweight. This is a weighted differences-in-means estimator, where the weights are inverses of the variances of school-level means.
The weights can also be expressed as wi = [ICC +{(1- ICC )/mi}]-1 where ICC =σ̂u2/(σ̂u2 +σ̂e2 is the estimated intraclass correlation coefficient. The first ICC term inside the brackets is common to all schools. Thus, the weights differ due to the second term. Schools with smaller variances (more sampled students) receive more weight in the analysis than schools with larger variances (fewer sampled students), because the larger schools provide more information on the super-population parameters μT and μC. As ICC approaches zero, the SP weights converge to the FP weights where schools are weighted by their sample sizes. Conversely, as ICC approaches one, the SP weights converge to the FP weights where schools are weighted equally. Under the SP approach, it may be reasonable to weight each school district by the size of their school population if random assignment is conducted within school districts.
It is well known that under weak regularity conditions, the feasible GLS estimator is asymptotically normal with mean α and variance E(ΣiX′iΩi-1Xi)-1/n (see, for example, Wooldridge 2002). This variance can be estimated as follows:
which in our case reduces to
For known Ωi, the GLS estimator is the best linear unbiased estimator (BLUE) (although this may not hold if Ω̂i is replaced by Ωi). The ANOVA, ML, REML, and GEE approaches discussed in Chapter 5 yield feasible GLS estimators where estimators for σu2 and σe2 are inserted into (16) and (17).
For a given sample size, the variance in (18) is minimized when mi =m and p =0.5. Furthermore, if mi =m, the OLS and GLS estimators of α1 are identical and yield the following simple variance estimator:
Note that replacing m by m in (19) is a serviceable variance estimator for designs where sample sizes vary somewhat across schools, which can be seen by setting m =m in (18).
Super-Population Model With Covariates
Under the SP model with covariates, the covariates qij
as well as the potential outcomes are considered to be random draws from joint super-population
distributions. For the estimation model, the covariate matrix is now Xi
=[K Ti Qi] and Ωi
is now conditional on Qi. In principle, the
covariates should beconsidered irrelevant variables because (14) is the true model.
This procedure, however, considerably complicates the asymptotics for the GLS estimator,
because Qi will tend to be correlated with the error
term, and Ωi will differ from the true Ω*i.4
Consequently, the following analysis strays somewhat from the Neyman framework and assumes that the true model contains Qi. In this case, the GLS formulas in (16) and (17) also apply to the SP model with covariates.