Skip Navigation
Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

Chapter 3: ATE Parameter Estimation for the Finite-Population Model

This chapter discusses ATE parameter and variance estimation for the FP model with and without baseline covariates. Mathematical proofs of asymptotic results are provided in the appendix. It is assumed for the remainder of this article that sample sizes of clusters are large enough so that asymptotic results are approximately valid (see Bingenheimer and Raudenbush, 2004 for a discussion of this issue).

Finite-Population Model Without Covariates

Ordinary least squares (OLS) methods are appropriate for estimating β1 in (3), because the ATE parameter for the FP model pertains to the study sample only. The following lemma provides the asymptotic moments of the OLS estimator.

Lemma 1. The simple OLS estimator for β1 under the FP model in (3) is β̂1, SR =(yT -yC) , where yT and yC are (unweighted) sample means for the treatment and control groups, respectively. As n increases to infinity for an increasing sequence of finite populations, β̂1, SR is asymptotically unbiased. Furthermore, assume that:

asymptotically unbiased

where m, ST2, SC2, and Sτ2 are fixed, nonnegative, real numbers. Then, β̂1, SR is asymptotically normal with variance:

asymptotically normal with variance

The ST2 and SC2 terms pertain to the extent to which potential outcomes vary and co-vary across students within the same schools. The Sτ2 term pertains to the extent to which treatment effects vary and co-vary across students within schools. Note that if student-level treatment effects are constant, Sτ2 =0 and ST2 = SC2.

With heterogeneous treatment effects, it is difficult to find a consistent estimator for Sτ2, because this requires unobserved information on student-level treatment effects. However, because Sτ2 ≥ 0 , ignoring this term will provide conservative variance estimators. Following this approach, a consistent estimator for the first two terms on the right-hand side in (6) can be obtained using the population averaged generalized estimating equation (GEE) approach developed by Liang and Zeger (1986) for clustered data (see also Hardin and Hilbe 2003).

To describe this method for general applications, it is assumed that xij is a row vector of model baseline covariates (including the intercept and Ti -p), yi is an mix1 column vector of student outcomes, and Vi is the assumed ("working") mixmi covariance structure for yi. The GEE method for estimating the vector of regression parameters β solves the following equation for the score function S(β):

score function

where μi(α) is the expected value of yi that is linked to a linear combination of the covariates through monotonic differentiable link function g where gij)= xijβ and μij =g-1(xijβ).

Equation (7) can be solved iteratively using a Taylor series expansion of S(β̂) around S(β). Under this approach, the estimated parameter vector β̂(iter+1) at iteration (iter +1) can be updated from β̂(iter) as follows:

estimated parameter vector

is the information matrix. The matrix I0 is sometimes replaced by J0 =∂S(β) /∂β (Binder 1983).

The model-based variance estimator of the solution β̂ is I0-1. The empirical or robust "sandwich" variance estimator uses the data to correct for the potential misspecification of Vi and equals I0-1I1I0-1 where

variance estimator

and ri =(yii(β̂)) is an mix1 vector of regression residuals.

In our application, we assume (1) an independent working correlation structure (that is, Vi is the identity matrix), (2) an identity link function (μij01(Ti -p)), and (3) the empirical sandwich variance estimator. The ATE estimator for this linear model is then β̂1,GEE= (yT -yC) with the following asymptotic variance estimator:

asymptotic variance estimator

where asymptotic variance estimator d equals This variance estimator is based on the sums of products and cross-products of OLS residuals for students within the same schools. Table 3.1 displays statistical package routines that use this method.

If schools are to be weighted equally under unbalanced designs, the GEE method can be applied by first pre-multiplying the outcome and explanatory variables (including the intercept) by the weights √wij where wij∝1/mi (Pfeffermann et al. 1998). Under this approach, it may be reasonable to also weight each school district equally if random assignment is conducted within school districts.

Importantly, as discussed in Murray (1998), the GEE method should be used only if the number of clusters in each research condition is at least 20. For smaller sample sizes, simulations demonstrate that the Type I error rate may not be close to the nominal level.

Finally, for the equal-school weighting scheme, model-free permutation (randomization) tests can also be used to test the strong null hypothesis that all student-level treatment effects are zero (Gail et al., 1996). Under this approach, observed school means are used to construct the distribution of all possible treatment effects under the null hypothesis of no impacts. This is done by (1) allocating schools to all possible combinations of np "pseudo-treatment" schools and n(1- p) "pseudo-control" schools, (2) estimating a treatment effect β̂1 for each of the [n!/ np!n(1 -p)!] allocations, (3) sorting these treatment effects from smallest to largest, (4) observing where in the distribution the treatment effect for the actual treatment-control allocation lies, and (5) rejecting the null hypothesis if the actual β̂1 lies outside the α/2 or 1-(α/2)quantiles of the permutation distribution (which will have mean 0).3 The validity of this method does not rely on a model, but only on correct randomization.

Gail et al. (1996) demonstrate through simulations that Type I error rates of these tests are near nominal levels if n is moderate, p is near 0.5, and variances of the outcomes do not differ substantially across the treatment and control conditions. These conditions are likely to hold in practice. Furthermore, Gail et al. (1996) demonstrate that the procedure performs better using school-level residuals from regression models that include baseline covariates (see below).

Finite-Population Model with Covariates

We now examine ATE estimators when the FP models include fixed covariates, qij, pertaining to the pre-randomization period. The covariates are not indexed by T or C because their values are independent of treatment status due to randomization. The covariates could include both school-level covariates and student-level covariates that are centered at school-level means. All covariates are assumed to be centered at grand means.

In the Neyman model, the covariates are irrelevant variables because (3) is the true model. Thus, the ATE parameters considered above without covariates pertain also to the models with covariates.

To examine asymptotic moments of the OLS estimator under the FP model with fixed covariates, we assume in addition to (5) that as n approaches infinity:

OLS estimator

where fij is the student’s predicted value from a full-sample OLS regression of αij on qij; hij is the predicted value from a full-sample OLS regression of τij on qij, and Sαf2, Sff2, and Shf2 are fixed, nonnegative real numbers. The following lemma generalizes results in Schochet (2009) and Freedman (2008) to two-stage clustered designs. The proof is provided in the appendix.

Lemma 2. Let β̂1,MR be the multiple regression estimator for β1 under the model in (3) and assume (5) and (11). Then, β̂1,MR is asymptotically normal with mean β1 and variance:

multiple regression estimator

The first bracketed term in (12) is the variance of the OLS estimator under the FP model without covariates. The (2Sαf2 -Sff2) term is a generalized version of the usual explained sum of squares from a multiple OLS regression, and will typically generate precision gains if the covariates are correlated with potential outcomes. The 2(1-2p)Shf2 term pertains to regression-adjusted covariances between αjj and τij for students within the same school. This term will be zero if p=0.5 or if the covariances between potential outcomes are similar in the treatment and control conditions (which would occur, for example, with constant treatment effects); otherwise this term could have any sign.

A variance estimator for (12) can be obtained using the GEE approach discussed above. Let Xi =(K Ti Qi, where K is an mix1 column of 1s for the intercept, Ti is an mix1 vector containing the Ti -p terms, and Qi is a matrix of covariates for school i. In this case, a variance estimator is:

variance estimator

where the residuals ri are calculated from a full-sample OLS regression of yi on Xi. The permutation tests discussed above could also be used for significance testing using the school-level residuals ri (for the equal-school weighting scheme).

Top

3 3For moderate n (say, n>30), the number of possible allocations becomes very large. In these cases, the permutation distribution can be estimated from a large random sample of reallocations of school means to the pseudo-treatment and control groups.