Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions - Chapter 3: ATE Parameter Estimation for the Finite-Population Model

Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

Chapter 1. Introduction
Chapter 2. The Neyman Causal Inference Model
Chapter 3: ATE Parameter Estimation for the Finite-Population Model
Chapter 4: ATE Parameter Estimation for the Super-Population Model
Chapter 5: Variance Component Estimation for the Super-Population Model
Chapter 6: Empirical Analysis
Chapter 7: Summary and Conclusions
References
Tables
Appendix A: Proofs
Appendix B: Summary of Data Sources
PDF & Related Info

Chapter 3: ATE Parameter Estimation for the Finite-Population Model

This chapter discusses ATE parameter and variance estimation for the FP model with and without baseline covariates. Mathematical proofs of asymptotic results are provided in the appendix. It is assumed for the remainder of this article that sample sizes of clusters are large enough so that asymptotic results are approximately valid (see Bingenheimer and Raudenbush, 2004 for a discussion of this issue).

Finite-Population Model Without Covariates

Ordinary least squares (OLS) methods are appropriate for estimating β₁ in (3), because the ATE parameter for the FP model pertains to the study sample only. The following lemma provides the asymptotic moments of the OLS estimator.

Lemma 1. The simple OLS estimator for β₁ under the FP model in (3) is β̂_{1, SR} =(y_T -y_C) , where y_T and y_C are (unweighted) sample means for the treatment and control groups, respectively. As n increases to infinity for an increasing sequence of finite populations, β̂_{1, SR} is asymptotically unbiased. Furthermore, assume that:

asymptotically unbiased

where m, S_T², S_C², and S_τ² are fixed, nonnegative, real numbers. Then, β̂_{1, SR} is asymptotically normal with variance:

asymptotically normal with variance

The S_T² and S_C² terms pertain to the extent to which potential outcomes vary and co-vary across students within the same schools. The S_τ² term pertains to the extent to which treatment effects vary and co-vary across students within schools. Note that if student-level treatment effects are constant, S_τ² =0 and S_T² = S_C².

With heterogeneous treatment effects, it is difficult to find a consistent estimator for S_τ², because this requires unobserved information on student-level treatment effects. However, because S_τ² ≥ 0 , ignoring this term will provide conservative variance estimators. Following this approach, a consistent estimator for the first two terms on the right-hand side in (6) can be obtained using the population averaged generalized estimating equation (GEE) approach developed by Liang and Zeger (1986) for clustered data (see also Hardin and Hilbe 2003).

To describe this method for general applications, it is assumed that x_ij is a row vector of model baseline covariates (including the intercept and T_i -p), y_i is an m_ix1 column vector of student outcomes, and V_i is the assumed ("working") m_ixm_i covariance structure for y_i. The GEE method for estimating the vector of regression parameters β solves the following equation for the score function S(β):

score function

where μ_i(α) is the expected value of y_i that is linked to a linear combination of the covariates through monotonic differentiable link function g where g(μ_ij)= x_ijβ and μ_ij =g^-1(x_ijβ).

Equation (7) can be solved iteratively using a Taylor series expansion of S(β̂) around S(β). Under this approach, the estimated parameter vector β̂^(iter+1) at iteration (iter +1) can be updated from β̂^(iter) as follows:

estimated parameter vector

is the information matrix. The matrix I₀ is sometimes replaced by J₀ =∂S(β) /∂β (Binder 1983).

The model-based variance estimator of the solution β̂ is I₀^-1. The empirical or robust "sandwich" variance estimator uses the data to correct for the potential misspecification of V_i and equals I₀^-1I₁I₀^-1 where

variance estimator

and r_i =(y_i -μ_i(β̂)) is an m_ix1 vector of regression residuals.

In our application, we assume (1) an independent working correlation structure (that is, V_i is the identity matrix), (2) an identity link function (μ_ij =β₀ +β₁(T_i -p)), and (3) the empirical sandwich variance estimator. The ATE estimator for this linear model is then β̂^1,GEE= (y_T -y_C) with the following asymptotic variance estimator:

asymptotic variance estimator

where asymptotic variance estimator d equals This variance estimator is based on the sums of products and cross-products of OLS residuals for students within the same schools. Table 3.1 displays statistical package routines that use this method.

If schools are to be weighted equally under unbalanced designs, the GEE method can be applied by first pre-multiplying the outcome and explanatory variables (including the intercept) by the weights √w_ij where w_ij∝1/m_i (Pfeffermann et al. 1998). Under this approach, it may be reasonable to also weight each school district equally if random assignment is conducted within school districts.

Importantly, as discussed in Murray (1998), the GEE method should be used only if the number of clusters in each research condition is at least 20. For smaller sample sizes, simulations demonstrate that the Type I error rate may not be close to the nominal level.

Finally, for the equal-school weighting scheme, model-free permutation (randomization) tests can also be used to test the strong null hypothesis that all student-level treatment effects are zero (Gail et al., 1996). Under this approach, observed school means are used to construct the distribution of all possible treatment effects under the null hypothesis of no impacts. This is done by (1) allocating schools to all possible combinations of np "pseudo-treatment" schools and n(1- p) "pseudo-control" schools, (2) estimating a treatment effect β̂₁ for each of the [n!/ np!n(1 -p)!] allocations, (3) sorting these treatment effects from smallest to largest, (4) observing where in the distribution the treatment effect for the actual treatment-control allocation lies, and (5) rejecting the null hypothesis if the actual β̂₁ lies outside the α/2 or 1-(α/2)quantiles of the permutation distribution (which will have mean 0).³ The validity of this method does not rely on a model, but only on correct randomization.

Gail et al. (1996) demonstrate through simulations that Type I error rates of these tests are near nominal levels if n is moderate, p is near 0.5, and variances of the outcomes do not differ substantially across the treatment and control conditions. These conditions are likely to hold in practice. Furthermore, Gail et al. (1996) demonstrate that the procedure performs better using school-level residuals from regression models that include baseline covariates (see below).

Finite-Population Model with Covariates

We now examine ATE estimators when the FP models include fixed covariates, q_ij, pertaining to the pre-randomization period. The covariates are not indexed by T or C because their values are independent of treatment status due to randomization. The covariates could include both school-level covariates and student-level covariates that are centered at school-level means. All covariates are assumed to be centered at grand means.

In the Neyman model, the covariates are irrelevant variables because (3) is the true model. Thus, the ATE parameters considered above without covariates pertain also to the models with covariates.

To examine asymptotic moments of the OLS estimator under the FP model with fixed covariates, we assume in addition to (5) that as n approaches infinity:

OLS estimator

where f_ij is the student’s predicted value from a full-sample OLS regression of α_ij on q_ij; h_ij is the predicted value from a full-sample OLS regression of τ_ij on q_ij, and S_αf², S_ff², and S_hf² are fixed, nonnegative real numbers. The following lemma generalizes results in Schochet (2009) and Freedman (2008) to two-stage clustered designs. The proof is provided in the appendix.

Lemma 2. Let β̂_1,MR be the multiple regression estimator for β₁ under the model in (3) and assume (5) and (11). Then, β̂_1,MR is asymptotically normal with mean β₁ and variance:

multiple regression estimator

The first bracketed term in (12) is the variance of the OLS estimator under the FP model without covariates. The (2S_αf² -S_ff²) term is a generalized version of the usual explained sum of squares from a multiple OLS regression, and will typically generate precision gains if the covariates are correlated with potential outcomes. The 2(1-2p)S_hf² term pertains to regression-adjusted covariances between α_jj and τ_ij for students within the same school. This term will be zero if p=0.5 or if the covariances between potential outcomes are similar in the treatment and control conditions (which would occur, for example, with constant treatment effects); otherwise this term could have any sign.

A variance estimator for (12) can be obtained using the GEE approach discussed above. Let X_i =(K T^∼_i Q_i, where K is an m_ix1 column of 1s for the intercept, T^∼_i is an m_ix1 vector containing the T_i -p terms, and Q_i is a matrix of covariates for school i. In this case, a variance estimator is:

variance estimator

where the residuals r_i are calculated from a full-sample OLS regression of y_i on X_i. The permutation tests discussed above could also be used for significance testing using the school-level residuals r_i (for the equal-school weighting scheme).

Top

³ 3For moderate n (say, n>30), the number of possible allocations becomes very large. In these cases, the permutation distribution can be estimated from a large random sample of reallocations of school means to the pseudo-treatment and control groups.