Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions - Chapter 4: ATE Parameter Estimation for the Super-Population Model

Technical Methods Report: The Estimation of Average Treatment Effects for Clustered RCTs of Education Interventions

NCEE 2009-0061
August 2009

Chapter 1. Introduction
Chapter 2. The Neyman Causal Inference Model
Chapter 3: ATE Parameter Estimation for the Finite-Population Model
Chapter 4: ATE Parameter Estimation for the Super-Population Model
Chapter 5: Variance Component Estimation for the Super-Population Model
Chapter 6: Empirical Analysis
Chapter 7: Summary and Conclusions
References
Tables
Appendix A: Proofs
Appendix B: Summary of Data Sources
PDF & Related Info

Chapter 4: ATE Parameter Estimation for the Super-Population Model

This chapter examines ATE parameter estimation for the SP model with and without baseline covariates, where it is assumed that error variances are the same in the treatment and control conditions: σ_uT² =σ_uC² =σ_u² and σ_eT² =σ_eC² =σ_e². This assumption is commonly applied and greatly simplifies the presentation.

This chapter focuses on generalized least squares (GLS) methods that are typically used to provide consistent and efficient estimators for α₁ in (4). However, the chapter starts with a discussion of the OLS approach (which produces consistent, but inefficient estimates) so the SP and FP estimators can be compared using a common approach. Methods for estimating variance components to obtain feasible GLS estimates are discussed in Chapter 5.

Super-Population Model Without Covariates

The SP model in (4) for students in school i can be expressed in vector notation as follows:

(14) y_i =α₀ +α₁T_i +δ_i,

where Ω^*_i = E(δ_iδ_i^') is an m_ixm_i positive definite variance-covariance with diagonal terms σ_u² +σ_e² and off-diagonal terms σ_u². The estimation of this model using OLS and GLS methods is discussed next.

OLS Methods
Standard methods (see, for example, Schochet 2008) can be used to show that as n increases to infinity, the OLS estimator α̂_1,SR = (y_T -y_C) is asymptotically normal with mean α₁ and asymptotic variance that can be estimated as follows:

asymptotic variance

where σ̂_u² and σ̂_e² are estimators for σ_u² and σ_e², respectively. Note that this variance is minimized if p = 0.5 and m_i= m for all schools (that is, for balanced designs).

The term in parentheses in (15) can be computed by summing the elements of Ω̂^*_i across schools and dividing by M², where Ω̂^*_i is an estimator for Ω^*_i. Thus, (15) is comparable to the S_T² and S_C² terms in (6) for the FP model. Thus, an important difference between the SP and FP models is that unlike the SP model, the FP model contains S_τ², which reduces variance. Thus, in theory, the variance may be somewhat smaller under the FP model, which is expected, because the SP model assumes external validity, with an associated loss in statistical precision. However, as noted, it is difficult to estimate S_τ² for clustered designs; thus, precision gains for the FP model cannot typically be realized in practice.

GLS Methods
Consider a generic regression model where the covariate and variance matrices for school i are denoted by X_i and Ω_i, respectively. The feasible GLS estimator of the parameter vector α is then:

asymptotic variance

where Ω̂_i i is an estimator for Ω_i.

In our case X_i = [K T_i], so (16) reduces to

asymptotic variance reduces to this equation

where y_i is the mean outcome in school i and w_i= [σ̂_u² +(σ̂_e²/m_i)]^-1 is the associated school-levelweight. This is a weighted differences-in-means estimator, where the weights are inverses of the variances of school-level means.

The weights can also be expressed as w_i = [ICC +{(1- ICC )/m_i}]^-1 where ICC =σ̂_u²/(σ̂_u² +σ̂_e² is the estimated intraclass correlation coefficient. The first ICC term inside the brackets is common to all schools. Thus, the weights differ due to the second term. Schools with smaller variances (more sampled students) receive more weight in the analysis than schools with larger variances (fewer sampled students), because the larger schools provide more information on the super-population parameters μ^T and μ^C. As ICC approaches zero, the SP weights converge to the FP weights where schools are weighted by their sample sizes. Conversely, as ICC approaches one, the SP weights converge to the FP weights where schools are weighted equally. Under the SP approach, it may be reasonable to weight each school district by the size of their school population if random assignment is conducted within school districts.

It is well known that under weak regularity conditions, the feasible GLS estimator is asymptotically normal with mean α and variance E(Σ_iX′_iΩ_i^-1X_i)^-1/n (see, for example, Wooldridge 2002). This variance can be estimated as follows:

variance estimation

which in our case reduces to

variance estimation reduces to this

For known Ω_i, the GLS estimator is the best linear unbiased estimator (BLUE) (although this may not hold if Ω̂_i is replaced by Ω_i). The ANOVA, ML, REML, and GEE approaches discussed in Chapter 5 yield feasible GLS estimators where estimators for σ_u² and σ_e² are inserted into (16) and (17).

For a given sample size, the variance in (18) is minimized when m_i =m and p =0.5. Furthermore, if m_i =m, the OLS and GLS estimators of α₁ are identical and yield the following simple variance estimator:

simple variance estimator

Note that replacing m by m in (19) is a serviceable variance estimator for designs where sample sizes vary somewhat across schools, which can be seen by setting m =m in (18).

Super-Population Model With Covariates

Under the SP model with covariates, the covariates q_ij as well as the potential outcomes are considered to be random draws from joint super-population distributions. For the estimation model, the covariate matrix is now X_i =[K T_i Q_i] and Ω_i is now conditional on Q_i. In principle, the covariates should beconsidered irrelevant variables because (14) is the true model. This procedure, however, considerably complicates the asymptotics for the GLS estimator, because Q_i will tend to be correlated with the error term, and Ω_i will differ from the true Ω^*_i.⁴

Consequently, the following analysis strays somewhat from the Neyman framework and assumes that the true model contains Q_i. In this case, the GLS formulas in (16) and (17) also apply to the SP model with covariates.

Top

⁴ For the OLS estimator, the first problem can be overcome (as it was for the FP model) and the second problem does not occur. The asymptotic variance of the OLS estimator is similar in form to that for the FP model in (12) but does not include terms comparable to S_τ² (not shown).