In this chapter, we use the models in (3) and (4) to discuss ITT estimators in nominal units, because they form the foundation for the CACE and standardized estimators. We focus on commonly used differencesin- means and analysis of covariance (ANCOVA) estimators, which are used for the empirical analysis.
We make the simplifying assumption that mi =m for all units (that is, equal cluster sizes). Cluster sizes are often similar for RCTs in the education area (and for the RCTs examined in our empirical work), and variance formulas are much more complex with unequal cluster sizes. Furthermore, the formulas presented in this chapter apply approximately for unequal unit sizes that do not vary substantially across units if m is replaced in the formulas by the average unit size m (Kish 1965) or, preferably, by [n/Σ(1/mi)] (Hedges 2007).
The Simple Differences-In-Means Estimator
The simple differences-in-means ITT estimator α̂ITT1
can be obtained by applying standard regression methods to (3). The resulting estimator
is as follows:
where This estimator is the average difference between cluster means across the treatment and control groups.
Schochet (2008) shows that α̂ITT1 is asymptotically normally distributed with mean αITT and the following asymptotic variance:
The within-unit (second) variance term in (6) is the conventional variance expression for an impact estimator in a nonclustered design where random assignment is conducted within units. Design effects in a clustered design arise because of the first between-unit variance term, which represents the extent to which mean outcomes vary across units (Murray 1998; Donner and Klar 2000).
An asymptotically unbiased estimator for the within-unit variance σW2 is as follows (Cochran 1963; Hedges 2007):
Similarly, an asymptotically unbiased estimator for the between-unit variance σB2 is:
Note that equation (9) can also be expressed in terms of regression residual sums of squares:
where ŷi is the predicted value for unit i from the between-unit regression of yi on Ti and an intercept. Inserting (7) and (8) into (6) yields the following variance estimator for α̂ITT1:
This estimator also applies to nonclustered designs where units are defined as students.
The Analysis of Covariance (ANCOVA) Estimator
The ANCOVA estimator α̂ITT2 can be obtained by applying
regression methods to (4) where baseline covariates (such as pretests) are included
in the analytic models, primarily to improve the precision of the impact estimates.
Schochet (2008) shows that α̂ITT2 is asymptotically
normally distributed with mean αITT and the following asymptotic
variance:
In this expression, σB12 and σW12 are between- and within-unit variances, respectively, that are conditional on the covariates, and reduce σB2 and σW2 depending on the size of the outcome-covariate correlations in the joint superpopulation distributions (these are R2 adjustments).
Using methods that are parallel to the simple differences-in-means estimator presented above, a consistent variance estimator for α̂ITT2 in (12) is as follows:
where SB12 is obtained using (10) with the following changes: (1) ŷi is now the predicted value for unit i from the between-unit regression of yi on Qi =[1 Ti Zi]; and (2) (n -2) is replaced by (n -k) where k is the rank of the matrix Q whose rows contain the Qis. In practice, Ti and Zi may be weakly correlated due to random sampling and missing data. Thus, (13) can be refined as follows:
Finally, in our empirical work, we also used STATA to estimate more efficient generalized least squares models that allowed for unequal cluster sample sizes. Specifically, we used generalized estimating equation (GEE) methods with the sandwich variance estimator (Liang and Zeger 1986), and full and restricted maximum likelihood approaches to general linear mixed models (Littell et al. 1996; Bryk and Raudenbush 1992). The empirical results using these methods are very similar to those that are presented in this report, and thus, are not reported.