Project Activities
People and institutions involved
IES program contact(s)
Products and publications
Journal article, monograph, or newsletter
Stapleton, L.M. (2012). Evaluation of Conditional Weight Approximations for Two-Level Models. Communications in Statistics: Simulation and Computation, 41: 182-204.
Stapleton, L. M., & Kang, Y. (2018). Design effects of multilevel estimates from national probability samples. Sociological Methods & Research, 47(3), 430-457.
Additional project information
Previous award details:
Supplemental information
Traditional estimation of multilevel models assumes that school data are a function of random selection and that student data are obtained via random selection within schools. These assumptions are violated with typical national survey sampling designs, and parameter estimates and their sampling variances may be biased under traditional estimation. For example, most national education- related datasets use sampling procedures that are much more complicated in design. With a three-stage sample, primary sampling units (PSUs) of geographic areas are first selected, then schools within those PSUs as secondary sampling units (SSUs) are selected, and finally teachers or students within those SSUs are selected as the ultimate sampling units (USUs). With a two-stage sample, the schools are typically selected as PSUs directly. Additionally, at each stage of selection, stratification of the population elements is used in selecting the sample. This stratum information may or may not be included in a researcher's statistical model.
Appropriate methods to model data from multi-stage stratified sampling designs have been proposed (e.g., multilevel pseudo-maximum likelihood [MPML]), but have not been tested under conditions similar to those found with national education-related datasets. These methods require sampling weights at both student and school levels and these level-1 and level-2 weights often are not found on public-release datasets.
Second, the project determined the best method of level-1 and level-2 sampling weight approximation from the available overall (unconditional) sampling weights found on public-release datasets. This was accomplished by comparing the approximated values with the known values from simulated data. From the simulated data introduced in Aim 1, unconditional USU sampling weights will be used to approximate conditional weights for the USU and SSU (if a 3-stage design) or PSU (if a 2-stage design). Bias in these estimates was determined by correlating the known weights to the approximations.
Third, the project determined the most robust method of sampling variance estimation by comparing the performance of a sandwich estimator with replication methods. Bias found with each technique was expected to vary with the data and sampling conditions. Typical conditions with education- related datasets were examined using Monte Carlo simulations. The simulated data (introduced in Aim 1) was analyzed with the MPML method and with three different approaches to sampling variance estimation (linearized, jackknife replication and bootstrap replication) to determine the method that yields the least bias in sampling variances. This provides an adequate 95 percent confidence interval coverage rates of the parameters of interest.
Fourth, the project examined the performance of the scaled change in chi-squared test statistic in model selection, both under conditions of taking the sampling design into account and not. The model fit for models run for Aims 1 and 3 was examined in comparison to the fit of six other misspecified models: three over-specified and three under-specified.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.