Search Funded Research Grants and Contracts

Funding Opportunities | Search Funded Research Grants and Contracts

IES Grant

Title:	Approaches for Weighting and Estimation of Public-release Education Data using Two-level Covariance Structure Models
Center:	NCER	Year:	2011
Principal Investigator:	Stapleton, Laura	Awardee:	University of Maryland, College Park
Program:	Statistical and Research Methodology in Education [Program Details]
Award Period:	2 years	Award Amount:	$159,620
Type:	Methodological Innovation	Award Number:	R305D110050
Description:	Previous Award Number: R305D110046 Previous Awardee: University of Maryland, Baltimore County Purpose: This project identified best methods for estimating parameters and their sampling variances when using multilevel analyses with data collected via complex sampling designs typically used in education research. Traditional estimation of multilevel models assumes that school data are a function of random selection and that student data are obtained via random selection within schools. These assumptions are violated with typical national survey sampling designs, and parameter estimates and their sampling variances may be biased under traditional estimation. For example, most national education- related datasets use sampling procedures that are much more complicated in design. With a three-stage sample, primary sampling units (PSUs) of geographic areas are first selected, then schools within those PSUs as secondary sampling units (SSUs) are selected, and finally teachers or students within those SSUs are selected as the ultimate sampling units (USUs). With a two-stage sample, the schools are typically selected as PSUs directly. Additionally, at each stage of selection, stratification of the population elements is used in selecting the sample. This stratum information may or may not be included in a researcher's statistical model. Appropriate methods to model data from multi-stage stratified sampling designs have been proposed (e.g., multilevel pseudo-maximum likelihood [MPML]), but have not been tested under conditions similar to those found with national education-related datasets. These methods require sampling weights at both student and school levels and these level-1 and level-2 weights often are not found on public-release datasets. Project Activities: The project has four specific aims to address multilevel analysis with complex sample data. First, the project quantified the effects of ignoring the sampling design when using a multilevel model on estimates of parameters and sampling variances through a Monte Carlo simulation. Bias of estimates were examined across a range of typical sampling designs and population characteristics found with education-related datasets. A simulation study was conducted to determine the levels of bias found in parameter and sampling variance estimates when using multilevel covariance structure modeling with complex sample data ignoring the sample design. The first step in developing the simulation study consisted of an extensive review of education-related datasets to define the values used within the conditions of the simulation study as explained in the research plan. Second, the project determined the best method of level-1 and level-2 sampling weight approximation from the available overall (unconditional) sampling weights found on public-release datasets. This was accomplished by comparing the approximated values with the known values from simulated data. From the simulated data introduced in Aim 1, unconditional USU sampling weights will be used to approximate conditional weights for the USU and SSU (if a 3-stage design) or PSU (if a 2-stage design). Bias in these estimates was determined by correlating the known weights to the approximations. Third, the project determined the most robust method of sampling variance estimation by comparing the performance of a sandwich estimator with replication methods. Bias found with each technique was expected to vary with the data and sampling conditions. Typical conditions with education- related datasets were examined using Monte Carlo simulations. The simulated data (introduced in Aim 1) was analyzed with the MPML method and with three different approaches to sampling variance estimation (linearized, jackknife replication and bootstrap replication) to determine the method that yields the least bias in sampling variances. This provides an adequate 95 percent confidence interval coverage rates of the parameters of interest. Fourth, the project examined the performance of the scaled change in chi-squared test statistic in model selection, both under conditions of taking the sampling design into account and not. The model fit for models run for Aims 1 and 3 was examined in comparison to the fit of six other misspecified models: three over-specified and three under-specified. Products and Publications Journal article, monograph, or newsletter Stapleton, L.M. (2012). Evaluation of Conditional Weight Approximations for Two-Level Models. Communications in Statistics: Simulation and Computation, 41: 182–204. Stapleton, L. M., & Kang, Y. (2018). Design effects of multilevel estimates from national probability samples. Sociological Methods & Research, 47(3), 430–457.
Back