Skip to main content

Breadcrumb

Home arrow_forward_ios Information on IES-Funded Research arrow_forward_ios Methods for Parameter Inference, Mo ...
Home arrow_forward_ios ... arrow_forward_ios Methods for Parameter Inference, Mo ...
Information on IES-Funded Research
Grant Closed

Methods for Parameter Inference, Model Comparison and Incomplete Data in Complex Psychometric Models for NAEP Survey Data

NCER
Program: Statistical and Research Methodology in Education
Program topic(s): Core
Award amount: $476,061
Principal investigator: Laura Salganik
Awardee:
American Institutes for Research (AIR)
Year: 2011
Award period: 2 years 2 months (01/01/2012 - 02/28/2014)
Project type:
Methodological Innovation
Award number: R305D110008

Purpose

This project sought to extend the work done by the co-PIs on increasing the efficient analysis (both computational and statistical) of the large-scale, multi-level National Assessment of Educational Progress (NAEP). This project used data from the 2005 NAEP math survey in order to address two issues: (1) determining the best fitting model; and (2) missing data in the teacher questionnaire.

Project Activities

The project consisted of two parts. First, it developed methods for parameter inference and model comparison in complex psychometric models designed for the NAEP data. Previously, parameter estimates in the multilevel model-based approach to NAEP survey data were obtained through maximum likelihood (ML) estimates using an extension of the 1981 Bock-Aitkin EM algorithm implemented in LatentGold. In this case, the estimates of the precision of the ML estimates are derived from standard errors, which are obtained by inverting the information matrix. As the complexity of models increases however, the representation of the likelihood with a multivariate normal distribution with curvature given by the inverse information matrix becomes progressively difficult. Confidence intervals based on these standard errors also become increasingly unreliable. To address these limitations in the previously used approach, the researchers used Bayesian methods based on Markov chain Monte Carlo (MCMC) to obtain the full joint posterior distribution for all the model parameters. These methods are expected to provide more accurate variability estimates and compare the models through either integrated or posterior likelihoods.

People and institutions involved

IES program contact(s)

Allen Ruby

Associate Commissioner for Policy and Systems
NCER

Supplemental information

Co-Principal Investigators: Aitkin, Murray; Aitkin, Irit

In their past work, the researchers fit a range of complex psychometric models to the NAEP test item data with maximum likelihood estimates. They relied on the likelihood ratio test for the reduction of the large regression models to parsimonious form and also for the comparison of competing models with different structure (for example, 3PL and mixture guessing models, or mixture models and multidmensional ability models). The researchers found that maximum likelihood estimates, standard errors, and likelihood ratio tests became increasingly unreliable as model complexity increased. In addition, some of the model comparisons were of non-nested models forcing them to rely on bootstrapping the distribution of the test statistic. This raised concerns that the Type I error probabilities of bootstrap tests were biased upwards because the estimated parameters used to generate the bootstrap data sets are assumed to be the true population values. Different conclusions regarding student behavior could then be drawn from the different models regarding student behavior. This in turn raised the importance of their being unable to rely upon the likelihood test to determine which model was superior. In this project, the researchers used Bayesian methods based on Markov chain Monte Carlo to obtain the full joint posterior distribution for all the model parameters. This was then used to obtain accurate variability estimates that would allow for a comparison of the models through either integrated or posterior likelihoods.

Regarding the second issue, the role of missing data can be illustrated through the teacher questionnaire of the 2005 NAEP. In the Texas sample, the teacher questionnaire is missing for 25% of schools and 50% of students. In the California, the sample is missing for 7% of schools and 22% of students, respectively. In their past work, the researchers included the teacher data in the multi-level analysis using complete case analysis in which cases were deleted unless they had all the variables to be used in the analysis. As a result, the analysis for Texas was only 50% of complete student records and for California 78% of the student records and findings differed depending on whether or not the teacher data were included in the analysis. The project investigated whether NAEP survey data with incomplete categorical explanatory variables can be efficiently and effectively analyzed through either the mixed maximum likelihood approach suggested by Aitkin and Chadwick (2003) or through extensions and/or accelerations of the Bayesian MCMC approach (for example, by using starting values for the model parameters from the posterior distribution based on the complete cases).

Second, the researchers developed methods for efficient use of incomplete data in multilevel models for analyses of the NAEP survey data. Fully Bayesian methods for incomplete data are generalizations of multiple imputations and require MCMC approaches in which the missing data are handled in the same way as they are in latent class models. The variables on which data are missing are represented with additional models, either with multinomial distributions (if categorical variables are present in the model) or with normal distributions (if continuous variables are present in the model). These models are needed only for the missing values, which are then imputed from their posterior distributions given the current parameters and values of other latent variables.

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

Tags

Data and AssessmentsMathematics

Share

Icon to link to Facebook social media siteIcon to link to X social media siteIcon to link to LinkedIn social media siteIcon to copy link value

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

You may also like

Zoomed in IES logo
Workshop/Training

Data Science Methods for Digital Learning Platform...

August 18, 2025
Read More
Zoomed in IES logo
Workshop/Training

Meta-Analysis Training Institute (MATI)

July 28, 2025
Read More
Zoomed in Yellow IES Logo
Workshop/Training

Bayesian Longitudinal Data Modeling in Education S...

July 21, 2025
Read More
icon-dot-govicon-https icon-quote