Skip Navigation
Funding Opportunities | Search Funded Research Grants and Contracts

IES Grant

Title: Addressing Small Sample and Computational Issues in Mixture Models of Repeated Measures Data with Covariance Pattern Mixture Models
Center: NCER Year: 2019
Principal Investigator: McNeish, Daniel Awardee: Arizona State University
Program: Statistical and Research Methodology in Education–Early Career      [Program Details]
Award Period: 2 Years (08/01/19–07/31/21) Award Amount: $209,305
Type: Methodological Innovation Award Number: R305D190011

Mixture models of repeated measures data are popular in the education sciences to detect potentially latent classes of growth trajectories. Two types of models are currently used for this purpose: Latent Class Growth Models (LCGMs) and Growth Mixture Models (GMM). These two models occupy opposite ends of the continuum of mixture modeling for repeated measures data. LCGMs are simpler computationally but do not fully model all sources of variance, leading to a semi-parametric approximation of a complex random effect distribution. On the other hand, GMMs fully model all sources of variance between- and within-individuals but tend to be quite difficult computationally because partitioning the variance can complicate inverses within the maximum likelihood estimator.

Though there is no strict sample size cut-off for GMMs and it is common to encounter computational difficulties with sample sizes of 1000 or more. When faced with computational issues, recent review studies have noted that empirical researchers often impose constraints on parameters across latent classes or set some parameters to zero — not for theoretical reasons but solely to circumvent computational issues present in the estimation algorithm. Many methodological articles have expressed concern for this practice, but education researchers generally have few alternatives in which to turn when computational issues arise, especially with small samples where asymptotics do not hold and Bayesian methods require informed priors. Though LCGMs and GMMs are typically presented as the two options for mixture models of repeated measures data, there is a type of model that occupies the area of the continuum between these models. Part of the complexity of GMMs stems from the fact that they account for between-individual variability with random effects. Though this is in line with common practice in growth modeling in the education sciences, such an approach is not always necessary, which augments the computational complexity of an already complex model.

Instead, this project proposes the use of Covariance Pattern Mixture Models (CPMMs). Covariance Pattern models were developed in the 1980s as an alternative to the (then) computationally demanding random effect modeling framework. Covariance Pattern models account for both between- and within-individual variability; however, this is done with a single patterned marginal covariance matrix rather than by partitioning the variance into multiple components via subject-specific random effects. Unlike LCGMs, CPMMs fully model all sources of variability and unlike GMMs CPMMs do not require random effects, so the computational difficulty is far reduced. Preliminary simulations show that misspecified CPMMs outperform perfectly specified GMMs in terms of class enumeration, classification accuracy, convergence rates, and standard error efficiency when the sample size is 1000 or less.

The specific aims of this project are to extend study of CPMMs by:

  • Conducting a comprehensive simulation to more fully demonstrate that CPMMs are a viable alternative to (and may outperform) GMMs when sample sizes are smaller.
  • Investigating which criteria are best able to enumerate classes in smaller samples, especially with CPMMs
  • Determining if the viability of CPMMs is maintained when missing data are present, a common occurrence in empirical studies within educational research.
  • Extending the idea of CPMMs to generalized estimating equations (GEE) mixture models, which are have the added benefit of being robust to covariance structure misspecification.