|Title:||Assessing the Fit of the Statistical Model Used in the National Assessment of Educational Progress|
|Principal Investigator:||Sinharay, Sandip||Awardee:||Educational Testing Service (ETS)|
|Program:||Statistical and Research Methodology in Education [Program Details]|
|Award Period:||2 years (7/1/12-6/30/14)||Award Amount:||$390,191|
|Type:||Methodological Innovation||Award Number:||R305D120006|
Co-Principal Investigators: Matthew Johnson (Teachers College)
The operational analyses in the National Assessment of Educational Progress (NAEP) make heavy use of item response theory (IRT) models. Standard 3.9 of the Standards for Educational and Psychological Testing advocates for evidence of model fit when an IRT model is used to make inferences from a data set. Hence, it is important to assess the fit of the statistical model used by NAEP for quality control purposes and for the long-term, overall improvement of the program. This study will investigate three recently suggested and cutting-edge model fit assessment tools to assess the fit of the NAEP statistical model to four NAEP data sets. These tools are: (1) generalized residual analysis, (2) residual analysis to assess item fit, and (3) the posterior predictive model checking (PPMC) method. Data from the NAEP 2002 Grade 12 Reading assessment and the NAEP 2009 Grade 8 Mathematics assessment will be analyzed, as well as data from the NAEP 2009 Science Grade 12 assessment and NAEP 2008 long-term trend Mathematics assessment at age 9.
The project will use generalized residual analysis to examine how the statistical model employed in NAEP predicts several data summaries, such as item proportion correct, item-pair proportion correct, and average proportion correct scores for subgroups. The PPMC method will also be applied to examine how the NAEP model predicts several data summaries. To assess the fit of the NAEP statistical model to the test items, the research team will use residual analysis. These applications of the three tools will take into account the matrix sampling design employed in NAEP. The performance of the three methods will be compared to two existing methods that have been used for NAEP data.
In performing a rigorous assessment of the fit of the NAEP statistical model, the researchers will provide evidence consistent with Standard 3.9 of the Standards for Educational and Psychological Testing. In addition, the model fit assessment has the potential to suggest possible improvements in the NAEP statistical model and NAEP test development process, which may lead to enhanced accuracy of the results reported in NAEP. Because several other assessments such as International Adult Literacy Survey (IALS), Trends in International Mathematics and Science Study (TIMSS), and Progress in International Reading Literacy Study (PIRLS) use essentially the same statistical model as in NAEP, the findings of this project will benefit analyses pertinent to those assessments as well.
Journal article, monograph, or newsletter
Chon, K.H., and Sinharay, S. (2014). A Note on the Type I Error Rate of the PARSCALE G 2 Statistic for Long Tests. Applied Psychological Measurement, 38(3), 245–252.
Haberman, S.J., and Sinharay, S. (2013). Generalized Residuals for General Models for Contingency Tables With Application to Item Response Theory. Journal of the American Statistical Association, 108(504), 1435–1444.
Sinharay, S., and Haberman, S.J. (2014). How Often is the Misfit of Item Response Theory Models Practically Significant?. Educational Measurement: Issues And Practice, 33(1), 23–35.
Van Rijn, P.W., Sinharay, S., Haberman, S.J., and Johnson, M.S. (2016). Assessment of Fit of Item Response Theory Models Used in Large-Scale Educational Survey Assessments. Large-Scale Assessments in Education, 4, 10.