Project Activities
In this project, the research team examined the three above-mentioned equating methods. First, the missing data assumptions for the three methods were described explicitly. Next, the research team compared the three methods using datasets from several operational tests, including those given to U.S. grade school students, which use the NEAT design for equating. For any dataset, the team examined the three equating methods performed when the missing data satisfy the assumptions made by only one of these equating methods. The outcomes from this project were intended to assist practitioners who use the NEAT design to choose the most appropriate method for equating; this will in turn ensure reporting of fair scores in tests that employ a NEAT design.
Structured Abstract
Research design and methods
In this project, the research team examined three popular equating methods that can be used with a NEAT design: the poststratification equipercentile equating method (also called the frequency estimation method), the chain equipercentile equating method, and the item response theory observed score equating method. First, the missing data assumptions for the three methods were described explicitly. Next, the research team compared the three methods using datasets from several operational tests, including those given to U.S. grade school students, which use the NEAT design for equating. For any dataset, the team examined how the three equating methods perform in comparison to each other when the missing data satisfy the assumptions made by only one of these equating methods.
Data analytic strategy
For data from each test, the following three-step procedure was used to compare the methods under different missing data assumptions:
- The "true equating function" under each method is obtained by making the missing data assumptions inherent in the method. For example, a typical missing data assumption made by the poststratification equating method is that of all the examinees who obtained a score of, say, 10, on the anchor test, the proportion that obtained a score of, say, 30, on the test to be equated is the same irrespective of whether the examinees belong to the new form population or to the old form population.
- The equating function is then obtained for each equating method. These are called the "observed equating functions." These are the standard equating functions that, for example, a testing company employing these methods computes from the data in an operational equating environment.
- The difference between the two (the true and observed equating functions) at each score point is computed for each pair of true and observed equating functions. The differences are plotted in two-dimensional graphical displays. The differences are summarized using their weighted averages.
The above steps were carried out under two alternative methods (linear equating and kernel equating) for continuizing discrete data and for several tests.
People and institutions involved
IES program contact(s)
Products and publications
Sinharay, S., & Holland, P. W. (2009). The missing data assumptions of the nonequivalent groups with anchor test (NEAT) design and their implications for test equating. ETS Research Report Series, 2009(1), i-53.
Sinharay, S., & Holland, P. W. (2010). A new approach to comparing several equating methods in the context of the NEAT design. Journal of Educational Measurement, 47(3), 261-285.
Sinharay, S., & Holland, P. W. (2010). The missing data assumptions of the NEAT design and their implications for test equating. Psychometrika, 75, 309-327.
** This project was submitted to and funded as an Unsolicited application in FY 2007.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.