Skip Navigation
Technical Methods Report: Using State Tests in Education Experiments

NCEE 2009-013
November 2009


Achieve, Inc. Three paths, one destination: Standards-based reform in Maryland, Massachusetts, and Texas. Washington, DC: Author, 2002.

American Educational Research Association. "Standards and tests: Keeping them aligned." Research Points, vol. 1, no. 1, 2003. Washington, DC: Author.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, D.C.: Author, 1999.

Allen, M.J., and W.M. Yen. Introduction to Measurement Theory. Monterey, CA: Brooks/Cole, 1979.

Allison, D.B., R.L. Allison, M.S. Faith, F. Paultre, and F.X. Pi-Sunyer. "Power and money: designing statistically powerful studies while minimizing financial costs." Psychological Methods, vol. 2, 1997, pp. 20–33.

Amrein, A.L., and D.C. Berliner. An Analysis of Some Unintended Consequences of High-Stakes Testing. Tempe, AZ: The Great Lakes Center for Education Research & Practice, Arizona State University, 2002.

Baron, R. M., & Kenny, D. A. (1986). "The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations." Journal of Personality and Social Psychology, vol. 51, 1986, pp. 1173-1182.

Bloom, H., L. Richburg-Hayes, and A.R. Black. "Using covariates to improve precision for studies that randomize schools to evaluate educational interventions." Educational Evaluation and Policy Analysis, vol. 29, no. 1, 2007, pp. 30-59.

Bracey, G.W. Put to the Test: An Educator's and Consumer's Guide to Standardized Testing (second edition). Bloomington, IN: Phi Delta Kappa International, 2002.

Goertz, Margaret, Mark Duffy, and Kerstin Carlson Le Floch. "Assessment and Accountability Systems in the 50 States: 1999-2000." Philadelphia, PA: Consortium for Policy Research in Education, Research Report No. 46, March 2001.

Cronin, J. A Study of the Ongoing Alignment of the NWEA RIT Scale with Assessments from the Montana Comprehensive Assessment System (MontCAS). Lake Oswego, OR: Northwest Evaluation Association, 2005. (ERIC Document Reproduction Service No. ED491218).

Cohen, J. Statistical Power for the Behavioral Sciences (Second Edition). Hillsdale, NJ: Erlbaum, 1988.

Council of Chief State School Officers (CCSSO). "Key State Education Policies on PK-12 Education: 2004." Washington, DC: CCSSO, 2005.

Council of Chief State School Officers (CCSSO). "Statewide Student Assessment 2007-08 SY: Math, ELA, and Science." Washington, D.C.: CCSSO, 2008. Retrieved from on October 24, 2008.

Cooper, H. Synthesizing Research (3rd ed.): A Guide for Literature Reviews. Applied Social Research Methods Series, Volume 2. Thousand Oaks, CA: Sage, 1998.

Cooper, H., and L.V. Hedges. The Handbook of Research Synthesis. New York: Russell Sage, 1994.

Coxe, B. "FCAT Developmental Score Scale." Unpublished memorandum. August 14, 2002. Retrieved from on December 31, 2008.

Cronbach, L.J., and L. Furby. "How should we measure ‘change'—or should we?" Psychological Bulletin, vol. 74, 1970, pp. 68-80.

Darling-Hammond, L. "Testimony Before the House Education and Labor Committee on the Re-Authorization of No Child Left Behind." Washington, DC, September 10, 2007.

Dong, N., R.A. Maynard, and I. Perez-Johnson, I. "Averaging Effect Sizes Within and Across Studies of Interventions Aimed at Improving Child Outcomes." Child Development Perspectives, vol. 2, no. 3, 2008, pp. 187-197.

Feuer, M.J., P.W. Holland, B.F. Green, M.W. Bertenthal, and F.C. Hemphill. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: National Academy Press, 1999.

Finn, C.E., M.J. Petrilli, and L. Julian. The State of State Standards 2006. Washington, D.C.: Thomas B. Fordham Institute, 2006.

General Accounting Office (GAO). "No Child Left Behind Act. Most Students with Disabilities Participated in Statewide Assessments, but Inclusion Options Could Be Improved." Washington, D.C.: GAO, July 2005.

Glass, G.V., B. McGraw, and M.L. Smith. Meta-Analysis in Social Research. Beverly Hills, CA: SAGE, 1981.

Glazerman, S., D.M. Levy, and D. Myers. "Nonexperimental Replications of Social Experiments: A Systematic Review." Princeton, NJ: Mathematica Policy Research, Inc., 2002.

Goldstein, H. Multilevel Statistical Models. Third edition. London: Edward Arnold, 2003.

Hambleton, R.K., and H. Swaminathan. Item Response Theory: Principles and Applications. Hingham, MA: Kluwer, 1984.

Hambleton, R.K., H. Swaminathan, and H.J. Rogers. Fundamentals of Item Response Theory. Newbury Park, CA: Sage Press, 1991.

Harcourt. Stanford Achievement Test Series, ninth edition, spring norms book. San Antonio, TX: Author, 1997.

Hedges, L.V. and I. Olkin. Statistical methods for meta-analysis. Orlando, FL: Academic Press, 1985.

Holland, P.W., and N.J. Dorans. "Linking and Equating." In Educational Measurement (fourth edition), edited by R.L. Brennan. Westport, CT: American Council on Education/Praeger, 2006.

Holland, P. W., & Rubin, D. B. (1983). On Lord's paradox. In H. Wainer & S. Messick (Eds.), Principals of modern psychological measurement (pp. 3-25). Hillsdale, NJ: Lawrence Erlbaum Associates.

Kohn, A. The Case Against Standardized Testing: Raising the Scores, Ruining the Schools. Portsmouth, NH: Heinemann, 2000.

Kolen, M.J., and R.L. Brennan. Test Equating, Scaling, and Linking: Methods and Practices (second edition). New York: Springer, 2004.

Liang, K. Y. and Zeger, S. L. "Longitudinal Data Analysis Using Generalized Linear Models," Biometrika, vol. 73, 1986, pp. 13–22.

Linn, R.L. "Linking results of distinct assessments." Applied Measurement in Education, vol. 6, no. 1, 1993, pp. 83-102.

Lipsey, M.W. and D.B. Wilson. Practical Meta-Analysis. Applied Social Research Methods Series, Volume 49. Thousand Oaks, CA: Sage, 2001.

Littell, R., Milliken, G., Stroup, W., Wolfinger, R., and Schabenberger, O. SAS for Mixed Models (second edition): Cary, NC: SAS Press, 2006.

Lord, F.M. and M.R. Novick. Statistical Theories of Mental Test Scores. Reading, MA: Addison-Welsley Publishing Company, 1968.

Martineau, J. A. "Distorting value added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability." Journal of Educational and Behavioral Statistics, vol. 31, no. 1, 2006, pp. 35-62.

May, H. "The Reality of Designing Field Experiments in Education: Using Monte Carlo Methods for Power Analysis and Design Decisions." Paper presented at the meeting of the American Education Research Association, Montreal, Canada, 2005.

May, H., and M.A. Robinson. A Randomized Evaluation of Ohio's Personalized Assessment Reporting System (PARS). Philadelphia, PA: Consortium for Policy Research in Education, 2007.

May, H., and J.A. Supovitz. "Capturing the cumulative effects of school reform: An 11-year study of the impacts of America's Choice on Student Achievement." Educational Evaluation and Policy Analysis, vol. 28, no. 3, 2006, pp. 231-257.

Mislevy, R.J. Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. Princeton, NJ: Educational Testing Service, 1992.

Muthén, L.K., and B.O. Muthén. "How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power." Structural Equation Modeling, vol. 9, no. 4, 2002, pp. 599-620.

National Center for Educational Statistics (NCES). "Mapping 2005 State Proficiency Standards onto the NAEP Scales (NCES 2007-482)." Washington, DC: U.S. Department of Education, 2007.

National Center for Educational Statistics. "National Assessment of Educational Progress [online data]." Washington, DC: U.S. Department of Education, 2008. Retrieved from on October 29, 2008.

National Research Council. "Common Standards for K-12 Education? Considering the Evidence: Summary of a Workshop Series." Washington, DC: National Academy of Sciences, 2008.

Neter, J., M.H. Kuter, C.J. Nachtscheim, and W. Wasserman. Applied Linear Statistical Models (fourth edition). Chicago: Irwin, 1996.

Petrilli, M. "The Proficiency Illusion." Presentation to the National Research Council Workshop on Assessing the Role of K-12 Academic Standards in States, April 2008. Retrieved from on October 29, 2008.

Porter, A., M. Polikoff, and J. Smithson. "Is there a de facto national curriculum? Evidence from state standards." Paper prepared for the National Research Council Workshop on Assessing the Role of K-12 Academic Standards in States, January 2008. Retrieved from on May 15, 2008.

Raudenbush, S.W., and A.S. Bryk. Hierarchical Linear Models: Applications and Data Analysis Methods (second edition). Thousand Oaks, CA: Sage, 2002.

Riddle, W. Adequate Yearly Progress (AYP): Implementation of the No Child Left Behind Act. Washington DC: Congressional Research Service, 2005.

Rogosa, D.R., and J.B. Willett. "Demonstrating the reliability of the difference score in the measurement of change." Journal of Educational Measurement, vol. 20, 1983, pp. 335-343.

Rothman, R., J.B. Slattery, J.L. Vranek, and L.B. Resnick. Benchmarking and Alignment of Standards and Testing. (CSE Technical Report 566.) Los Angeles: University of California-Los Angeles, Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, 2002.

Sanders, W. "Comparisons Among Various Educational Assessment Value-Added Models." Paper presented at the National Conference on Value-Added, Columbus, OH, October 2006.

Schmidt, W.H., H.C. Wang, and C.C. McKnight. "Curriculum coherence: An examination of U.S. mathematics and science content standards from an international perspective." Journal of Curriculum Studies, vol. 37, no. 5, 2005, pp. 525-559.

Segal, C. "Motivation, Test Scores, and Economic Success." Unpublished manuscript, 2006. Retrieved from on December 3, 2008.

Shadish, W., T. Cook, and D. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin, 2002.

Singer, J.D., and J.B. Willett. Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. New York: Oxford University Press, 2003.

Thurlow, Martha, Christopher Johnson, and Ruth Ryder. "Accountability for Performance in Assessment." Presentation at the National Accountability Conference, New Orleans, LA, October 4-5, 2004.

U.S. Department of Education. "Family Educational Rights and Privacy, Final Rule." 34 CFR Part 99. Federal Register, vol. 73, no. 237, December 2008, pp. 74806–74855. Retrieved from on October 6, 2009.

U.S. Department of Education. "State and Local Implementation of the No Child Left Behind Act." Washington, DC: ED and RAND, 2007. Retrieved from on October 22, 2008.

U.S. Department of Education. "Assistance to States for the Education of Children with Disabilities and Preschool Grants for Children with Disabilities; Final Rule." 34 CFR Parts 300 and 301. Federal Register, vol. 71, no. 156, August 2006, pp. 46540–46845. Retrieved from on October 6, 2009.

Webb, N.L. "Issues Related to Judging the Alignment of Curriculum Standards and Assessments." Applied Measurement in Education, vol. 20, no. 1, 2007, pp. 7-25. Wildt, A.R., and O. Ahtola. Analysis of Covariance. Thousand Oaks, CA: Sage Publications, 1978.

Wise, S.L., and C.E. DeMars. "Low Examinee Effort in Low-Stakes Assessment: Problems and Potential Solutions." Educational Assessment, vol. 10, no. 1, 2005, pp. 1-17.

Zimmerman, D.W., and R.H. Williams. "Note on the reliability of experimental measures and the power of significance tests." Psychological Bulletin, vol. 100, no. 1, 1986, pp.123-124.