Skip Navigation
archived information

Literacy Assessments
December 2020

Question

What is the research on the effectiveness of K-12 literacy assessments for monitoring student growth and achievement?

Ask A REL Response

Thank you for your request to our Regional Educational Laboratory (REL) Reference Desk. Ask A REL is a collaborative reference desk service provided by the 10 RELs that, by design, functions much in the same way as a technical reference library. Ask A REL provides references, referrals, and brief responses in the form of citations in response to questions about available education research.

Following an established REL Northwest research protocol, we conducted a search for evidence- based research. The sources included ERIC and other federally funded databases and organizations, research institutions, academic research databases, Google Scholar, and general Internet search engines. For more details, please see the methods section at the end of this document.

The research team has not evaluated the quality of the references and resources provided in this response; we offer them only for your reference. The search included the most commonly used research databases and search engines to produce the references presented here. References are listed in alphabetical order, not necessarily in order of relevance. The research references are not necessarily comprehensive and other relevant research references may exist. In addition to evidence-based, peer-reviewed research references, we have also included other resources that you may find useful. We provide only publicly available resources, unless there is a lack of such resources or an article is considered seminal in the topic area.

References

Chung, S., Espin, C. A., & Stevenson, C. E. (2018). CBM maze-scores as indicators of reading level and growth for seventh-grade students. Reading and Writing, 31(3), 627-648. Retrieved from https://link.springer.com

From the Abstract:
"The technical adequacy of CBM maze-scores as indicators of reading level and growth for seventh-grade secondary-school students was examined. Participants were 452 Dutch students who completed weekly maze measures over a period of 23 weeks. Criterion measures were school level, dyslexia status, scores and growth on a standardized reading test. Results supported the technical adequacy of maze scores as indicators of reading level and growth. Alternate-form reliability coefficients were significant and intermediate to high. Mean maze scores showed significant increase over time, students’ growth trajectories differed, and students’ initial performance levels (intercepts) and growth rates (slopes) were not correlated. Maze reading level and growth were related to reading level and/or growth on criterion measures. A nonlinear model provided a better fit for the data than a linear model. Implications for use of CBM maze-scores for data-based decision-making are discussed."

Clemens, N. H., Shapiro, E. S., Wu, J. Y., Taylor, A. B., & Caskie, G. L. (2014). Monitoring early first-grade reading progress: A comparison of two measures. Journal of Learning Disabilities, 47(3), 254-270. Retrieved from https://ir.nctu.edu.tw

From the Abstract:
"This study compared the validity of progress monitoring slope of nonsense word fluency (NWF) and word identification fluency (WIF) with early first-grade readers. Students ("N" = 80) considered to be at risk for reading difficulty were monitored with NWF and WIF on a 1-2 week basis across 11 weeks. Reading skills at the end of first grade were assessed using measures of passage reading fluency, real and pseudoword reading efficiency, and basic comprehension. Latent growth models indicated that although slope on both measures significantly predicted year-end reading skills, models including WIF accounted for more variance in spring reading skills than NWF, and WIF slope was more strongly associated with reading outcomes than NWF slope. Analyses of student growth plots suggested that WIF slope was more positively associated with later reading skills and discriminated more clearly between students according to successful or unsuccessful year-end reading outcomes. Although both measures may be used to monitor reading growth of at-risk students in early first grade, WIF may provide a clearer index of reading growth. Implications for data-based decision-making are discussed."

Cordray, D., Pion, G., Brandt, C., Molefe, A., & Toby, M. (2012). The impact of the Measures of Academic Progress (MAP) Program on student reading achievement. (NCEE 2013–4000). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. https://eric.ed.gov

From the Abstract:
"This study was designed to address questions from Midwestern states and districts about the extent to which benchmark assessment may affect teachers' differentiated instructional practices and student achievement. Thirty-two elementary schools in five districts in Illinois participated in a two-year randomized controlled trial to assess the effectiveness of the MAP program. Half the schools were randomly assigned to implement the MAP program in grade 4, and the other half were randomly assigned to implement MAP in grade 5. Schools assigned to grade 4 treatment served as the grade 5 control condition, and schools assigned to grade 5 treatment served as the grade 4 control. The results of the study indicate that the MAP program was implemented with moderate fidelity but that MAP teachers were not more likely than control group teachers to have applied differentiated instructional practices in their classes. Overall, the MAP program did not have a statistically significant impact on students' reading achievement in either grade 4 or grade 5."

Foorman, B., Espinosa, A., Wood, C., & Wu, Y. C. (2016). Using computer-adaptive assessments of literacy to monitor the progress of English learner students. (REL 2016-149). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southeast. https://eric.ed.gov

From the Abstract:
"A top education priority in the United States is to address the needs of one of the fastest growing yet lowest performing student populations--English learner students (Capps et al., 2005). English learner students come from homes where a non-English language is spoken and need additional academic support to access the mainstream curriculum. These students account for about 10 percent of the preK-12 student population in the United States (Aud et al., 2013). Spanish-speaking students account for 80 percent of the English learner student population in the United States and, because they live disproportionately in poverty and attend schools with higher percentages of racial/ethnic minority students, students from low-income households, and students with low achievement, Spanish-speaking students are at greater risk of low achievement than other English learner students (Capps et al., 2005). This study examined how teachers and school staff administered computer-adaptive assessments of literacy to English learner students in grades 3-5 and how they used the assessments to monitor students' growth in literacy skills. It presents findings that may aid districts in implementing a computer-adaptive assessment of literacy skills for English learner students as well as for other students. Appendix A presents Additional Tables and Figures."

Foorman, B. R., Kershaw, S., & Petscher, Y. (2013). Evaluating the screening accuracy of the Florida Assessments for Instruction in Reading (FAIR). (REL 2013–008). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southeast. https://eric.ed.gov

From the Abstract:
"Florida requires that students who do not meet grade-level reading proficiency standards on the end-of-year state assessment (Florida Comprehensive Assessment Test, FCAT) receive intensive reading intervention. With the stakes so high, teachers and principals are interested in using screening or diagnostic assessments to identify students with a strong likelihood of failing to meet grade-level proficiency standards on the FCAT. Since 2009 Florida has administered a set of interim assessments (Florida Assessments for Instruction in Reading, FAIR) three times a year (fall, winter, and spring) to obtain information on students' probability of meeting grade-level standards on the end-of-year FCAT. In 2010/11 the Florida Department of Education aligned the FCAT to new standards (Next Generation Sunshine State Standards) and renamed it the FCAT 2.0 but retained the 2009/10 cutscores. In 2011/12 it changed the FCAT 2.0 cutscores. The share of students meeting grade-level standards on the FCAT 2.0 fell to 53 percent in 2012 from 72 percent in 2011. This drop led the Florida Department of Education to partner with the Regional Educational Laboratory Southeast to analyze student performance on the FAIR reading comprehension screen and FCAT 2.0 to determine how well the FAIR and the 2011 FCAT 2.0 scores predict 2012 FCAT 2.0 performance. The study addresses two research questions: (1) What is the association between performance on the 2012 FCAT 2.0 and two scores from the FAIR reading comprehension screen across grades 4-10 and the three FAIR assessment periods (predictive validity)?; and (2) How much does adding the FAIR reading comprehension screen affect identification errors beyond those identified through 2011 FCAT 2.0 scores (screening accuracy)? Performance on the 2012 FCAT 2.0 was found to have a stronger correlation with FCAT success probability scores than with FAIR reading comprehension ability scores. In addition, using 2011 FCAT 2.0 scores alone to predict 2012 FCAT 2.0 scores underidentified 16-24 percent of students as at risk. Adding FAIR reading comprehension ability scores dropped the underidentification rate by 12-20 percentage points. An appendix provides additional statistics."

Fuchs, L. S., & Fuchs, D. (2011). Using CBM for progress monitoring in reading. National Center on Student Progress Monitoring. https://eric.ed.gov

From the Abstract:
"Progress monitoring focuses on individualized decision making in general and special education with respect to academic skill development at the elementary grades. Progress monitoring is conducted frequently (at least monthly) and is designed to: (1) Estimate rates of improvement; (2) Identify students who are not demonstrating adequate progress and therefore require additional or alternative forms of instruction; and/or; and (3) Compare the efficacy of different forms of instruction and thereby design more effective, individualized instructional programs for problem learners. In this manual, the authors discuss one form of progress monitoring: Curriculum-Based Measurement (CBM). CBM provides teachers with an easy and quick method of obtaining empirical information on the progress of their students. With frequently obtained student data, teachers can analyze student scores to adjust student goals and revise their instructional programs. That way, instruction can be tailored to best fit the needs of each student. Research has demonstrated that when teachers use CBM to inform their instructional decision making, students learn more, teacher decision making improves, and students are more aware of their own performance. Included is an annotated bibliography of selected CBM article."

January, S. A., Van Norman, E. R., Christ, T. J., Ardoin, S. P., Eckert, T. L., & White, M. J. (2018). Progress monitoring in reading: Comparison of weekly, bimonthly, and monthly assessments for students at risk for reading difficulties in grades 2–4. School Psychology Review, 47(1), 83-94. https://eric.ed.gov

From the Abstract:
"The present study examined the utility of two progress monitoring assessment schedules (bimonthly and monthly) as alternatives to monitoring once weekly with curriculum-based measurement in reading (CBM-R). General education students (N = 93) in Grades 2-4 who were at risk for reading difficulties but not yet receiving special education services had their progress monitored via three assessment schedules across 1 academic year. Four mixed-factorial analyses of variance tested the effect of progress monitoring schedule (weekly, bimonthly, monthly), grade (2, 3, and 4), and the interaction effect between schedule and grade on four progress monitoring outcomes: intercept, slope, standard error of the estimate, and standard error of the slope. Results indicated that (a) progress monitoring schedule significantly predicted each outcome, (b) grade predicted each progress monitoring outcome except the standard error of the slope, and (c) the effect of schedule on each outcome did not depend on students' grade levels. Overall, findings from this study reveal that collecting CBM-R data less frequently than weekly may be a viable option for educators monitoring the progress of students in Grades 2-4 who are at risk for reading difficulties."

Miller, K. C., Bell, S. M., & McCallum, R. S. (2015). Using reading rate and comprehension CBM to predict high-stakes achievement. Journal of Psychoeducational Assessment, 33(8), 707–718. Retrieved from https://citeseerx.ist.psu.edu

From the Abstract:
"Because of the increased emphasis on standardized testing results, scores from a high-stakes, end-of-year test (Tennessee Comprehensive Assessment Program [TCAP] Reading Composite) were used as the standard against which scores from a group-administered, curriculum-based measure (CBM), Monitoring Instructional Responsiveness: Reading (MIR:R), were compared for 448 third-grade students. A zero-order correlation coefficient of 0.58 (p 0.001) partially defined the relationship between the MIR:R composite score (comprehension rate) and student performance on the TCAP reading composite; a classification analysis yielded the following percentages: sensitivity = 85, specificity = 53. Results from a stepwise multiple-regression equation revealed that the Comprehension score provided moderate predictive validity for TCAP reading composite performance (29% variance accounted for, p 0.001); the rate (Total Words Read) score was less predictive (1% additional variance accounted for, p 0.05). Discussion focuses on the implications of using unidimensional versus multidimensional CBMs for early screening and/or progress monitoring within response to intervention."

Morsy, L., Kieffer, M., & Snow, C. (2010). Measure for measure: A critical consumers' guide to reading comprehension assessments for adolescents—Final report from Carnegie Corporation of New York's Council on Advancing Adolescent Literacy. Carnegie Corporation of New York. https://eric.ed.gov

From the Abstract:
"Although millions of dollars and weeks of instructional time are spent nationally on testing students, educators often have little information on how to choose appropriate assessments of adolescent reading for informing instruction. This guide is designed to meet that need, by drawing together evidence about nine of the most commonly-used, commercially-available reading comprehension assessments and providing a critical view into the strengths and weaknesses of each. In so doing, the authors focus on the utility of assessments for the purposes of screening groups of students to identify those who struggle and diagnosing the specific needs of students who struggle. Motivated primarily by the many questions that the authors receive from principals, literacy coaches, and district curriculum leaders about diagnostic assessment for students in grades four through twelve, this guide aims to provide those decision-makers with the tools they need to make informed decisions. Note on Methodology is appended. (Contains 11 tables and 7 endnotes.) [For related reports, see "Adolescent Literacy Programs: Costs of Implementation. Final Report from Carnegie Corporation of New York's Council on Advancing Adolescent Literacy.]"

Muijselaar, M. M., Kendeou, P., de Jong, P. F., & van den Broek, P. W. (2017). What does the CBM-Maze test measure? Scientific Studies of Reading, 21(2), 120-132. Retrieved from https://www.tandfonline.com

From the Abstract:
"In this study, we identified the code-related (decoding, fluency) and language comprehension (vocabulary, listening comprehension) demands of the CBM-Maze test, a formative assessment, and compared them to those of the Gates-MacGinitie test, a standardized summative assessment. The demands of these reading comprehension tests and their developmental patterns were examined with multigroup structural regression models in a sample of 274 children in Grades 4, 7, and 9. The results showed that the CBM-Maze test relied more on code-related than on language comprehension skills when compared to the Gates-MacGinitie test. These demands were relatively stable across grades."

Methods

Keywords and Search Strings: The following keywords, subject headings, and search strings were used to search reference databases and other sources: Assess* AND (literacy OR reading), (Monitoring OR growth OR progress OR measuring OR achievement), "Literacy assessment", "Adolescent literacy"

Databases and Resources: We searched ERIC for relevant resources. ERIC is a free online library of more than 1.6 million citations of education research sponsored by the Institute of Education Sciences (IES). Additionally, we searched Google Scholar and EBSCO databases (Academic Search Premier, Education Research Complete, and Professional Development Collection).

Reference Search and Selection Criteria

When we were searching and reviewing resources, we considered the following criteria:

Date of publications: This search and review included references and resources published in the last 10 years.

Search priorities of reference sources: Search priority was given to study reports, briefs, and other documents that are published and/or reviewed by IES and other federal or federally funded organizations, as well as academic databases, including ERIC, EBSCO databases, and Google Scholar.

Methodology: The following methodological priorities/considerations were given in the review and selection of the references:

  • Study types: randomized control trials, quasi experiments, surveys, descriptive data analyses, literature reviews, and policy briefs, generally in this order
  • Target population and samples: representativeness of the target population, sample size, and whether participants volunteered or were randomly selected
  • Study duration
  • Limitations and generalizability of the findings and conclusions

This memorandum is one in a series of quick-turnaround responses to specific questions posed by stakeholders in Alaska, Idaho, Montana, Oregon, and Washington, which is served by the Regional Educational Laboratory (REL) Northwest. It was prepared under Contract ED-IES-17-C-0009 by REL Northwest, administered by Education Northwest. The content does not necessarily reflect the views or policies of IES or the U.S. Department of Education, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.