archived information

Star Assessments
_{February 2021}

Question

"What does the research say about the usability of Renaissance Star assessments and their use in guiding the instructional practices of teachers?"

Ask A REL Response

Thank you for your request to our Regional Educational Laboratory (REL) Reference Desk. Ask A REL is a collaborative reference desk service provided by the 10 RELs that, by design, functions much in the same way as a technical reference library. Ask A REL provides references, referrals, and brief responses in the form of citations in response to questions about available education research.

Following an established REL Northwest research protocol, we conducted a search for evidence- based research. The sources included ERIC and other federally funded databases and organizations, research institutions, academic research databases, Google Scholar, and general Internet search engines. For more details, please see the methods section at the end of this document.

The research team has not evaluated the quality of the references and resources provided in this response; we offer them only for your reference. The search included the most commonly used research databases and search engines to produce the references presented here. References are listed in alphabetical order, not necessarily in order of relevance. The research references are not necessarily comprehensive and other relevant research references may exist. In addition to evidence-based, peer-reviewed research references, we have also included other resources that you may find useful. We provide only publicly available resources, unless there is a lack of such resources or an article is considered seminal in the topic area.

References

Burns, M. K., Kanive, R., & DeGrande, M. (2012). Effect of a computer-delivered math fact intervention as a supplemental intervention for math in third and fourth grades. Remedial and Special Education, 33(3), 184-191. Retrieved from https://www.researchgate.net

From the Abstract:
"The current study reviews a computer-based math fluency intervention with 216 third- and fourth-grade students who were at risk for math difficulties. The intervention used a computer software program to practice math facts an average of three times per week for 8 to 15 weeks. Data were compared to those of 226 students in a control group. Results indicated that students who participated in the intervention had significantly larger gains on their math scores than those in the control group, and students with severe math problems (at or below the 15th percentile) grew at a rate that was equal to that of students with a pretest score that was between the 15th and 25th percentiles. Moreover, significantly fewer students remained at risk for math failure in the intervention group after participating in the intervention. These data suggest that the computer-based intervention was a useful supplemental math intervention. Suggestions for future research are provided."

Bulut, O., & Cormier, D. C. (2018). Validity evidence for progress monitoring with Star Reading: Slope estimates, administration frequency, and number of data points. Frontiers in Education, 3(68), 1–12. Retrieved from https://www.frontiersin.org

From the Abstract:
"The increasing use of computerized adaptive tests (CATs) to collect information about students' academic growth or their response to academic interventions has led to a number of questions pertaining to the use of these measures for the purpose of progress monitoring. Star Reading is an example of a CAT-based assessment with considerable validity evidence to support its use for progress monitoring. However, additional validity evidence could be gathered to strengthen the use and interpretation of Star Reading data for progress monitoring. Thus, the purpose of the current study was to focus on three aspects of progress monitoring that will benefit Star Reading users. The specific research questions to be answered are: (a) how robust are the estimation methods in producing meaningful progress monitoring slopes in the presence of outliers; (b) what is the length of the time interval needed to use Star Reading for the purpose of progress monitoring; and (c) how many data points are needed to use Star Reading for the purpose of progress monitoring? The first research question was examined using a Monte Carlo simulation study. The second and third research questions were examined using real data from 6,396,145 students who took the Star Reading assessment during the 2014–2015 school year. Results suggest that the Theil-Sen estimator is the most robust estimator of student growth when using Star Reading. In addition, it appears that five data points and a progress monitoring window of approximately 20 weeks appear to be the minimum parameters for Star Reading to be used for the purpose of progress monitoring. Implications for practice include adapting the parameters for progress monitoring according to a student's current grade-level performance in reading."

Clemens, N. H., Hagan-Burke, S., Luo, W., Cerda, C., Blakely, A., Frosch, J., ... & Jones, M. (2015). The predictive validity of a computer-adaptive assessment of kindergarten and first-grade reading skills. School Psychology Review, 44(1), 76-97. Retrieved from https://www.researchgate.net

From the Abstract:
"This study examined the predictive validity of a computer-adaptive assessment for measuring kindergarten reading skills using the STAR Early Literacy (SEL) test. The findings showed that the results of SEL assessments administered during the fall, winter, and spring of kindergarten were moderate and statistically significant predictors of year-end reading and reading-related skills, and they explained 35% to 38% of the variance in a latent variable of word-reading skills. Similar results were observed with a subsample of 71 participants who received follow-up assessments in first grade. End-of-kindergarten analyses indicated that, when added as predictors with SEL, paper-based measures of letter naming, letter-sound fluency, and word-reading fluency improved the amount of explained variance in kindergarten and first-grade year-end word-reading skills. Classification-accuracy analyses found that the SEL literacy classifications aligned with word-reading skills measured by paper-based assessments for students with higher SEL scores, but less alignment was found for students with lower SEL scores. In addition, SEL cut scores showed problematic accuracy, especially in predicting outcomes at the end of first grade. The addition of paper-based assessments tended to improve accuracy over using SEL in isolation. Overall, SEL shows promise as a universal screening tool for kindergarten reading skills, although it may not yet be able to completely replace paper-based assessments of early reading."

Clemens, N. H., Hsiao, Y. Y., Simmons, L. E., Kwok, O. M., Greene, E. A., Soohoo, M. M., ... & Otaiba, S. A. (2019). Predictive validity of kindergarten progress monitoring measures across the school year: Application of dominance analysis. Assessment for Effective Intervention, 44(4), 241-255. https://eric.ed.gov

From the Abstract:
"Although several measures are available for monitoring kindergarten reading progress, little research has directly compared them to determine which are superior in predicting year-end reading skills relative to other measures, and how validity may change across the school year as reading skills develop. A sample of 426 kindergarten students who were considered to be at risk for reading difficulty at the start of kindergarten were monitored across the year with a set of paper-based progress monitoring measures and a computer-adaptive test. Dominance analyses were used to determine the extent to which each measure uniquely predicted year-end reading skills relative to other measures. Although the computer-adaptive test was the most dominant predictor at the start of the year over letter sound fluency, letter naming fluency, and phoneme segmentation fluency, letter sound fluency was most dominant by December. Measures of fluency reading real words administered across the second half of the year were dominant to all other assessments. The implications for measure selection are discussed."

Lambert, R., Algozzine, B., & Mc Gee, J. (2014). Effects of progress monitoring on math performance of at-risk students. Journal of Education, Society and Behavioural Science, 527-540. Retrieved from https://www.journaljesbs.com

From the Abstract:
"Aims: In this research, we evaluated the effects of progress monitoring grounded in a commercially-available tool used to customize assignments and keep track of progress in mathematics for students in elementary school. Study Design: We used a randomized controlled trial and multilevel analysis to test the effect of the treatment on the outcome measures while nesting students within their classroom. Place and Duration of Study: Students in three elementary schools in the Midwestern region of the United States were in the study which took place across an academic year. Methodology: We used two-level hierarchical linear models for our analyses because of the nested nature of our data. We compared outcomes across high- and low implementation fidelity treatment group classrooms as well as across treatment and control classrooms. Results: We found statistically significant treatment differences for monthly growth rate and elementary school fidelity of implementation effects were documented. Conclusion: Professionals engaged in progress monitoring use a variety of measures to track student performance and to assist in instructional decision making when data indicate a need for change. We found that the use of a computer-based individualized mathematics assignment and progress monitoring program resulted in improvements in both curriculum based and standardized assessments. The effects of using the system were greater when level of implementation (i.e., intervention fidelity) was higher. The value of progress monitoring and the importance of measuring the relationship between fidelity of implementation and achievement outcomes that we found support prior research."

McBride, J. R., Ysseldyke, J., Milone, M., & Stickney, E. (2010). Technical adequacy and cost benefit of four measures of early literacy. Canadian Journal of School Psychology, 25(2), 189-204. Retrieved from https://www.academia.edu

From the Abstract:
"Technical adequacy and information/cost return were examined for four early reading measures: the Dynamic Indicators of Basic Early Literacy Skills (DIBELS), STAR Early Literacy (SEL), Group Reading Assessment and Diagnostic Evaluation (GRADE), and the Texas Primary Reading Inventory (TPRI). All four assessments were administered to the same students in each of Grades K through 2 over a 5-week period; the samples included 200 students per grade from 7 states. Both SEL and DIBELS were administered twice to establish their retest reliability in each grade. We focused on the convergent validity of each assessment for measuring five critical components of reading development identified by the U.S. National Research Panel: Phonemic awareness, phonics, vocabulary, comprehension, and fluency. DIBELS and TPRI both are asserted to assess all five of these components; GRADE and STAR Early Literacy explicitly measure all except fluency. For all components, correlations among relevant subtests were high and comparable. The pattern of intercorrelations of nonfluency measures with fluency suggests the tests of fluency, vocabulary, comprehension, and word reading are measuring the same underlying construct. A separate cost-benefit study was conducted and showed that STAR Early Literacy was the most cost-effective measure among those studied. In terms of amount of time per unit of test administration or teachers' time, CAT (computerized adaptive testing) in general, and STAR Early Literacy in particular, is an attractive option for early reading assessment."

Monpas-Huber, J. B., & Marysville Public Schools (2015). Just pressing buttons? Validity evidence for the STAR and Smarter Balanced Summative Assessments. The WERA Educational Journal, 8(1), 39-44. Retrieved form https://jackbhuber.files.wordpress.com

From the Abstract:
"As long as American public school students continue to take tests, those in the field of educational measurement who develop and use tests are exhorted to uphold high standards of practice. For testing experts working in public school districts, this means being very clear about the purposes of various assessments and then being able to describe and produce evidence of validity for those purposes. This paper considers challenges to the validity of the STAR and Smarter Balanced assessments on the grounds of content and anomalies in administration. It then examines correlations between the test scores as evidence of validity. The results show strong correlations."

Nelson, P. M., Van Norman, E. R., Klingbeil, D. A., & Parker, D. C. (2017). Progress monitoring with computer adaptive assessments: The impact of data collection schedule on growth estimates. Psychology in the Schools, 54(5), 463-471. Retrieved from https://www.researchgate.net

From the Abstract:
"Although extensive research exists on the use of curriculum-based measures for progress monitoring, little is known about using computer adaptive tests (CATs) for progress-monitoring purposes. The purpose of this study was to evaluate the impact of the frequency of data collection on individual and group growth estimates using a CAT. Data were available for 278 fourth- and fifth-grade students. Growth estimates were obtained when five, three, and two data collections were available across 18 weeks. Data were analyzed by grade to evaluate any observed differences in growth. Further, root mean square error values were obtained to evaluate differences in individual student growth estimates across data collection schedules. Group-level estimates of growth did not differ across data collection schedules; however, growth estimates for individual students varied across the different schedules of data collection. Implications for using CATs to monitor student progress at the individual or group level are discussed."

Ochs, S., Keller-Margulis, M. A., Santi, K. L., & Jones, J. H. (2020). Long-term validity and diagnostic accuracy of a reading computer-adaptive test. Assessment for Effective Intervention, 45(3), 210-225. Retrieved from https://www.researchgate.net

From the Abstract:
"Universal screening is the first mechanism by which students are identified as at risk of failure in the context of multitiered systems of supports. This study examined the validity and diagnostic accuracy of a reading computer-adaptive test as a screener to identify state achievement test performance for third through fifth graders (N = 1,696). Single time points and slopes within year and longitudinally were examined. Validity results for single points were moderate (0.60-0.79, p 0.002). Validity for slopes and the state test were weak or not significant. Diagnostic accuracy cut scores that maximized sensitivity and specificity yielded high accuracy for single points whereas sensitivity was inadequate for slopes. Practical implications and future directions are presented."

Methods

Keywords and Search Strings: The following keywords, subject headings, and search strings were used to search reference databases and other sources: Star, Renaissance, (Assessment? OR screener?), (Instruction OR instructional)

Databases and Resources: We searched ERIC for relevant resources. ERIC is a free online library of more than 1.6 million citations of education research sponsored by the Institute of Education Sciences (IES). Additionally, we searched Google Scholar and EBSCO databases (Academic Search Premier, Education Research Complete, and Professional Development Collection).

Reference Search and Selection Criteria

When we were searching and reviewing resources, we considered the following criteria:

Date of publications: This search and review included references and resources published in the last 10 years.

Search priorities of reference sources: Search priority was given to study reports, briefs, and other documents that are published and/or reviewed by IES and other federal or federally funded organizations, as well as academic databases, including ERIC, EBSCO databases, and Google Scholar.

Methodology: The following methodological priorities/considerations were given in the review and selection of the references:

Study types: randomized control trials, quasi experiments, surveys, descriptive data analyses, literature reviews, and policy briefs, generally in this order
Target population and samples: representativeness of the target population, sample size, and whether participants volunteered or were randomly selected
Study duration
Limitations and generalizability of the findings and conclusions

This memorandum is one in a series of quick-turnaround responses to specific questions posed by stakeholders in Alaska, Idaho, Montana, Oregon, and Washington, which is served by the Regional Educational Laboratory (REL) Northwest. It was prepared under Contract ED-IES-17-C-0009 by REL Northwest, administered by Education Northwest. The content does not necessarily reflect the views or policies of IES or the U.S. Department of Education, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

Star AssessmentsFebruary 2021