
The Impacts of a Standards-Based Grading System Emphasizing Formative Assessment, Feedback, and Re-Assessment: A Mixed Methods, Cluster Randomized Control Trial in Ninth Grade Mathematics Classrooms
Steven L. Kramer; Michael A. Posner; Alexander S. Browman; Nancy R. Lawrence; Jennifer Roem; Kathleen Krier (2025). Journal of Research on Educational Effectiveness, v18 n1 p56-87. Retrieved from: https://eric.ed.gov/?id=EJ1492695
-
examining2,736Students, grades8-9
Single Study Review
Review Details
Reviewed: April 2026
- Single Study Review (findings for Proficiency-based Assessment and Reassessment of Learning Outcomes (PARLO))
- Randomized Controlled Trial
- Meets WWC standards with reservations because it is a cluster randomized controlled trial with high cluster-level attrition, but the analytic intervention and comparison groups satisfy the baseline equivalence requirement.
This review may not reflect the full body of research evidence for this intervention.
Evidence Tier rating based solely on this study. This intervention may achieve a higher tier when combined with the full body of evidence.
Findings
|
Outcome measure |
Comparison | Period | Sample |
Intervention mean |
Comparison mean |
Significant? |
Improvement index |
Evidence tier |
|---|---|---|---|---|---|---|---|---|
|
Researcher-developed geometry or algebra end-of-course assessment |
Proficiency-based Assessment and Reassessment of Learning Outcomes (PARLO) vs. Business as usual |
0 Days |
Full sample;
|
0.33 |
0.00 |
Yes |
|
|
Evidence Tier rating based solely on this study. This intervention may achieve a higher tier when combined with the full body of evidence.
Sample Characteristics
Characteristics of study sample as reported by study author.
-
Female: 55%
Male: 45% -
Rural, Suburban, Urban
-
Race Asian 5% Black 30% Other or unknown 15% White 51% -
Ethnicity Hispanic 8% Not Hispanic or Latino 92% -
Eligible for Free and Reduced Price Lunch Other or unknown 100%
Study Details
Setting
This study was conducted in 29 schools across urban, suburban, and rural areas, including public, charter, and Catholic schools from high- and low-performing districts. Two cohorts participated: 20 schools participated in 2010-11 and 2011-12, and 15 schools participated in 2011-12 and 2012-13. The study focused primarily on 9th-grade Algebra and Geometry classrooms, with all teachers in these subjects asked to participate for 2 years.
Study sample
The study randomly assigned 18 schools to the intervention group and 17 to the comparison group. 14 intervention schools and 15 comparison schools provided outcome data for the study. The sample of students included in the analysis consisted of 1,649 intervention students and 1,087 comparison students. The analytic sample of teachers included 33 intervention teachers and 25 comparison group teachers. More than two-thirds (68 percent) of the teachers were female. Teachers were 88 percent White, 7 percent Black, 5 percent Asian, and 2 percent multiracial or of another unspecified race.
Intervention Group
Teachers in the intervention condition used Proficiency-based Assessment and Reassessment of Learning Outcomes (PARLO), a standards-based grading system designed to help students learn course content by organizing grading around clearly defined learning outcomes, formative feedback, and opportunities to show improved mastery over time. PARLO aims to shift grading away from averaging early and later performance and toward grading based on the best evidence of what students had learned by the end of the term. In the study, PARLO teachers organized each semester around about 10 to 15 learning outcomes and shared these outcomes and success criteria with students and families. Teachers used quizzes, exit tickets, observations, and other classroom evidence to judge student progress on each learning outcome and to give students feedback. Students could complete additional learning activities, such as error logs, remediation plans, or in-class catch-up opportunities, and then reassess for full credit. Final grades were based on the number of learning outcomes on which students demonstrated proficient or high-performance work, rather than on a weighted average of points earned across the semester. Teachers participated for two years and developed their own learning outcomes, grading procedures, and classroom routines within the PARLO approach. Students were taught by teachers using PARLO during eighth or ninth grade.
Comparison Group
Teachers in the comparison condition received training on organizing instruction by learning outcomes and providing formative assessment and feedback, which the authors consider necessary but not sufficient components of the PARLO system. However, teachers in the comparison condition did not receive any additional training or implement other key PARLO system features, such as reassessment policies or mastery-based grading.
Support for implementation
Teachers received a stipend for participation. All participating teachers received professional development on organizing instruction by learning outcomes and providing formative feedback during the summer prior to implementation. Teachers in Cohort 1 received 3 days of this professional development, and teachers in Cohort 2 received 2 days. Teachers in the intervention condition also received additional professional development on the other components of PARLO. Cohort 1 teachers in the intervention condition received an additional 3 days of professional development to support PARLO implementation prior to the first year of implementation and an additional 2 days prior to the second year of implementation. Cohort 2 teachers in the intervention condition received an additional 3 days of professional development prior to the first year of implementation and an additional 2 days prior to the second year of implementation. All teachers in the intervention condition also had the opportunity to participate in monthly professional learning community (PLC) meetings during the intervention.
An indicator of the effect of the intervention, the improvement index can be interpreted as the expected change in percentile rank for an average comparison group student if that student had received the intervention.
For more, please see the WWC Glossary entry for improvement index.
An outcome is the knowledge, skills, and attitudes that are attained as a result of an activity. An outcome measures is an instrument, device, or method that provides data on the outcome.
A finding that is included in the effectiveness rating. Excluded findings may include subgroups and subscales.
The sample on which the analysis was conducted.
The group to which the intervention group is compared, which may include a different intervention, business as usual, or no services.
The timing of the post-intervention outcome measure.
The number of students included in the analysis.
The mean score of students in the intervention group.
The mean score of students in the comparison group.
The WWC considers a finding to be statistically significant if the likelihood that the finding is due to chance alone, rather than a real difference, is less than five percent.
The WWC reviews studies for WWC products, Department of Education grant competitions, and IES performance measures.
The name and version of the document used to guide the review of the study.
The version of the WWC design standards used to guide the review of the study.
The result of the WWC assessment of the study. The rating is based on the strength of evidence of the effectiveness of the intervention. Studies are given a rating of Meets WWC Design Standards without Reservations, Meets WWC Design Standards with Reservations, or >Does Not Meet WWC Design Standards.
A related publication that was reviewed alongside the main study of interest.
Study findings for this report.
Based on the direction, magnitude, and statistical significance of the findings within a domain, the WWC characterizes the findings from a study as one of the following: statistically significant positive effects, substantively important positive effects, indeterminate effects, substantively important negative effects, and statistically significant negative effects. For more, please see the WWC Handbook.
The WWC may review studies for multiple purposes, including different reports and re-reviews using updated standards. Each WWC review of this study is listed in the dropdown. Details on any review may be accessed by making a selection from the drop down list.
Tier 1 Strong indicates strong evidence of effectiveness,
Tier 2 Moderate indicates moderate evidence of effectiveness, and
Tier 3 Promising indicates promising evidence of effectiveness,
as defined in the
non-regulatory guidance for ESSA
and the regulations for ED discretionary grants (EDGAR Part 77).