REL Blog | ASPIRE to Improve Your Agency’s School Performance Measures!

Home Blogs ASPIRE to Improve Your Agency’s School Performance Measures!

ASPIRE to Improve Your Agency’s School Performance Measures!

Mid-Atlantic | February 29, 2024

Measures of school performance—whether used for high-stakes accountability or for lower-stakes diagnostic purposes—have become increasingly complex and technical in the past three decades. State accountability systems that initially examined only student proficiency rates in reading and math now incorporate a wide range of additional measures, including achievement in other subjects, graduation rates, chronic absenteeism, school climate, postsecondary readiness and enrollment, and student achievement growth (or school value-added) alongside proficiency. Meanwhile, many local school districts and charter school authorizers have their own approaches to measuring school performance, which are often equally complex and multidimensional.

There are good reasons for the increasing complexity of school performance frameworks. Schools aim to promote student knowledge in more than just reading and math. Growth and value-added metrics provide data related to the school's contribution to student achievement, helping to level the playing field among schools serving widely varied student populations. Graduation and postsecondary enrollment are important signals of students' preparation for work and life. Students' and teachers' perceptions of school climate provide a valuable window into the learning environment. In short, all of these additions to accountability and performance management systems seek to make the systems fairer and more comprehensive.

Even so, the additional complexity of the systems increases the likelihood that they might produce unintended consequences that could be overlooked. When the District of Columbia's Public Charter School Board (PCSB) sought to revise its framework of performance measures used to evaluate DC's 135 public charter schools, PCSB recognized this risk and consulted REL Mid-Atlantic for analytic support.

Over the past three years, PCSB has been working, in collaboration with charter school leaders and other constituents, to develop its new Annual School Performance Index Report and Evaluation (ASPIRE) framework. REL staff reviewed literature on school performance measures, participated in meetings, and developed presentations to help ensure that the component measures included in the system would be valid, reliable, and robust.

Valid measures can support the inferences drawn from them: They actually measure what they claim to measure.
Reliable measures are stable: They do not bounce unpredictably due to random error.
Robust measures remain valid even after they become consequential: They are not easily manipulated by gaming strategies.

The REL's advice to PCSB was informed by years of work with state and local agencies under the aegis of its community of practice on accountability and school performance measures. That work led to the development of a comprehensive framework for understanding school performance, focused on student outcomes, school impacts on student outcomes, and processes inside schools.

As PCSB considered options for the ASPIRE framework, REL staff provided input on validity, reliability, and robustness.

For example, the REL encouraged PCSB to get an indication of the reliability of each proposed school performance measure by applying the measure to historical data and examining the correlation within schools from year to year. A low correlation—indicated by results that swing wildly up and down from year to year—is a warning of unreliability because a school's true performance is likely to change relatively slowly from one year to the next. This is a particular concern for growth measures, which are inherently less reliable than the underlying achievement measures on which they are based.

PCSB's historical analysis found that variations of its proposed composite summary measure and many of the individual measures demonstrated strong correlations and appropriate consistency from year to year. But this was not true for all measures. Some proposed growth measures showed low (sometimes even negative) year-to-year correlations, casting doubt on their reliability. PCSB, therefore, adopted the REL's suggestion to increase the stability of those measures by calculating them as at least two-year rolling averages rather than single-year measures.

The REL also helped PCSB avoid unintended consequences (including reduced validity and reliability) in creating an approach to incorporate results for different groups of students.

PCSB wants ASPIRE to promote equity by including subgroup-specific results in the ASPIRE framework, and by giving additional weight to disadvantaged groups: students with disabilities, English learners, and at-risk students (where risk is defined by poverty and other characteristics). Other subgroups are identified based on race and ethnicity. REL staff pointed out that using fixed weights—the same in every school—for racial/ethnic subgroups could implicitly produce very different weights for individual members of these subgroups. For example, in a school with twice as many Black students as White students, giving the two groups the same group-level weight would imply that every White student would count twice as much as every Black student—not what PCSB intended. PCSB changed its plans so that weights for racial/ethnic groups in each school will be directly proportional to the size of the groups in the measure, ensuring that each student counts equally.

Another example of the REL's support in stress-testing prospective measures relates to robustness. For assessing student growth, PCSB would like to allow schools to include within-year growth on nationally normed assessments, in addition to year-to-year growth on DC's state accountability tests. REL staff agreed that growth on nationally normed assessments could be usefully complementary but pointed out that any measure of fall-to-spring growth is especially susceptible to gaming if not monitored carefully: It is easier to game a growth measure by artificially depressing a baseline score (for example, by telling students not to care about the test) than by inflating an outcome score. Anecdotal reports suggest that artificially depressing baseline scores is a real phenomenon in places where teachers' evaluation results depend on growth on a self-selected assessment. In contrast, when growth is based on annual assessments, each spring's test is an outcome measure for one grade and a baseline for the next. The school has to care about both, so the incentive to bomb the baseline test is removed (or at least appropriately counterbalanced).

PCSB staff elected to use within-year growth because of the high student mobility rates in DC public schools: Requiring a baseline from the preceding school year would exclude substantial numbers of students who had transferred from schools that didn't use the same nationally normed assessment. Recognizing that bombing the baseline is a vulnerability of fall-to-spring growth measures, the REL suggested that PCSB should monitor fall scores used in growth measures to ensure they aren't substantially lower than those of the same students in the preceding spring.

These are just a few examples of how REL Mid-Atlantic staff worked with PCSB staff to ensure that the new ASPIRE framework of school performance measures is as fair and useful as possible. If your agency is interested in having a thought partner on school performance measures, feel free to get in touch and join our community of practice. REL staff would love to help.

Author(s)

Brian Gill
Director for REL Mid-Atlantic

Kirsten James
Public Charter School Board of DC

Connect with REL Mid-Atlantic