REL Midwest Ask A REL Response

Data Use

April 2018

Question:

What does the research say about how K-12 school districts best measure student academic growth in reading and mathematics?

Response:

Following an established Regional Educational Laboratory (REL) Midwest protocol, we conducted a search for research reports and guides on best practices to measure student academic growth in K-12 school districts. For details on the databases and sources, keywords, and selection criteria used to create this response, please see the Methods section at the end of this memo.

Below, we share a sampling of the publicly accessible resources on this topic. References are listed in alphabetical order, not necessarily in order of relevance. The search conducted is not comprehensive; other relevant references and resources may exist. We have not evaluated the quality of references and resources provided in this response, but offer this list to you for your information only.

Research References

Anderman, E. M., Gimbert, B., O’Connell, A. A., & Riegel, L. (2015). Approaches to academic growth assessment. British Journal of Educational Psychology, 85(2), 138–153. Retrieved from https://eric.ed.gov/?id=EJ1061380

From the ERIC abstract: “Background: There is much interest in assessing growth in student learning. Assessments of growth have important implications and affect many policy decisions at many levels. Aims: In the present article, we review some of the different approaches to measuring growth and examine the implications of their usage. Sample: Samples used in research on growth models typically include students enrolled in public schools that primarily serve kindergarten through the 12th grade. Method: Definitions of growth and gain are reviewed, and five types of growth models are examined: (1) Student Gain Score Model, (2) The Covariate Adjustment Model, (3) The Student Percentile Gain Model—referred to as single-wave value-added models, (4) Univariate Value-Added Response Models, and (5) Multivariate Value-Added Response Models. Results: Modelling approaches are vastly different, whereas Student Gain Models are mathematically and conceptually simple, Multivariate Models are highly complex. Conclusion: Educators assessing growth must make critical decisions about measurement. The type of instrument that is selected and the type of analytic techniques selected are of great importance. Growth must be considered from technical, pedagogical, and policy perspectives.”

Note: REL Midwest was unable to locate a link to the full-text version of this resource. Although REL Midwest tries to provide publicly available resources whenever possible, it was determined that this resource may be of interest to you. It may be found through university or public library systems.

Castellano, K. E., & Ho, A. D. (2013). A practitioner’s guide to growth models. Washington, DC: Council of Chief State School Officers. Retrieved from https://eric.ed.gov/?id=ED551292

From the ERIC abstract: “This Practitioner’s Guide to Growth Models, commissioned by the Technical Issues in Large-Scale Assessment (TILSA) and Accountability Systems & Reporting (ASR), collaboratives of the Council of Chief State School Officers, describes different ways to calculate student academic growth and to make judgments about the adequacy of that growth. It helps to clarify the questions that each model answers best, as well as the limitations of each model. This document is intended to support states as they address the challenges of evolving assessment and accountability systems. This guide does not promote one type of interpretation over another. Rather, it describes growth models in terms of the interpretations they best support and, in turn, the questions they are best designed to answer. The goal of this guide is thus to increase alignment between user interpretations and model function in order for models to best serve their desired purposes: increasing student achievement, decreasing achievement gaps, and improving the effectiveness of educators and schools. The report is divided into two parts. Part I, A Framework for Operational Growth Models, includes seven models: (1) Growth and Growth Models; (2) Growth: Beyond Status; (3) Different Ways to Slice the Data: Status, Improvement, and Growth; (4) What is a Growth Model?; (5) Growth Models of Interest; (6) Critical Questions for Describing Growth Models; and (7) Alternative Growth Model Classification Schemes. Part II, The Growth Models, includes seven chapters: 1: The Gain Score Model; (2) The Trajectory Model; (3) The Categorical Model; (4) The Residual Gain Model; (5) The Projection Model; (6) The Student Growth Percentile Model; and (7) The Multivariate Model. An appendix explains cross-referencing growth model terms.”

Dahlin, M., Xiang, Y., Durant, S., & Cronin, J. (2010). State standards and student growth: Why state standards don’t matter as much as we thought. Portland, OR: Northwest Evaluation Association. Retrieved from https://eric.ed.gov/?id=ED521964

From the ERIC abstract: “The goal of the No Child Left Behind Act (NCLB) was to ensure that states set educational standards in core academic subjects, and to hold schools accountable for ensuring that all students meet these standards. Given the inflexibility of NCLB’s AYP (adequate yearly progress) targets and the federal line drawn in the sand for 2014, critics have expressed concern that states, particularly those with high academic proficiency standards, must eventually choose between easing their standards to the point that even the lowest performing students can meet them, or face increasing federal sanctions of schools, including loss of funds, forced reallocations of students within districts, and eventually, school closures. Other criticisms have focused on NCLB’s use of proficiency rates as the school performance metric, since it only holds schools accountable for the performance of students that are below their state proficiency standards. Students whose performance exceeds their state proficiency standards exert no influence on school outcomes, and NCLB does not require schools to ensure that students performing above state proficiency standards make any kind of progress at all (Loveless, Farkas, and Duffett, 2008). Implicit in these trends are two assumptions. The first assumption is that lowering the proficiency cut scores negatively impacts student performance and growth. The second is that the implied focus of the current accountability system on nearly- or non-proficient (i.e., ‘bubble’) students has negative consequences for higher performing, already-proficient ones. The Kingsbury Center at NWEA (Northwest Evaluation Association) is home to one of the nation’s largest repositories of information about student academic growth, so they examined these two assumptions using growth data collected from hundreds of thousands of students across the country, a small sample from the millions of student records hosted within the Kingsbury Center’s Growth Research Database. Specifically, the authors examined two questions: (1) After accounting for differences attributable to poverty, race, gender, amount of instruction received, and pertinent level school factors, does the difficulty of a state’s proficiency standards bear any relationship to student academic growth?; and (2) Do students that are above their state’s proficiency standard demonstrate less growth, relative to their peers, than do students performing below the level of their state proficiency standards? They investigated these questions separately for four samples of roughly 100,000 students, one sample each for third and eighth grade students, and for reading and mathematics. Across all four samples, the authors found that a student’s status relative to his or her state proficiency bar had an effect on growth, and that students below the proficiency bar showed greater growth than those above it. This tends to validate concerns that NCLB may be focusing the energy of educators on ‘bubble’ students, or students below the state proficiency cut score who might help the school meet its Adequate Yearly Progress requirement if they were to become proficient during the school year. However, the prevailing wisdom that lower proficiency standards lead to poorer student outcomes was observed in only one of four conditions. In the case of third grade mathematics, lower state standards did indeed predict modestly poorer growth. However, this relationship did not persist into middle school, and it was not seen at all in reading. In other words, in three of the four cases examined, student growth bore no relationship to whether states set their academic proficiency standards high or low.”

Gill, B., English, B., Furgeson, J., & McCullough, M. (2014). Alternative student growth measures for teacher evaluation: Profiles of early-adopting districts. (REL 2014-016). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Mid-Atlantic. Retrieved from https://eric.ed.gov/?id=ED544796

From the ERIC abstract: “States and districts are beginning to use student achievement growth—as measured by state assessments (often using statistical techniques known as value-added models or student growth models)—as part of their teacher evaluation systems. But this approach has limited application in most states, because their assessments are typically administered only in grades 3-8 and only in math and reading. In response, some districts have turned to alternative measures of student growth. These alternative measures include alternative assessment-based value-added models (VAMs) that use the results of end-of-course assessments or commercially available tests in statistical models, and student learning objectives (SLOs), which are determined by individual teachers, approved by principals, and used in evaluations that do not involve sophisticated statistical modeling. For this report, administrators in eight districts that were early adopters of alternative measures of student growth were interviewed about how they used these measures to evaluate teacher performance. Key findings from the study are: (1) Districts using SLOs chose them as a teacher-guided method of assessing student growth, while those using alternative assessment-based VAMs chose to take advantage of existing assessments; (2) SLOs can be used for teacher evaluation in any grade or subject, but require substantial effort by teachers and principals, and ensuring consistency is challenging; (3) In the four SLO districts, SLOs are required of all teachers across grades K-12, regardless of whether the teachers serve grades or subjects that include district-wide standardized tests; and (4) Alternative student assessments used by VAM districts differ by developer, alignment with specific courses, and coverage of grades and subjects. VAMs applied to end-of-course and commercial assessments create consistent district-wide measures but generally require technical support from an outside provider.”

Goldhaber, D., & Theobald, R. (2012). Do different value-added models tell us the same things? Stanford, CA: Carnegie Foundation for the Advancement of Teaching. Retrieved from https://eric.ed.gov/?id=ED537431

From the ERIC abstract: “There are good reasons for re-thinking teacher evaluation. Evaluation systems in most school districts appear to be far from rigorous. A recent study showed that more than 99 percent of teachers in a number of districts were rated ‘satisfactory,’ which does not comport with empirical evidence that teachers differ substantially from each other in terms of their effectiveness. Likewise, the ratings do not reflect the assessment of the teacher workforce by administrators, other teachers, or students. Evaluation systems that fail to recognize the true differences that are known to exist among teachers greatly hamper the ability of school leaders and policymakers to make informed decisions about such matters as which teachers to hire, what teachers to help, which teachers to promote, and which teachers to dismiss. Thus it is encouraging that policymakers are developing more rigorous evaluation systems, many of which are partly based on student test scores. Yet while the idea of using student test scores for teacher evaluations may be conceptually appealing, there is no universally accepted methodology for translating student growth into a measure of teacher performance. In this brief, the authors review what is known about how measures that use student growth align with one another, and what that agreement or disagreement might mean for policy.”

Goldschmidt, P., Choi, K., & Beaudoin, J. P. (2012). Growth model comparison study: Practical implications of alternative models for evaluating school performance. Washington, DC: Council of Chief State School Officers. Retrieved from https://eric.ed.gov/?id=ED542761

From the ERIC abstract: “The Elementary and Secondary Education Act (ESEA) has had several tangible effects on education and the monitoring of education. There have been both intended and unintended consequences. ESEA’s newer generation of federal programs, such as Race to the Top, and the recent ESEA flexibility guidelines, have continued to push development of methods to accurately and fairly monitor school (and more recently teacher) performance. The purpose of this study is to compare several different growth models and examine empirical characteristics of each. This study differs from previous research comparing various models for accountability purposes in that the focus is broader—it is based on large scale assessment results from four states (Delaware, Hawaii, North Carolina, and Wisconsin) across two cohorts of students (each with three consecutive years of assessment results), and explicitly considers model results with respect to elementary and middle schools. This study addresses the following research questions regarding the performance of the different growth models: (1) Overall, does the model matter?; (2) Do different models lead to different inferences about schools?; (3) How accurately do models classify schools into performance categories?; (4) Are models consistent in classifying schools from one year to the next?; (5) How are models influenced by school intake characteristics (percent ELL, FRL, etc.)?; (6) Do models perform similarly for elementary and middle schools?; and (7) Do models behave similarly across states? The results of these analyses confirm that no single model can unequivocally be assumed to provide the best results. This is not possible for two reasons: one, different models address different questions about schools; and two, the empirical results indicate that context matters when examining models. By context the authors mean that the state in which the model will be run affects how the model may work. State affects include several pieces that are confounded. These include tests scales, testing procedures, student characteristics, and school characteristics. An accountability model should not be unduly influenced by factors outside of schools’ control and models clearly differ in this respect. Distinguishing between a school’s ability to facilitate learning and a school’s performance as a function of advantageous (or challenging) student enrollment characteristics is where statistical machinery provides its biggest benefit. Appended are: (1) Data Quality; (2) Data Elements for Each State; and (3) Detailed Results of Overall Model Impact.”

Hoffer, T. B., Hedberg, E. C., Brown, K. L., Halverson, M. L., Reid-Brossard, P., Ho, A. D., & Furgol, K. (2011). Final report on the evaluation of the Growth Model Pilot Project. Washington, DC: U.S. Department of Education, Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service. Retrieved from https://eric.ed.gov/?id=ED515310

From the ERIC abstract: “The U.S. Department of Education (ED) initiated the Growth Model Pilot Project (GMPP) in November 2005 with the goal of approving up to ten states to incorporate growth models in school adequate yearly progress (AYP) determinations under the Elementary and Secondary Education Act (ESEA). After extensive reviews, nine states were fully approved for the initial phase of the pilot project by the 2007-08 school year: Alaska, Arizona, Arkansas, Delaware, Florida, Iowa, North Carolina, Ohio, and Tennessee. Based on analyses of data provided by the U.S. Department of Education and by the pilot grantee states, this report describes the progress these states made in implementing the GMPP in the 2007-08 school year. The growth models implemented under the GMPP were all designed to augment rather than replace the standard status model and safe-harbor provisions for determining school AYP. The growth models resulted in more schools making AYP than would have been the case using only status and safe-harbor. The results of this analysis showed that schools serving economically disadvantaged student populations in all pilot states except for Arkansas were more likely than more-advantaged schools to make AYP by growth. The results of this analysis show that use of growth models generally added to the number of schools making AYP but that the numbers were not large in almost all pilot states. This study has also shown that the types of growth models states select for federal accountability purposes are consequential and raise some potentially difficult theoretical questions for policymakers. Appendices include: (1) Comparison of GMPP Growth Models with State Accountability Systems; (2) State GMPP Model Summaries; (3) Supplemental Exhibits; and (4) Derivation of the Generic Projection Model Rule for Identifying On-Track Students.”

Murphy, D. L., & Gaertner, M. N. (2014). Evaluating the predictive value of growth prediction models. Educational Measurement: Issues and Practice, 33(2), 5–13. Retrieved from https://eric.ed.gov/?id=EJ1031353

From the ERIC abstract: “This study evaluates four growth prediction models—projection, student growth percentile, trajectory, and transition table—commonly used to forecast (and give schools credit for) middle school students’ future proficiency. Analyses focused on vertically scaled summative mathematics assessments, and two performance standards conditions (high rigor and low rigor) were examined. Results suggest that, when ‘status plus growth’ is the accountability metric a state uses to reward or sanction schools, growth prediction models offer value above and beyond status-only accountability systems in most, but not all, circumstances. Predictive growth models offer little value beyond status-only systems if the future target proficiency cut score is rigorous. Conversely, certain models (e.g., projection) provide substantial additional value when the future target cut score is relatively low. In general, growth prediction models’ predictive value is limited by a lack of power to detect students who are truly on-track. Limitations and policy implications are discussed, including the utility of growth projection models in assessment and accountability systems organized around ambitious college-readiness goals.”

Shanley, L. (2016). Evaluating longitudinal mathematics achievement growth: Modeling and measurement considerations for assessing academic progress. Educational Researcher, 45(6), 347–357. Retrieved from https://eric.ed.gov/?id=EJ1112165

From the ERIC abstract: “Accurately measuring and modeling academic achievement growth is critical to support educational policy and practice. Using a nationally representative longitudinal data set, this study compared various models of mathematics achievement growth on the basis of both practical utility and optimal statistical fit and explored relationships within and between early and later mathematics growth parameters. Common patterns included a summer lag in achievement between kindergarten and Grade 1 and an association between achievement at kindergarten entry and later achievement. Notably, there were no statistically significant relationships between early and later rates of growth, and there was minimal variability in achievement growth in the late elementary and middle school grades. Challenges related to assessing academic achievement in the middle grades and modeling academic skill development are discussed.”

Shneyderman, A., & Froman, T. (2015). Using student growth to evaluate teachers: A comparison of three methods. Research brief. Volume 1502. Miami, FL: Research Services, Miami-Dade County Public Schools. Retrieved from https://eric.ed.gov/?id=ED570129

From the ERIC abstract: “In accordance with the federal No Child Left Behind (NCLB) law of 2001, 100% of students were expected to become proficient on state assessments of reading and mathematics by the end of 2013-2014 academic year. Schools that consistently failed to meet the NCLB’s Adequate Yearly Progress requirements were subject to penalties. In 2011, the U.S. Department of Education invited each State educational agency (SEA) to request flexibility regarding specific requirements of the NCLB in exchange for ‘rigorous and comprehensive State-developed plans designed to improve educational outcomes for all students, close achievement gaps, increase equity, and improve the quality of instruction.’ In order to receive flexibility from the NCLB Adequate Yearly Progress requirements, states had to develop and implement ‘high-quality teacher and leader evaluation and support systems that are based on multiple measures, including student growth as a significant factor and other measures of professional practice.’ At the time of the publication of this brief, most states received flexibility waivers. Currently, these states are at different stages in the process of implementing their teacher evaluation and support systems. Many of them use Value-Added Models (VAM) similar to those used in Florida, while others use the Student Growth Percentile (SGP) approach. In this brief the authors compare three methods of teacher evaluation: (1) the State system employing value-added models; (2) a district-level procedure using single-level regression; and (3) a common alternative approach utilizing student growth percentiles. All three methods start by constructing predictions of student test performance based on prior achievement data and student characteristics. At this basic building block level, all three approaches produce virtually identical results. As the methodologies diverge in techniques of aggregation, teacher-level and school-level summary indices begin to separate, but remain remarkably comparable.”

Warkentien, S., & Silver, D. (2016). Alternative methods for estimating achievement trends and school effects: When is simple good enough? Paper presented at the Society for Research on Educational Effectiveness Conference, Washington, DC. Retrieved from https://eric.ed.gov/?id=ED567595

From the ERIC abstract: “Public schools with impressive records of serving lower-performing students are often overlooked because their average test scores, even when students are growing quickly, are lower than scores in schools that serve higher-performing students. Schools may appear to be doing poorly either because baseline achievement is not easily accounted for or because changing demographic trends result in successive cohorts of students that are not comparable. These situations are common and problematic for practitioners, policymakers, and researchers who are increasingly tasked with identifying, replicating and communicating effective educational practice. The purpose of this study is to (1) explore alternatives to value-added models that are simple to calculate and easy to interpret; and (2) describe the conditions that must be met for such alternatives to provide results that are similar to those found with value-added methods. The authors focus on three alternative methods that have been the subject of prior reports (see Glazerman & Potamites 2011; Castellano & Ho 2013): average gain scores, calculated as the difference between a school’s average score in a given grade in one year and that school’s average score in the previous grade the previous year; average cohort differences, calculated as the difference between a school’s average score in a given grade in one year and that school’s average score in the ‘same’ grade the previous year; and residual gain scores, calculated as the difference between the observed and expected average score in a given grade in one year given the previous year’s average score. The study asks the following questions: (1) How similar are results obtained from average gain, cohort difference, and residual gain methods to results obtained using value-added methods? and (2) Under what conditions can results from average gain, cohort difference, or residual gain scores provide an acceptable substitute for value-added results? The authors focus on the relationship between student background characteristics and growth and student attrition (or student composition changes) across years. The first question is addressed through the use of empirical school district data and the second through simulation studies. The goal of this analysis is to provide practical evidence for districts and states that can inform analyses of achievement growth trends.”

Xiang, Y., & Hauser, C. (2010). School conditional growth model: How to make an “apples to apples” comparison possible? Portland, OR: Northwest Evaluation Association. Retrieved from https://eric.ed.gov/?id=ED521961

From the ERIC abstract: “The purpose of this paper is to offer an analytic perspective to policy makers and educational practitioners regarding how to use longitudinal achievement data to evaluate schools. The authors further discuss the potential practical applications of their models for superintendents, researchers, and policy makers. The premise of the study is that the complexity of the school context can be leveraged within longitudinal growth models to account for more variance than the unconditional counterparts of these models. The following research questions were considered: (1) Do school growth rates differ when school characteristics are taken into account in growth modeling?; (2) Do school growth rates differ when school initial status is taken into account in growth modeling?; and (3) How are schools evaluated differently based on the application of an unconditional model and a conditional model? When researchers and policy makers start to recommend the two-dimension matrix of initial score by rate of change and begin to evaluate schools based on it, this study suggests a three-dimension perspective that considers school contextual characteristics based on their initial status and rate of change. The study is also a demonstration of how schools can be evaluated under the context of a school accountability system in one state. This study can be used to provide a reference of school performance when an amount of growth for a particular school is compared to similar schools in a larger context. School districts, states, or educational funding organizations can use the conditional growth rates to evaluate schools based on the three-dimension matrix shown in this report. Appended are the following tables: (1) schools have different initial status; (2) schools have different percentages of FRL students; (3) schools have different percentages of Minority students; and (4) schools have different teacher student ratios.”

Methods

Keywords and Search Strings

The following keywords and search strings were used to search the reference databases and other sources:

Academic growth academic proficiency
Measuring academic growth
“Northwest Evaluation Association” model
descriptor:“achievement gains” descriptor:“school districts”

Databases and Search Engines

We searched ERIC for relevant resources. ERIC is a free online library of more than 1.6 million citations of education research sponsored by the Institute of Education Sciences (IES). Additionally, we searched IES and Google Scholar.

Reference Search and Selection Criteria

When we were searching and reviewing resources, we considered the following criteria:

Date of the publication: References and resources published over the last 15 years, from 2002 to present, were include in the search and review.
Search priorities of reference sources: Search priority is given to study reports, briefs, and other documents that are published or reviewed by IES and other federal or federally funded organizations.
Methodology: We used the following methodological priorities/considerations in the review and selection of the references: (a) study types—randomized control trials, quasi-experiments, surveys, descriptive data analyses, literature reviews, policy briefs, and so forth, generally in this order, (b) target population, samples (e.g., representativeness of the target population, sample size, volunteered or randomly selected), study duration, and so forth, and (c) limitations, generalizability of the findings and conclusions, and so forth.

This memorandum is one in a series of quick-turnaround responses to specific questions posed by educational stakeholders in the Midwest Region (Illinois, Indiana, Iowa, Michigan, Minnesota, Ohio, Wisconsin), which is served by the Regional Educational Laboratory (REL Region) at American Institutes for Research. This memorandum was prepared by REL Midwest under a contract with the U.S. Department of Education’s Institute of Education Sciences (IES), Contract ED-IES-17-C-0007, administered by American Institutes for Research. Its content does not necessarily reflect the views or policies of IES or the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

REL Midwest Ask A REL Response Data Use April 2018

Question:

What does the research say about how K-12 school districts best measure student academic growth in reading and mathematics?

Response:

Research References

Methods

REL Midwest Ask A REL Response

Data Use

April 2018