Skip Navigation
archived information
Skip Navigation

Back to Ask A REL Archived Responses

REL Midwest Ask A REL Response

Educator Effectiveness

August 2017


What does the research say about the relationship between teacher accountability ratings and student outcomes (e.g., attendance, academics, social emotional learning)?


Following an established Regional Educational Laboratory (REL) Midwest protocol, we conducted a search for research reports and descriptive studies on the relationship between teacher accountability ratings and student outcomes, including academic and nonacademic student outcomes. For details on the databases and sources, key words and selection criteria used to create this response, please see the methods section at the end of this memo.

Below, we share a sampling of the publicly accessible resources on this topic. The search conducted is not comprehensive; other relevant references and resources may exist. We have not evaluated the quality of references and resources provided in this response, but offer this list to you for your information only.

Research References

Bacher-Hicks, A., Chin, M., Kane, T. J., Staiger, D. O. (2015). Validating components of teacher effectiveness: A random assignment study of value-added, observation, and survey scores. Paper presented at the Society for Research on Educational Effectiveness Conference, Washington, DC. Retrieved from

From the ERIC abstract: “Policy changes from the past decade have resulted in a growing interest in identifying effective teachers and their characteristics. This study is the third study to use data from a randomized experiment to test the validity of measures of teacher effectiveness. The authors collected effectiveness measures across three school years from three broad areas: value-added, classroom observation, and student surveys. In the first two years, they collected non-experimental estimates of these measures and, in the third year, they designed a randomized experiment to test the validity of these estimates. Using these data, they answer two questions: (1) Does a combination of these three distinct non-experimental measures identify teachers who, on average, produce higher student achievement gains following random assignment?; and (2) Does the magnitude of the gains correspond with what we would have predicted based on their non-experimental estimates of effectiveness? The analysis sample consisted of 66 fourth- and fifth-grade teachers from four large East coast school districts in the 2010-2011 through 2012-2013 school years. To answer the research questions, the authors first constructed the best linear combination of non-experimental student test score, survey, and classroom observation data from the first two years of the study (2010-11 and 2011-12) to predict teachers’ average contribution to student growth on state standardized math tests another year. They used these predicted outcomes as their non-experimental estimates of teachers’ contributions to student test score growth in 2012-13. Then, they examined actual student growth in 2012-13 (the third year of the study in which re randomly assigned students to teachers) and compared their non-experimental prediction of growth to actual growth.”

Daley, G., & Kim, L. (2010). A teacher evaluation system that works. Santa Monica, CA: National Institute for Excellence in Teaching. Retrieved from

From the ERIC abstract: “Status quo approaches to teacher evaluation have recently come under increasing criticism. They typically assign most teachers the highest available score, provide minimal feedback for improvement, and have little connection with student achievement growth and the quality of instruction that leads to higher student growth. A more comprehensive approach has been demonstrated for ten years by TAP[TM]: The System for Teacher and Student Advancement. This system includes both classroom observations and student achievement growth measures, provides feedback to teachers for improvement, is aligned to professional development and mentoring support, and provides metrics for performance-based compensation. This paper describes the TAP system, and examines data from a large sample of teachers to assess the distribution of TAP evaluations and their alignment to student achievement growth. We find that TAP evaluations provide differentiated feedback, that classroom observational scores are positively and significantly correlated with student achievement growth, that TAP teachers increase in observed skill levels over time, and that TAP schools show differential retention of effective teachers based on these evaluation scores.”

Forman, K., & Markson, C. (2015). Is “effective” the new “ineffective”? A crisis with the New York State teacher evaluation system. Journal for Leadership and Instruction, 14(2), 5–11. Retrieved from

From the ERIC abstract: “The purpose of this study was to examine the relationship among New York State’s APPR teacher evaluation system, poverty, attendance rates, per pupil spending, and academic achievement. The data from this study included reports on 110 school districts, over 30,000 educators and over 60,000 students from Nassau and Suffolk counties posted on the New York State Education Department’s Data website. The results of this study showed that poverty had a strong negative correlation with performance on the New York State English Language Arts (ELA) and Mathematics assessments among students in grades 3-8. As poverty went up, performance on the State assessments went down. Poverty accounted for over 60 percent of the variance on student performance on both State assessments. The school districts’ APPR teacher evaluation ratings had weak to conflicting correlations with student achievement. The school districts’ percent of teachers rated ‘highly effective’ had a positive correlation with student achievement. However, the strength of the relationship was weak, accounting for only 12.53 and 10.76 percent of the variance on student success on the English Language Arts and Mathematics examinations respectively. The school districts’ percent of teachers rated ‘effective’ had a negative correlation with student achievement. As the percent of teachers rated ‘effective’ went up, student performance on the State assessments went down. The implications of this study suggested that legislators, State education departments, and school districts would better serve students by allocating recourses toward programs that alleviate the detrimental effects that poverty has on academic achievement.”

Gallagher, H. A. (2002). The relationship between measures of teacher quality and student achievement: The case of Vaughn Elementary. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Retrieved from

From the ERIC abstract: “This paper reports on a study of the relationship between teacher evaluation scores in a school implementing knowledge- and skills-based pay and classroom student achievement. The study occurred in a California charter elementary school that was 100 percent Title I, 100 percent free/reduced lunch, and had predominantly limited English speaking students. The school had historically low achievement, and for 4 years, it had been implementing a performance evaluation and pay plan under which teachers were evaluated, rated, and paid accordingly. For the study, data were collected on 34 teachers and all of their students for whom 2 years of achievement data were available. Researchers estimated classroom effects, analyzed their relationship to teacher evaluation scores, and examined teacher evaluation scores as level 2 explanatory variables in hierarchical linear models of student achievement. Results indicated that there was a clear difference in the strength of the relationship between teacher evaluation scores and classroom achievement in reading compared with mathematics or language arts.”

Gallant, D. J. (2013). Using first-grade teacher ratings to predict third-grade English language arts and mathematics achievement on a high-stakes statewide assessment. International Electronic Journal of Elementary Education, 5(2), 125–142. Retrieved from

From the ERIC abstract: “Early childhood professional organizations support teachers as the best assessors of students’ academic, social, emotional, and physical development. This study investigates the predictive nature of teacher ratings of first-grade students’ performance on a standards-based curriculum-embedded performance assessment within the context of a state accountability system. The sample includes 4292 elementary school students cross-classified by 131 first-grade and 137 third-grade schools attended. This study uses extant statewide assessment data for students located in a state in the southeastern part of the United States. Controlling for student and school demographic variables in cross-classified random effects multilevel models, first-grade teacher ratings—as reflected by domain scores on a performance assessment—are found to positively and significantly correlate with students’ third-grade academic achievement.”

Jiang, J. Y., Sartain, L., Sporte, S. E., & Steinberg, M. P. (2014). The impact of teacher evaluation reform on student learning: Success and challenges in replicating experimental findings with non-experimental data. Paper presented at the Society for Research on Educational Effectiveness, Washington, DC. Retrieved from

From the ERIC abstract: “One of the most persistent and urgent problems facing education policymakers is the provision of highly effective teachers in all of the nation’s classrooms. Of all school-level factors related to student learning and achievement, the student’s teacher is consistently the most important (Goldhaber 2002; Rockoff 2004; Rivkin, Hanushek, and Kain 2005). Even with substantial within-school variation in teacher effectiveness (Rivkin, Hanushek, and Kain 2005; Aaronson, Barrow, and Sander 2007), historically teacher evaluation systems have inadequately differentiated teachers who effectively improve student learning from lower-performing teachers. In Chicago from 2003 to 2006, for example, nearly all teachers (93 percent) received performance evaluation ratings of ‘Superior’ or ‘Excellent’ (based on a four-tiered rating system) while at the same time 66 percent of CPS schools failed to meet state proficiency standards under Illinois’ accountability system (The New Teacher Project 2007). This study seeks to answer the following research questions about two waves of teacher evaluation reform in Chicago, a pilot (Excellence in Teaching Pilot or EITP) focused on rigorous classroom observations (2008-10) and a fully implemented evaluation system (REACH) that incorporates information from classroom observations and student assessment (2012-13 to present): (1) What does experimental evidence say about the effect teacher evaluation can have on school-level performance in mathematics and reading in elementary schools? and What does experimental evidence say about how teacher evaluation can differentially impact schools with different characteristics (for example, are there greater impacts in lower- or higher-achieving schools)? Findings from the first wave showed: (1) at the end of the first year of implementing EITP, schools improved student achievement in reading; and (2) more advantaged schools (i.e., schools that were high achieving prior to implementation, schools with lower rates of student poverty) tended to benefit the most from EITP. This finding suggests that an intervention such as teacher evaluation requires high levels of capacity in the school building in order to affect student learning.”

Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Seattle, WA: Bill & Melinda Gates Foundation. Retrieved from

From the executive summary: “The MET project is working with nearly 3,000 teacher-volunteers in public schools across the country to improve teacher evaluation and feedback. MET project researchers are investigating a number of alternative approaches to identifying effective teaching: systematic classroom observations; surveys collecting confidential student feedback; a new assessment of teachers’ pedagogical content knowledge; and different measures of student achievement. …[Key findings include:] Combining observation scores, student feedback, and student achievement gains was better than graduate degrees or years of teaching experience at predicting a teacher’s student achievement gains with another group of students on the state tests. Whether or not teachers had a master’s degree or many years of experience was not nearly as powerful a predictor of a teacher’s student achievement gains on state tests as was a combination of multiple observations, student feedback, and evidence of achievement gains with a different group of students. Combining observation scores, student feedback, and student achievement gains on state tests also was better than graduate degrees or years of teaching experience in identifying teachers whose students performed well on other measures. Compared with master’s degrees and years of experience, the combined measure was better able to indicate which teachers had students with larger gains on a test of conceptual understanding in mathematics and a literacy test requiring short written responses. In addition, the combined measure outperformed master’s and years of teaching experience in indicating which teachers had students who reported higher levels of effort and greater enjoyment in class.”

Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Seattle, WA: Bill & Melinda Gates Foundation. Retrieved from

From the ERIC abstract: “To develop, reward, and retain great teachers, school systems first must know how to identify them. The authors designed the Measures of Effective Teaching (met) project to test replicable methods for identifying effective teachers. In past reports, the authors described three approaches to measuring different aspects of teaching: student surveys, classroom observations, and a teacher’s track record of student achievement gains on state tests. In those analyses, they could only test each measure’s ability to predict student achievement gains non-experimentally, using statistical methods to control for student background differences. For this report, they put the measures to a more definitive and final test. First, they used the data collected during 2009-10 to build a composite measure of teaching effectiveness, combining all three measures to predict a teacher’s impact on another group of students. Then, during 2010-11, they randomly assigned a classroom of students to each teacher and tracked his or her students’ achievement. They compared the predicted student outcomes to the actual differences that emerged by the end of the 2010-11 academic year. Here’s what the authors found: First, the measures of effectiveness from the 2009-10 school year did identify teachers who produced higher average student achievement following random assignment. Second, the magnitude of the achievement gains they generated was consistent with their expectations.”

Kimball, S. M., White, B., Milanowski, A. T., & Borman, G. (2004). Examining the relationship between teacher evaluation and student assessment results in Washoe County. Peabody Journal of Education, 79(4), 54–78. Retrieved from

From the ERIC abstract: “In this article, we describe findings from an analysis of the relationship between scores on a standards-based teacher evaluation system modeled on the Framework for Teaching (Danielson, 1996) and student achievement measures in a large Western school district. We apply multilevel statistical modeling to study the relationship between the evaluation scores and state and district tests of reading, mathematics, and a composite measure of reading and mathematics. Using a value-added framework, the teacher evaluation scores were included at the 2nd level, or teacher level, of the model when other student and teacher-level characteristics were controlled. This study provided some initial evidence of a positive association between teacher performance, as measured by the evaluation system, and student achievement. The coefficients representing the effects of teacher performance on student achievement were positive and were statistically significant in 4 of 9 grade-test combinations studied.”

Note: REL Midwest was unable to locate a link to the full-text version of this resource. Although REL Midwest tries to provide publicly available resources whenever possible, it was determined that this resource may be of interest to you. It may be found through university or public library systems.

Lash, A., Tran, L., & Huang, M. (2016). Examining the validity of ratings from a classroom observation instrument for use in a district’s teacher evaluation system (REL 2016–135). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West. Retrieved from

From the ERIC abstract: “The purpose of this study was to examine the validity of teacher evaluation scores that are derived from an observation tool, adapted from Danielson’s Framework for Teaching, designed to assess 22 teaching components from four teaching domains. The study analyzed principals’ observations of 713 elementary, middle, and high school teachers in Washoe County School District (Reno, NV). The findings support the use of a single, summative score to evaluate teachers, one that is derived by totaling or averaging all 22 ratings. The findings do not support using domain- or component-level scores to evaluate teachers’ skills, because there was little evidence that these scores measure distinct aspects of teaching. The information that the total score provides predicts the learning of teachers’ students. While the relationship is moderate, it is evidence to support interpreting the observation score as an indicator of teachers’ effectiveness in promoting learning.”

Lazarev, V., Newman, D., & Sharp, A. (2014). Properties of the multiple measures in Arizona’s teacher evaluation model (REL 2015–050). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West. Retrieved from

From the ERIC abstract: “This study explored the relationships among the components of the Arizona Department of Education’s new teacher evaluation model, with a particular focus on the extent to which ratings from the state model’s teacher observation instrument differentiated higher and lower performance. The study used teacher-level evaluation data collected by the Arizona Department of Education from five participating pilot LEAs during the 2012/13 school year. The study relied primarily on descriptive statistics calculated from the results of the different component metrics piloted in these LEAs, as well as analysis of the correlations among these components. Results indicated that teachers’ observation item scores tended to concentrate at the Proficient level (the second-highest score on a four-point scale: Unsatisfactory, Basic, Proficient, and Distinguished), with this level accounting for 62 percent of all observational item scores. In addition, while the strength of the correlation between results from observations and the state’s student academic progress metric was generally low, the correlation varied significantly between high- and low-performing teachers, as well as between certain teacher subgroups.”

Milanowski, A. (2004). The relationship between teacher performance evaluation scores and student achievement: Evidence from Cincinnati. Peabody Journal of Education, 79(4), 33–53. Retrieved from

From the ERIC abstract: “In this article, I present the results of an analysis of the relationship between teacher evaluation scores and student achievement on district and state tests in reading, mathematics, and science in a large Midwestern U.S. school district. Within a value-added framework, I correlated the difference between predicted and actual student achievement in science, mathematics, and reading for students in Grades 3 through 8 with teacher evaluation ratings. Small to moderate positive correlationships were found for most grades in each subject tested. When these correlationships were combined across grades within subjects, the average correlationships were .27 for science, .32 for reading, and .43 for mathematics. These results show that scores from a rigorous teacher evaluation system can be substantially related to student achievement and provide criterion-related validity evidence for the use of the performance evaluation scores as the basis for a performance-based pay system or other decisions with consequences for teachers.”

Taylor, E. S., & Tyler, J. H. (2011). The effect of evaluation on performance: Evidence from longitudinal student achievement data of mid-career teachers (NBER Working Paper 16877). Cambridge, MA: National Bureau of Economic Research. Retrieved from

From the ERIC abstract: “The effect of evaluation on employee performance is traditionally studied in the context of the principal-agent problem. Evaluation can, however, also be characterized as an investment in the evaluated employee’s human capital. We study a sample of mid-career public school teachers where we can consider these two types of evaluation effect separately. Employee evaluation is a particularly salient topic in public schools where teacher effectiveness varies substantially and where teacher evaluation itself is increasingly a focus of public policy proposals. We find evidence that a quality classroom-observation-based evaluation and performance measures can improve mid-career teacher performance both during the period of evaluation, consistent with the traditional predictions; and in subsequent years, consistent with human capital investment. However the estimated improvements during evaluation are less precise. Additionally, the effects sizes represent a substantial gain in welfare given the program’s costs.”

Walsh, E., & Lipscomb, S. (2013). Classroom observations from Phase 2 of the Pennsylvania teacher evaluation pilot: Assessing internal consistency, score variation, and relationships with value added (Final Report). Princeton, NJ: Mathematica Policy Research. Retrieved from

From the ERIC abstract: “This report presents findings from Phase 2 of a three-year teacher evaluation pilot conducted by the Pennsylvania Department of Education. Principals evaluated the teaching practices of teachers using The Framework for Teaching, a rubric that includes 22 components grouped into four broad teaching practice domains: (1) planning and preparation, (2) classroom environment, (3) instruction, and (4) professional responsibilities. Although principals did not typically use all 22 components, the report’s findings suggest the fairness of overall scores might not be compromised substantially by principals using different sets of components. Also, across nearly all components, teachers with higher scores on the rubric tended to make larger contributions to student achievement than did teachers with lower scores, as measured by value added. The report’s findings suggest that the rubric measures aspects of teachers’ practices related to growth in student achievement on standardized assessments.”

Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood (NBER Working Paper No. 17699). Cambridge, MA: National Bureau of Economic Research. Retrieved from

From the ERIC abstract: “This study examined whether being taught by a teacher with a high “value-added” improves a student’s long-term outcomes. The study analyzed more than 20 years of data for nearly one million fourth- through eighth-grade students in a large urban school district. The study reported that having a teacher with a higher level of value-added was associated with higher test scores, lower rates of teen pregnancy, higher probability of college attendance and college quality, higher earnings growth in their 20s, higher rates of saving for retirement, and higher neighborhood quality. The study is not a randomized controlled trial and, therefore, cannot receive the highest rating of meets What Works Clearinghouse (WWC) evidence standards. It used a quasi-experimental design, but did not clearly establish that students with and without high value-added teachers were similar before exposure to the teachers. Once the WWC conducts a more thorough review (forthcoming), it will be able to determine whether the study meets WWC evidence standards with reservations.”

Additional Organizations to Consult

Measures of Effective Teaching (MET) Project : –

From the website: “The MET project is a research partnership of academics, teachers, and education organizations committed to investigating better ways to identify and develop effective teaching. Funding is provided by the Bill & Melinda Gates Foundation. The approximately 3,000 MET project teachers who volunteered to open up their classrooms for this work are from the following districts: The Charlotte-Mecklenburg Schools, the Dallas Independent Schools, the Denver Public Schools, the Hillsborough County Public Schools, the New York City Schools, the Memphis Public Schools, and the Pittsburgh Public Schools. Participating teachers and students were enrolled in math and English language arts (ELA) in grades 4 through 8, algebra I at the high school level, biology (or its equivalent) at the high school level, and English in grade 9. Partners include representatives of the following institutions and organizations: American Institutes for Research, Cambridge Education, University of Chicago, The Danielson Group, Dartmouth University, Educational Testing Service, Empirical Education, Harvard University, National Board for Professional Teaching Standards, National Math and Science Initiative, New Teacher Center, University of Michigan, RAND, Rutgers University, University of Southern California, Stanford University, Teachscape, University of Texas, University of Virginia, University of Washington, and Westat. ”

Center on Great Teachers and Leaders at American Institutes for Research –

From the website: “The Center on Great Teachers and Leaders (GTL Center) is dedicated to supporting state education leaders in their efforts to grow, respect, and retain great teachers and leaders for all students. The GTL Center continues the work of the National Comprehensive Center for Teacher Quality (TQ Center) and expands its focus to provide technical assistance and online resources designed to build systems that:

  • Support the implementation of college and career standards.
  • Ensure the equitable access of effective teachers and leaders.
  • Recruit, retain, reward, and support effective educators.
  • Develop coherent human capital management systems.
  • Create safe academic environments that increase student learning through positive behavior management and appropriate discipline.
  • Use data to guide professional development and improve instruction.”

National Council on Teacher Quality –

From the website: “The National Council on Teacher Quality is led by this vision: every child deserves effective teachers and every teacher deserves the opportunity to become effective.

For far too many children and teachers, this vision is not the reality. That's because all too often the policies and practices of those institutions with the most authority and influence over teachers and schools—be they state governments, teacher preparation programs, school districts, or teachers unions—fall short. NCTQ focuses on the changes these institutions must make to return the teaching profession to strong health, delivering to every child the education needed to ensure a bright and successful future.”


Keywords and Search Strings

The following keywords and search strings were used to search the reference databases and other sources:

  • Teacher accountability

  • Teacher evaluation

  • Teacher (accountability OR evaluation OR quality)

  • Classroom observations

  • Value-added

  • Student outcomes

  • Academic achievement

  • Student (outcomes OR achievement)

  • Social emotional outcomes

Databases and Search Engines

We searched ERIC for relevant resources. ERIC is a free online library of more than 1.6 million citations of education research sponsored by the Institute of Education Sciences (IES).

Reference Search and Selection Criteria

When we were searching and reviewing resources, we considered the following criteria:

  • Date of the publication: References and resources published over the last 15 years, from 2002 to present, were include in the search and review.

  • Search priorities of reference sources: Search priority is given to study reports, briefs, and other documents that are published or reviewed by IES and other federal or federally funded organizations.

  • Methodology: We used the following methodological priorities/considerations in the review and selection of the references: (a) study types—randomized control trials, quasi-experiments, surveys, descriptive data analyses, literature reviews, policy briefs, and so forth, generally in this order, (b) target population, samples (e.g., representativeness of the target population, sample size, volunteered or randomly selected), study duration, and so forth, and (c) limitations, generalizability of the findings and conclusions, and so forth.
This memorandum is one in a series of quick-turnaround responses to specific questions posed by educational stakeholders in the Midwest Region (Illinois, Indiana, Iowa, Michigan, Minnesota, Ohio, Wisconsin), which is served by the Regional Educational Laboratory (REL Region) at American Institutes for Research. This memorandum was prepared by REL Midwest under a contract with the U.S. Department of Education’s Institute of Education Sciences (IES), Contract ED-IES-17-C-0007, administered by American Institutes for Research. Its content does not necessarily reflect the views or policies of IES or the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.