For more than 50 years, the RELs have collaborated with school districts, state departments of education, and other education stakeholders to help them generate and use evidence and improve student outcomes. Read more
Home Publications The content, predictive power, and potential bias in five widely used teacher observation instruments
School districts and states across the Regional Educational Laboratory Mid-Atlantic Region and the country as a whole have been modifying their teacher evaluation systems to identify more effective and less effective teachers and provide better feedback to improve instructional practice. The new systems typically include components related to student achievement growth and instruments for observing and rating instructional practice. Many school districts and states are considering adopting commercially available instruments for the instructional practice component of their evaluation systems. Yet little data are available to help districts and states choose among available instruments or determine which dimensions of instructional practice merit the greatest emphasis. Most existing data comparing different observation instruments, including their statistical characteristics and their relationship to student achievement, come from the Bill & Melinda Gates Foundation's Measures of Effective Teaching project (Kane & Staiger, 2012). This study examined data from the Measures of Effective Teaching project to address three research questions that might inform district and state decisions about selecting and using five widely used teacher observation instruments: the Classroom Assessment Scoring System, the Framework for Teaching, the Protocol for Language Arts Teaching Observations, the Mathematical Quality of Instruction, and the UTeach Observational Protocol. Specifically, the research questions focused on the major differences and similarities in the dimensions of instructional practice rated by the five observation instruments, whether some dimensions of instructional practice consistently show stronger correlations with teachers' value-added scores across the different observation instruments, and the extent to which characteristics of students in the classroom affect instrument scores. Key findings include: (1) Eight of ten dimensions of instructional practice are common across all five examined teacher observation instruments, demonstrating that large parts of the various instruments are conceptually consistent; (2) All seven of the dimensions of instructional practice with quantitative data are modestly but significantly related to teachers' value-added scores; (3) The classroom management dimension is most consistently and strongly related to teachers' value-added scores across instruments, subjects, and grades; and (4) The characteristics of students in the classroom affect teacher observation scores for some instruments and subjects. Observation scores for English language arts classes may be more susceptible to classroom composition effects. For two of the three instruments (Framework for Teaching and Classroom Assessment Scoring System) used to score English language arts instruction, teachers with a larger percentage of racial/ethnic minority students in their classroom tend to receive lower observation scores; a similar effect was observed with the Framework for Teaching for teachers with lower-achieving students. There was no evidence that the composition of students in the classroom affects scores for the Protocol for Language Arts Teaching Observations (the third instrument used to score English language arts instruction), and there was little indication that student characteristics affect observation scores in math classes. The following are appended: (1) Detailed study methodology; (2) Imputation methodology for value-added model estimation; and (3) Supplementary results.
ERIC DescriptorsClassroom Observation Techniques, Content Analysis, Correlation, Educational Practices, Evaluation Methods, Instructional Effectiveness, Instructional Program Divisions, Intellectual Disciplines, Language Arts, Lesson Observation Criteria, Observation, Predictive Validity, Qualitative Research, Regression (Statistics), School Districts, Scoring Rubrics, Statistical Analysis, Statistical Data, Statistical Significance, Student Characteristics, Teacher Effectiveness, Teacher Evaluation, Test Bias, Test Content, Test Reliability, Test Validity, Value Added Models
Mid-Atlantic | Publication Type: Descriptive Study | Publication
Date: November 2016
Connect with REL Mid-Atlantic