IES Grant

Title: Validating an Observation Protocol for the Evaluation of Special Educators
Center: NCSER Year: 2015
Principal Investigator: Jones, Nathan Awardee: Boston University
Program: Educators and School-Based Service Providers      [Program Details]
Award Period: 4 years (7/1/2015-6/30/2019) Award Amount: $1,600,000
Type: Measurement Award Number: R324A150231

Co-Principal Investigators: Mary Brownell (University of Florida) and Courtney Bell (Educational Testing Service)

Purpose: This project is designed to validate Charlotte Danielson's Framework for Teaching (FFT) — a widely-used observation scheme — for use in the evaluation of special education teachers. Research suggests that teachers are an important aspect of improving student achievement; therefore, there has been recent attention to research on how to conduct teacher observations and measure instructional quality. Most of this research (including recent validity and reliability studies of the FFT) has been conducted with general education teachers, not special educators. Thus, there is a lack of high-quality, research-based tools to assess special educator teacher quality. This study will provide insights into the strengths and limitations of using the FFT with special educators and greater understanding of the characteristics, processes, and outcomes that are associated with high-quality special education classroom contexts.

Project Activities: Researchers will investigate whether judgments of special educator teaching quality can be made on the basis of FFT scores. In Year 1, research efforts will focus primarily on data collection via 320 videotaped classroom observations of 80 teachers in Grades 3-5. Year 2 will focus on training special education experts to score the classroom observations. Raters will be trained on using FFT and the Reading Instruction in Special Education (RISE) instruments and begin scoring videotaped lessons. In Year 3, raters will score all 320 lessons using FFT and RISE. The end of Year 3 and Year 4 will focus on data analysis aimed at validity and reliability and dissemination activities. Dissemination will include updates to state personnel in charge of coordinating the observation system, as well as publications and presentations.

Products:  The products of this project will include evidence of validity and reliability for FFT use with special educators, annual written reports to the state and participating districts, a summative report to the participating state in the final year, and scoring support resources (e.g., a set of questions to encourage rater reflection, brief video clips of exemplary practice) that will lead to more accurate and valid scores in special education settings. Other products will include conference presentations, scholarly journal publications, and distribution of results through the Collaboration for Effective Educator Development, Accountability, and Reform (CEEDAR) Center.

Setting: The research will take place in elementary schools in Rhode Island. Rhode Island collects statewide assessment data on all of its students and establishes student growth percentiles for teachers. Rhode Island incorporates student learning objectives (SLOs) into its evaluation system as a way to measure student learning when summative assessment test scores are not available.

Sample:   Eighty special educators teaching third- through fifth-grade students with high-incidence disabilities will participate in this study. These teachers will provide instruction in self-contained, resource, and co-taught classrooms. Observations will be matched as much as possible to the teachers' distribution of instruction.

Assessment: The assessment being examined is the Danielson Framework for Teaching (FFT) evaluation instrument, which includes knowledge and skills required for teachers' classroom practices and a rubric for scoring and assessing teaching performance across all grade levels and content areas. It contains 22 component criteria in four domains of teacher responsibilities: planning and preparation, classroom environment, instruction, and professional responsibilities. Many states, including Rhode Island, use modified versions of FFT, so this project will focus on validating the instrument's use in instruction and other professional responsibilities.

Research Design and Methods: Researchers will use Michael Kane's argument-based validation model with four related sets of inferences: 1) scoring (to what degree do the scores support their intended purpose of assessing teacher quality?), 2) generalization (is the sample of observations collected sufficiently representative of the pool of teachers' lessons?), 3) extrapolation (do FFT scores converge with a broader conception of teaching quality?), and 4) implication (do FFT teacher quality scores support decisions made based on those scores?). Assumptions behind the first three inferences will be tested empirically. Special educators' classrooms will be videotaped and scored using the FFT at four different time points throughout the school year. Observations will capture reading and math instruction to control for subject-specific effects. FFT scoring and data analysis will be conducted by researchers at Educational Testing Service. Four raters will score each videotaped lesson, and raters will be calibrated on a weekly basis through comparison of scores with a master rater. Scores from the videotaped lessons will be used to address the scoring and generalization inferences. The extrapolation inference will be assessed using comparisons to RISE scores, expert rankings, and student growth scores. The implication inference will be based on findings and explored through the disseminated products.

Control Condition:  Due to the nature of the research design, there is no control condition.

Key Measures:  The FFT, in its current form, will be used to observe special education teachers. A subsample of the special education teachers who teach reading will also be observed using the RISE observation tool. Student achievement scores will be taken from the New England Common Assessment Program, a state standardized test used for school accountability purposes. Student growth percentiles will be included as determined by the Rhode Island Growth Model. A proximal measure of student achievement will be collected using results of student learning objectives, in which educators are rated by administrators on a scale of 1 (minimal/no attainment) to 5 (exceptional attainment) based on student performance on a set of goals that are based on state standards. Administrative data will be collected on teachers (e.g., type of license held, scores on licensure tests, highest degree, number of years teaching, and demographics) and students (e.g., school information, teachers, special education status, and demographics).

Data Analytic Strategy: Distribution of FFT scores will be analyzed to determine if any are skewed or non-normal. Correlations will be conducted to assess relationships within and across domains. Inter-rater reliability will be measured, and individual raters' patterns of scoring will be examined for potential bias. A confirmatory factor analysis using common goodness-of-fit statistics along with the root mean square error of approximation will be conducted to assess model fit. Generalizability studies will be conducted to investigate sources of variation in ratings for each teacher. The Generalizability studies also provide an assessment of sources of divergent evidence. That is, the extent to which FFT scores are related to divergent factors such as subject matter or service delivery model. Correlations will be calculated to compare FFT and RISE scores. Spearman's rank correlation and Kendall's tau will be used to compare FFT scores to expert rankings. Pearson correlations will be used to compare FFT scores to student growth percentile scores.