J.R. Lockwood & Daniel F. McCaffrey
Analyses of longitudinal student achievement data attempting to isolate teacher contributions to student learning have become more common as the requisite longitudinal data have become available with increased standardized testing. Longitudinal data can also provide an estimate of how each student would perform regardless of specific teacher assignments. This “general achievement” level is often referred to as a student effect in the mixed model formulation for longitudinal data.
In many applications the student effects are of little interest and are used only as a means to distinguish teacher effects from student inputs to learning. However, in this study we use these effects to explore two enhancements to the mixed model formulation of value-added models of teacher effects. First, we expand the model to allow teacher effects to depend on each student's general level of achievement, so that we may estimate a teacher to have a larger effect on students with high (low) achievement than the teacher has on students with low (high) achievement. Second, we use the student effects to relax the assumptions about the nature of missing data by allowing the number of observed scores to depend on the student's random effect. These enhanced models allow us to explore how two assumptions made by existing mixed model formulations - namely, that an average teacher effect across all students is sufficient for inference and that missing test score data are missing at random - might lead to systematic errors in estimated teacher effects.
Our preliminary findings using five years of longitudinal data from one cohort of students from a large urban school district suggest that neither assumption is likely to lead to appreciable bias in estimated teacher effects for this example. First, using a combination of exploratory procedures and model-based inference, we find that the data provide evidence of modestly-sized teacher-by-student interactions. The model estimates that variability in the effectiveness within individual teachers due to teacher-by-student interactions may account for about 9% of the total variance of teacher effects across all students.
However, interactions of this magnitude would have little impact on a teacher's average effect on potential classrooms because the variation in the average level of achievement across classrooms is small relative to the variability among scores from individual students. Given the distribution of classroom means for our data and the distribution of estimated interaction terms, we find that effects of a given teacher on two potential classrooms at nearly opposite ends of the observed distribution of classroom means are very highly correlated (0.98 or higher depending on the grade level). Thus, while our results suggest the presence of teacher-by-student interactions that warrant further investigation, they also suggest that failing to account for these interactions when comparing teachers with differing class compositions is not likely to have a notable effect on general ordering of estimated teacher effects.
We found similar robustness of estimated teacher effects to our enhanced models for missing test score data. The selection model which allows the number of observed test scores for a student to depend on that student's latent general level of achievement provided teacher effect estimates that were nearly perfectly correlated with those from the standard model. To check that the robustness of estimates to the assumptions about missing data was not limited to our chosen selection model, we also explored a pattern mixture model that allowed the means and covariances of test scores to depend on the student's pattern of observed scores. Again estimated teacher effects from the pattern mixture model were nearly perfectly correlated with those from the standard model. Analytical work suggests that the robustness of estimated teacher effects to assumptions about missing data may result from the downweighting by the mixed model estimating equations of scores from students with incomplete testing information relative to those from students with complete data. Scores from students with incomplete data are downweighted because they provide less information about teacher effects than scores from students with complete data. While other missing data models may not necessarily lead to the same robustness, our findings add to a growing body of evidence that missing test score data are not likely to be major source of bias in estimated teacher effects provided sufficiently rich statistical models are used for estimation of teacher effects.
While our findings to date are encouraging, they should be interpreted cautiously. We need to replicate our study with data from additional locations. The results would also be strengthened by continuing to expand the models by including classroom contextual information into the interaction model and by considering alternative models for selection of missing data. Our future work will also include developing diagnostic measures to identify conditions that create the potential for assumptions about missing data to bias results.