Skip Navigation
Impacts of Comprehensive Teacher Induction: Results From the First Year of a Randomized Controlled Study

NCEE 2009-4034
October 2008

Methods and Data

We used a model-based approach to estimate program impacts. The statistical model explicitly acknowledges the hierarchical structure of the data—for example, the nesting of teachers within schools—an approach that is sometimes referred to as a hierarchical linear model (HLM). Accordingly, we can properly specify the units of analysis (teachers and schools) and devise unbiased estimates of the standard errors that we used to conduct hypothesis tests. The model also allows us to control for the effects of a range of teacher and school characteristics on the outcomes of interest to increase the precision of the estimates of treatment effects. The set of benchmark control variables (covariates), which differs for each outcome, are described in the discussion of key study findings.

To test the robustness of the study findings, we conducted several sensitivity tests. These tests included re-estimation of the study’s main impacts with different sets of covariates and sample weights and different statistical model assumptions. We also reported whether the findings would change if we were to use post-hoc adjustments for multiple comparison errors. Multiple comparison errors are those that arise when researchers report on a large number of hypothesis tests, at least some of which may result in falsely rejecting the null hypothesis. Specifically, we applied a method developed by Benjamini and Hochberg (1995) for reducing the rate of false discoveries.

Findings are pooled across ETS and NTC districts throughout this report because the study was intended to explore the effects of comprehensive teacher induction in general, not the specific impacts of any one program. However, we conducted separate analyses by district type (ETS or NTC) to ensure that the findings were not peculiar to one of the providers.

Data for the study were collected from a variety of sources. We administered a baseline teacher survey in fall 2005, at which time we also requested teachers’ permission to obtain their college entrance examination scores (SAT or ACT). The baseline survey asked teachers about their formal education, professional training, current teaching assignment, and personal background. We surveyed teachers twice during the 2005-2006 school year on the induction activities in which they participated, including questions about duration and intensity of mentoring and professional development as well as questions about satisfaction with and preparedness for different aspects of their current teaching position. We surveyed mentors participating in the comprehensive induction programs on their background characteristics and reviewed program documents from ETS and NTC. Additional detail on these measures is included in the discussion of findings below.

For the study’s core outcomes, we observed the teachers teaching a literacy unit in the classroom in the spring of 2006, collected districts’ student records data at the end of the 2005-2006 school year, and conducted the first of three mobility surveys in fall 2006 to learn about teacher retention. We achieved response rates of over 85 percent on the teacher surveys and observations, although the rates for the control group (for example, 92 percent on the background survey) were not as high as those for the treatment group (97 percent on the same survey). We used nonresponse adjustment weights and sensitivity analyses to address the differential response rates.

The instrument used to conduct the observations was the Vermont Classroom Observation Tool (VCOT). The VCOT measures the teacher practices that current research suggests are essential to good teaching or that have been linked to student achievement growth (Cawelti 2004). The VCOT also measures instructional practices that closely reflect those recognized by both the ETS and NTC induction programs, particularly for literacy instruction. We observed eligible study teachers once while they were teaching a literacy unit. The observations lasted between one to two hours, with duration dependent on how the district or school structured its class periods. Observers scored teachers in each of three constructs based on a set of items that are believed to be indicators of good practice: implementation of a lesson, content of a lesson, and classroom culture. Implementation was measured with five items that focused on the effectiveness of instruction and learning that occurred during the lesson. Content was measured with four items that assessed the accuracy, importance, level of abstraction, and connections to other concepts. Classroom culture was measured with seven items that assessed the learning environment, the level of student engagement, the nature of working relationships, and issues of student equity (Saginor and Hyjek 2005). The three domains comprise five, four, and seven items, respectively. Observers rated the extent of evidence of teacher behavior for each item on a five-point scale showing (1) no evidence, (2) limited evidence, (3) moderate evidence, (4) consistent evidence, or (5) extensive evidence.

We measured student achievement outcomes using district-administered test score data from the spring 2006 (post-test) for students taught by study teachers in the 2005-2006 school year and students’ linked scores from the prior grade in spring 2005 (pre-test).3 We conducted all treatment-control comparisons within grade and within district to ensure that treatment status was not confounded with properties of the test.

Top


3 One district tested students in the fall, so we used data that tracked growth from fall 2005 to fall 2006.