Skip Navigation
Impact of Two Professional Development Interventions on Early Reading Instruction and Achievement

NCEE 2008-4030
September 2008

Study Design

The 90 study schools were randomly assigned in spring 2005 so that equal numbers within each district received treatment A (the institutes), treatment B (the institutes plus coaching), and no treatment (the district’s “business as usual” PD). A variety of data were collected from the teachers and students in these schools, primarily in the fall and spring of the implementation year (2005-06) and the fall and spring of the follow-up year (2006-07). Based on these data, several outcome measures were constructed:

  • Teachers’ knowledge about reading instruction. The study team administered a Reading Content and Practices Survey (RCPS) to all treatment and control teachers in fall and spring of the implementation year and the spring of the follow-up year.7 Although the overall knowledge score is the main measure for this outcome, we also computed two subscores—a word-level subscore, measuring teachers’ knowledge of word-level components of reading instruction (phonemic awareness, phonics, and fluency), and a meaning-level subscore, measuring teachers’ knowledge of meaning-level components of reading instruction (vocabulary and reading comprehension). The two subscores were included to permit exploration of possible differences in the impact of the PD on the domains of knowledge it addressed.8 The teacher knowledge measures were standardized based on the control group mean and standard deviation so that impacts can be displayed as effect sizes. The first administration of the RCPS (prior to delivery of the PD) was used as a baseline measure of teacher knowledge.
  • Teachers’ use of research-based instructional practices. Trained observers visited all second grade classrooms in study schools in the fall and spring of the implementation year and in the fall of the follow-up year, tallying activities that occurred during each three-minute interval over a full period of reading instruction. Outcome measures derived from the observations of reading instruction included scores for explicit teaching methods, independent student activity (i.e., guided student practice), and differentiation of instruction to address students’ diverse needs, three areas of teachers’ practice that the PD was intended to affect.9 Again, so that the impacts can be displayed as effect sizes, each classroom instruction measure was standardized based on the control group mean and standard deviation.
  • Students’ reading achievement. Students’ reading achievement was the primary outcome for the study. The key measure was the standardized average reading score, obtained from the district assessments. Because the tests used in the six study districts differed, there was no one consistent test metric. Hence the scaled scores reported by the districts were standardized within each district so that they can be compared across districts.10 Standardizing the achievement scores makes it possible to interpret the impact estimates as effect sizes. It is possible that the PD interventions might not have an impact on average achievement, but the interventions might affect the achievement distribution. For that reason, a secondary, dichotomous measure was constructed. First, the average reading test score in the 2004–2005 school year (latest baseline year) for all second grade students in the study schools within each district was chosen as a cut-point. Each student’s implementation year and follow-up year test scores were compared to this cut-point, and each student was categorized as achieving above or below that cut-point in the implementation year as well as the follow-up year tests. This metric reflects the percentage of students who performed at or above the mean baseline performance level. The analysis based on this measure focused on the impact of the PD treatment on the proportion of students with above average achievement in the study schools.

We also surveyed teachers to gather data on their backgrounds and on the amount and type of PD they participated in during the study years. Study staff obtained information on the implementation of the two interventions by observing the institutes and from logs maintained by coaches that recorded the nature of each coach interaction with each teacher.

The basic analytic strategy for assessing the impacts of the PD interventions was to compare outcomes for schools that were randomly assigned within each district to each of the three study conditions. Because we used data on students, nested within teachers’ classrooms, nested within study schools, three-level multilevel models were used to estimate the impacts of professional development on student reading achievement and two-level models were used for estimating impacts on the teacher measures. The impact model uses the sample of teachers and students present in the study schools as of the spring 2006 (implementation year) and 2007 (follow-up year) data collection periods. The estimates provide an intent-to-treat analysis of the impact of the interventions because they reflect the PD effects on the targeted (or “intended”) sample, whether or not all the teachers in the treatment schools participated fully in the PD provided.

A summary of the study sample and design is provided in the following text box.

Study Sample and Design Summary

Participants: Six districts, 90 schools, and 270 second grade teachers participated in the study during the year that the PD interventions were implemented. During the follow-up year (which included only data collection), the number of teachers participating was 250 in the fall and 254 in the spring. Participating districts used one of two commonly used scientifically based reading programs. Schools selected for the study were high-poverty urban or urban fringe public elementary schools in which fewer than half the students were designated as English language learners (ELL). Schools were screened out if they were already receiving Reading First funding (and therefore might already be participating in intensive PD) or if they planned to receive this funding during the first year of the study.

Research Design: Within each district, schools were randomly assigned in equal numbers to treatment A, treatment B, or the control group. Each group therefore consisted of 30 schools and 88 to 93 teachers during the implementation year or 81 to 85 teachers during the follow-up year. School-level student achievement data were collected from district records for student cohorts from the two years prior to the study as pretest data, and teachers took a teacher knowledge pretest before participating in any study PD. Outcomes data collected consisted of student achievement scores from spring of the implementation and follow-up years, obtained from district records; teacher knowledge scores from posttests administered in spring of the implementation and follow-up years; and classroom observations conducted during fall and spring of the implementation year and during fall of the follow-up year. These data were collected from all three study groups. Because students were clustered within classrooms and classrooms were clustered within schools, effects for the study were estimated using hierarchical linear models.

Outcomes: The study examined impacts on three sets of outcomes: teachers’ knowledge of reading instruction, based on data from the Reading Content and Practices Survey (RCPS); teachers’ reading instructional practices, based on observations by trained observers; and student reading test scores, collected from district records.


7 The outcomes of the teacher knowledge assessment, like other achievement or aptitude tests, are scaled in logits, which represent the log of the odds of getting correct answers to each test item.
8 The word-level material in the PD curriculum emphasized foundational knowledge underlying “best practices” in phonics and fluency instruction, topics believed to be unfamiliar to most teachers (Moats 2002). The meaning-level material in the curriculum emphasized teaching strategies for building students’ vocabularies and comprehension skills, both of which were built into the lesson structure of the core readers the teachers used.
9 The measures of explicit instruction and independent student activity were scaled in logits, paralleling the scales used for the teacher knowledge outcomes. Logits are commonly used in situations in which the purpose is to measure the proportion of occasions in which an event occurs. Each teacher’s logit score represents the log of the odds of the teacher engaging in explicit instruction or independent student activity during each three-minute observation interval. The differentiated instruction measure was not scaled in logits because the majority of teachers did not engage in differentiated instruction during the classroom observation; logits cannot be calculated for zero occurrences.
10 The standardized scores were calculated by subtracting the second grade student reading test average for the district’s study schools in 2004–2005 from each student’s total reading score and then dividing it by the standard deviation for the second grade students in the district’s study schools in 2004–2005.