Study Design
The 90 study schools were randomly assigned in spring 2005 so that equal numbers within
each district received treatment A (the institutes), treatment B (the institutes plus coaching), and no
treatment (the district’s “business as usual” PD). A variety of data were collected from the teachers
and students in these schools, primarily in the fall and spring of the implementation year (2005-06)
and the fall and spring of the follow-up year (2006-07). Based on these data, several outcome
measures were constructed:
- Teachers’ knowledge about reading instruction. The study team administered a
Reading Content and Practices Survey (RCPS) to all treatment and control teachers in
fall and spring of the implementation year and the spring of the follow-up year.7
Although the overall knowledge score is the main measure for this outcome, we also
computed two subscores—a word-level subscore, measuring teachers’ knowledge of
word-level components of reading instruction (phonemic awareness, phonics, and
fluency), and a meaning-level subscore, measuring teachers’ knowledge of meaning-level
components of reading instruction (vocabulary and reading comprehension). The two
subscores were included to permit exploration of possible differences in the impact of
the PD on the domains of knowledge it addressed.8 The teacher knowledge measures
were standardized based on the control group mean and standard deviation so that
impacts can be displayed as effect sizes. The first administration of the RCPS (prior to
delivery of the PD) was used as a baseline measure of teacher knowledge.
- Teachers’ use of research-based instructional practices. Trained observers visited
all second grade classrooms in study schools in the fall and spring of the implementation
year and in the fall of the follow-up year, tallying activities that occurred during each
three-minute interval over a full period of reading instruction. Outcome measures
derived from the observations of reading instruction included scores for explicit teaching
methods, independent student activity (i.e., guided student practice), and differentiation of
instruction to address students’ diverse needs, three areas of teachers’ practice that the PD
was intended to affect.9 Again, so that the impacts can be displayed as effect sizes, each classroom instruction measure was standardized based on the control group mean and
standard deviation.
- Students’ reading achievement. Students’ reading achievement was the primary
outcome for the study. The key measure was the standardized average reading score,
obtained from the district assessments. Because the tests used in the six study districts differed, there was no one consistent test metric. Hence the scaled scores reported by
the districts were standardized within each district so that they can be compared across
districts.10 Standardizing the achievement scores makes it possible to interpret the
impact estimates as effect sizes. It is possible that the PD interventions might not have
an impact on average achievement, but the interventions might affect the achievement
distribution. For that reason, a secondary, dichotomous measure was constructed. First, the
average reading test score in the 2004–2005 school year (latest baseline year) for all
second grade students in the study schools within each district was chosen as a cut-point.
Each student’s implementation year and follow-up year test scores were compared to
this cut-point, and each student was categorized as achieving above or below that
cut-point in the implementation year as well as the follow-up year tests. This metric
reflects the percentage of students who performed at or above the mean baseline
performance level. The analysis based on this measure focused on the impact of the PD
treatment on the proportion of students with above average achievement in the study
schools.
We also surveyed teachers to gather data on their backgrounds and on the amount and type
of PD they participated in during the study years. Study staff obtained information on the
implementation of the two interventions by observing the institutes and from logs maintained by
coaches that recorded the nature of each coach interaction with each teacher.
The basic analytic strategy for assessing the impacts of the PD interventions was to compare
outcomes for schools that were randomly assigned within each district to each of the three study
conditions. Because we used data on students, nested within teachers’ classrooms, nested within
study schools, three-level multilevel models were used to estimate the impacts of professional
development on student reading achievement and two-level models were used for estimating impacts
on the teacher measures. The impact model uses the sample of teachers and students present in the
study schools as of the spring 2006 (implementation year) and 2007 (follow-up year) data collection
periods. The estimates provide an intent-to-treat analysis of the impact of the interventions because
they reflect the PD effects on the targeted (or “intended”) sample, whether or not all the teachers in
the treatment schools participated fully in the PD provided.
A summary of the study sample and design is provided in the following text box.
Study Sample and Design Summary
Participants: Six districts, 90 schools, and 270 second grade teachers participated in the study
during the year that the PD interventions were implemented. During the follow-up year (which
included only data collection), the number of teachers participating was 250 in the fall and 254 in
the spring. Participating districts used one of two commonly used scientifically based reading
programs. Schools selected for the study were high-poverty urban or urban fringe public elementary
schools in which fewer than half the students were designated as English language learners (ELL).
Schools were screened out if they were already receiving Reading First funding (and therefore might
already be participating in intensive PD) or if they planned to receive this funding during the first
year of the study.
Research Design: Within each district, schools were randomly assigned in equal numbers to
treatment A, treatment B, or the control group. Each group therefore consisted of 30 schools and
88 to 93 teachers during the implementation year or 81 to 85 teachers during the follow-up year.
School-level student achievement data were collected from district records for student cohorts from
the two years prior to the study as pretest data, and teachers took a teacher knowledge pretest before
participating in any study PD. Outcomes data collected consisted of student achievement scores
from spring of the implementation and follow-up years, obtained from district records; teacher
knowledge scores from posttests administered in spring of the implementation and follow-up years;
and classroom observations conducted during fall and spring of the implementation year and during
fall of the follow-up year. These data were collected from all three study groups. Because students
were clustered within classrooms and classrooms were clustered within schools, effects for the study
were estimated using hierarchical linear models.
Outcomes: The study examined impacts on three sets of outcomes: teachers’ knowledge of
reading instruction, based on data from the Reading Content and Practices Survey (RCPS); teachers’
reading instructional practices, based on observations by trained observers; and student reading test
scores, collected from district records.
|
Top
7 The outcomes of the teacher knowledge assessment, like other achievement or aptitude tests, are scaled in logits, which represent the log of the odds of getting correct answers to each test item.
8 The word-level material in the PD curriculum emphasized foundational knowledge underlying “best practices” in phonics and fluency instruction, topics believed to be unfamiliar to most teachers (Moats 2002). The meaning-level material in the curriculum emphasized teaching strategies for building students’ vocabularies and comprehension skills, both of which were built into the lesson structure of the core readers the teachers used.
9 The measures of explicit instruction and independent student activity were scaled in logits, paralleling the scales used for the teacher knowledge outcomes. Logits are commonly used in situations in which the purpose is to measure the proportion of occasions in which an event occurs. Each teacher’s logit score represents the log of the odds of the teacher engaging in explicit instruction or independent student activity during each three-minute observation interval. The differentiated instruction measure was not scaled in logits because the majority of teachers did not engage in differentiated instruction during the classroom observation; logits cannot be calculated for zero occurrences.
10 The standardized scores were calculated by subtracting the second grade student reading test average for the district’s study schools in 2004–2005 from each student’s total reading score and then dividing it by the standard deviation for the second grade students in the district’s study schools in 2004–2005.