The Reading First Impact Study uses a regression discontinuity design that capitalizes on the systematic processes some school districts used to allocate Reading First funds once their states had received RF grants.3 A regression discontinuity design is the strongest quasi-experimental method available to produce unbiased estimates of program impacts. Under certain conditions, all of which are met by the present study, this method can produce unbiased estimates of program impacts. Within each district or site:
Also, assuming that the shape of the relationship between schools' ratings and outcomes is correctly modeled, once the above conditions have been met, there should be no systematic differences between eligible schools that did and did not receive Reading First grants (Reading First and non-Reading First schools respectively), except for the characteristics associated with the school ratings used to determine funding decisions. Controlling for differences in schools' ratings allows one to control statistically for all systematic pre-existing differences between the two groups. One then can estimate the impact of Reading First by comparing the outcomes for Reading First schools and non-Reading First schools in the study sample, controlling for differences in their ratings. Non-Reading First schools in a regression discontinuity analysis thereby play the same role as do control schools in a randomized experiment—it is their regression-adjusted outcomes that represent the best indications of what outcomes would have been for the treatment group (in this instance, Reading First schools) in the absence of the program being evaluated.
The study sample was selected purposively to meet the requirements of the regression discontinuity design by selecting a sample of sites that had used a systematic rating or ranking process to select their Reading First school grantees. Within these sites, the selection of schools focused on schools as close to the site-specific cut-points as possible in order to obtain schools that were as comparable as possible in the treatment and comparison groups.
The study sample includes 18 study sites: 17 school districts and one state-wide program. Sixteen districts and one state-wide program were selected from among 28 districts and one state-wide program that had demonstrably met the three criteria listed above. One other school district agreed to randomly assign some of its eligible schools to Reading First or a control group. The final selection reflected wide variation in district characteristics and provided enough schools to meet the study's sample size requirements. The regression discontinuity sites provide 238 schools for the analysis, and the randomized experimental site provides 10 schools. Half the schools at each site are Reading First schools and half are non-Reading First schools: in three sites, the study sample includes all the RF schools (in that site), in the remaining 15 sites, the study sample includes some, but not all, of the RF schools (in that site).
At the same time, the study deliberately endeavored to obtain a sample that was geographically diverse and as similar as possible to the population of all RF schools. The final study sample of 248 schools, 125 of which are Reading First schools, represents 44 percent of the Reading First schools in their respective sites (at the time the study selected its sample in 2004). The study's sample of RF schools is large, is quite similar to the population of all RF schools, is geographically diverse, and represents states (and districts) that received their RF grants across the range of RF state award dates. The average Year 1 grant for RF schools in the study sample ranged from about $81,790 to $708,240, with a mean of $188,782. This translates to an average of $601 per RF student. For more detailed information about the selection process and the study sample, see the study's Interim Report (Gamse, Bloom, Kemple & Jacob, 2008).
Exhibit ES.1 summarizes the study's three-year, multi-source data collection plan. The present report is based on data for school years 2004-05, 2005-06, and 2006-07. Data collection included student assessments in reading comprehension and decoding, and classroom observations of teachers' instructional practices in reading, teachers' instructional organization and order, and students' engagement with print. Data were also collected through surveys of teachers, reading coaches, and principals, and interviews of district personnel.
Exhibit ES.2 lists the principal domains for the study, the outcome measures within each domain, and the data sources for each measure. These include:
Student reading performance, assessed with the reading comprehension subtest of the Stanford Achievement Test, 10th Edition (SAT 10, Harcourt Assessment, Inc., 2004). The SAT 10 was administered to students in grades one, two and three during fall 2004, spring 2005, spring 2006, and spring 2007, with an average completion rate of 83 percent across all administrations. In the spring of 2007 only, first grade students were assessed with the Test of Silent Word Reading Fluency (TOSWRF, Mather et al., 2004), a measure designed to assess students' ability to decode words from among strings of letters. The average completion rate was 86 percent. Three outcome measures of student reading performance were created from SAT 10 and TOSWRF data.
Classroom reading instruction, assessed in first-grade and second-grade reading classes through an observation system developed by the study team called the Instructional Practice in Reading Inventory (IPRI). Observations were conducted during scheduled reading blocks in each sampled classroom on two consecutive days during each wave of data collection: spring 2005, fall 2005 and spring 2006, and fall 2006 and spring 2007. The average completion rate was 98 percent across all years. The IPRI, which is designed to record instructional behaviors in a series of three-minute intervals, can be used for observations of varying lengths, reflecting the fact that schools' defined reading blocks can and do vary. Most reading blocks are 90 minutes or more. Eight outcome measures of classroom reading instruction were created from IPRI data to represent the components of reading instruction emphasized by the Reading First legislation.5 Six of these measures are reported in terms of the amount of time spent on the various dimensions of instruction. Two of these measures are reported in terms of the proportion of the intervals within each observation .
Student engagement with print. Beginning in fall 2005, the study conducted classroom observations using the Student Time-on-Task and Engagement with Print (STEP) instrument to measure the percentage of students engaged in academic work who are reading or writing print. The STEP observation was completed by recording a time-sampled "snapshot" of student engagement three times in each observed classroom, for a total of three such "sweeps" during each STEP observation. The STEP was used to observe classrooms in fall 2005, spring 2006, fall 2006, and spring 2007, with an average completion rate of 98 percent across all years. One outcome measure was created using STEP data.
Professional development in scientifically based reading instruction, amount of reading instruction, supports for struggling readers, and use of assessments. Within these four domains, eight outcome measures were created based on data from surveys of principals, reading coaches, and teachers about school and classroom resources. The eight outcome measures represent aspects of scientifically based reading instruction promoted in the Reading First legislation and guidance. Surveys were fielded in spring 2005 and again in spring 2007 with an average completion rate across all respondents of 73 percent in spring 2005 and 86 percent in spring 2007. This final report includes findings from 2007 surveys only. Additional data were collected by the study team in order to create measures used in correlational analyses. These data include:
The Global Appraisal of Teaching Strategies (GATS), a 12-item checklist designed to measure teachers' instructional strategies related to overall instructional organization and order, is adapted from The Checklist of Teacher Competencies (Foorman and Schatschneider, 2003). Unlike the IPRI, which focuses on discrete teacher behaviors, the GATS was designed to capture global classroom management and environmental factors. Items covered topics such as the teacher's organization of materials, lesson delivery, responsiveness to students, and behavior management. The GATS was completed by the classroom observer immediately after each IPRI observation, meaning that each sampled classroom was rated on the GATS twice in the fall and twice in the spring in both the 2005-2006 school year and the 2006-2007 school year. The GATS was fielded in fall 2005, spring 2006, fall 2006, and spring 2007, with an average completion rate of over 99 percent. A single measure from the GATS data was created for use in correlational analyses.