Skip Navigation
Achievement Effects of Four Early Elementary School Math Curricula

NCEE 2009-4052
February 2009

Basis for the Current Findings

This report presents results from the first cohort of 39 schools participating in the evaluation, with the goal of answering the following research question: What are the relative effects of different early elementary math curricula on student math achievement in disadvantaged schools? The report also examines whether curriculum effects differ for student subgroups in different instructional settings.

Curricula Included in the Study. A competitive process was used to select four curricula for the evaluation that represent many of the diverse approaches used to teach elementary school math in the United States:

  • Investigations in Number, Data, and Space (Investigations) published by Pearson Scott Foresman (Russell, Economopoulos, Mokros, Kliman, Wright, Clements, Goodrow, Murray, and Sarama 2006)
  • Math Expressions published by Houghton Mifflin Company (Fuson 2006a)
  • Saxon Math (Saxon) published by Harcourt Achieve (Larson 2004)
  • Scott Foresman-Addison Wesley Mathematics (SFAW) published by Pearson Scott Foresman (Charles, Crown, Fennel, Caldwell, Cavanagh, Chancellor, Ramirez, Ramos, Sammons, Schielack, Tate, Thompson, and Van de Walle 2005)

The process for selecting the curricula began with the study team inviting developers and publishers of early elementary school math curricula to submit a proposal to include their curricula in the evaluation. A panel of outside experts in math and math instruction then reviewed the submissions and recommended to IES curricula suitable for the study. The goal of the review process was to identify widely used curricula that draw on different instructional approaches and that hold promise for improving student math achievement.

Study Design. An experimental design was used to evaluate the relative effects of the study’s four curricula. The design randomly assigned schools in each participating district to the four curricula, thereby setting up an experiment in each district. The relative effects of the curricula were calculated by comparing math achievement of students in the four curriculum groups.

The study does not include a control group of schools (or a “business as usual” group) that continue to use whatever math curriculum they were using before joining the study. The study team decided not to include such a control group because it would contain a variety of curricula used by the participating districts, thereby making it difficult to compare effects of the study’s curricula to effects for this group.

Participating Districts and Schools. The study compares the effects of the selected curricula on math achievement of students in disadvantaged schools. The study team identified and recruited districts that (1) have Title I schools, (2) are geographically dispersed, and (3) contain at least four elementary schools interested in study participation, so all four of the study’s curricula could be implemented in each district.

Participating sites are not a representative sample of districts and schools, because interested sites are likely to be unique in ways that make it difficult to select a representative sample. Interested districts were willing to use all four of the study’s curricula, allowed the curricula to be randomly assigned to their participating schools, and were willing to have the study team test students and collect other data required by the evaluation (as described below). It would have been extremely costly to recruit a representative sample of districts and schools that met these criteria.

The 39 schools examined in this report are contained in four districts that are geographically dispersed in four states and in three regions of the country (Northeast, Midwest, and West). The districts also fall in areas with different levels of urbanicity.

In this first cohort, curriculum implementation occurred in the first grade during the 2006- 2007 school year. Data were collected from the 131 first-grade teachers in the study schools, and from 1,309 students—a random sample of about 10 students in each classroom was sufficient to support the analyses. Each of the four curricula was assigned about 10 schools with 33 classrooms and 325 students. The table below presents the exact number of schools, classrooms, and students included in the analysis, in total and by curriculum group.

(Refer to Number of Cohort-One Schools, Classrooms, and Students, in Total and by Curriculum)

An inspection of baseline school, teacher, and student characteristics shows that random assignment achieved its objective of creating four groups with similar characteristics before curriculum implementation began. The baseline characteristics include 7 school characteristics (see Table III.1 in the body of the report) 21 teacher characteristics (see Table II.1 in the body of the report), and 7 student characteristics (see Table III.2 in the body of the report), including student fall math achievement. Statistical tests indicate that none of the school and student characteristics are significantly different at the 5 percent level of confidence across the curriculum groups.1 One of the 21 teacher characteristics (race) is significantly different across the curriculum groups;2 however, as described in Chapter III, the approach for calculating curriculum effects adjusted for teacher race.

Statistical Power. The effect size that can be detected with the first cohort is as small as 0.22, where effect size is defined as a fraction of the standard deviation of the test score. Specifically, the minimum detectable effect (MDE) equals the difference in average student math scores of any two curriculum groups, divided by the pooled standard deviation of the score for the two curricula being compared.3

The MDE of 0.22 means that, when comparing student achievement of any two curriculum groups, it must differ by at least 15 percent of the gain made by the average first grader from a low income family to be detectable in this report. Chapter I provides more details about the computation of the MDE and what it represents.

Outcome Measure and Other Data Collection. To measure the achievement effects of the curricula, the study team tested students at the beginning and end of the school year using the math assessment developed for the Early Childhood Longitudinal Study-Kindergarten Class of 1998-99 (ECLS-K) (West, Denton, and Germino-Hausken 2000). The ECLS-K assessment is a nationally normed test that meets the study’s requirements of: assessing knowledge and skills mathematicians and math educators feel are important for early elementary school students to develop; having accepted standards of validity and reliability; being administered to students individually; being able to measure achievement gains over the study’s grade range (which ultimately will include the first, second, and third grades); and being able to accurately capture achievement of students from a wide range of backgrounds and ability levels.

Another important feature of the ECLS-K assessment is that it is an adaptive test, which is an approach used to measure achievement that is tailored to a student’s achievement level. In particular, the test begins by administering to each student a short, first-stage routing test used to broadly measure each examinee’s achievement level. Depending on the score on the routing test, the student is then administered one of three longer, second-stage tests: (1) an easy test, (2) a middle-difficulty test, or (3) a difficult test. Some of the items on the second-stage tests overlap, and this overlap is used to place the scores on the different tests on the same scale. Item response theory (IRT) techniques (Lord 1980) were used to develop the scale score, which, according to the test developers, are the appropriate scores to analyze for our purposes (Rock and Pollack 2002).4 Adaptive tests are useful for measuring achievement because they limit the amount of time children are away from their classrooms and reduce the risk of ceiling or floor effects in the test score distribution—something that can have adverse effects on measuring achievement gains.

The assessment includes questions in the five math content areas: (1) Number Sense, Properties, and Operations, (2) Measurement, (3) Geometry and Spatial Sense, (4) Data Analysis, Statistics, and Probability, and (5) Patterns, Algebra, and Functions. The items in each of the second-stage tests administered to the study’s first graders can primarily be classified as Number Sense, Properties, and Operations, with the remainder from the other areas. The easy test contained only a few items from each of the remaining areas, whereas the middle-difficulty and difficult tests contained more such items. On the middle-difficulty test, the remaining items were mainly about Patterns, Algebra, and Functions, whereas those on the difficult test were mainly about Data Analysis, Statistics, and Probability.

To help interpret the measured effects of the curricula, teachers were surveyed about curriculum implementation. The survey data are useful for assessing teacher participation in curriculum training, usage of the assigned curriculum, and any supplementation with other materials. Teachers also reported their usage of the essential and secondary features of their assigned curriculum, which was useful for assessing adherence to each curriculum. Demographic information about teachers also was collected through the surveys, and student demographics were obtained from school records.

Top

1 The 5 percent level of confidence means there is no more than a 5 percent chance that the finding (that none of the school and student characteristics are different across the curriculum groups) could have occurred by chance.
2 At least 93 percent of Investigations, Math Expressions, and Saxon teachers classified themselves as white, whereas 78 percent of SFAW teachers did so.
3 The MDE calculation accounts for the extent to which students in the first cohort are clustered in classrooms and schools according to their baseline achievement, after adjusting for other baseline student, teacher, and school characteristics. The calculation also uses the Tukey-Kramer method (Tukey 1952, 1953; Kramer 1956) to account for the six unique pair-wise comparisons that can be made with the study’s four curricula: (1) Investigations relative to Math Expressions, (2) Investigations relative to Saxon, (3) Investigations relative to SFAW, (4) Math Expressions relative to Saxon, (5) Math Expressions relative to SFAW, and (6) Saxon relative to SFAW.
4 Student answers on the assessment were sent to the Educational Testing Service (ETS) for scoring—ETS was a developer of the ECLS-K Mathematics Assessment. A three-parameter IRT model was used to place scores from the different tests students took on the same scale. Reliabilities for the study’s sample (0.93 for the fall score and 0.94 for the spring score) were consistent with the national ECLS-K sample (Rock and Pollack 2002, pp. 5-7 through 5-9)—reliabilities are based on the internal consistency (alpha) coefficients. Also, there were no floor or ceiling effects observed in either the fall or spring scores.