Appendix A1.1 Study characteristics: Johnson & Hall, 2003 (quasi-experimental design)
| Characteristic | Description |
|---|---|
| Study citation | Johnson, J., & Hall, M. (2003). Technical report: Houghton Mifflin California math performance evaluation. Raleigh, NC: EDSTAR, Inc. |
| Participants | The participants in this study were second through fifth graders from 16 districts in California. The intervention group included 1601 schools from eight districts using Houghton Mifflin Mathematics. The comparison group included 137 schools in eight different districts. The intervention group was identified by Houghton Mifflin, which provided the names of eight districts in California that began using Houghton Mifflin Mathematics in 2002. Using data from the Quality Education Database, the California Department of Education, and the American Institutes for Research, comparison districts were matched based on prior math achievement scores, student demographic characteristics, and district sizes. |
| Setting | The participating school districts were located throughout California. |
| Intervention | The intervention group used the 2002 edition of Houghton Mifflin Mathematics and had completed their first year of implementing the curriculum during the 2001-2002 school year. |
| Comparison | There is no information in the study about the specific math programs used in the comparison school districts, except that the schools did not use Houghton Mifflin Mathematics. |
| Primary outcomes and measurement | The outcome measure was the total math score on the California statewide assessment, the Standardized and Reporting (STAR) Stanford 9 test, used during the 2000-01 and 2001-02 school years. (See Appendix A2 for more detailed descriptions of outcome measures.) The study authors reported scores as national percentile ranks, but the WWC reports scaled scores sent by the author in response to a data request, because scaled scores are more direct indicators of performance and do not require extrapolation based on national norms. |
| Teacher training | No information is available on the training or professional development provided to the teachers in the intervention group. |
|
1 Some of the grade level analyses contained fewer than 160 intervention schools because not all schools had all grade levels. |
|
Appendix A1.2 Study characteristics: EDSTAR, Inc., 2004 (quasi-experimental design)
| Characteristic | Description |
|---|---|
| Study citation | EDSTAR, Inc. (2004). Large-scale evaluation of student achievement in districts using Houghton Mifflin. Raleigh-Durham, NC: Author. |
| Participants | The participating 519 schools were selected from different regions of the country including the West (California), the Midwest (Illinois, Missouri, and Wisconsin), the Northeast (New Jersey and New York), and the Southeast (South Carolina). The grade levels evaluated varied by state: California, grades 2-5; South Carolina, grades 3-5; Missouri, New Jersey, New York, and Wisconsin, grade 4; Illinois, grades 3 and 5. The authors indicate that no attrition occurred in this study. Due to the confounding of the intervention effect with the effect of other district characteristics,1 the analysis was limited to a sample of 16 districts (eight pairs) and 212 schools in the three states that had multiple districts in the intervention and comparison groups: California, New Jersey, and South Carolina. |
| Setting | Districts were selected in various states to represent ranges in size, demographic characteristics, and student achievement. Within districts, schools were matched based on size of schools, student achievement level, school socioeconomic level, and school minority level. |
| Intervention | The eight districts in the intervention group had begun using Houghton Mifflin Mathematics in 2002-03. |
| Comparison | The comparison group used one of three types of math programs: reform, traditional, or balanced. The reform programs included Everyday Math, Mathland, and Excel Math. The traditional programs included Saxon and SRA. Scott Foresman 2000, Harcourt-Brace Mathematics, and Silver Burdett comprised the balanced programs. This WWC report focuses on an analysis of a reduced sample of states and therefore includes only comparison groups with balanced (California and South Carolina) and reform (New Jersey) programs. |
| Primary outcomes and measurement | The outcome measures were the state achievement tests used by each state in the study. Due to differences in state tests and state standards, results for each state were analyzed and evaluated separately. (See Appendix A2 for more detailed descriptions of outcome measures.) The study authors reported scores as percent of students at or above proficiency. |
| Teacher training | No information is available on the training or professional development provided to the teachers in the intervention group. |
|
1 For more information see the WWC Technical Paper on Teacher-Intervention Confound. |
|
Appendix A2 Outcome measures in the mathematics achievement domain
| Outcome measure | Description |
|---|---|
| Standardized and Reporting (STAR) Stanford 9 test | Johnson and Hall (2003) used the 2001 and 2002 Stanford 9 scaled test scores to measure mathematics achievement. The test scores were obtained from the California Department of Education website. |
| State achievement tests | EDSTAR, Inc. (2004) used state achievement tests from California, New Jersey, and South Carolina to measure students' mathematics achievement.1 for California, the authors used two tests from the Standardized Testing and Reporting (STAR) program of the California Assessment System: the California Standards Test and the Stanford 9 test. In 2003 the Stanford 9 test was replaced by another norm-referenced test, the California Achievement Test (as cited in EDSTAR, Inc., 2004). The California Standards Test was administered to grades 2-9 and the Stanford 9 test was administered to grades 2-11. In New Jersey, the state assessment was the Elementary School Proficiency Assessment (ESPA), which is administered to fourth-grade students. For South Carolina, the authors used results from the Palmetto Achievement Challenge Test, which was administered to students in grades 3-8. |
|
1 Additional outcome measures (state tests for Illinois, Missouri, and Wisconsin) were reported by the study authors but are not described here because these analyses were excluded from the WWC report due to a confound between the district and the intervention. |
|
Appendix A3 Summary of study findings included in the rating for the mathematics achievement domain1
| Author's findings from the study | ||||||||
|---|---|---|---|---|---|---|---|---|
| Mean outcome (standard deviation2) | WWC calculations | |||||||
| Outcome measure | Study sample | Sample size (Schools/districts, except where indicated) | Houghton Mifflin Mathematics group3 | Comparison group3 | Mean difference4 (Houghton Mifflin Mathematics -comparison) | Effect size5 | Statistical significance6 (at α= 0.05) | Improvement index7 |
| Johnson & Hall, 2003 (quasi-experimental design)8 | ||||||||
| CA STAR test: 2002 SAT9 mean scaled scores | 16 California school districts: grade 2 | 297/16 | 592.52 (nr) | 586.12 (nr) | 6.40 | na10 | ns | na10 |
| CA STAR test: 2002 SAT9 mean scaled scores | 16 California school districts: grade 3 | 296/16 | 618.04 (nr) | 615.11 (nr) | 2.93 | na10 | ns | na10 |
| CA STAR test: 2002 SAT9 mean scaled scores | 16 California school districts: grade 4 | 296/16 | 636.87 (nr) | 632.60 (nr) | 4.27 | na10 | ns | na10 |
| CA STAR test: 2002 SAT9 mean scaled scores | 16 California school districts: grade 5 | 293/16 | 657.34 (nr) | 654.13 (nr) | 3.21 | na10 | ns | na10 |
| Average9 for mathematics achievement (Johnson & Hall, 2003) | na10 | ns | na10 | |||||
| EDSTAR, Inc., 2004 (quasi-experimental design)8 | ||||||||
| NJ ASK4 exam: percent at or above proficiency, 2002-03 | New Jersey: grade 4 | 16/4 | 40.50 (nr) | 37.70 (nr) | 2.80 | na10 | ns | na10 |
| SC PACT exam: percent at or above proficiency, 2002–03 | South Carolina: grades 3-5 | 128/8 | 34.30 (nr) | 32.10 (nr) | 2.20 | na10 | ns | na10 |
| CA CAT/6 exam: percent at or above proficiency, 2002–03 | California: grades 2-5 | 68/4 | 36.40 (nr) | 38.70 (nr) | -2.30 | na10 | ns | na10 |
| Average9 for mathematics achievement (EDSTAR,Inc., 2004) | na10 | ns | na10 | |||||
| Average9 for mathematics achievement across all studies | na10 | na | na10 | |||||
|
ns = not statistically significant 2 The standard deviation across all students in each group shows how dispersed the participants' outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 3 The intervention and control group values are based on information provided by the authors for both the Johnson and Hall (2003) and EDSTAR, Inc. (2004) studies. These values may differ from what appeared in the original studies. 4 Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 5 For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations. 6 Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 7 The improvement index represents the difference between the percentile rank of the average student in the intervention condition and that of the average student in the comparison condition. The improvement index can take on values between -50 and +50, with positive numbers denoting favorable results. 8 The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools and for multiple comparisons. For an explanation about the clustering correction, see WWC Tutorial on Mismatch. See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate statistical significance. In the case of Johnson and Hall (2003) and EDSTAR, Inc. (2004), a correction for clustering was needed, so the statistical significance reported by the WWC may differ from that reported by the study authors. 9 The WWC-computed average effect size for each study and for the domain across studies are simple averages rounded to two decimal places. The average improvement indices are calculated from the average effect sizes. 10 Student-level standard deviations were not available for this study. In Johnson & Hall (2003), school-level standard deviations for grades 2 through 5 were 21.56, 20.65, 20.21, and 20.66 for the intervention group and 20.72, 20.00, 19.16, and 19.29 for the comparison group. In EDSTAR, Inc. (2004), school-level standard deviations for the New Jersey, South Carolina, and California samples were 22.00, 15.20, and 18.30 for the intervention group and 21.90, 13.10, and 16.60 for the comparison group. Because the student-level effect size and improvement index could not be computed, the magnitude of the effect size was not considered for rating purposes. However, the statistical significance for this study is comparable to other studies and is included in the intervention rating. For further details, please see Technical Details of WWC-Conducted Computations. |
||||||||
Appendix A4 Houghton Mifflin Mathematics rating for the mathematics achievement domain
The WWC rates the effects of an intervention in a given outcome domain as positive, potentially positive, mixed, no discernible effects, potentially negative, or negative.1
For the outcome domain of mathematics achievement, the WWC rated Houghton Mifflin Mathematics as having no discernible effects. It did not meet the criteria for positive effects because no studies met WWC evidence standards for a strong design or showed significant, positive effects. Further, it did not meet the criteria for other ratings (potentially positive, mixed, potentially negative, and negative effects) because neither of the two studies showed statistically significant or substantively important effects, either positive or negative.
| Rating received |
|---|
|
No discernible effects: No affirmative evidence of effects.
|
| Other ratings considered |
|
Positive effects: Strong evidence of a positive effect with no overriding contrary evidence.
|
|
Potentially positive effects: Evidence of a positive effect with no overriding contrary evidence.
|
|
Mixed effects: Evidence of inconsistent effects as demonstrated through either of the following criteria.
|
|
Potentially negative effects: Evidence of a negative effect with no overriding contrary evidence.
|
|
Negative effects: Strong evidence of a negative effect with no overriding contrary evidence.
|
|
1 For rating purposes, the WWC considers the statistical significance of individual outcomes and the domain level effect. The WWC also considers the size of the domain level effect for ratings of potentially positive effects. See the WWC Intervention Rating Scheme for a complete description. |
Appendix A5 Extent of evidence by domain
| Sample size | ||||
|---|---|---|---|---|
| Outcome domain | Number of studies | Schools | Studies | Extent of evidence1 |
| Math achievement | 2 | Over 800 | nr | Medium to large |
|
nr = not reported 1 A rating of "medium to large" requires at least two studies and two schools across studies in one domain, and a total sample size across studies of at least 350 students or 14 classrooms. Otherwise, the rating is "small." |
||||
|Institute of Education Sciences