Skip Navigation

What Works Clearinghouse


Middle School Math
Middle School Math
July 30, 2007

Methodology

One hundred and fifty-eight studies provided data on 34 middle school math curricula and were classified by the strength of their designs.1 To be fully reviewed, a study had to be a randomized controlled trial or a quasi experimental design with evidence of equating between the treatment and comparison groups.

Eligibility for review

Quasi experiments eligible for review include those equating through matching or statistical adjustment, regression discontinuity designs, and single case designs. However, no studies identified for the middle school math review used regression discontinuity or single case designs.

In judging the quality of the evidence, the review considered the properties of measurement instruments used in the studies, the percentage of the original study sample that was lost to follow-up, and any sample characteristics or events that might serve as alternative explanations for the observed effect. For details please see the WWC Evidence Standards. When results were reported for multiple time periods following sample enrollment, the longer term results were included in the review.

The research evidence for programs that have at least one study meeting WWC evidence standards with or without reservations is summarized in individual intervention reports posted on the WWC website. See http://www.whatworks.ed.gov. So far, 21 studies of 7 middle school programs have met evidence standards with or without reservations. The lack of evidence for the remaining programs does not mean that those programs are ineffective; some programs have not yet been studied using a study design that permits the WWC to draw any conclusions about their effectiveness. And some studies were not considered for rating of effectiveness purposes because insufficient information was reported to enable us to confirm statistical findings.

Rating of effectiveness

Each middle school math curriculum that had at least one study meeting WWC standards with or without reservations received a rating of effectiveness for math achievement. The rating of effectiveness aims to characterize the existing evidence base on the intervention within a given domain. The intervention effects based on the research evidence are rated as positive, potentially positive, mixed, no discernible effects, potentially negative, or negative.

The rating of effectiveness takes into account four factors: the quality of the research design, the statistical significance of the findings, the size of the difference between students in the intervention and the comparison conditions, and the consistency in findings across studies (see the WWC Intervention Rating Scheme).

The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools and for multiple comparisons. Because of these corrections, the level of statistical significance as calculated by the WWC may differ from the one originally reported by the study authors. For the formulas that we used to calculate statistical significance, see Technical Details of WWC-Conducted Computations. For an explanation, see the WWC Tutorial on Mismatch. If the average effect size across all outcomes in one study in a single domain is at least 0.25, it is considered substantively important, contributing toward the rating of effectiveness. See the technical appendices of the middle school math intervention report for further details.

Extent of evidence

The evidence base rating represents the size and number of independent samples that were assessed for the purposes of analysis of the program effects. A “medium to large” evidence base requires at least two studies and two schools across studies of at least 350 students or 14 classrooms. Otherwise, the evidence base is considered to be “small. ” The WWC is currently working to define a “large” evidence base. This term should not be confused with external validity, as other facets of external validity—such as variations in settings, important subgroups of students, implementation, and outcome measures—were not taken into account for the purposes of this rating.

Improvement index

The WWC computes an improvement index for each individual finding. In addition, within each outcome domain, the WWC computes an average improvement index for each domain and each study as well as a domain average improvement index across studies of the same intervention (see the Technical Details of WWC-Conducted Computations). The improvement index represents the difference between the percentile rank of the average student in the intervention condition and the percentile rank of the average student in the comparison condition. The improvement index can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. Unlike the rating of effectiveness, the improvement index is based only on the size of the difference between the intervention and the comparison conditions.

1 No empirical studies were identified for additional 14 programs during the time period of this review.

Top


PO Box 2393
Princeton, NJ 08543-2393
Phone: 1-866-503-6114