Skip Navigation

An Evaluation of Number Rockets: A Tier 2 Intervention for Grade 1 Students At Risk for Difficulties in MathematicsAn Evaluation of Number Rockets: A Tier 2 Intervention for Grade 1 Students At Risk for Difficulties in Mathematics

Regional need and study purpose

Over the last decade much attention in education has focused on early reading skills (U.S. Department of Education 2008a). A recent National Mathematics Advisory Panel report revealed an urgent need to focus on early math skills as well (U.S. Department of Education 2008b). American students lag behind students of other industrialized nations in math. In the 2003 Trends in International Mathematics and Science Study the United States ranked only 12th among 25 countries in grade 4 math and 15th among 46 countries in grade 8 math (Gonzales et al. 2004). The trend continues in older students; on the 2003 Program for International Student Assessment, American students age 15 placed 27th in math among 39 countries (Lemke et al. 2004). In the Southwest Region specifically, there is a large discrepancy1 in math performance among the states. On the 2007 National Assessment of Educational Progress, New Mexico (3rd) and Oklahoma (9th) fared relatively well nationally, but Louisiana (41st), Texas (43rd), and Arkansas (44th) ranked substantially lower (U.S. Department of Education 2007). The poor performance of students in math, both nationally and regionally, is of concern to stakeholders and policymakers.

Under the reauthorization of the Individuals with Disabilities Education Act of 2004, schools can now use alternative methods to determine student eligibility for special education services. The act also encourages schools to intervene as soon as students begin to struggle, before their performance declines (Individuals with Disabilities Education Act of 2004; U.S. Department of Education 2008c). One method for intervening early with struggling students—and proposed as an alternative to the ability-achievement discrepancy for identifying students with learning disabilities and for reducing referrals for special education services (Individuals with Disabilities Education Act of 2004)—are response to intervention (RTI) models (Fuchs and Fuchs 2006).

In RTI models schools provide interventions of increasing intensity to struggling students to improve their achievement (Gersten et al. 2009). RTI model interventions are multitiered: tier 1 consists of research-based core instruction delivered in the classroom and differentiated instruction based on individual student needs; tier 2 (and higher) comprises increasing levels of targeted and individualized instruction. Essential to the RTI model are valid and feasible screening measures to identify students at risk of not achieving grade-level performance at the end of the school year (tier 1) and validated interventions provided in addition to classroom instruction (tier 2) for these students (Gersten et al. 2009). Although current legislation indicates that RTI can play a role in determining a child's eligibility for special education, the goal is to provide instructional assistance to students as soon as they begin falling behind.

Empirical studies of response to intervention in reading have demonstrated its feasibility and possible benefits (Burns, Appleton, and Stehouwer 2005; Case, Speece, and Molloy 2003; Vaughn, Linan-Thompson, and Hickman-Davis 2003). These preliminary results, however, were based on quasi-experimental study designs.2 Thus, while they were able to link interventions with improved student achievement, they could not claim that the intervention caused the improvement, as studies employing randomized controlled trials could. A recent review of tier 2 interventions in beginning reading identified 18 studies (of which only 5 were randomized controlled trials) that analyzed the effects of response to intervention on student reading achievement (Wanzek and Vaughn 2007). In math there are four such studies of tier 2 interventions that use random assignment (Newman-Gonchar, Clarke, and Gersten 2009).

Small group tutoring is one of the few tier 2 math interventions that has demonstrated positive results based on a smaller scale study employing random assignment. The intervention employs small groups to deliver instruction. The current study is an effectiveness evaluation of the Fuchs et al. (2005) efficacy study of the small group tutoring intervention in math for at-risk students in grade 1. Fuchs et al. demonstrate that the intervention is effective across several schools within a single district. The current study will provide rigorous causal evidence of whether the intervention can be effective in four urban districts in four states. If the intervention is effective on this larger scale, it may meet an important need for validated tier 2 math interventions suitable for use within RTI models.

Efficacy trials help researchers determine whether an intervention (in this case, small group tutoring) is feasible and practical for implementation in schools and whether the intervention produces the desired impact for a target population (in this case, improving math achievement for at-risk students). In the Fuchs et al. (2005) efficacy study small group tutoring was shown to be both feasible and practical and to improve student performance in math in one school district.

In effectiveness studies such as the current study researchers replicate interventions that have demonstrated positive effects in smaller efficacy trials. This is done across a variety of settings to determine whether the interventions still demonstrate positive results when implemented on a larger scale.

The goal of this study is to evaluate on a large scale the Fuchs et al. (2005) small group tutoring intervention for at-risk students in grade 1. The study addresses one question:

This study is not designed to provide rigorous causal evidence of program impacts for specific student subgroups, such as English language learner students. The tutoring was available for English-speaking students only and did not address issues that English language learner students at risk in math might have. In addition, the results represent only between 70–80 percent of the students in grade 1 at participating schools because not all parents consented to their child's participation in the study. The study was also restricted to four large urban school districts in the Southwest Region. If the intervention is found effective in the current study, it would provide additional evidence (beyond Fuchs et al. 2005) that the intervention is generalizable to other settings and can be implemented at scale. However, the study design does not enable conclusive statements to be made about generalization. The schools and districts were not sampled at random, nor were they representative of any population. Results from this study will not necessarily apply to other settings, such as districts in rural areas or districts with different demographics.

1 The ability-achievement discrepancy is typically defined as a significant difference between a child's intelligence test score and scores on achievement tests (Sattler 2001).

2 The gold standard for research design is the randomized controlled trial (Shadish, Cook, and Campbell 2002). These designs are characterized by random assignment of individuals (or units) to two or more experimental conditions. They provide the strongest evidence of causality, supporting statements that the intervention condition was the cause of an observed effect. Quasi-experimental studies, often those reporting only correlations or regression analyses, do not employ random assignment and may not have control or comparison conditions. The lack of random assignment increases the possibility that other variables, not controlled by the researcher, may have caused any observed effects or relationships. Quasi-experimental studies do not provide the same level of evidence to support causal statements—such as "this intervention caused this outcome"—as do randomized controlled trials.

Intervention description

Instructional interventions in early math should emphasize developing an understanding of numbers, or number sense—the foundation for subsequent math skills (Gersten and Chard 1999).3 In addition to focusing on critical math content, such interventions should incorporate key design principles. Three recent meta-analyses4 incorporated 111 studies on low-achieving students (Baker, Gersten, and Lee 2002) and students with learning disabilities (Kroesbergen and Luit 2003; Gersten et al. 2009). Their results point to three critical design principles of effective math interventions: explicit and systematic instruction, visuals and models, and think alouds.5

Addressing all three design principles, small group tutoring incorporates systematic and explicit instruction that provides students with clear steps to follow and visuals and models to work with (Fuchs et al. 2005). The teacher describes and models the steps to solve the problem using examples before the students solve similar problems independently. Teachers also provide individual feedback to students on their work.

In a meta-analysis of research on the effects of grouping practices on reading outcomes for students with reading disabilities, small group instruction was found to result in the largest positive impacts on student achievement (Elbaum et al. 2000). Similarly, in a related meta-analysis for students without disabilities, students who received small group instruction learned significantly more than those not taught in small groups (Lou et al. 1996). Positive impacts for small group instruction in math and science achievement have also been documented (Springer, Stanne, and Elizabeth 1999), though most of the studies reviewed were quasi experimental and unable to provide the highest level of evidence.

In the Fuchs et al. 2005 study—one of the four studies focusing on tier 2 interventions for math that did use random assignment—students were assigned to either at risk or not at risk groups based on their screening test scores. Approximately 21 percent of students screened were identified as at risk. In each class students at risk were randomly assigned to receive either small group tutoring (intervention) or no tutoring (control). The intervention group received approximately 16 weeks of small group tutoring. At the end of this period, students' math skills were assessed using several measures—the Woodcock-Johnson III test of calculation (Woodcock, McGrew, and Mather 2000), a test of concepts and applications (Fuchs, Hamlett, and Fuchs 1990), and a test of story problems (Jordan and Hanich 2000). Tutored students received higher scores than control students. The following effect sizes6 demonstrate the magnitude of the impact of the small group tutoring intervention on students' test scores: 0.57 standard deviations on the Woodcock-Johnson III calculation test, 0.67 standard deviations for concepts and applications, and 0.70 standard deviations for story problems. A recent estimate of typical change on standardized test performance from the end of kindergarten to the end of grade 1 in math is an effect size of 1.14 (Hill et al. 2007). The results from Fuchs et al. (2005) are thus roughly equivalent to 10 months of at-risk student progress accomplished in 5 months of instruction.

3 Number sense is defined as having important central features such as: "a child's fluidity and flexibility with numbers, the sense of what numbers mean, and an ability to perform mental mathematics and to look at the world and make comparisons" (Gersten and Chard 1999, p. 20).

4 A meta-analysis is "a form of survey research in which research reports, rather than people, are surveyed" (Lipsey and Wilson 2001, p. 1). Meta-analyses encompass several procedures to formally aggregate findings across studies.

5 Think aloud is an instructional strategy in which the teacher describes his or her thought processes during the problem-solving process.

6 An effect size is a measure of change or of a relationship between two variables, usually expressed in standard deviation units on the normal curve (Cohen 1988). It can be used to express the impact of an intervention. For example, if an intervention had an effect size of 0.50 (and operated equally on all individuals in a normal distribution), then an individual originally at the 10th percentile would move to the 23rd percentile. The relationship between standard deviations and percentile rank is not linear. Given a fixed effect size, the amount an individual would shift in percentile units depends on initial location on the normal curve.

Study design

For this effectiveness study one district was selected from each of four Southwest Region states (Arkansas, Louisiana, New Mexico, and Texas) to evaluate the Fuchs et al. (2005) intervention in multiple populations. The study was conducted in 76 schools (38 intervention, 38 control) in four urban districts during the 2008/09 academic year. All grade 1 students receiving core math instruction in English in the general classroom were eligible. Parental consent was obtained for approximately 75 percent of eligible students, who were then enrolled in the study. Of the nearly 3,000 students screened, some 994 were identified as at risk (615 intervention, 379 control)7.

Before random assignment, schools in districts were matched—based on percentage of students eligible for free or reduced-price lunch (a proxy for family income) and overall school achievement (as measured by state assessments averaged across the last three years)—to ensure that schools with similar student achievement and family incomes were compared. One school in each pair was then randomly assigned to the intervention condition and the other to the control condition.

Intervention schools provided tutoring sessions in addition to the regular core math instruction. Control schools followed business as usual, conducting their regular core math curriculum and classroom activities. District leaders agreed that no formal supplemental math programs other than the small group tutoring intervention would be used outside the classroom in study schools. Beginning in December 2008, at-risk students in intervention schools met in groups of two or three students for three or four 40-minute sessions a week, for a total of 48 lessons over approximately 17 weeks of instruction.

In mid-November 2008, before the intervention began, tutors received training from staff experienced in curriculum development and in training others to deliver tutoring services. In the day-long training session tutors received an overview of the program and key components, particularly the need to adhere strictly to the scripted lessons. Training consisted of explicit instruction followed by iterative role play and peer review. Follow-up coaching was provided by the initial trainers in two subsequent meetings in January and February and by phone and email throughout the intervention period. Tutors were licensed elementary school teachers, often retired teachers or substitute teachers from the district. Tutors met with multiple student groups, though the students and tutor to whom students were assigned did not change throughout the study. Tutored students did not miss their regular classroom instruction in math, but did miss instruction in other disciplines.

This study employs a pre-post design. Students were screened before the tutoring began and will be assessed when it ends to determine whether the intervention group outperformed the control group on a measure of math achievement (the Test of Early Mathematics Ability–Third Edition; TEMA-3).

In October and November 2008 all students in grade 1 who were receiving core math instruction in English and whose parents provided signed consent were individually administered a math screening test. This screening test included six subtests taken from four sources:

Students were ranked using the composite score from the six subtests. The lowest performing 35 percent of students were identified as at risk; the rest of the students, who were not identified as such for this study, did not participate further. In the intervention schools students identified as at risk received the small group tutoring intervention beginning in December 2008 and lasting until April and May 2009.

In empirical studies such as this one the smaller the sample size, the larger the difference must be on the final assessment between the control and intervention groups in order to attribute the difference to the intervention rather than chance. In this study 60 schools were targeted for participation; however, to ensure that enough data were collected to detect even smaller impacts of the tutoring, another 14 schools were added. This provided a wider variety of participating schools regionally, reduced the risk that the school attrition rate would jeopardize the successful completion of the study, and accounted for the possibility that some tutors would not implement the intervention with fidelity. The statistical probability (power) that the final analysis will find a meaningful difference (minimal detectable effect size) of 0.27 standard deviation is 80 percent if such an effect at that level or greater exists. Adding 14 schools lowers the minimal detectable effect size to 0.23. The lower the minimal detectable effect size, the more likely the study is to detect the intervention's impact. The sample size of the current study is large enough to allow a smaller effect to be detected than in the Fuchs et al. (2005) study.

7 The unbalanced sample sizes between the intervention and control groups was unexpected. Although schools were not matched on enrollment, there was an expectation, everything else remaining the same, that approximately equal numbers of at-risk students would be identified in each of the experimental conditions. Consent forms were identical for both intervention and control schools, and parents were not informed of the assignment status of their child's school. School-level random assignment was conducted before the consent process was initiated; therefore, school personnel knew the assignment status of their school. Because students in control schools would not receive tutoring, the study team believes there was less incentive for school staff to ensure consent forms were returned, leading to lower return rates in control schools. Despite the difference in sample size by condition, no differences between the two groups were found on the baseline screening test scores. In addition, logistic regression analyses indicated that school assignment was not a predictor of whether a student was identified as at risk. Both findings suggest that differential consent form return rates by condition did not introduce differences between the groups in terms of mathematics ability before the start of the intervention.

Key outcomes and measures

The primary outcome of interest in this study is math achievement as measured by the TEMA-3, an individually administered assessment that can detect changes in the performance of below grade-level students (Ginsberg and Baroody 2003). The test was normed on a nationally representative sample and is reported by the publisher to be well validated and widely used as a measure of math achievement (Ginsburg and Baroody 2003). The test was administered in spring 2009 to all students participating in the study.

Data collection approach

All data collection occurred over the course of a single academic year (2008/09). Student roster and demographic information were acquired from districts in August and September 2008. Consent forms were distributed by the classroom teachers to students and parents in September and October.

In October and November 2008 all students with parental consent were assessed using the 25-minute, individually administered, six-subtest math screening test. The assessment teams consisted of individuals (the assessors) with bachelor degrees and experience in schools. Before the screening test was administered, the research team provided the assessment teams with a half-day training session that included peer practice and peer review. The results were reviewed for quality, and feedback was provided to the assessors within 24 hours of administering the screening tests. Assessors were not informed of schools' status (intervention or control).

During the intervention implementation fidelity was monitored through audio-recordings of all tutoring sessions by the tutors. The research team will assess 3 of each tutor's 48 sessions using a quality checklist aligned to critical components of the intervention. The tutors were not told which sessions were preselected for review. This information will be used to describe the quality of implementation of the intervention.

Posttests were conducted in April and May, depending on the district calendar. All students were individually administered one math outcome assessment (TEMA-3), as well as the Woodcock –Johnson III Letter-Word subtest to assess the impact of the small group tutoring intervention on reading achievement, which could have been affected when tutored students missed classroom instruction in areas other than math.

Analysis plan

Each school pair (intervention-control) in this study represents a mini-experiment. After students complete the posttests, a TEMA-3 score is calculated for each school, which represents the average performance of at-risk students in that school. For each school pair the performance of at-risk intervention students on the test is compared with that of at-risk control students. An intervention effect is calculated for each of the 38 school pairs. These are then combined to calculate an overall intervention effect across all schools and districts; in addition, the sample size in each school is considered when the overall intervention effect is calculated.

Russell Gersten, PhD, Director of Research, REL Southwest; Eric Rolfhus, PhD, Senior Scientist, REL Southwest; and Ben Clarke, PhD, Senior Researcher, Instructional Research Group.

Contact information

Eric Rolfhus
Regional Educational Laboratory Southwest
9901 IH-10 West, Suite 1000
San Antonio, TX 78230
(210) 558-4142
erolfhus@edvanceresearch.com

Region: Southwest

References

Baker, S., Gersten, R., and Lee, D. (2002). A synthesis of empirical research on teaching mathematics to low-achieving students. The Elementary School Journal, 103 (1), 51–73.

Burns, M.K., Appleton, J.J., and Stehouwer, J.D. (2005). Meta-analytic review of Responsiveness-to-Intervention research: examining field-based and research-implemented models. Journal of Pscyhoeducational Assessment, 23, 381–94.

Case, R., Okamoto, Y., Griffin, S., McKeough, A., Bleiker, C., Henderson, B., and Stephenson, K.M. (1996). The role of central conceptual structures in the development of children's thought. Monographs of the Society for Research in Child Development, 61 (1–2), 83–102.

Case, L.P., Speece, D.L., and Molloy, D.E. (2003). The validity of a response-to-instruction paradigm to identify reading disabilities: A longitudinal analysis of individual differences and contextual factors. School Psychology Review, 32 (4), 557–82.

Clarke, B., Baker, S., Chard, D., and Otterstedt, J. (2006). Developing and validating measures of number sense to identify students at risk for mathematics disabilities (Technical Report 0307). Eugene, OR: Pacific Institutes for Research.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.

Elbaum, B., Vaughn, S., Hughes, M., Moody, S., and Schumm, J. (2000). How reading outcomes of students with disabilities are related to instructional grouping formats: a meta-analytic review. In R. Gersten, E. Schiller, and S. Vaughn (Eds.), Contemporary Special Education Research. Mahwah, NJ: Lawrence Erlbaum Associates.

Fuchs, L.S., Compton, D.L., Fuchs, D., Paulsen, K., Bryant, J.D., and Hamlett, C.L. (2005). The prevention, identification, and cognitive determinants of math difficulty. Journal of Educational Psychology, 97 (33), 493–513.

Fuchs, D., and Fuchs, L.S. (2006). Introduction to response to intervention: what, why, and how valid is it? Reading Research Quarterly, 41 (1), 92–99.

Fuchs, L.S., Hamlett, C.L., and Fuchs, D. (1990). Curriculum-based math computation and concepts/applications. (Available from L. S. Fuchs, 328 Peabody, Vanderbilt University, Nashville, TN 37203).[Unpublished paper].

Geary, D.C. (1993). Mathematical disabilities: cognitive, neuropsychological, and genetic components. Psychological Bulletin, 114 (2), 345–62.

Gersten, R., and Chard, D. (1999). Number sense: Rethinking arithmetic instruction for students with mathematical disabilities. The Journal of Special Education, 33(1), 18–28.

Gersten, R., Chard, D., Jayanthi, M., Baker, S., Morphy, P., and Flojo, J. (2009). A Meta-analysis of Mathematics Instructional Interventions for Students with Learning Disabilities: A Technical Report. Los Alamitos, CA: Instructional Research Group.

Gersten, R., Compton, D., Connor, C.M., Dimino, J., Santoro, L., Linan-Thompson, S., and Tilly, W.D. (2008). Assisting students struggling with reading: response to intervention and multi-tier intervention for reading in the primary grades. A practice guide. (NCEE 2009-4045). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance. Retrieved April 10, 2009, from http://ies.ed.gov/ncee/wwc/publications/practiceguides/.

Ginsburg, H.P., and Baroody, A.J. (2003). Test of early mathematics ability–Third edition. Pro-Ed: Austin, TX.

Gonzales, P., Guzmán, J.C., Partelow, L., Pahlke, E., Jocelyn, L., Kastberg, D., and Williams, T. (2004). Highlights from the Trends in International Mathematics and Science Study (TIMSS) 2003 (NCES 2005–005). Washington, DC: U.S. Department of Education, National Center for Education Statistics.

Hill, C.J., Bloom, H.S., Black, A.R., and Lipsey, M.W. (2007, July). Empirical benchmarks for interpreting effect sizes in research. MDRC working papers on research methodology. Washington, DC: MDRC.

Jordan, N.C., and Hanich, L.B. (2000). Mathematical thinking in second grade children with different forms of LD. Journal of Learning Disabilities, 23 (6), 567–78.

Jordan, N.C., Kaplan, D., Locuniak, M.N. and Ramineni, C. (2007). Predicting first-grade math achievement from developmental number sense trajectories. Learning Disabilities Research & Practice, 22 (1), 36–46.

Kroesbergen, E.H., and Van Luit, J.E.H. (2003). Mathematics interventions for children with special education needs: A meta-analysis. Remedial and Special Education, 24 (2), 97–114.

Lemke, M., Sen, A., Pahlke, E., Partelow, L., Miller, D., Williams, T., Kastberg, D., and Jocelyn, L. (2004). International outcomes of learning in mathematics literacy and problem solving: PISA 2003. Results from the U.S. perspective. (NCES 2005–003). Washington, DC: U.S. Department of Education, National Center for Education Statistics.

Lipsey, M.W., and Wilson, D.B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.

Lou, Y., Abrami, P.C., Spence, J.C., Poulsen, C., Chambers, B., and d'Apollonia, S. (1996). Within-class grouping: A meta-analysis. Review of Educational Research, 66(4), 423–58.

Newman-Gonchar, R., Clarke, B., and Gersten, R. (2009). A summary of nine key studies: Multi-tier intervention and response to interventions for students struggling in mathematics. Portsmouth, NH: RMC Research Corporation, Center on Instruction.

Raudenbush, S.W. and Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage Publications.

Sattler, J.M. (2001). Assessment of children: Cognitive applications. La Mesa, CA: Jerome M. Sattler Publisher.

Shadish, W.R., Cook, T.D., and Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin.

Springer, L., Stanne, M.E., and Donovan, S.S. (1999). Effects of small-group learning on undergraduates in science, mathematics, engineering, and technology: a meta-analysis. Review of Educational Research, 69 (1), 21–51.

U.S. Department of Education. (2007). National Assessment of Educational Progress at grades 4 and 8 (NCES 2007–494). Washington, DC: U.S. Department of Education, National Center for Education Statistics.

U.S. Department of Education. (2008a). Reading First. Washington, DC: U.S. Department of Education. Retrieved November 1, 2008, from http://www.ed.gov/programs/readingfirst/index.html.

U.S. Department of Education. (2008b). Foundations for success: report of the National Mathematics Advisory Panel. Washington, DC: U.S. Department of Education. Retrieved November 1, 2008, from http://www.ed.gov/about/bdscomm/list/mathpanel/reports.html.

U.S. Department of Education. (2008c). Building the legacy: IDEA 2004. Washington, DC: U.S. Department of Education. Retrieved November 1, 2008, from http://idea.ed.gov/explore/home.

Vaughn, S., Linan-Thompson, S., and Hickman-Davis, P. (2003). Response to treatment as a means of identifying students with reading/learning disabilities. Exceptional Children, 69 (4), 391–409.

Wanzek, J., and Vaughn, S. (2007). Research-based implications from extensive early reading interventions. School Psychology Review, 36, 541–61.

Woodcock, R.W., McGrew, K.S., and Mather, N. (2001). Woodcock-Johnson–III. Rolling Meadows, IL: Riverside Publishing Company.

Return to Index