NCEE Blog

National Center for Education Evaluation and Regional Assistance

Why We Still Can Learn from Evaluations that Find No Effects

By Thomas Wei, Evaluation Team Leader

As researchers, we take little pleasure when the programs and policies we study do not find positive effects on student outcomes. But as J.K. Rowling, author of the Harry Potter series, put it: there are “fringe benefits of failure.” In education research, studies that find no effects can still reveal important lessons and inspire new ideas that drive scientific progress.

On November 2, The Institute of Education Sciences (IES) released a new brief synthesizing three recent large-scale random assignment studies of teacher professional development (PD). As a nation we invest billions of dollars in PD every year, so it is important to assess the impact of those dollars on teaching and learning. These studies are part of an evaluation agenda that IES has developed to advance understanding of how to help teachers improve.

One of the studies focused on second-grade reading teachers, one on fourth-grade math teachers, and one on seventh-grade math teachers. The PD programs in each study emphasized building teachers’ knowledge of content or content-specific pedagogy. The programs combined summer institutes with teacher meetings and coaching during the school year. These programs were compared to the substantially less intensive PD that teachers typically received in study districts.

All three studies found that the PD did not have positive impacts on student achievement. Disappointing? Certainly. But have we at least learned something useful? Absolutely.

For example, the studies found that the PD did have positive impacts on teachers’ knowledge and some instructional practices. This tells us that intensive summer institutes with periodic meetings and coaching during the school year may be a promising format for this kind of professional development. (See the graphic above and chart below, which are from a snapshot of one of the studies.)

But why didn’t the improved knowledge and practice translate to improved student achievement? Educators and researchers have long argued that effective teachers need to have strong knowledge of the content they teach and know how best to convey the content to their students. The basic logic behind the content-focused PD we studied is to boost both of these skills, which were expected to translate to better student outcomes. But the findings suggest that this translation is actually very complex. For example, at what level do teachers need to know their subjects? Does a third-grade math teacher need to be a mathematician, just really good at third-grade math, or somewhere in between? What knowledge and practices are most important for conveying the content to students? How do we structure PD to ensure that teachers can master the knowledge and practice they need?

When we looked at correlational data from these three studies, we consistently found that most of the measured aspects of teachers’ knowledge and practice were not strongly related to student achievement. This reinforces the idea that we may need to find formats for PD that can boost knowledge and practice to an even larger degree, or we may need to find other aspects of knowledge and practice for PD to focus on that are more strongly related to student achievement. This is a critical lesson that we hope will inspire researchers, developers, and providers to think more carefully about the logic and design of content-focused PD.

Scientific progress is often incremental and painstaking. It requires continual testing and re-testing of interventions, which sometimes will not have impacts on the outcomes we care most about. But if we are willing to step back and try to connect the dots from various studies, we can still learn a great deal that will help drive progress forward.

 

Why We Need Large-Scale Evaluations in Education

By Elizabeth Warner, Team Leader for Teacher Evaluations, NCEE

These days, people want answers to their questions faster than ever, and education research and evaluation is no exception. In an ideal world, evaluating the impact of a program or policy would be done quickly and at a minimal expense. There is increasing interest in quicker-turnaround, low-cost studies – and IES offers grants specifically for these types of evaluations.

But in education research, quicker isn’t always better. It depends, in part, on the nature of the program you want to study and what you want to learn.

Consider the case of complex, multi-faceted education programs.  By their very nature, complex programs may require several years for all of the pieces to be fully implemented. Some programs also may take time to influence behavior and desired outcomes.  A careful and thorough assessment of these programs may require an evaluation that draws on a large amount of data, often from multiple sources over an extended period of time. Though such studies can require substantial resources, they are important for understanding whether and why an investment had an intended effect. This is especially true for Federal programs that involved millions of dollars of taxpayer money.

On August 24, IES plans to release a new report from a large-scale, multi-year study of a complex Federal initiative: the Teacher Incentive Fund (TIF) [1] This report – the third from this evaluation -- will provide estimated program impacts on student achievement after three years of implementation. The Impact Evaluation of the Teacher Incentive Fund is a $13.7 million study that stretches for more than six years and has reported interim findings annually since 2014. (The graphic above is from the a study snapshot of the first TIF evaluation report.) 

A “big study” of TIF was important to do, and here’s why.

Learning from the Teacher Incentive Fund

TIF is a Federal program that provides grants to districts that want to implement performance pay with the goal of improving teacher quality.  TIF grants awarded in 2010 included the requirement that performance pay be based on an educator evaluation system with multiple performance measures consistent with recent research. The grantees were also expected to use the performance measures to guide educator improvement.

The TIF evaluation provides an opportunity not only to learn about impacts on student achievement over time as the grant activities mature but also to get good answers to implementation questions such as:

  • How do districts structure the pay-for-performance bonus component of TIF?
  • Are educators even aware of the TIF pay-for-performance bonuses?
  • Do educators report having opportunities for professional development to learn about the measures and to improve their performance?
  • Do educators change their practice in ways that improve their performance measures? 
  • Are principals able to use their ability to offer pay-for-performance bonuses to hire or retain more effective teachers?  

Analyses to address these questions can suggest avenues for program improvement that a small-scale impact evaluation with limited data collection would miss. 

Studying the Initiative Over Time

With four years of data collection, the TIF evaluation is longer than most evaluations conducted by IES. But the characteristics of the TIF program made that extensive data collection important for a number of reasons.

 

First, TIF’s approach to educator compensation differs enough from the traditional pay structure that it might take time for educators to fully comprehend how it works. They likely need to experience the new performance measures to see how they might score and how that translates in terms of additional money earned. Second, educators might need to receive a performance bonus or see others receive one in order to fully believe it is possible to earn a bonus. (The chart pictured here is from the second report and compares teachers’ understanding with the actual size of the maximum bonuses that they could receive. This chart will be updated in the third report.)

That also might help them better understand what behaviors are needed to earn a bonus.  Finally, time might be needed for educators to respond to all of the intended policy levers of the program, particularly related to recruitment and hiring.

The TIF evaluation is designed to estimate an impact over the full length of the grants as well as provide rich information to improve the program. Sometimes, even after a number of years, some aspects of a program are never fully implemented. Thus, it may take time to see if whether it is even possible to implement a complex policy like TIF with fidelity.  An evaluation that only looks at initial implementation of a complex program may miss important program components that are incorporated or refined with time. Also, it may take several years to determine if the intended educator behaviors and desired outcomes of the policy are realized.

Learning how and why a policy does or does not work is central to program improvement.  For large, complex programs like TIF, this assessment is only possible with data-rich study over an extended period of time.     

 

[1] Under the newly reauthorized Elementary and Secondary Education Act, this program is now called the Teacher and School Leader Incentive Program, but for this blog, we will call it by its original name.

Sustaining School Improvement

By Thomas Wei, Evaluation Team Leader, NCEE

NOTE: In an effort to turn around the nation’s chronically low-performing schools, the Department of Education injected more than $6 billion into the federal School Improvement Grants (SIG) program over the past several years. SIG schools received a lot of money for a short period of time—up to $6 million over three years—to implement a number of prescribed improvement practices.

What is the prognosis for low-performing schools now that many federal School Improvement Grants (SIG) are winding down? This is an important question that the National Center for Education Evaluation and Regional Assistance (NCEE) addressed through its Study of School Turnaround

The second and final report from this study was released on April 14 and describes the experiences of 12 low-performing schools as they implemented SIG from 2010 to 2013 (Read a blog post on the first report). Findings are based on analyses of teacher surveys and numerous interviews with other school stakeholders, such as district administrators, principals, assistant principals, members of the school improvement team, instructional coaches, and parents.

After three years trying a diverse array of improvement activities ranging from replacing teachers to extending learning time to installing behavioral support systems, most of the 12 schools felt they had changed in primarily positive ways (see chart below from report).

The report also found that schools with lower organizational capacity in the first year of SIG appeared to boost their capacity by the final year of SIG. At the same time, schools with higher capacity appeared generally able to maintain that capacity.

Many experts believe that organizational capacity is an important indicator of whether a low-performing school can improve (see chart below showing schools with higher organizational capacity also appeared more likely to sustain improvements). Organizational capacity is indicated by for example, how strong a leader the principal is, how consistent school policies are with school goals, how much school leaders and staff share clear goals, how much collaboration and trust there is among teachers, and how safe and orderly the school climate is.

Despite these promising results, the report found that the overall prospects for sustaining any improvements appeared to be fragile in most of these 12 schools. The report identified four major risk factors, including (1) anticipated turnover or loss of staff; (2) leadership instability; (3) lack of district support, particularly with regard to retaining principals and teachers; and (4) loss of specific interventions such as professional learning or extended day programs. Most of the case study schools had at least one of these major risk factors, and a number of schools had multiple risk factors.

It is important to note that this study cannot draw any causal conclusions and that it is based on surveys and interviews at a small number of schools that do not necessarily reflect the experiences of all low-performing schools. Still, it raises interesting questions for policymakers as they consider how best to deploy limited public resources in support of future school improvement efforts that will hopefully be long-lasting.

NCEE has a larger-scale study of SIG underway that is using rigorous methods to estimate the impact of SIG on student outcomes. The findings from the case studies report released last week may yield important contextual insights for interpreting the overall impact findings. These impact findings are due out later this year, so stay tuned.

How to Help Low-performing Schools Improve

By Thomas Wei, Evaluation Team Leader

NOTE: Since 2009, the Department of Education has invested more than $6 billion in School Improvement Grants (SIG)SIG provided funds to the nation’s persistently lowest-achieving schools to implement one of four improvement models. Each model prescribed a set of practices, for example: replacing the principal, replacing at least 50 percent of teachers, increasing learning time, instituting data-driven instruction, and using “value-added” teacher evaluations.

Other than outcomes, how similar are our nation’s low-performing schools? The answers to this question could have important implications for how best to improve these, and other, schools. If schools share similar contexts, it may be more sensible to prescribe similar improvement practices than if they have very different contexts.

This is one of the central questions the National Center for Education Evaluation and Regional Assistance is exploring through its Study of School Turnaround. The first report (released in May 2014) described the experiences of 25 case study schools in 2010-2011, which was their first year implementing federal School Improvement Grants (SIG).

The report found that even though the 25 SIG schools all struggled with a history of low performance, they were actually quite different in their community and fiscal contexts, their reform histories, and the root causes of their performance problems. Some schools were situated in what the study termed “traumatic” contexts, with high crime, incarceration, abuse, and severe urban poverty. Other schools were situated in comparatively “benign” contexts with high poverty but limited crime, homes in good repair, and little family instability. All schools reported facing challenges with funding and resources, but some felt it was a major barrier to improvement while others felt it was merely a nuisance. Some schools felt their problems were driven by student behavior, others by poor instruction or teacher quality, and still others by the school’s external context such as crime or poverty.

Given how diverse low-performing schools appear to be, it is worth wondering whether they need an equally diverse slate of strategies to improve. Indeed, the report found that the 25 case study schools varied in their improvement actions even with the prescriptive nature of the SIG models (see the chart above, showing school improvement actions used by sample schools).

It is important to note that this study cannot draw any causal conclusions and that it is based on a small number of schools that do not necessarily reflect the experiences of all low-performing schools. Still, policymakers may wish to keep this finding in mind as they consider how to structure future school improvement efforts.

The first report also found that all but one of the 25 case study schools felt they made improvements in at least some areas after the first year of implementing SIG. Among the issues studied in the second report, released April 14, 2016, is whether these schools were able to build on their improvements in the second and third year of the grant. Read a blog post on the second report.

UPDATED APRIL 18 to reflect release of second report.