Twenty-some odd years ago as a college junior, I screamed in horror watching a friend open a running dishwasher. She wanted to slip in a lightly used fork. I jumped to stop her, yelling “don’t open it, can’t you tell it’s full of water?” She paused briefly, turning to look at me with a “have you lost your mind” grimace, and yanked open the door.
Much to my surprise, nothing happened. A puff of steam. An errant drip, perhaps? But no cascade of soapy water. She slid the fork into the basket, closed the door, and hit a button. The machine started back up with a gurgle, and the kitchen floor was none the wetter.
Until that point in my life, I had no idea how a dishwasher worked. I had been around a dishwasher, but the house I lived in growing up didn’t have one. To me, washing the dishes meant filling the sink with soapy water, something akin to a washer in a laundry. I assumed dishwashers worked on the same principle, using gallons of water to slosh the dishes clean. Who knew?
Lest you think me completely inept, a counterpoint. My first car was a 1979 Ford Mustang. And I quickly learned how that very used car worked when the Mustang’s automatic choke conked out. As it happens, although a choke is necessary to start and run a gasoline engine, that it be “automatic” is not. My father Rube Goldberg-ed up a manual choke in about 15 minutes rather than paying to have it fixed.
My 14-year-old self learned how to tweak that choke “just so” so that I could get to school each morning. First, pull the choke all the way out to start the car, adjusting the fuel-air mixture ever so slightly. Then gingerly slide it back in, micron by micron, as the car warms up and you hit the road. A car doesn’t actually run on liquid gasoline, you see. Cars run on fuel vapor. And before the advent of fuel injection, fuel vapor was courtesy your carburetor and its choke. Not a soul alive who didn’t know how a manual choke worked could have started that car.
You would be forgiven if, by now, you were wondering where I am going with all of this and how it relates to the evaluation of education interventions. To that end, I offer three thoughts for your consideration:
- Knowing that something works is different from knowing how something works.
- Knowing how something works is necessary to put that something to its best use.
- Most education research ignores the how of interventions, dramatically diminishing the usefulness of research to practitioners.
My first argument—that there is a distinction between knowing what works and how something works—is straightforward. Since it began, the What Works Clearinghouse™ has focused on identifying “what works” for educators and other stakeholders, mounting a full-court press on behalf of internal validity. Taken together, Version 4.1 of the WWC Standards and Procedures Handbooks total some 192 pages. As a result, we have substantially greater confidence today than we did a decade ago that when an intervention developer or researcher reports that something worked for a particular group of students, we know that it actually did.
In contrast, WWC standards do not, and as far as I can tell have not ever, addressed the how of an intervention. By “the how” of an intervention, I’m referring to the parts of it that must be working, sometimes “just so,” if its efficacy claims are to be realized. For a dishwasher, it is something like: “a motor turns a wash arm, which sprays dishes with soapy water.” (It is not, as I had thought, “the dishwasher fills with soapy water that washes the mac and cheese down the drain.”) In the case of my Mustang, it was: “the choke controls the amount of air that mixes with fuel from the throttle, before heading to the cylinders.”
If you have been following the evolution of IES’ Standards for Excellence in Education Research, or SEER, and its principles, you recognize “the how” as core components. Most interventions consist of multiple core components that are—and perhaps must—be arrayed in a certain manner if the whole of the thing is to “work.” Depicted visually, core components and their relationships to one another and to the outcomes they are meant to affect form something between a logic model (often too simplistic) and a theory of change (often too complex).
(A word of caution: knowing how somethings works is also different from knowing why something works. I have been known to ask at work about “what’s in the arrows” that connect various boxes in a logic model. The why lives in those arrows. In the social sciences, those arrows are where theory resides.)
My second argument is that knowing how something works matters, at least if you want to use it as effectively as possible. This isn’t quite as axiomatic as the distinction between “it works” and “how it works,” I realize.
This morning, when starting my car, I didn’t have to think about the complex series of events leading up to me pulling out of the driveway. Key turn, foot down, car go. But when the key turns and the car doesn’t go, then knowing something about how the parts of a car are meant to work together is very, very helpful. Conveniently, most things in our lives, if they work at all, simply do.
Inconveniently, we don’t have that same confidence when it comes to things in education. There are currently 10,677 individual studies in the What Works Clearinghouse (WWC) database. Of those, only about 11 percent meet the WWC’s internal validity standards. Among them, only 445 have at least one statistically significant positive finding. Because the WWC doesn’t consider results from studies that don’t have strong internal validity, it isn’t quite as simple as saying “only about 4 percent of things work in education.” Instead, we’re left with “89 percent of things aren’t tested rigorously enough to have confidence about whether they work, and when tested rigorously, only about 38 percent do.” Between the “file drawer” problem that plagues research generally and our own review of the results from IES efficacy trials, we have reason to believe the true efficacy rate of “what works” in education is much lower.
Many things cause an intervention to fail. Some interventions are simply wrong-headed. Some interventions do work, but for only some students. And other interventions would work, if only they were implemented well.
Knowing an intervention’s core components and the relationships among them would, I submit, be helpful in at least that third case. If you don’t know that a dishwasher’s wash arm spins, the large skillet on the bottom rack with its handle jutting to the sky might not strike you as the proximate cause of dirty glasses on the top rack. If you don’t know that a core component of multi-tiered systems of support is progress monitoring, you might not connect the dots between a decision to cut back on periodic student assessments and suboptimal student outcomes.
My third and final argument, that most education research ignores the how of interventions, is based in at least some empiricism. The argument itself is a bit of a journey. One that starts with a caveat, wends its way to dismay, and ends in disappointment.
Here’s the caveat: My take on the relative lack of how in most education research comes from my recent experience trying to surface “what works” in remote learning. This specific segment of education research may well be an outlier. But I somehow doubt it.
Why dismay? Well, as regular readers might recall, in late March I announced plans to support a rapid evidence synthesis on effective practices in remote learning. It seemed simple enough: crowd-source research relevant to the task, conduct WWC reviews of the highest-quality submissions, and then make those reviews available to meta-analysts and other researchers to surface generalizable principles that could be useful to educators and families.
My stated goal had been to release study reviews on June 1. That date has passed, and the focus of this post is not “New WWC Reviews of Remote Learning Released.” As such, you may have gathered something about my plan has gone awry. You would be right.
Simply, things are taking longer than hoped. It is not for lack of effort. Our teams identified more than 930 studies, screened more than 700 of those studies, and surfaced 250 randomized trials or quasi-experiments. We have prioritized 35 of this last group for review. (For those of you who are thinking some version of “wow, it seems like it might be a waste to not look at 96 percent of the studies that were originally located,” I have some thoughts about that. We’ll have to save that discussion, though, for another blog.)
Our best guess for when those reviews will be widely available is now August 15. Why things are taking as long as they are is, as they say, “complicated.” The June 1 date was unlikely from the start, dependent as it was upon a series of best-case situations in times that are anything but. And at least some of the delay is driven by our emphasis on rigor and steps we take to ensure the quality of our work, something we would not short-change in any event.
Not giving in to my dismay, however, I dug in to the 930 studies in our remote learning database to see what I might be able to learn in the meantime. I found that 22 of those studies had already been reviewed by the WWC. “Good news,” I said to myself. “There are lessons to be learned among them, I’m sure.”
And indeed, there was a lesson to be learned—just not the one I was looking for. After reviewing the lot, there was virtually no actionable evidence to be found. That’s not entirely fair. One of the 22 records was a duplicate, two were not relevant, two were not locatable, and one was behind a paywall that even my federal government IP address couldn’t get behind. Because fifteen of the sixteen remaining studies reviewed name-brand products, there was one action I could take in most cases: buy the product the researcher had evaluated.
I went through each article, this time making an imperfect determination about whether the researcher described the intervention’s core components and, if so, arrayed them in a logic model. My codes for core components included one “yes,” two “bordering on yes,” six “yes-ish,” one “not really,” and six “no.” Not surprisingly, logic models were uncommon, with two studies earning a “yes” and two more tallied as “yes-ish.” (You can see now why I am not a qualitative researcher.)
In case there’s any doubt, herein lies my disappointment: if an educator had turned to one of these articles to eke out a tip or two about “what works” in remote learning, they would have been, on average, out of luck. If they did luck out and find an article that described the core components of the tested intervention, there was a vanishingly small chance there would be information on how to put those components together to form a whole. As for surfacing generalizable principles for educators and families across multiple studies? Not without some serious effort, I can assure you.
I have never been more convinced of the importance of core components being well-documented in education research than I am today. As they currently stand, the SEER principles for core components ask:
- Did the researcher document the core components of an intervention, including its essential practices, structural elements, and the contexts in which it was implemented and tested?
- Did the researcher offer a clear description of how the core components of an intervention are hypothesized to affect outcomes?
- Did the researcher's analysis help us understand which components are most important in achieving impact?
More often than not, the singular answer to the questions above is a resounding “no.” That is to the detriment of consumers of research, no doubt. Educators, or even other researchers, cannot turn to the average journal article or research report and divine enough information about what was actually studied to draw lessons for classroom practice. (There are many reasons for this, of course. I welcome your thoughts on the matter.) More importantly, though, it is to the detriment of the supposed beneficiaries of research: our students. We must do better. If our work isn’t ultimately serving them, who is it serving, really?
Commissioner, National Center for Education Evaluation and Regional Assistance
Agency Evaluation Officer, U.S. Department of Education