Earlier this month, I celebrated my fifth anniversary as director of IES, leaving just one year on my term of office. I could look backward to highlight some of what I think are the biggest accomplishments of those years, but I am not much for retrospection. Rather, I am thinking about priorities for the remaining year.
There are many versions of the wisdom that one should not expect their priorities to survive an encounter with reality (my current favorite is Mike Tyson's: "Everyone has a plan until they get punched in the mouth.") And given the speed of government, a year is not a lot of time. But here are the three items I plan to focus on.
NCADE
My highest priority is establishing ARPA-like activities in IES. I have written before about how IES is charged by Congress to fund high risk, high reward R&D projects that will have a game changing impact on educational outcomes in the United States. This will be achieved by creating a DARPA-like entity focused on education using rapid cycle, transformative and more inclusive applied research.
Ultimately, we hope that the New Essential Education Discoveries (NEED) Act passes this Congress allowing us to situate those functions within a new IES center, the National Center for Advanced Development in Education (NCADE). Alternatively, Congress could create NCADE through ESRA reauthorization, which may finally have some movement. Even without such legislation, we received $30 million in the Omnibus funding bill that passed at the end of last year to move IES along the path toward developing ARPA-like functions. The President's budget request for the next fiscal year includes another $45 million for the same purpose.
We have been talking with many stakeholders who have experience with DARPA, ARPA-E, and ARPA-H to help us develop our plans. We are also moving to create a "transformation team" within NCER to consolidate some of our most innovative work laying a foundation for NCADE and we are beginning to develop and execute a strategic plan for transformative high risk/high reward work.
Further, we are determining how to construct a portfolio that balances short- versus longer-term investments and constructs a mix of projects with various risk profiles. One path we are likely to follow involves launching a set of "seedling" projects. Using this approach, we would launch multiple small projects (maybe a dozen) at perhaps $200k each. These seedlings would have a fixed period (less than a year) to achieve a set of agreed upon goals; the most successful would be eligible for future rounds of funding (a practice like our SBIR program). We are considering how to best incorporate a focus on scaling in these later rounds of funding, perhaps modeled on ARPA-E's SCALEUP program.
SLDS v2
When I was commissioner of NCES in the mid-2000s, Congress authorized the state longitudinal data system (SLDS) grant program. Since then, the nation has invested almost $1 billion to build out of the current SLDS, with about half of the money being appropriated during the fiscal crisis that began in 2008. In short, most of the money was spent over a dozen years ago—and the program has been funded at somewhere around $35 million per year since the ARRA money was spent. Given the speed of technological advances, we are talking about an aging system.
A vision of what SLDS v2 should look like is taking shape:
Like NCADE, I hope to see SLDS v2 included in either the NEED Act or in ESRA reauthorization.
Prize Competitions
In May, we will announce the results of the Digital Learning XPRIZE. This was by far our most successful foray into large-scale prizes. In contrast, we had to cancel a middle-school science prize (much to my regret, given the abysmal performance of American students on NAEP's science assessment) and we had a very difficult time launching our upper elementary school math challenge for students with disabilities.
We have had other successful competitions that were smaller in scale, such as the NAEP Automated Scoring Reading Challenge. In this challenge, six teams were able to successfully predict human scores for open-ended reading items in NAEP. While automated scoring for writing/reading items has been established for over a decade, this challenge made several notable advances. Most notably, all teams that had winning entries used "large language models (LLMs)," which is the same underlying algorithmic approach used by ChatGPT. No entry using conventional separate features was even close to performing as well.
The winning entries were so close in accuracy that we decided to make three first-place awards. The participants came from diverse backgrounds, including assessment services organizations, researchers, and even an undergraduate (and recent graduate) student team. (Read more here: Successful NLP Approaches to Automated Scoring of NAEP's Reading Assessment.)
Building on this success, we recently launched a NAEP Math Automated Scoring Challenge to predict the scores for open-ended math items, which are more challenging for computers, and, ironically, easier for humans. But we have roughly 250,000 student responses to 10 NAEP questions, so there's a lot of data for sophisticated modeling approaches. We are working on releasing both the reading and math datasets under restricted use data licenses for broader research use and are investigating other NAEP datasets for Writing and Civics that hold promise.
A difficult problem with LLMs for education (and beyond) is the lack of transparency and potential fairness/bias issues. In the Reading Challenge, we required all participants to submit a Technical Report in which they described the algorithms used and their training results, including analyses to ensure that no bias was present. To me, this openness on the part of researchers and commercial providers demonstrates the potential for future research into applied solutions.
Our experience also demonstrates one feature that makes for successful prize competitions: it's usually not the prize money that matters; rather, it's access to large data sets that attracts entrants. This is rational on the part of potential applicants, since only a few will win a competition, but all gain access to high quality data. This simple "rule" makes it imperative that NAEP and other large-scale assessments open their data archives as much and as fast as possible. It also reinforces one of the principles I noted above regarding the large data sets collected through SLDS—we need to make them available!
So Not Goodbye—not yet at least.
Clearly, there's miles to go before I leave IES. And I hope to cover as many of those miles as possible over the next year.
As always, feel free to contact me: mark.schneider@ed.gov