Project Activities
This project will advance education research and practice by 1) applying machine learning approaches to text-based descriptions of courses to systematically classify the content of courses and 2) developing software education researchers and practitioners can use to apply our classification algorithms to course data at a very large scale. The classification approach and corresponding open-source college course mapping tool will open up new possibilities in applied education research around college course-taking and student success.
Structured Abstract
Research design and methods
The project team will use various hierarchical classification approaches on text-as-data, including both supervised machine learning and generative AI. To train the algorithm, the project will use human-classified course data from several nationally representative NCES longitudinal studies that is included in the Postsecondary Education Transcript Studies dataset.
User Testing: Users will be recruited early in the project period to pilot the software and provide feedback on product usability and subject their newly classified data to validation, which will be used to further refine our algorithm. A wider set of user-testers will also be convened towards the end of the project.
Use in Applied Education Research: The software will be useful in any education research that uses postsecondary course-level data. Such applications are numerous, including studies of disparities in course-taking, transfer students, bottleneck and gateway courses, and the long-term consequences of college curriculum.
People and institutions involved
IES program contact(s)
Products and publications
The team will publish an open-source software package that will assign consistent College Course Map (CCM) codes to individual course records. End users will provide a dataset (in CSV form) containing course features and the software tool will return CCM codes for the same set of course records at a 2-digit, 4-digit, and 6-digit level (where appropriate), along with estimated confidence levels. The tool will take the form of a package in R and Python (with wrappers facilitating use by other statistical products) that is freely available. It can be downloaded by anyone and can be used on their own institutional data, and individual institutions can integrate it into their workflows however it makes sense for them. The tool will be well documented, have example data, and be a reproducible artifact which will last past the end of the grant. The open-source tool will be freely available and disseminated through various platforms and promoted at professional conferences, through professional associations, and through social media.
Publications:
ERIC Citations: Find available citations in ERIC for this award here.
Supplemental information
Co-Principal Investigators: Flaster, Allyson; Jurgens, David
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.