Skip to main content

Breadcrumb

Home arrow_forward_ios Information on ... arrow_forward_ios Methods and Sof ...
Home arrow_forward_ios ... arrow_forward_ios Methods and Sof ...
Information on ...
Grant Open

Methods and Software to Classify College Courses at Scale

NCER
Program: Statistical and Research Methodology in Education
Program topic(s): Core
Award amount: $899,226
Principal investigator: Kevin Stange
Awardee:
University of Michigan
Year: 2024
Award period: 3 years (09/01/2024 - 08/31/2027)
Project type:
Methodological Innovation
Award number: R305D240029

Purpose

Research-ready (de-identified, standardized, documented) data on the courses students take and what they learn while attending college has, historically, been challenging to obtain on a large scale. Research on student course-taking has been hampered by inconsistent course titles and numbers across multiple institutions, as well as the sheer amount of human intervention required to manually classify thousands of unique courses within an institution. The 'big data revolution,' however, brings approaches to automated pattern recognition that we can use to solve this problem.

Project Activities

This project will advance education research and practice by 1) applying machine learning approaches to text-based descriptions of courses to systematically classify the content of courses and 2) developing software education researchers and practitioners can use to apply our classification algorithms to course data at a very large scale. The classification approach and corresponding open-source college course mapping tool will open up new possibilities in applied education research around college course-taking and student success.

Structured Abstract

Research design and methods

The project team will use various hierarchical classification approaches on text-as-data, including both supervised machine learning and generative AI. To train the algorithm, the project will use human-classified course data from several nationally representative NCES longitudinal studies that is included in the Postsecondary Education Transcript Studies dataset.

User Testing: Users will be recruited early in the project period to pilot the software and provide feedback on product usability and subject their newly classified data to validation, which will be used to further refine our algorithm. A wider set of user-testers will also be convened towards the end of the project.

Use in Applied Education Research: The software will be useful in any education research that uses postsecondary course-level data. Such applications are numerous, including studies of disparities in course-taking, transfer students, bottleneck and gateway courses, and the long-term consequences of college curriculum.

People and institutions involved

IES program contact(s)

Elizabeth Albro

Elizabeth Albro

Commissioner of Education Research
NCER

Project contributors

Allyson Flaster

Co-principal investigator

David Jurgens

Co-principal investigator

Products and publications

The team will publish an open-source software package that will assign consistent College Course Map (CCM) codes to individual course records. End users will provide a dataset (in CSV form) containing course features and the software tool will return CCM codes for the same set of course records at a 2-digit, 4-digit, and 6-digit level (where appropriate), along with estimated confidence levels. The tool will take the form of a package in R and Python (with wrappers facilitating use by other statistical products) that is freely available. It can be downloaded by anyone and can be used on their own institutional data, and individual institutions can integrate it into their workflows however it makes sense for them. The tool will be well documented, have example data, and be a reproducible artifact which will last past the end of the grant. The open-source tool will be freely available and disseminated through various platforms and promoted at professional conferences, through professional associations, and through social media.

Publications:

ERIC Citations: Find available citations in ERIC for this award here.

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

Tags

Education TechnologyPostsecondary Education

Share

Icon to link to Facebook social media siteIcon to link to X social media siteIcon to link to LinkedIn social media siteIcon to copy link value

Questions about this project?

To answer additional questions about this project or provide feedback, please contact the program officer.

 

You may also like

Zoomed in IES logo
Data file

2020/2022 Beginning Postsecondary Students Longitu...

Data owner(s): National Center for Education Statistics (NCES)
Publication number: NCES 2026013
Read More
Zoomed in IES logo
Thought leadership

Reimagining the Institute of Education Sciences

February 27, 2026 by Matthew Soldner
Read More
Students engaging with technology while sitting around a table.
Works in progress

IES Announces 2025 SBIR Awards to Advance Technolo...

January 20, 2026 by Shirley Huang
Read More
icon-dot-govicon-https icon-quote