IES Grant
Title: | Methods and Software to Classify College Courses at Scale | ||
Center: | NCER | Year: | 2024 |
Principal Investigator: | Stange, Kevin | Awardee: | University of Michigan |
Program: | Statistical and Research Methodology in Education [Program Details] | ||
Award Period: | 3 years (09/01/2024 – 08/31/2027) | Award Amount: | $899,226 |
Type: | Methodological Innovation | Award Number: | R305D240029 |
Description: | Co-Principal Investigators: Flaster, Allyson; Jurgens, David Purpose: Research-ready (de-identified, standardized, documented) data on the courses students take and what they learn while attending college has, historically, been challenging to obtain on a large scale. Research on student course-taking has been hampered by inconsistent course titles and numbers across multiple institutions, as well as the sheer amount of human intervention required to manually classify thousands of unique courses within an institution. The 'big data revolution,' however, brings approaches to automated pattern recognition that we can use to solve this problem. Project Activities: This project will advance education research and practice by 1) applying machine learning approaches to text-based descriptions of courses to systematically classify the content of courses and 2) developing software education researchers and practitioners can use to apply our classification algorithms to course data at a very large scale. The classification approach and corresponding open-source college course mapping tool will open up new possibilities in applied education research around college course-taking and student success. Structured Abstract Research Design and Methods: The project team will use various hierarchical classification approaches on text-as-data, including both supervised machine learning and generative AI. To train the algorithm, the project will use human-classified course data from several nationally representative NCES longitudinal studies that is included in the Postsecondary Education Transcript Studies dataset. User Testing: Users will be recruited early in the project period to pilot the software and provide feedback on product usability and subject their newly classified data to validation, which will be used to further refine our algorithm. A wider set of user-testers will also be convened towards the end of the project. Use in Applied Education Research: The software will be useful in any education research that uses postsecondary course-level data. Such applications are numerous, including studies of disparities in course-taking, transfer students, bottleneck and gateway courses, and the long-term consequences of college curriculum. Products and Publications Products: The team will publish an open-source software package that will assign consistent College Course Map (CCM) codes to individual course records. End users will provide a dataset (in CSV form) containing course features and the software tool will return CCM codes for the same set of course records at a 2-digit, 4-digit, and 6-digit level (where appropriate), along with estimated confidence levels. The tool will take the form of a package in R and Python (with wrappers facilitating use by other statistical products) that is freely available. It can be downloaded by anyone and can be used on their own institutional data, and individual institutions can integrate it into their workflows however it makes sense for them. The tool will be well documented, have example data, and be a reproducible artifact which will last past the end of the grant. The open-source tool will be freely available and disseminated through various platforms and promoted at professional conferences, through professional associations, and through social media. ERIC Citations: Find available citations in ERIC for this award here. |
||
Back |