Project Activities
The research team will use natural language processing and image analysis techniques to identify gender- and race-based messages in the corpus of textbooks used in core subjects in elementary schools in the state of Texas between 1985–2011. The team will then pair these data with administrative data to assess the relationship between exposure to these messages and educational outcomes for children enrolled in elementary schools during that period.
Structured Abstract
Setting
The study focuses on the Texas Department of Education, which uses a standardized, required corpus of textbooks across the entire state and offers rich public-access data on student outcomes. The researchers will analyze the corpus of books used in elementary school core subjects – math, science, social studies, and language arts – over the period 1985–2011.
Sample
: These data are publicly available at the cohort-by-subgroup (gender and race) level from the Texas Student Data System. According to these data, over the period of our study, the number of students in Texas public schools ranged from 3.2 to 4.8 million; the proportion of these students who were Black ranged from 12-14%, and the percentage who were Latinx ranged from 35-53%.
Factors
The factors that will be measured in this project are: 1) the levels of gender- and race-based messages contained in textbooks, as measured by different machine-led methods for estimating them, and 2) the differences in cohort-by-subgroup achievement outcomes across different levels of exposure to these messages.
Research design and methods
: To characterize the messages about race and gender contained in the textbooks, researchers will develop and apply frontier methods in natural language processing and image analysis to each textbook. They will report how message levels vary by book, by subject, by grade, and over time. The team will then use an event study design to generate suggestive estimates of how these changes in gender- and race-specific messages may map onto educational achievement. By studying the textbook changes across different grades, subjects, and years, and after controlling for secular changes over time, the team will isolate the direct relationship between exposure to gender- and race-based messages and differences in achievement among the student populations, by gender and race. This approach protects against omitted variable bias from unobserved time trends and from more discrete confounders, such as an unrelated policy change, exposure to which might differ before and after a given year in which a textbook may also change.
Key measures
: The key measures will be the scales to measure the gender and race-based messages in texts, and the following educational achievement data: SAT and ACT test score results; enrollment in AP courses; AP test scores; admissions, enrollment, graduation from Texas public four-year colleges; and employment status.
Data analytic strategy
First, researchers will use new methods from machine learning, natural language processing, and image classification to quantify the implicit messages about ability by gender and race contained in the text and images of the textbooks. Second, researchers will use an event study design, collecting estimates from textbook change events, to isolate how changes in these messages lead to differences in educational achievement by race and gender.
People and institutions involved
IES program contact(s)
Products and publications
Products: This exploratory project will provide evidence and analytical tools for policymakers and researchers to assess and address an understudied potential source of persistent gaps in achievement.
Supplemental information
Co-Principal Investigator: Eble, Alex
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.