|Title:||Testing Different Methods of Improving the External Validity of Impact Evaluations in Education|
|Principal Investigator:||Olsen, Robert||Awardee:||Abt Associates, Inc.|
|Program:||Statistical and Research Methodology in Education [Program Details]|
|Award Period:||2 years||Award Amount:||$489,178|
|Type:||Methodological Innovation||Award Number:||R305D100041|
Co-Principal Investigator: Bell, Stephen
Purpose: This study was motivated by the observation that most major, multi-site evaluations in education have chosen participating sites (e.g., districts, schools, or grantees) "purposively" and not randomly. This raises possible concerns about the generalizability of the findings from these studies. The goal of this project was to provide evidence regarding the external validity of evaluations that are based on purposive samples.
Project Activities: Part 1 of the study considered how, and under what conditions, evaluations of educational programs that select sites purposively can produce externally valid impact estimates for the program as a whole. To address this question, this project (1) conducted simulations using data from real educational program evaluations to estimate how different the impacts are likely to be between purposive samples of sites and random samples of sites; (2) identified and developed methods for making findings from purposive samples more representative of program sites; (3) assessed the conditions under which these methods produce unbiased impact and standard error estimates; and (4) tested how well these methods work in real evaluations of educational programs.
Part 2 of the study considered how, and under what conditions, evaluations of educational interventions can produce externally valid estimates of the interventions' impacts for schools and districts that did not participate in the evaluation. To address this question, the project (1) identified and developed methods for predicting the impacts of an intervention for sites that are not participating in the evaluation; (2) assessed the conditions under which these methods produce unbiased impact and standard error estimates for these sites; and (3) tested how well these methods work in real evaluations of educational programs (whether how well they work depends on how sites were selected for the evaluation).
In both parts of the study, the researchers reanalyzed data from the National Evaluation of Upward Bound (which randomly selected 70 programs nationwide) and the second Reading First Implementation Study (which collected student information for all students within a state). In Part 1 of the study, the study team used these data to simulate both representative and purposive samples, estimate the average impacts from these samples, and compare the results. Then the study tested whether regression-based methods and weighting methods can "close the gap" and produce impact estimates from the purposive samples that are more similar to the impact estimates from purposive samples. In Part 2 of the study, the study team assessed how accurate average impact estimates are in predicting the impacts for individual sites that did not participate in the evaluationóbut may use evaluation results in deciding whether to adopt the intervention. In addition, they tested whether regression-based methods or weighting methods can yield improved predictions of the intervention's impact for these sites.
Publications and Products
Journal article, monograph, or newsletter
Bell, S.H., Olsen, R.B., Orr, L.L., and Stuart, E.A. (2016). Estimates of External Validity Bias When Impact Evaluations Select Sites Nonrandomly. Educational Evaluation and Policy Analysis, 38(2), 318–335.
Bell, S.H., and Stuart, E.A. (2016). On the "Where" of Social Experiments: The Nature and Extent of the Generalizability Problem. New Directions for Evaluation, 2016(152), 47–59.
Olsen, R.B., and Orr, L.L. (2016). On the "Where" of Social Experiments: Selecting More Representative Samples to Inform Policy. New Directions for Evaluation, 2016(152), 61–71.
Olsen, R.B., Orr, L.L., Bell, S.H., and Stuart, E.A. (2013). External Validity in Policy Evaluations That Choose Sites Purposively. Journal of Policy Analysis and Management, 32(1): 107–121.