Project Activities
People and institutions involved
IES program contact(s)
Products and publications
ERIC Citations: Find available citations in ERIC for this award here.
Journal article, monograph, or newsletter
Bell, S.H., Olsen, R.B., Orr, L.L., and Stuart, E.A. (2016). Estimates of External Validity Bias When Impact Evaluations Select Sites Nonrandomly. Educational Evaluation and Policy Analysis, 38(2), 318-335.
Bell, S.H., and Stuart, E.A. (2016). On the "Where" of Social Experiments: The Nature and Extent of the Generalizability Problem. New Directions for Evaluation, 2016(152), 47-59.
Olsen, R.B., and Orr, L.L. (2016). On the "Where" of Social Experiments: Selecting More Representative Samples to Inform Policy. New Directions for Evaluation, 2016(152), 61-71.
Olsen, R.B., Orr, L.L., Bell, S.H., and Stuart, E.A. (2013). External Validity in Policy Evaluations That Choose Sites Purposively. Journal of Policy Analysis and Management, 32(1): 107-121.
Supplemental information
Co-Principal Investigator: Bell, Stephen
Part 2 of the study considered how, and under what conditions, evaluations of educational interventions can produce externally valid estimates of the interventions' impacts for schools and districts that did not participate in the evaluation. To address this question, the project (1) identified and developed methods for predicting the impacts of an intervention for sites that are not participating in the evaluation; (2) assessed the conditions under which these methods produce unbiased impact and standard error estimates for these sites; and (3) tested how well these methods work in real evaluations of educational programs (whether how well they work depends on how sites were selected for the evaluation).
In both parts of the study, the researchers reanalyzed data from the National Evaluation of Upward Bound (which randomly selected 70 programs nationwide) and the second Reading First Implementation Study (which collected student information for all students within a state). In Part 1 of the study, the study team used these data to simulate both representative and purposive samples, estimate the average impacts from these samples, and compare the results. Then the study tested whether regression-based methods and weighting methods can "close the gap" and produce impact estimates from the purposive samples that are more similar to the impact estimates from purposive samples. In Part 2 of the study, the study team assessed how accurate average impact estimates are in predicting the impacts for individual sites that did not participate in the evaluation—but may use evaluation results in deciding whether to adopt the intervention. In addition, they tested whether regression-based methods or weighting methods can yield improved predictions of the intervention's impact for these sites.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.