Skip Navigation

National Center for Education Evaluation and Regional Assistance


Search
Evaluation of the DC Opportunity Scholarship Program: Second Year Report on Participation
NCEE 2006-4003
April 2006

Appendix A: Congressionally Mandated Evaluation


Impact Analysis

It is well-known that the independent effects of school choice on student outcomes are difficult to estimate. Perhaps the most significant difficulty faced by researchers is selection bias—the self-selection of families to even seek out a new school choice for their child, and the mutual student/school decision process that selects students into different types of schools. Because this bias is generally a result of unmeasurable factors, most researchers have preferred the use of an RCT to a dependence on nonexperimental (nonrandomized) statistical methods. Since the DC Opportunity Scholarship Program provides for the random distribution of scholarships through a lottery, we will use RCT methods to estimate most Program impacts.

Impact Analysis Sample

The RCT approach rests on random assignment or, in the case of the DC Opportunity Scholarship Program evaluation, a lottery to create two statistically equivalent groups of students from among Program applicants: a “treatment” group that receives a scholarship, and a “control” group that does not receive a scholarship. Because the two groups are generated from the same pool of applicants, they are equally likely to be motivated to participate in the Program and to reap any benefits from it. And as long as the pool of applicants is sufficiently large, the random assignment of students into treatment and control groups should produce groups that are similar in other characteristics, both those we can observe and measure (e.g., family income, prior academic achievement) and those we cannot (e.g., motivation to succeed). The random assignment ensures that all observed and unobserved characteristics are equally represented in both groups.

According to the statute, the random assignment that is the means to create the treatment and control groups can only be used to help allocate scholarships under particular circumstances. As a result of these conditions, the impact analysis sample will:

  • Exclude applicants already enrolled in private school when they applied to the OSP. The statute contained no provision to exclude from the Program students who were currently enrolled in private schools but otherwise eligible to participate.25 A substantial number of private school students did apply to the Program, especially in the first year of Program implementation. However, because those students intended to use the DC Opportunity Scholarship to continue to attend private schools, measuring the difference in outcomes between private school applicants who did and did not receive a scholarship through the lottery would likely only answer the question of whether a different type or amount of scholarship funds affects student outcomes. While that question is of some policy interest, it is not the main focus of the evaluation as specified in the legislation. Therefore, applicants already enrolled in private schools when they applied are not part of the impact analysis sample.
  • Exclude any students who were originally assigned to the control group by lottery but subsequently were awarded scholarships by way of follow-up lotteries. The participants in the impact evaluation all sought scholarships to attend private schools. To encourage those randomly assigned to the control group (nonrecipients) to turn out for follow-up data collection, a special lottery is held each spring in which the only eligible students are those originally assigned to the control group who subsequently cooperated with outcome data collection. Because they represent study-induced treatment crossover (i.e., participants assigned to the control condition who are subsequently offered the treatment) and were chosen at random, these students will be excluded from the subsequent impact analysis. Five scholarships were awarded to control group students by lottery in the summer of 2005, when the control group totaled 193 students. Ten scholarships will be awarded to control group students by lottery in the spring of 2006, when the control group will number 911 (e.g., 921 minus 5 control awardees minus 5 high school graduates). Ten or more new scholarships will be awarded to control group students each year after 2006. The number of scholarships awarded to members of the control group needs to be kept modest in the initial years of an experimental evaluation so that the control group remains sufficiently large to enable researchers to identify Program impact.
  • Include only public school applicants in grades where there are more applications than there are available private school slots. A lottery is a fair and efficient way of distributing scholarships when there are too many applicants, but is inappropriate as an allocation device when sufficient scholarship funds and private school slots exist to accommodate all the eligible applicants at certain grade levels. In those grade levels, all applicants will receive scholarships, and they will be excluded from the impact sample.

Thus, the impact evaluation of the DC Opportunity Scholarship Program depends on the extent to which large numbers of eligible DC families with public school students apply to the Program. The treatment and control groups must be of a sufficient size to allow us to detect and measure any difference in outcomes between the two groups (the “impact”) with statistical certainty. A procedure called “power analysis” is used to determine the sample sizes necessary to enable the study to answer the central research questions and to measure Program effects that are large enough to be both meaningful in students’ lives and relevant to policy debates about the efficacy of school choice interventions. At the end of the 18-month initial implementation period, we know the following about the impact sample and study power:

  • Cohort 1 includes 492 eligible applicants who qualify as members of the impact sample. They all were entering grades 6–12 in fall of 2004. A total of 299 are members of the treatment group, and 193 are members of the control group.
  • Cohort 2 includes 1,816 eligible applicants who qualify as members of the impact sample. They cover all eligible grades, K–12. A total of 1,088 are members of the treatment group, and 728 are members of the control group.
  • The combined-cohort impact sample totals 2,308 students, of whom 1,387 are members of the treatment group, and 921 are members of the control group.
  • Preliminary estimates of study power, given these participant numbers, suggest that the analysis will be able to detect even modest but educationally meaningful Program impacts for the combined-cohort sample and for the Cohort 2 sample alone.

General Statistical Approach: Estimating the Impact of the Offer of a Scholarship

Given appropriately sized treatment and control groups, the strategy for analyzing impacts is well established. To motivate the discussion of how we identify the effect of the scholarship Program on test scores, it is useful to begin with a simple representation of the selection problem as a missing data problem, using the potential outcomes approach. This approach defines causal effects in terms of potential outcomes or counterfactuals. Conceptually, the causal effect of treatment—the scholarship—is defined as the difference between the “outcome for individuals assigned to the treatment group” and the “outcome for the treatment group if it had not received the treatment,” or:

(E.1) Equation E.1

In the case of scholarships, the treatment effect—the effect of the scholarships on academic achievement—would be defined as the difference between “test scores for Program students” and “test scores for Program students if they had not received a scholarship.” The fundamental problem is that a student is never observed simultaneously in both states of the world. What is observed is a student in the treatment group (Ti = 1) or in the control group (Ti = 0). The outcome in the absence of treatment, E(Yi | Xi, Ti =0), is then the counterfactual—what would have occurred to those students receiving the scholarships if they had not received them.

If students receiving scholarships were identical to other students in both observable and unobservable characteristics, the counterfactual could be generated directly from an appropriately selected comparison group. Valid comparison groups are rarely found in practice, however. The random assignment of students into the Program generates the counterfactual from the control group—eligible applicants who did not receive a scholarship.26 If correctly implemented, random assignment yields statistically equivalent groups and allows estimation of the Program impact through differences in mean outcomes between the two groups.

Consistent with this approach is the following basic analytic model of the effects of school choice scholarships on outcomes. Consider first the outcome equation for the test score of student i in year t. It is reasonable to assume that test scores (Yit) are determined as follows:

(E.2) Equation E.2 (period after Program takes effect)

Equation (E.2) estimates the effect of the offer of a scholarship on student outcomes. Under this model, commonly referred to as the “Intent to Treat” (ITT) estimation, all students who were randomly assigned by virtue of the lottery are included in the analysis, regardless of whether a member of the treatment group uses the scholarship to attend a private school. In E.2, Tit is equal to one if the student has the opportunity to participate in the scholarship Program (i.e., the award rather than the actual use of the scholarship) and equal to zero otherwise. Xi is a vector of student characteristics (measured at baseline) known to influence future academic achievement, such as prior test scores, mother’s level of education, family income, etc. In this model, tau represents the effect of scholarships on test scores for students in the Program, conditional on Xi. With a properly designed RCT, using a judiciously chosen set of statistical controls for characteristics that predict future achievement should improve the precision of the estimated impact.27 That treatment effect, tau, should be identical to the difference in mean outcomes between the treatment and the control groups.

Since the initial applicants were randomized within certain relevant subgroups, we will analyze Program impacts using a randomized block design. We are interested in how academic achievement (Y) is affected by the assignment into the scholarship Program within each block (B) or group of size n. The impacts are then estimated as:

(E.3) Equation E.3

where

i = 1,.....,n observations and k=1,....,b blocks (defined by grade and priority status);
Yji is the outcome for student i in block j, at time t;
mu is the overall mean outcome (e.g., test score);
tau is the treatment (scholarship Program) effect;
rhoj is the jth block effect;
Tit is assignment into the scholarship Program;
Bji is the block assignment;
Xji represents observable characteristics, measured at baseline; and
epsilonij is the random error; independent, N(0,sigmaepsilon2 ).

This analytical framework follows naturally from the group randomization and is easily implemented and interpreted. Y can be measured in several different dimensions, including test scores, school satisfaction, parental satisfaction, grade completion, including where appropriate, high school graduation, etc. mu is average outcome for all Program members, rhoj is the average block effect, and tau is the effect of scholarships on academic achievement.28

Estimating the Impact of the Use of Scholarships

Even with a properly implemented RCT, we may expect that not all applicants placed by random assignment into the treatment (scholarship offer) group will actually use the scholarship at a private school. That is, some scholarship recipients may choose not to use their scholarship and instead attend a public school. This type of nonparticipation or underutilization of treatment services has been observed across all RCT settings, including medical trials, job training and health insurance experiments, as well as in previous school scholarship RCTs such as the one of the Milwaukee Parental Choice Program.

Policymakers are typically interested in the effect of scholarship use on student achievement, in addition to the offer of the scholarship. To estimate the impact, we will use a model commonly referred to as the “Impact of the Treated” (IOT), which statistically estimates the impact of actual scholarship use, relying on what is called the “Bloom Adjustment.” 29 This is possible by using the original comparison of all treatment group members to all control group members but interpreting it in a different way. The new interpretation says that the treatment group’s impact—how its outcomes differ from what would have transpired without a scholarship—has two components:

  • The impact on the decliners, who by definition do not participate in the Program even though offered a scholarship, which can logically be assumed to be zero.
  • The impact on everyone else assigned to the treatment group—i.e., on the OSP participants who make up the rest of the experimentally determined treatment group.

This assumption alone—the presumption that the decliners remain unaffected by their assignment to the treatment group—makes it possible to translate the measured effect of the scholarship Program on the entire treatment group (which the experimental design provides directly as described above) as a way to assess the average effect of the Program on just the participants. It does not matter what the average effect would have been on the decliners had they participated. Nor does it matter whether decliners have different outcomes than participants due to “selection” or pre-existing differences. Thus, we will use a simple Bloom Adjustment to estimate the impact of the OSP on actual scholarship users.

25 Some of these applicants from private schools were already relying on scholarship funds in order to attend those schools. However, the scholarships they were receiving may have been less generous than those available under the DC Opportunity Scholarship Program.

26 See the following studies, which all use the same data from an evaluation of a New York City privately funded scholarship Program: William G. Howell, Patrick J. Wolf, David E. Campbell, and Paul E. Peterson, “School Vouchers and Academic Performance: Results from Three Randomized Field Trials,” Journal of Policy Analysis and Management 21 (2000): 2; John Barnard, Constantine E. Frangakis, Jennifer L. Hill, and Donald B. Rubin, “Principal Stratification Approach to Broken Randomized Experiments: A Case Study of School Choice Vouchers in New York City,” Journal of the American Statistical Association 98 (2003): 462; Alan B. Krueger and Pei Zhu, Another Look at the New York City School Voucher Experiment, Working Paper Series, Education Research (Princeton, NJ: Princeton University, March 2003).

27 For a spirited debate about the use of this technique in the context of school choice research, see William G. Howell and Paul E. Peterson, “Uses of Theory in Randomized Field Trials: Lessons from School Voucher Research on Disaggregation, Missing Data, and the Generalization of Findings,” American Behavioral Scientist 47 (Jan. 2004): 634-657; Krueger and Zhu, Another Look, 658-698; Paul E. Peterson and William G. Howell, “Efficiency, Bias, and Classification Schemes: A Response to Krueger, A.B. and Zhu, P., ‘Another Look at the New York City School Voucher Experiment,’” Working Paper Series, Education Research (Princeton, NJ: Princeton University, March 2003): 699-717; Alan B. Kruger and Pei Zhu, “Inefficiency, Subsample Selection Bias, and Nonrobustness: A Response to Peterson, P.E. and Howell, W.G., ‘Another Look at the New York City School Voucher Experiment,’” — Working Paper Series, Education Research (Princeton, NJ: Princeton University, March 2003): 718-728; Paul E. Peterson and William G. Howell, “Voucher Research Controversy: New Looks at the New York City Evaluation,” Education Next 4 (Spring 2004): 73-78.

28 Depending on the extent to which the randomly assigned applicants are clustered in their schools, some adjustments to the standard error estimates may be necessary.

29 Howard S. Bloom, “Accounting for No-Shows in Experimental Evaluation Designs.” Evaluation Review, 8 (1984): 225-246.

Top