Technical Methods Report: Using State Tests in Education Experiments

Technical Methods Report: Using State Tests in Education Experiments
A Discussion of the Issues

NCEE 2009-013
November 2009

Whether to Secure Baseline Data

Random assignment in an RCT enables the construction of treatment and control groups that are statistically equivalent prior to the implementation of an intervention. This prior equivalence makes it possible to estimate the impact of an intervention using only posttest data collected from both groups after the intervention is completed. The primary advantage of this approach is that only one wave of data is collected, eliminating the need to link multiple waves of data and the potential errors associated with such linking. The primary disadvantage of such posttest-only analyses is that analyses that utilize more than one wave of data (for example, covariance analyses¹² and repeated measures analyses¹³) typically have much greater statistical power (Shadish, Cook, and Campbell 2002).¹⁴

Increased Statistical Power. Bloom, Richburg-Hayes, and Black (2005) advocate the use of baseline covariates to improve power in multilevel RCTs. They show that even the use of aggregate school-level data (which are very easy to obtain from district and state websites) as a baseline covariate can dramatically increase the power of a school-level RCT. We agree that the benefits of getting baseline data (aggregate or individual-level) will generally outweigh the costs of obtaining such data. In particular, using prior state test results as baseline covariates could lead to a substantial decrease in overall study costs—the improvement in power from the baseline tests is so large in many contexts that the overall sample size can be greatly reduced (thus reducing other data collection costs) while maintaining the target level of statistical power. Baseline data can also help establish the equivalence of treatment and comparison groups and facilitate assessments of the potential for nonresponse bias in impact estimates.

Data Linking. Although efforts should be made to maximize statistical power within the available resources, researchers should recognize that substantial effort might be required to link longitudinal data from state assessments. Fortunately, in our experience, the effort and cost of linking multiple waves of data from state assessments are far less than the costs typically associated with increasing statistical power by administering additional waves of external assessments.

Arguably, the benefits associated with one or two additional waves of prior assessment data outweigh the costs in even the worst case scenario. More specifically, the worst case scenario involves a state in which student identifiers are assigned at the school or district level, the study involves multiple schools or districts in a state, and student mobility across districts is common. This makes it impossible to link student records across waves using only a numeric identifier. The linkages must make use of additional identifiers, such as students' names, birthdates, and demographic characteristics. Fortunately, data-linking programs exist that implement probabilistic matching algorithms designed to deal with common database errors or inconsistencies, such as incorrectly keyed ID numbers or birthdates, transposed first and last names, and nicknames (for example, Jon instead of Jonathan).

Other states present the best case scenario, in which such linking problems are minimized because the state has taken great care to implement a longitudinal database, including the use of state-assigned student identifiers, in which multiyear histories of test scores are available for individual students. This variation in data quality across states and variation in sophistication of state databases means that the costs associated with linking multiple waves of state assessment data must be evaluated separately for each state. Fortunately, the current trend is toward more states developing longitudinal databases.¹⁵

In sum, for any study in which student achievement is an outcome, the cost savings associated with increased power are likely to far exceed the added costs of obtaining and linking prior test score data. Therefore, we conclude studies should generally aim to collect and use baseline data. If the RCT involves student-level random assignment, efforts should be made to link pretest and posttest scores for individual students. If the RCT involves school- or district-level random assignment, linking individual and/or aggregate data can be used to increase power.

Top

¹² Covariance analysis includes two types of mathematically equivalent models: (1) analysis of covariance (ANCOVA), and (2) linear regression in which a pretest is included as a predictor (also known as a covariate) in the regression model. Here we use the term covariance analysis to refer to either model.
¹³ Repeated measures analyses include several types of analyses in which multiple measurements are available for each individual in the analysis. Popular statistical models for repeated measures include multivariate analysis of variance (MANOVA), multivariate hierarchical linear modeling (MHLM), and growth curve models.
¹⁴ Additional advantages of collecting baseline data include the ability to confirm equivalence of treatment and control groups on observable covariates, and analyses of potential sampling bias due to attrition.
¹⁵ An example of such efforts can be seen in the Statewide Longitudinal Data System (SLDS) grant program funded through the Institute of Education Sciences, which has provided funding to 41 states and the District of Columbia to develop or enhance statewide longitudinal data systems (see http://nces.ed.gov/Programs/SLDS/).