In today’s world, much scientific data is collected automatically from sensors and processed by computers in real time to produce instant analytic results. People grow accustomed to instant data and expect to get things quickly.
At the National Center for Education Statistics (NCES), we are frequently asked why, in a world of instant data, it takes so long to produce and publish data from surveys. Although improvements in the timeliness of federal data releases have been made, there are fundamental differences in the nature of data compiled by automated systems and specific data requested from federal survey respondents. Federal statistical surveys are designed to capture policy-related and research data from a range of targeted respondents across the country, who may not always be willing participants.
This blog is designed to provide a brief overview of the survey data processing framework, but it’s important to understand that the survey design phase is, in itself, a highly complex and technical process. In contrast to a management information system, in which an organization has complete control over data production processes, federal education surveys are designed to represent the entire country and require coordination with other federal, state, and local agencies. After the necessary coordination activities have been concluded, and the response periods for surveys have ended, much work remains to be done before the survey data can be released.
Survey Response
One of the first sources of potential delays is that some jurisdictions or individuals are unable to fill in their surveys on time. Unlike opinion polls and online quizzes, which use anyone who feels like responding to the survey (convenience samples), NCES surveys use rigorously formulated samples meant to properly represent specific populations, such as states or the nation as a whole. In order to ensure proper representation within the sample, NCES follows up with nonresponding sampled individuals, education institutions, school districts, and states to ensure the maximum possible survey participation within the sample. Some large jurisdictions, such as the New York City school district, also have their own extensive survey operations to conclude before they can provide information to NCES. Before the New York City school district, which is larger than about two-thirds of all state education systems, can respond to NCES surveys, it must first gather information from all its schools. Receipt of data from New York City and other large districts is essential to compiling nationally representative data.
Editing and Quality Reviews
Waiting for final survey responses does not mean that survey processing comes to a halt. One of the most important roles NCES plays in survey operations is editing and conducting quality reviews of incoming data, which take place on an ongoing basis. In these quality reviews, a variety of strategies are used to make cost-effective and time-sensitive edits to the incoming data. For example, in the Integrated Postsecondary Education Data System (IPEDS), individual higher education institutions upload their survey responses and receive real-time feedback on responses that are out of range compared to prior submissions or instances where survey responses do not align in a logical way. All NCES surveys use similar logic checks in addition to a range of other editing checks that are appropriate to the specific survey. These checks typically look for responses that are out of range for a certain type of respondent.
Although most checks are automated, some particularly complicated or large responses may require individual review. For IPEDS, the real-time feedback described above is followed by quality review checks that are done after collection of the full dataset. This can result in individualized follow up and review with institutions whose data still raise substantive questions.
Sample Weighting
In order to lessen the burden on the public and reduce costs, NCES collects data from selected samples of the population rather than taking a full census of the entire population for every study. In all sample surveys, a range of additional analytic tasks must be completed before data can be released. One of the more complicated tasks is constructing weights based on the original sample design and survey responses so that the collected data can properly represent the nation and/or states, depending on the survey. These sample weights are designed so that analyses can be conducted across a range of demographic or geographic characteristics and properly reflect the experiences of individuals with those characteristics in the population.
If the survey response rate is too low, a “survey bias analysis” must be completed to ensure that the results will be sufficiently reliable for public use. For longitudinal surveys, such as the Early Childhood Longitudinal Study, multiple sets of weights must be constructed so that researchers using the data will be able to appropriately account for respondents who answered some but not all of the survey waves.
NCES surveys also include “constructed variables” to facilitate more convenient and systematic use of the survey data. Examples of constructed variables include socioeconomic status or family type. Other types of survey data also require special analytic considerations before they can be released. Student assessment data, such as the National Assessment of Educational Progress (NAEP), require that a number of highly complex processes be completed to ensure proper estimations for the various populations being represented in the results. For example, just the standardized scoring of multiple choice and open-ended items can take thousands of hours of design and analysis work.
Privacy Protection
Release of data by NCES carries a legal requirement to protect the privacy of our nation’s children. Each NCES public-use dataset undergoes a thorough evaluation to ensure that it cannot be used to identify responses of individuals, whether they are students, parents, teachers, or principals. The datasets must be protected through item suppression, statistical swapping, or other techniques to ensure that multiple datasets cannot be combined in such a way as to identify any individual. This is a time-consuming process, but it is incredibly important to protect the privacy of respondents.
Data and Report Release
When the final data have been received and edited, the necessary variables have been constructed, and the privacy protections have been implemented, there is still more that must be done to release the data. The data must be put in appropriate formats with the necessary documentation for data users. NCES reports with basic analyses or tabulations of the data must be prepared. These products are independently reviewed within the NCES Chief Statistician’s office.
Depending on the nature of the report, the Institute of Education Sciences Standards and Review Office may conduct an additional review. After all internal reviews have been conducted, revisions have been made, and the final survey products have been approved, the U.S. Secretary of Education’s office is notified 2 weeks in advance of the pending release. During this notification period, appropriate press release materials and social media announcements are finalized.
Although NCES can expedite some product releases, the work of preparing survey data for release often takes a year or more. NCES strives to maintain a balance between timeliness and providing the reliable high-quality information that is expected of a federal statistical agency while also protecting the privacy of our respondents.
By Thomas Snyder