IES Blog

Institute of Education Sciences

From Data Collection to Data Release: What Happens?

In today’s world, much scientific data is collected automatically from sensors and processed by computers in real time to produce instant analytic results. People grow accustomed to instant data and expect to get things quickly.

At the National Center for Education Statistics (NCES), we are frequently asked why, in a world of instant data, it takes so long to produce and publish data from surveys. Although improvements in the timeliness of federal data releases have been made, there are fundamental differences in the nature of data compiled by automated systems and specific data requested from federal survey respondents. Federal statistical surveys are designed to capture policy-related and research data from a range of targeted respondents across the country, who may not always be willing participants.

This blog is designed to provide a brief overview of the survey data processing framework, but it’s important to understand that the survey design phase is, in itself, a highly complex and technical process. In contrast to a management information system, in which an organization has complete control over data production processes, federal education surveys are designed to represent the entire country and require coordination with other federal, state, and local agencies. After the necessary coordination activities have been concluded, and the response periods for surveys have ended, much work remains to be done before the survey data can be released.

Survey Response

One of the first sources of potential delays is that some jurisdictions or individuals are unable to fill in their surveys on time. Unlike opinion polls and online quizzes, which use anyone who feels like responding to the survey (convenience samples), NCES surveys use rigorously formulated samples meant to properly represent specific populations, such as states or the nation as a whole. In order to ensure proper representation within the sample, NCES follows up with nonresponding sampled individuals, education institutions, school districts, and states to ensure the maximum possible survey participation within the sample. Some large jurisdictions, such as the New York City school district, also have their own extensive survey operations to conclude before they can provide information to NCES. Before the New York City school district, which is larger than about two-thirds of all state education systems, can respond to NCES surveys, it must first gather information from all its schools. Receipt of data from New York City and other large districts is essential to compiling nationally representative data.

Editing and Quality Reviews

Waiting for final survey responses does not mean that survey processing comes to a halt. One of the most important roles NCES plays in survey operations is editing and conducting quality reviews of incoming data, which take place on an ongoing basis. In these quality reviews, a variety of strategies are used to make cost-effective and time-sensitive edits to the incoming data. For example, in the Integrated Postsecondary Education Data System (IPEDS), individual higher education institutions upload their survey responses and receive real-time feedback on responses that are out of range compared to prior submissions or instances where survey responses do not align in a logical way. All NCES surveys use similar logic checks in addition to a range of other editing checks that are appropriate to the specific survey. These checks typically look for responses that are out of range for a certain type of respondent.

Although most checks are automated, some particularly complicated or large responses may require individual review. For IPEDS, the real-time feedback described above is followed by quality review checks that are done after collection of the full dataset. This can result in individualized follow up and review with institutions whose data still raise substantive questions. 

Sample Weighting

In order to lessen the burden on the public and reduce costs, NCES collects data from selected samples of the population rather than taking a full census of the entire population for every study. In all sample surveys, a range of additional analytic tasks must be completed before data can be released. One of the more complicated tasks is constructing weights based on the original sample design and survey responses so that the collected data can properly represent the nation and/or states, depending on the survey. These sample weights are designed so that analyses can be conducted across a range of demographic or geographic characteristics and properly reflect the experiences of individuals with those characteristics in the population.

If the survey response rate is too low, a “survey bias analysis” must be completed to ensure that the results will be sufficiently reliable for public use. For longitudinal surveys, such as the Early Childhood Longitudinal Study, multiple sets of weights must be constructed so that researchers using the data will be able to appropriately account for respondents who answered some but not all of the survey waves.

NCES surveys also include “constructed variables” to facilitate more convenient and systematic use of the survey data. Examples of constructed variables include socioeconomic status or family type. Other types of survey data also require special analytic considerations before they can be released. Student assessment data, such as the National Assessment of Educational Progress (NAEP), require that a number of highly complex processes be completed to ensure proper estimations for the various populations being represented in the results. For example, just the standardized scoring of multiple choice and open-ended items can take thousands of hours of design and analysis work.

Privacy Protection

Release of data by NCES carries a legal requirement to protect the privacy of our nation’s children. Each NCES public-use dataset undergoes a thorough evaluation to ensure that it cannot be used to identify responses of individuals, whether they are students, parents, teachers, or principals. The datasets must be protected through item suppression, statistical swapping, or other techniques to ensure that multiple datasets cannot be combined in such a way as to identify any individual. This is a time-consuming process, but it is incredibly important to protect the privacy of respondents.

Data and Report Release

When the final data have been received and edited, the necessary variables have been constructed, and the privacy protections have been implemented, there is still more that must be done to release the data. The data must be put in appropriate formats with the necessary documentation for data users. NCES reports with basic analyses or tabulations of the data must be prepared. These products are independently reviewed within the NCES Chief Statistician’s office.

Depending on the nature of the report, the Institute of Education Sciences Standards and Review Office may conduct an additional review. After all internal reviews have been conducted, revisions have been made, and the final survey products have been approved, the U.S. Secretary of Education’s office is notified 2 weeks in advance of the pending release. During this notification period, appropriate press release materials and social media announcements are finalized.

Although NCES can expedite some product releases, the work of preparing survey data for release often takes a year or more. NCES strives to maintain a balance between timeliness and providing the reliable high-quality information that is expected of a federal statistical agency while also protecting the privacy of our respondents.  

 

By Thomas Snyder

Data Tools for College Professors and Students

Ever wonder what parts of the country produce the most English majors? Want to know which school districts have the most guidance counselors? The National Center for Education Statistics (NCES) has all the tools you need to dig into these and lots of other data!

Whether you’re a student embarking on a research project or a college professor looking for a large data set to use for an assignment, NCES has you covered. Below, check out the tools you can use to conduct searches, download datasets, and generate your own statistical tables and analyses.

 

Conduct Publication Searches

Two search tools help researchers identify potential data sources for their study and explore prior research conducted with NCES data. The Publications & Products Search Tool can be used to search for NCES publications and data products. The Bibliography Search Tool, which is updated continually, allows users to search for individual citations from journal articles that have been published using data from most surveys conducted by NCES.

Key reference publications include the Digest of Education Statistics, which is a comprehensive library of statistical tabulations, and The Condition of Education, which highlights up-to-date trends in education through statistical indicators.

 

Learn with Instructional Modules

The Distance Learning Dataset Training System (DLDT) is an interactive online tool that allows users to learn about NCES data across the education spectrum. DLDT’s computer-based training introduces users to many NCES datasets, explains their designs, and offers technical considerations to facilitate successful analyses. Please see the NCES blog Learning to Use the Data: Online Dataset Training Modules for more details about the DLDT tool.
 




Download and Access Raw Data Files

Users have several options for conducting statistical analyses and producing data tables. Many NCES surveys release public-use raw data files that professors and students can download and analyze using statistical software packages like SAS, STATA, and SPSS. Some data files and syntax files can also be downloaded using NCES data tools:

  • Education Data Analysis Tool (EDAT) and the Online Codebook allow users to download several survey datasets in various statistical software formats. Users can subset a dataset by selecting a survey, a population, and variables relevant to their analysis.
  • Many data files can be accessed directly from the Surveys & Programs page by clicking on the specific survey and then clicking on the “Data Products” link on the survey website.

 

Generate Analyses and Tables

NCES provides several online analysis tools that do not require a statistical software package:

  • DataLab is a tool for making tables and regressions that features more than 30 federal education datasets. It includes three powerful analytic tools:
    • QuickStats—for creating simple tables and charts.
    • PowerStats—for creating complex tables and logistic and linear regressions.
    • TrendStats—for creating complex tables spanning multiple data collection years. This tool also contains the Tables Library, which houses more than 5,000 published analysis tables by topic, publication, and source.



  • National Assessment of Educational Progress (NAEP) Data Explorer can be used to generate tables, charts, and maps of detailed results from national and state assessments. Users can identify the subject area, grade level, and years of interest and then select variables from the student, teacher, and school questionnaires for analysis.
  • International Data Explorer (IDE) is an interactive tool with data from international assessments and surveys, such as the Program for International Student Assessment (PISA), the Program for the International Assessment of Adult Competencies (PIAAC), and the Trends in International Mathematics and Science Study (TIMSS). The IDE can be used to explore student and adult performance on assessments, create a variety of data visualizations, and run statistical tests and regression analyses.
  • Elementary/Secondary Information System (ElSi) allows users to quickly view public and private school data and create custom tables and charts using data from the Common Core of Data (CCD) and Private School Universe Survey (PSS).
  • Integrated Postsecondary Education Data System (IPEDS) Use the Data provides researcher-focused access to IPEDS data and tools that contain comprehensive data on postsecondary institutions. Users can view video tutorials or use data through one of the many functions within the portal, including the following:
    • Data Trends—Provides trends over time for high-interest topics, including enrollment, graduation rates, and financial aid.
    • Look Up an Institution—Allows for quick access to an institution’s comprehensive profile. Shows data similar to College Navigator but contains additional IPEDS metrics.
    • Statistical Tables—Equips power users to quickly get data and statistics for specific measures, such as average graduation rates by state.

 

 

Back to School by the Numbers: 2019–20 School Year

Across the country, hallways and classrooms are full of activity as students return for the 2019–20 school year. Each year, the National Center for Education Statistics (NCES) compiles back-to-school facts and figures that give a snapshot of our schools and colleges for the coming year. You can see the full report on the NCES website, but here are a few “by-the-numbers” highlights. You can also click on the hyperlinks throughout the blog to see additional data on these topics.

The staff of NCES and of the Institute of Education Sciences (IES) hopes our nation’s students, teachers, administrators, school staffs, and families have an outstanding school year!

 

 

56.6 million

The number of students expected to attend public and private elementary and secondary schools this year—slightly more than in the 2018–19­ school year (56.5 million).

Overall, 50.8 million students are expected to attend public schools this year. The racial and ethnic profile of public school students includes 23.7 million White students, 13.9 million Hispanic students, 7.7 million Black students, 2.7 million Asian students, 2.1 million students of Two or more races, 0.5 million American Indian/Alaska Native students, and 0.2 million Pacific Islander students.

About 5.8 million students are expected to attend private schools this year.

 

$13,440

The projected per student expenditure in public elementary and secondary schools in 2019–20. Total expenditures for public elementary and secondary schools are projected to be $680 billion for the 2019–20 school year.

 

3.7 million

The number of teachers in fall 2019. There will be 3.2 million teachers in public schools and 0.5 million teachers in private schools.

 

3.7 million

The number of students expected to graduate from high school this school year, including 3.3 million from public schools and nearly 0.4 million from private schools.

 

19.9 million

The number of students expected to attend American colleges and universities this fall—lower than the peak of 21.0 million in 2010. About 13.9 million students will attend four-year institutions and 6.0 million will attend two-year institutions.

 

56.7%

The projected percentage of female postsecondary students in fall 2019, for a total of 11.3 million female students, compared with 8.6 million male students.

 

By Sidney Wilkinson-Flicker

Introducing the 2020 Classification of Instructional Programs (CIP) and Its Website

The National Center for Education Statistics (NCES) is pleased to announce the release of the 2020 Classification of Instructional Programs (CIP), which reflects the various programs of study being offered at postsecondary institutions around the country. This is the sixth edition of the CIP and contains more than 300 new programs of study, which can be searched on the new 2020 CIP website.

The CIP is updated about every 10 years to reflect changes in instructional program structures and the introduction of new fields of study. Beginning next year, postsecondary institutions will use the 2020 CIP when they report the degrees and certificates awarded for the 2020 Integrated Postsecondary Education Data System (IPEDS) Completions Survey.

The CIP is a taxonomy of instructional programs that provides a classification system for the thousands of different programs offered by postsecondary institutions. Its purpose is to facilitate the organization, collection, and reporting of fields of study and program completions. CIP Codes and IPEDS Completions Survey data are used by many different groups of people for many different reasons. For instance, economists use the data to study the emerging labor pools to identify people with specific training and skills. The business community uses IPEDS Completions Survey data to help recruit minority and female candidates in specialized fields, by identifying the numbers of these students who are graduating from specific institutions.  Prospective college students can use the data to look for institutions offering specific programs of postsecondary study at all levels, from certificates to doctoral degrees.

To allow sufficient time for institutions to update their reporting systems, NCES is releasing the 2020 CIP and the new website approximately one year before it will be implemented.

 



 

The 2020 CIP website has many features, including multiple search options, an FAQ section, resources, a help page, and contact information. Users can search the 2020 CIP by code or keyword and the resource page contains lists of new, moved, and deleted CIP codes as well as Word and Excel versions of the 2020 CIP and 2010 CIP. The website also contains an online data tool called the CIP Wizard, which enables users to focus on changes at a specific institution between the 2010 and 2020 CIPs.

 



 

The CIP Wizard requires users to specify an institution by either name or IPEDS ID, a unique identification number assigned by NCES. The Wizard then searches the last 3 years of the IPEDS Completions Survey and compiles the CIP codes used by that institution. The Wizard also crosswalks an institution’s 2010 CIP codes to its 2020 CIP Codes and generates a report that categorizes the codes into the following categories:

  • No substantive changes—codes that did not change from the previous version of the CIP
  • New codes—codes that were added to this version of the CIP
  • Moved codes—codes that were relocated and have two references: one in the former location  and one in the current location
  • Deleted codes—codes that were removed from the previous version of the CIP

By looking through the CIP Wizard report, an institution can see exactly what changes have been made to the CIP codes it used in the last 3 years of Completions Survey data.

 



 

The CIP Wizard also suggests new CIP codes that might be of interest to the user, allows the user to export a report as either a Word or Excel file, and creates a file of CIP codes that can be uploaded to an institution’s reporting system.

Over the next several months, NCES will be preparing web-based tutorials on how to use the CIP website and the CIP Wizard. Until then, users can reference a list of frequently asked questions and a detailed help document, and also submit  questions by email to CIP2020@ed.gov.

 

 

By Michelle Coon

Revenues and Expenditures for Public Schools Rebound for Third Consecutive Year in School Year 2015–16

Revenues and expenditures per pupil on elementary and secondary education increased in school year 2015–16 (fiscal year [FY] 2016), continuing a recent upward trend in the amount of money spent on public preK–12 education. This is the third consecutive year that per pupil revenues and expenditures have increased, reversing three consecutive years of declines in spending between FY 10 and FY 13 after adjusting for inflation. The findings come from the recently released Revenues and Expenditures for Public Elementary and Secondary School Districts: School Year 2015–16 (Fiscal Year 2016).

 

 

The national median of total revenues across all school districts was $12,953 per pupil in FY 16, reflecting an increase of 3.2 percent from FY 15, after adjusting for inflation.[1] This increase in revenues per pupil follows an increase of 2.0 percent for FY 15 and 1.6 percent for FY 14. These increases in revenues per pupil between FY 14 and FY 16 contrast with the decreases from FY 10 to FY 13. The national median of current expenditures per pupil was $10,881 in FY 16, reflecting an increase of 2.4 percent from FY 15. Current expenditures per pupil also increased in FY 15 (1.7 percent) and FY 14 (1.0 percent). These increases in median revenues and current expenditures per pupil between FY 14 and FY 16 represent a full recovery in education spending following the decreases from FY 10 to FY 13.

The school district finance data can help us understand differences in funding levels for various types of districts. For example, median current expenditures per pupil in independent charter school districts were lower than in noncharter and mixed charter/noncharter school districts in 21 out of the 25 states that were able to report finance data for independent charter school districts. Three of the 4 states where median current expenditures were higher for independent charter school districts had policies that affected charter school spending. The new School District Finance Survey (F-33) data offer researchers extensive opportunities to investigate local patterns of revenues and expenditures and how they relate to conditions for other districts across the country.

 

 

By Stephen Q. Cornman, NCES; Malia Howell, Stephen Wheeler, and Osei Ampadu, U.S. Census Bureau; and Lei Zhou, Activate Research


[1] In order to compare from one year to the next, revenues are converted to constant dollars, which adjusts figures for inflation. Inflation adjustments use the Consumer Price Index (CPI) published by the U.S. Department of Labor, Bureau of Labor Statistics. For comparability to fiscal education data, NCES adjusts the CPI from a calendar year basis to a school fiscal year basis (July through June). See Digest of Education Statistics 2016, table 106.70, https://nces.ed.gov/programs/digest/d16/tables/dt16_106.70.asp.