Implementation Guide for Public Access to Research Data
The Institute of Education Sciences (IES) expressed its commitment to advancing education research through the sharing of scientific data in the IES Policy Statement on Public Access to Data Resulting From IES Funded Grants. The purpose of this document is to describe the implementation of this policy on public access to data and to provide guidance to applicants for preparing the Data Management Plan (DMP) that must outline data sharing and be submitted with the grant application. The DMP should describe a plan to provide discoverable and citable dataset(s) with sufficient documentation to support responsible use by other researchers, and should address four interrelated concerns—access, permissions, documentation, and resources—which must be considered in the earliest stages of planning for the grant.
Rationale for Providing Public Access to Data
IES believes that data sharing is an important component of the scientific process. Data sharing provides opportunities for other researchers to review, confirm, or challenge study findings. In addition, data sharing can enhance scientific inquiry through a variety of other analytic activities, including the use of shared data to: test alternative theories or hypotheses; explore different sets of research questions than those targeted by the original researchers; combine data from multiple sources to provide potential new insights and areas of inquiry; and/or conduct methodological studies to advance education research methods and statistical analyses.
The IES policy on providing access to data is focused on data collected with grants funds provided by the two IES research centers: the National Center for Education Research and the National Center for Special Education Research. Beginning with grant applications submitted in 2012 for Fiscal Year (FY) 2013 awards, researchers applying for Goal Four Scale-Up Evaluation grants under competitions 84.305A for Education Research or 84.324A for Special Education Research were expected to include data management plans in their applications. The requirement for providing public access to data was extended to include Goal Three Efficacy and Replication grants for applications submitted for FY 2015. For FY 2016, the requirement will also apply to the Research Networks on Critical Problems of Policy and Practice competition (84.305N).
Public access to data in this policy refers to final research data. These data are the recorded factual materials commonly accepted in the scientific community as necessary to document and support research findings. For most studies, an electronic file will constitute the final research data. This dataset should include the final, cleaned data and may include both original data and derived variables, which will be fully described in accompanying documentation. Note that final research data does not mean summary statistics or tables, but rather the factual information on which summary statistics and tables are based. For the purposes of this policy, final research data do not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, or communications with colleagues.
This policy applies to new data collection as well as to data obtained through transforming or linking extant datasets. There may be circumstances, such as when a state or district will not allow student data to be released, where investigators will not be able to share their complete data set. However, IES expects primary data collected by the project or extant data obtained from a private source to be shared. In many cases, de-identified data shared through either open or restricted access will be sufficient to meet requirements for protecting the confidentiality of participants.
Requirement and Responsibility for the Data Management Plan
When the Principal Investigator (PI) and the authorized institutional official sign the cover page of the 84.305A, 84.324A, or 84.305N grant application, they will be assuring compliance with IES policy on data sharing as well as other policies and regulations governing research awards. Once the DMP is approved by IES, then the PI and the institution are required to carry it out, and to report progress and problems through the regular reporting channels. Compliance with IES data sharing requirements is expected even though the final dataset may not be completed and prepared for data sharing until after the grant has been completed. In cases where the PI/grantee is non-compliant with the requirements of the data sharing policy or DMP, subsequent awards to individuals or institutions may be affected.
Contents of the Data Management Plan
The DMP must provide a comprehensive overview of how the final research data will be shared, and should not exceed five pages. DMPs are expected to differ, depending on the nature of the project and the data collected. However, applicants should address the following in the DMP:
As part of the discussion of the roles and responsibilities, applicants should also consider changes that will occur should the Principal Investigator and/or Co- Principal Investigators leave the project or institution. In addition, since access to data is expected for 10 years, the DMP should describe how access to data will be supported after the grant ends.
The DMP will be included in an appendix in the grant application, and will not be counted towards the Project Narrative's limit of 25 pages. IES Program Officers will be responsible for reviewing the proposed DMP to ensure that it addresses all of the components listed under "Contents of the Data Management Plan." Once awards are made, IES Program Officers will be responsible for monitoring the DMP over the course of the grant period in regular monitoring activities (i.e., update calls and annual reports). The grantee will also describe plans for executing the DMP in the final report.
Timing of Providing Access to Data
Timely data sharing is important to the scientific process. IES thus expects that data will be shared no later than when the main findings from the final study dataset are published in a peer-reviewed scholarly publication. As noted above, if findings are published after the grant period has ended, grantees are still required to adhere to the DMP.
Human Subjects and Privacy Issues
Researchers funded by IES must be committed to protecting the rights and privacy of human subjects at all times. Data sharing must not compromise this commitment. IES recognizes that providing access to data may be complicated or limited by institutional policies, local Institutional Review Board (IRB) rules, as well as state and federal laws and regulations that address issues of the rights and privacy of human subjects. It is the responsibility of the researchers to develop a data management plan that protects the rights of study participants and confidentiality of the data, as required by their IRB and state and federal laws and regulations.
Data that are to be shared should be free of identifiers that would allow linkages to individuals participating in the research as well as other elements that could lead to deductive disclosure of the individual study participants. Deductive disclosure is particularly challenging in research in which there might be "indirect" identifiers, such as data collection that involves study participants drawn from small geographic areas or rare populations (e.g., individuals with low-incidence disabilities), when there is a joint occurrence of several rare factors (e.g., ethnic minority students attending a rural school with unique characteristics), or when, as is often the case in education research, there is a hierarchical structure in the data (e.g., students nested within classrooms within schools). Disclosure risk might also be a concern when databases are linked or when digital photographs or videos include tags with identifying information. In cases where data cannot be free of identifiers or when identifiers are important for linking datasets, then investigators should consider restrictions on data sharing, as provided by data archives or enclaves described in the Methods section below.
To prepare for providing public access to data, investigators should plan their study design and procedures to enable data access. One important consideration will be the consent forms and agreements used in recruiting individuals and/or institutions (e.g., schools, early childhood programs) to participate in research studies. The content of the informed consent can limit how that data can subsequently be used, including data sharing. Investigators should seek to optimize the opportunity for data sharing while working with their IRB to protect the privacy rights of study participants and confidentiality of the data.
If researchers believe that full data sharing is not possible, they must provide a comprehensive written rationale in their DMP. IES approval of the DMP is required prior to the commitment of funds for the grant.
IES acknowledges that there may be issues associated with providing access to data when the data collected are proprietary (e.g., when a published curriculum is being evaluated). Any restrictions on data sharing, such as a delay of disclosing proprietary data, should be presented in the DMP and will be considered by IES Program Officers. If proprietary issues emerge during the course of the research, they should be brought to the attention of the IES Program Officer, and the DMP will be reviewed in light of these issues.
Methods for Providing Access to Data
There are alternative methods that researchers can use for providing access to data, and IES anticipates that investigators will decide to use a particular method based on a variety of factors, including size and complexity of the dataset, sensitivity of the data collected, and anticipated number of requests for data sharing. The available methods include the (1) investigator and institution taking on the responsibility for data sharing, (2) use of a data archive or data enclave, or (3) use of some combination of these methods.
Investigators sharing data under their own auspices may send or make data available to the requestor through a variety of means, including their institutional or personal website. In deciding on whether this method is appropriate, investigators should also plan how data sharing will continue if the Principal Investigator and/or Co-Principal Investigators leave the project or institution. Investigators sharing data under their own auspices should consider using a data-sharing agreement to impose appropriate limitations on users. Such an agreement usually includes the criteria for data access, whether or not there are any conditions for research use, and incorporates privacy and confidentiality standards to ensure data security at the recipient site and prohibit manipulation of data for the purposes of identifying subjects.
A data archive is a place where data can be stored and distributed to the scientific community for further analysis. Data archives typically require extensive data documentation, and work to ensure privacy and confidentiality standards. Data archives can be particularly attractive for investigators concerned about a large volume of requests, vetting requests, or providing technical assistance for users seeking help with analyses.
A data enclave provides a controlled, secure environment in which eligible researchers can perform analyses using restricted data resources without downloading the data to their own computer. Researchers can use a data archive or enclave when datasets cannot be distributed to the general public, for example, because of participant confidentiality concerns, third-party licensing, or use agreements that prohibit redistribution.
Investigators may also wish to develop a mixed method for data sharing that allows for more than one version of the dataset and provides different levels of access depending on the version. For example, a redacted dataset could be made available for general use, but stricter controls through a data archive or enclave would be applied if access to more sensitive data was required.
Documentation that provides all the information necessary for other researchers to use the data must be prepared. The documentation should include a summary of the purpose of the data collection, methodology and procedures used to collect the data, timing of the data collection, as well as details of the data codes, definition of variables, variable field locations, and frequencies. The data documentation should be a comprehensive and stand-alone document that includes all the information necessary to replicate the analysis performed by the original research team.
Funds for Providing Access to Data
The costs of data sharing can be included in grant application budgets. The costs can include those associated with preparing the dataset and documentation and storing the data. The rationale for each of the costs needs to be provided in the grant application Budget Justification section.
Assistance in Preparing a Data Management Plan
Applicants are encouraged to contact IES Program Officers for technical assistance in planning for data sharing and writing the DMP. In addition, a set of frequently asked questions with links to resources on data sharing is available at Frequently Asked Questions About Providing Public Access to Data.