Writing an NSF Data Management Plan
Beginning January 18, 2011, all NSF proposals are required to include a data management plan. The eScience Institute offers several resources to help you write and deliver a competitive plan, including platforms and services for data management, standard text to include in proposals that describes these services, and consulting expertise for unique challenges.
This post presents information about the new NSF requirements. We have also created a Wiki page with this information (Note: this page is internal to UW). In addition, the Data Services guide on the UW Libraries web site provides valuable resources for writing proposals. Share your comments on this blog and add your own information to the Wiki page. We'll post examples of successful data management plans, conversations with NSF Program Officers, and any other relevant information on the wiki page.
- Requirements Summary
- eScience Services
- Detailed Requirements
- Directorate-Specific Requirements
NSF does not prescribe specific content for the data management plan. The following summarizes important points:
- The data management plan is two pages maximum, and does not count against the 15-page limit
- The plan is uploaded to Fastlane as a supplementary document
- Broadly, the data management plan must specify "how the proposal will conform to the NSF policy on dissemination and sharing of research results." This policy includes the following tenets:
- Publish promptly
- Share your data
- Share software and invention (however...)
- You can retain IP
NSF will enforce this policy through the review process
- Despite the name, the requirements do not emphasize a description of how your data will be managed internally for project investigators. The focus is on how you will share and disseminate your data externally, long term. The plan itself should include:
Metadata. Descriptions of all materials produced, e.g., "types of data, samples, physical collections, software, curriculum materials"
Standards. The "standards to be used for data and metadata format and content" or appropriate documentation where such standards do not exist
Access. Policies for "access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements"
Archive. Plans for archiving and "preservation of access"
Specific Directorates may have additional recommendations or requirements.
The eScience Institute offers the following resources to help you write and deliver a competitive plan:
lolo: a service of UW-IT, developed in collaboration with the UW eScience Institute, providing scalable storage for the UW research community. lolo is intended to provide UW researchers with an alternative to building and operating their own storage systems. lolo offers two classes of storage:
- A general purpose file storage service, ideally suited to collaboration with peers on and off campus.
- An archive file storage service, intended for the safe long term storage of large files that change rarely and are recalled infrequently.
Both systems provide 10Gbs connections to campus, the Internet, and Hyak, the UW central HPC resource. Both appear as ordinary filesystems and access is via standard protocols. Both systems can scale to several PB.
Users reserve capacity in 8TB chunks for three year periods with monthly billing. Initial pricing is $1.44/GB/year for collaboration filesystem capacity and $0.17/GB/year for archives.
SQLShare: Lightweight cloud-based database service for sharing tabular data (e.g., spreadsheets, ASCII files, etc.).
Web Services: eScience can help you find a host for your data that conforms to relevant data exchange standards, or deploy your own, on or off the cloud. Relevant standards include the Open Geospatial Consortium (Web Mapping Service, Web Feature Service, Sensor Observation Service, Sensor Web Enablement, and others), Federal Geospatial Data Commission, Open Data Access Protocol (OPeNDAP).
Cloud hosting: In some circumstances, Microsoft or Amazon are able to offer free or reduced-cost hosting for scientific datasets through their Dallas and S3 platforms, respectively. The eScience Institute can facilitate this agreement when appropriate. The cloud offers an ideal hosting option, as anyone can access your data resources on their dime, without consuming any of your computing resources. Further, both the Microsoft and Amazon cloud platforms offer a variety of flexible access methods for all data types and sizes. For example, Elastic MapReduce service allows access via efficient distributed algorithms programmed in a high-level language.
This requirement was first proposed by the National Science Board in 2005 as part of the report Long Lived Digital Data Collections: Enabling Research and Education in the 21st Century:
The NSF should require that research proposals for activities that will generate digital data, especially long-lived data, should state such intentions in the proposal so that peer reviewers can evaluate a proposed data management plan.
Chapter 2 of that report offers the Protein Data Bank as an exemplar of a successful Digital Data Repository, and also introduces the concept of Digital Data Common Space:
...defined here as elements of infrastructure, much as a university library or a campus core facility for DNA sequencing would be considered as infrastructure. The data commons consists of the cyberinfrastructure for data preservation, retrieval and analysis, robust communications links for global access, and data scientists who direct the facility and can act as consultants and collaborators to the researchers served by the facility.
Here is relevant text from the Grant Proposal Guide. There is also a FAQ from NSF.
Plans for data management and sharing of the products of research. Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan.” This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:
- the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the projects;
- the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
- policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
- policies and provisions for re-use, re-distribution, and the production of derivatives;
- plans for archiving data, samples, and other research products, and for preservation of access to them.
Data management requirements and plans specific to the Directorate, Office, Division, Program, or other NSF unit, relevant to a proposal are available at: http://www.nsf.gov/bfa/dias/policy/dmp.jsp. If guidance specific to the program is not available, then the requirements established in this section apply.
Simultaneously submitted collaborative proposals and proposals that include subawards are a single unified project and should include only one supplemental combined Data Management Plan, regardless of the number of non-lead collaborative proposals or subawards included. Fastlane will not permit submission of a proposal that is missing a Data Management Plan. Proposals for supplementary support to an existing award are not required to include a Data Management Plan.
A valid Data Management Plan may include only the statement that no detailed plan is needed, as long as the statement is accompanied by a clear justification. Proposers who feel that the plan cannot fit within the supplement limit of two pages may use part of the 15-page Project Description for additional data management information. Proposers are advised that the Data Management Plan may not be used to circumvent the 15-page Project Description limitation. The Data Management Plan will be reviewed as an integral part of the proposal, coming under Intellectual Merit or Broader Impacts or both, as appropriate for the scientific community of relevance.
- Expected data
Directorate-specific requirements are available on the NSF website under "Requirements by Directorate, Office, Division, Program, or other NSF Unit."
MPS (Math and Physics Sciences):
Each division within the MPS Directorate (AST, CHE, DMR, DMS, PHY) has issued guidance on preparing a Data Management Plan. Guidance for all divisions is contained in this single PDF document at the NSF website.
The Engineering Directorate released additional material describing specific requirements. These requirements are paraphrased here (source: Blog on Scholarly Communication at Texas A&M).
- Describe the types of data, samples, physical collections, software, curriculum materials, and other materials to be proced in the course of the project.
- Describe the expected types of data to be retained
- Period of data retention
- Minimum retention of research data is 3 years after the conclusion of the award or three years after public release, whichever is later.
- Additional guidelines on data retention provisions with respect to publication, patents, student research, etc. are provided in the document.
- Data formats and dissemination
- Describe specific data formas, media, and dissemination approaches used to share the data and metadata with others
- Describe policies for public access, including provisions for protecting privacy, confidentiality, security, intellectual property, other rights or managing other restrictions.
- Describe how data are to be shared and managed with partners, if applicable, or other major stakeholders or user communities.
- Clearly indicate publication delay policies, if applicable
- Data storage and preservation of access
- Describe physical and cyber resources and facilities used for preservation and storage of research data.
Latest eScience News
Please help us support your research by including the following acknowledgment in publications to which we have contributed:
Supported in part by the University of Washington eScience Institute.