Describe the data you will produce in the course of the project (see the NSF-ENG directorate guidelines for what types of data to include). State roughly how much data you will generate and at what rate, if possible. If you will be generating multiple data sets, answer the questions below for each data set. If you know you won’t be keeping all the data you generate, state what you will and won’t retain and why.
Please note that these DMP excerpts are copyrighted by their respective authors.
Preferred:
“This research project will generate data resulting from sensor recordings (i.e. earth pressures, accelerations, wall deformation and displacement and soil settlement) during the centrifuge experiments. In addition to the raw, uncorrected sensor data, converted and corrected data (in engineering units), as well as several other forms of derived data will be produced. Metadata that describes the experiments with their materials, loads, experimental environment and parameters will be produced. The experiments will also be recorded with still cameras and video cameras. Photos and videos will be part of the data collection.”
“A total storage demand of 50 GB is anticipated at the University of Michigan, and 50 GB at Auburn University.”
“Based on the previous viscoelastic turbulent channel flow simulations, the amount of resulting binary data is estimated around 40 TB per year. Some text format data files are also required for post-processing in the laboratory and are anticipated to be around 1 TB per year.”
These three examples all illustrate parts of a good answer to this question. The first lists the various types of data that will be generated, the second states how much data will be created in total, and the third estimates the volume of data to be created per year. Your plan should address all of these elements, if possible.
Less Developed:
“The main goal of this project is to conduct simulations to better understand the thermosphere and ionosphere. Therefore, the data that will be produced from this project are simulations. The model that we utilize produces 3D data covering from 100 km to 600 km altitude with roughly 50 grid spacings. In the latitude and longitudinal directions, the spacing is typically 2.5 x 2.5 degrees.”
A common error in this section is to lapse into a recap of the project summary, as illustrated by this example. Stick to describing the types of data to be generated, touching on methods only when necessary to explain what (or how much) data you will be creating.
Preferred:
“In one year, we will perform approximately 2 to 3 simulations. This means ~100 3D plots, 30 restart files, 1000 EUV, X-ray and LASCO-like images, 10 satellite files, 1000 2D plot files (total of about 150 GB of data per year).”
This is a good level of detail to include.
Less Developed:
“The nature of the data or other materials produced under this NSF-sponsored project will include data characteristics such as observational, experimental, reference, derived, simulated and/or other. The data types referenced could include data generated by computer, data collected from sensors or instruments, images, video files, reports, and/or other.”
This paragraph reads as if it was lifted from funder documentation and tells the reviewer nothing about the nature of the data to be generated under this proposal.
Preferred:
“The field data collection will augment existing data sets without creating issues of redundancy. The following existing data sets will be used:
Topographic data: data will be obtained through the geodata web portal (http://gos2/geodata.gov/); the data will be used to describe topography of all case study basins.
Soil texture data: available from the SSURGO database of the Natural Resources Conservation Service (http:// soils.usda.gov/survey/geography/ssurgo/). These data will be used to infer soil water retention and hydraulic conductivity relationships using pedotransfer functions. . .”
This response clearly states the nature and provenance of the existing data sets to be used, and explains how they relate to the new data sets that are to be produced in the course of the proposed project.
Less Developed:
“Our proposed work does not collect any new observations; we only use existing observations and those in the process of being collected through previously funded NSF projects.”
“This is a workshop proposal, so no data will be generated. As such, there is no need to formulate a data management plan.”
Even if the project won’t produce new observations, a re-analysis of existing data will likely result in new datasets that fall under the ENG definition of data. Also, it may be true that no data will be generated by a workshop, however a more thorough justification should be provided.