Skip to Main Content

Data Management Plans for the Social Sciences

Suggested resources for designing data management plans (DMP) for your research project.

Resources - File Formats

Metadata Resources

NSF Guidance

"The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others."

Data Management for NSF SBE Directorate Proposals and Awards, p. 3. https://www.nsf.gov/sbe/DMP/SBE_DataMgmtPlanPolicy_RevisedApril2018.pdf

Examples

Please note that these DMP excerpts are copyrighted by their respective authors.

Preferred:
“Verilog, SPICE, and MATLAB files generated will be processed and submitted to FTP servers as .mat files with TXT documentation. The data will be distributed in several widely used formats, including ASCII, tab-delimited (for use with Excel), and MAT format. Instructional material and relevant technical reports will be provided as PDF. Digital video data files generated will be processed and submitted to the FTP servers in MPEG-4 (.mp4) and .avi formats. Variables will use a standardized naming convention consisting of a prefix, root, suffix system.”

“Plasma image data will be RGB colored JPG or TIFF format with resolution determined by the camera. Video data will be RGB colored AVI format.”

These examples illustrate a preference for non-proprietary data formats based on open standards. This is preferred.

Less Developed:
“The data format includes digital data recorded by computers and instruments and metadata recorded in lab notebooks and reports.”

This answer is too vague to be informative.

Show/Hide Example 2

Show/Hide Example 3

Summary

Describe the formats (file types) your data will be in. Proprietary formats are more difficult to preserve, as the software and hardware that reads them quickly becomes obsolete. Data should be stored in stable, non-proprietary formats, preferably those based on open and published standards, whenever possible. If your research will generate files in proprietary formats, consider converting those files into formats based on open standards for sharing and archival purposes.

Current DMP guidelines are not specific about metadata requirements. 

A metadata record is a file that captures all details about a data set that another researcher would need to make use of the data set in a separate or related line of inquiry. Metadata captures the who, what, when, where, why and how of the data you produce. When data curators talk about metadata they are normally referring to a machine-readable description tha comes in a standardized format, often defined by an XML schema.

Format Questions to Consider

  • Which file formats will you use for your data and why?
  • Which standards will you use and why have you chosen them? If your data are generated in a non-standard format, how will you convert them to a more accessible format? Where and how will you make the conversion code available, if you plan to write it?
  • If there are no applicable standard formats, how will you format your data so that other researchers can make use of them?
  • Who on your team will have the responsibility of ensuring that data standards are properly applied and data properly formatted? What procedures will be in place to ensure that this is done consistently throughout the duration of the project?

Metadata Questions to Consider

  • What contextual details (metadata) are needed to make your data meaningful?
  • What form or format will the metadata describing your data take? Which metadata standards will you use? If there is no applicable standard, how will you describe your data in a way that will make them accessible to others?
  • How will metadata files be generated for each of the data sets that you produce? Who will do the work of data description and how will the costs be borne?
  • Who on your team will be responsible for ensuring that metadata standards are followed and are correctly applied to the corresponding data sets?