Skip to Main Content

Data Management Plans for the Social Sciences

Suggested resources for designing data management plans (DMP) for your research project.

Resources - File Formats

Metadata Resources

NSF Guidance

Guidance from NSF's Social, Behavioral and Economic Sciences (SBE) Directorate: "The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others." https://new.nsf.gov/sbe/data-management 

Summary

Describe the formats (file types) your data will be in. Proprietary formats are more difficult to preserve, as the software and hardware that reads them quickly becomes obsolete. Data should be stored in stable, non-proprietary formats, preferably those based on open and published standards, whenever possible. If your research will generate files in proprietary formats, consider converting those files into formats based on open standards for sharing and archival purposes.

Current DMP guidelines are not specific about metadata requirements. 

A metadata record is a file that captures all details about a data set that another researcher would need to make use of the data set in a separate or related line of inquiry. Metadata captures the who, what, when, where, why and how of the data you produce. When data curators talk about metadata they are normally referring to a machine-readable description that comes in a standardized format, often defined by an XML schema.

Questions to Consider

Formats

  • Which file formats will you use for your data and why?
  • Which standards will you use and why have you chosen them? If your data are generated in a non-standard format, how will you convert them to a more accessible format? Where and how will you make the conversion code available, if you plan to write it?
  • If there are no applicable standard formats, how will you format your data so that other researchers can make use of them?
  • Who on your team will have the responsibility of ensuring that data standards are properly applied and data properly formatted? What procedures will be in place to ensure that this is done consistently throughout the duration of the project?

Metadata 

  • What contextual details (metadata) are needed to make your data meaningful?
  • What form or format will the metadata describing your data take? Which metadata standards will you use? If there is no applicable standard, how will you describe your data in a way that will make them accessible to others?
  • How will metadata files be generated for each of the data sets that you produce? Who will do the work of data description and how will the costs be borne?
  • Who on your team will be responsible for ensuring that metadata standards are followed and are correctly applied to the corresponding data sets?

Example 1

The example below is taken from Tina Nabatchi, PARCC, Syracuse University and  Rebecca McLain, Portland State University, "The Atlas of Collaboration: Building the World's First Large N Database on Collaborative Governance." Funding agency is NSF SBE. Note that "QDR" refers to the Qualitative Data Repository.

Data Formats. We will enter and store data in standard formats (e.g., .docx, .xlsx, .csv), which are used widely and easy to work with in a wide range of applications. For long-term storage, we will follow QDR’s preservation policy and convert data files and outputs to the format most appropriate for the dataset and the repository. The research team will decide on those formats in collaboration with QDR and in response to their expert advice.

Example 2

Example language from from Dayna Cueva Alegría, University of Kentucky, NSF SBE, “Water Pollution Governance in Lake Titicaca: Creating Political Spaces of Democratization"

Standards for data format: Software used for the processing and analysis of data collected during fieldwork will include Microsoft Office Suite, ATLAS.ti qualitative coding software, and the Social Network Analysis software UCINET. Metadata generated from interviews and survey responses will include date, location, type of informant (e.g. rural civil society or state). For rural civil society environmental actors, metadata will be gathered on their rural environmental organization attributes; namely, information on its age, size, legal status, financial status, and geographic reach of the organization. For state actors, metadata will be gathered on the state institution they represent; namely, government level, name of state institution/department/organization, age, size, and endowment. Metadata will be stored in ATLAS.ti, Microsoft excel spreadsheets, and UCINET data files in the password-protected external hard drive.