Guidance from NSF's Social, Behavioral and Economic Sciences (SBE) Directorate: "The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others." https://new.nsf.gov/sbe/data-management
Describe the formats (file types) your data will be in. Proprietary formats are more difficult to preserve, as the software and hardware that reads them quickly becomes obsolete. Data should be stored in stable, non-proprietary formats, preferably those based on open and published standards, whenever possible. If your research will generate files in proprietary formats, consider converting those files into formats based on open standards for sharing and archival purposes.
Current DMP guidelines are not specific about metadata requirements.
A metadata record is a file that captures all details about a data set that another researcher would need to make use of the data set in a separate or related line of inquiry. Metadata captures the who, what, when, where, why and how of the data you produce. When data curators talk about metadata they are normally referring to a machine-readable description that comes in a standardized format, often defined by an XML schema.
Formats
Metadata
The example below is taken from Tina Nabatchi, PARCC, Syracuse University and Rebecca McLain, Portland State University, "The Atlas of Collaboration: Building the World's First Large N Database on Collaborative Governance." Funding agency is NSF SBE. Note that "QDR" refers to the Qualitative Data Repository.
Data Formats. We will enter and store data in standard formats (e.g., .docx, .xlsx, .csv), which are used widely and easy to work with in a wide range of applications. For long-term storage, we will follow QDR’s preservation policy and convert data files and outputs to the format most appropriate for the dataset and the repository. The research team will decide on those formats in collaboration with QDR and in response to their expert advice. |
Example language from from Dayna Cueva Alegría, University of Kentucky, NSF SBE, “Water Pollution Governance in Lake Titicaca: Creating Political Spaces of Democratization"
Standards for data format: Software used for the processing and analysis of data collected during fieldwork will include Microsoft Office Suite, ATLAS.ti qualitative coding software, and the Social Network Analysis software UCINET. Metadata generated from interviews and survey responses will include date, location, type of informant (e.g. rural civil society or state). For rural civil society environmental actors, metadata will be gathered on their rural environmental organization attributes; namely, information on its age, size, legal status, financial status, and geographic reach of the organization. For state actors, metadata will be gathered on the state institution they represent; namely, government level, name of state institution/department/organization, age, size, and endowment. Metadata will be stored in ATLAS.ti, Microsoft excel spreadsheets, and UCINET data files in the password-protected external hard drive. |