Skip to Main Content

Data Management Plans for the Social Sciences

Suggested resources for designing data management plans (DMP) for your research project.

Research Data Repositories

How to Find Repositories

Storage Services

NSF Guidance

"The DMP should describe physical and cyber resources and facilities that will be used for the effective preservation and storage of research data. These can include third party facilities and repositories." https://new.nsf.gov/sbe/data-management

Summary

Once data are ready to be archived and shared they will most likely need to be transferred to a repository or data center with a commitment to long-term curation. Consider both backup and archival strategies as part of your data management planning process.

Your first choice for long-term data preservation should be a disciplinary repository serving a relevant area of research. If no such repository exists, consider our institutional repository, Deep Blue Data. Perpetual archiving in a curated disciplinary or institutional repository is the preferred solution for long-term data preservation. If there are no applicable repositories, describe how you will keep the data accessible for its expected useful lifespan.

Two ways to approach finding a repository (please note that services mentioned are options for you to investigate further, not endorsements):

Flowchart with data deposit options

Questions to Consider

  • Which of the data you plan to generate will have long-term value to others? 
  • What is your long-term strategy for maintaining, curating, and archiving your data? How will you ensure access beyond the life of the project?
  • Which archive, repository, or database have you identified as the best place to deposit your data? If there are no appropriate disciplinary repositories, what tools will you use to preserve and disseminate your data and what resources will you need to make use of those tools?
  • What procedures does the repository have in place for preservation and backup? Are there any security measures that need to be taken when storing and distributing the data (e.g. permissions management, restrictions on use)? Who will manage these security procedures and how?
  • What procedures does the repository have in place for forward migration of storage technologies, to avoid obsolescence?
  • What data preparation, description, or cleaning procedures will be necessary to prepare data for archiving and sharing (e.g. quality or consistency checks, de-identification, insuring compliance with IRB requirements, obtaining consent from project members or other stakeholders)?
  • What metadata and other documentation will be submitted alongside the data (or created after deposit) to ensure the data can be found and used by others?
  • Will any other related information be deposited (e.g. publications, software, reports)?
  • How much will it cost to preserve and disseminated the data and how will these costs be covered?

Practical considerations when choosing a repository

What is accepted

  • Are there limits on formats, file size, or by subject area?
  • Access restrictions (if you need to limit access, is that possible - and vice versa: if you want to ensure your data is accessible, will it be open)?

Logistics

  • Who can deposit work? How much cleanup or preparation is necessary beforehand? How much help is provided?
  • Is the deposit process straightforward? How long does it take?

Costs

  • Are there fees to deposit your work?

Accessibility

  • how long will data be kept? Will it be migrated to new formats? 
  • What does "preservation" mean - i.e., bit-level preservation only?

Example 1

Excerpt from  Laura Garbes, Brown University, NSF SBE, with Andrew Creamer, Science Data Specialist, Brown University,  “Analyzing Diversity Efforts in Public Radio Organizations – A comparative approach to performance standards in the workplace” (section 5, "Data storage and preservation of access")

The data produced in this proposed project includes analyses of publicly available quantitative data and qualitative data obtained from the analysis of semi-structured interviews. During the project, the Co-PI will store the project’s data in encrypted files in password-protected folders, secured on a password-protected computer. The quantitative IPEDS data will be backed up by saving a copy of the data files locally to the Co-PI’s hard drive as well as remotely by storing a copy of the data on the Co-PI’s institution’s departmental file network drive, files@brown, maintained by Computing & Information Services (CIS), requiring SSO and 2-factor authentication for access. The interview recordings and transcriptions will be stored on files@brown network drive, but once the transcriptions have been stripped of any identifying information from informants, they will be kept locally on the Co-PI’s hard drive for analyses in NVivo. Digital data files with identifiers will be encrypted and the participant key will be stored separately from the research data files. Paper files with identifiers will be secured in a locked cabinet in a locked closet in the PI’s office with access limited to the PI and Co-PI. At the end of the project, the Co-PI will preserve access to the project’s quantitative data by archiving a copy of the aggregate and de-identified data underlying published results and analyses as well as broader impacts materials in the Brown Digital Repository (BDR). Files in the BDR are stored redundantly in off-site storage. Audit trails are maintained for each file to document changes and deletions, and older versions of files are retrievable in the event of unintentional modification. Files also receive a checksum value to allow for periodic auditing of data integrity.

Example 2

Excerpt from Dayna Cueva Alegría, University of Kentucky, NSF SBE, “Water Pollution Governance in Lake Titicaca: Creating Political Spaces of Democratization"

For the purpose of making the data collected from qualitative and multimethod research publicly available and accessible for future use and reanalysis, the Qualitative Data Repository (QDR) (https://qdr.syr.edu/) located at Syracuse University will serve as a data repository and for data-sharing. QDR has been certified as a "trustworthy data repository" by CoreTrustSeal with the capacity to store, publish, and durably preserve social science data and documentation acquired from publicly-funded research for public use and access at no cost. Its trained staff will curate data to make them usable, discoverable, meaningful, citable, secure and durably preserved. The Co-PI has received confirmation from QDR staff that the data are suitable for archiving with QDR. Thus, all redacted interview transcripts (.docx), qualitative coding results (.qdpx), survey responses (.xlsx), SNA diagrams (.doc), results of descriptive statistical analysis from SNA surveys (.txt), audio (.wav), and metadata will be stored in a specific open source virtual archive at (https://data.qdr.syr.edu/). This data will be published with Data Documentation Initiative (DDI) metadata and issued a Digital Object Identifier (DOI) to facilitate findability and allow stable citations to the data. In case participants have given explicit consent to use their names, then identifiable records will be stored in this way.