Research Guides: Research Data: Finding, Managing, Sharing: Preservation

What is Preservation and Why Preserve Data?

Preservation ensures longevity of and continued access to your data. It may seem like an extra step, but it is a crucial part of the data life cycle. Preservation of your data may be something that you want to do so that you can access your data into the future; but it may also be a requirement of your funding agency. As you create a Data Management Plan you will probably be asked to address issues of data preservation. Below you'll find more information and guidance on format types and preservation standards. Consider using several preservation methods and locations for your data. Redundancy is a good strategy!

Data Repositories

There are many discipline-specific repositories where researchers can choose to deposit their data. Discipline-specific repositories often better accommodate specific disciplinary needs, but not all disciplines have a repository which is why the U-M Library created Deep Blue Data.

Deep Blue Data (UM)
Extensive, multi-disciplinary list of repositories
List authored by Open Access Directory, hosted by School of Library and Information Science at Simmons College.
Registry of Research Data Repositories (re3data)
DataONE
A repository for environmental data.
Data PASS - Data Preservation Alliance for the Social Sciences
Dataverse Project
A repository for social science data (a collaboration between the Institute for Quantitative Social Science and Harvard University).
Dryad
A general-purpose repository in the Sciences.
Open Context
A repository for archaeological and related data.

Evaluating a Repository for Preservation

Before depositing your data in a repository, ask a few questions about their preservation policy.

What is the repository's commitment to preservation? Most repositories that are committed to long-term preservation will say so, probably in a preservation policy or other similar document. These policies could include important information like retention periods and deposit limits.
Is the preservation format-specific? Some repositories will commit to the long-term preservation, particularly a commitment to migrate, of specific formats only.
Is the repository certified? While not a requirement, some well-established repositories are certified by external agents such as the Data Seal of Approval. Certifications show that a repository has opened its doors to outside evaluators looking for specific preservation based criteria.

Best Formats for Preservation

Something to look for in a file format used for preservation is the likelihood that it will not become obsolete soon, and when it does a successor format will be available for easy migration. A format that is open (i.e., unencumbered with restrictive patents and well documented) and has widespread use will likely be more suitable for preservation than formats that are proprietary and not heavily used. A good place to start for specific guidance is the Library of Congress Recommended Formats Statement.

Versioning

Versioning is the practice of assigning version names or numbers to progressive states of data and datasets. Versioning ensures standardization and allows for repeatability in research, as well as comparison over time. Recording versions is an important piece of your metadata. MBox will aid you in keeping track of multiple versions of files. There are also software solutions available for more complex versioning needs (e.g. GitHub).