Research Data Management (Health Sciences)
Data Documentation
Investing time in creating data documentation will help to ensure that your research data can be found, understood, and used by others.
Ideally, you should begin to document your research data at the outset of your project, and continue to create and update the documentation throughout the course of your project. This will decrease the risk that your documentation will be incomplete, or that you will forget important details about your data.
Research data documentation falls in to two categories - project documentation and dataset documentation.
Project documentation includes:
- Where and how the data was collected
- How the data files are structured and organized
- How the data was validated
- How the data was transformed
- Who can access the data, and for what purpose
- How the data can be used, and under what conditions
Dataset documentation includes:
- Variable names and descriptions
- Explanation of codes and classification schemes used
- Algorithms used to transform data
- File format and software use
An essential piece of research data documentation, the readme file provides basic information about a data file or dataset to help ensure that the data can be correctly interpreted, both by you at a later date or by others when sharing or publishing data. For readme file best practices and recommended content, see Guide to writing "readme" style metadata from Cornell University Research Data Management Service Group.
For further guidance on collecting, tracking, and structuring your research data documentation:
-
Basic Approach to Metadataby Stanford University Library
-
Best Practices: Metadataby DataOne
-
Research Data Management TrainingModule on metadata, documentation, and citation, MANTRA
Metadata Standards
Metadata means, simply, data about data. In the context of research data management, metadata refers to both data documentation, and to structured information that conforms to a metadata standard.
Metadata structures are often referred to as schemas. A schema is a logical plan which shows the relationships between metadata elements. The completed metadata are often reported in a machine-readable language such as XML.
Most data repositories require that your project metadata follows a specific standard. A widely-used, general purpose metadata standard is the Dublin Core Metadata Element Set. This standard defines fifteen properties used to describe data:
- Contributor - An entity responsible for making contributions to the resource.
- Coverage - The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
- Creator - An entity primarily responsible for making the resource.
- Date - A point or period of time associated with an event in the life cycle of the resource.
- Description - An account of the resource.
- Format - The file format, physical medium, or dimensions of the resource.
- Identifier - An unambiguous reference to the resource within a given context.
- Language - A language of the resource.
- Publisher - An entity responsible for making the resource available.
- Relation - A related resource.
- Rights - Information about rights held in and over the resource.
- Source - A related resource from which the described resource is derived.
- Subject - The topic of the resource.
- Title - A name given to the resource.
- Type - The nature or genre of the resource.
-
Digital Curation Center (DCC)The Digital Curation Center maintains a catalog of domain-specific metadata standards, including for biology.
-
FAIRsharingA curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.
-
Metadata Standards CatalogThe Resource Description and Access (RDA) Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.
-
Minimum Information for Biological and Biomedical Investigations (MIBBI)A portal to a group of nearly 40 checklists of minimum information for various biological disciplines. The MIBBI Foundry is developing a cross-analysis of these guidelines to create an intercompatible, extensible community of standards.
-
NIH Common Data Elements (CDE) RepositoryNational Institutes of Health (NIH) portal of data elements that are common to multiple data sets across different studies in support of improving data quality and promoting data sharing.
-
Understanding Metadata (PDF)by the National Information Standards Organization
-
Metadata Basicsby the Dublin Core Metadata Initiative
-
Documenting Your Databy the UK Data Archive