Research Guides: Research Data Management (Health Sciences): Describe Data

Data Documentation

Investing time in creating data documentation will help to ensure that your research data can be found, understood, and used by others.

Ideally, you should begin to document your research data at the outset of your project, and continue to create and update the documentation throughout the course of your project. This will decrease the risk that your documentation will be incomplete, or that you will forget important details about your data.

Research data documentation falls in to two categories - project documentation and dataset documentation.

Project documentation includes:

Where and how the data was collected
How the data files are structured and organized
How the data was validated
How the data was transformed
Who can access the data, and for what purpose
How the data can be used, and under what conditions

Dataset documentation includes:

Variable names and descriptions
Explanation of codes and classification schemes used
Algorithms used to transform data
File format and software use

An essential piece of research data documentation, the readme file provides basic information about a data file or dataset to help ensure that the data can be correctly interpreted, both by you at a later date or by others when sharing or publishing data. For readme file best practices and recommended content, see Guide to writing "readme" style metadata from Cornell University Research Data Management Service Group.

For further guidance on collecting, tracking, and structuring your research data documentation:

Metadata and Data Documentation
by - Digital Scholarship - Last Updated Nov 13, 2024 436 views this year

Basic Approach to Metadata
by Stanford University Library
Best Practices: Metadata
by DataOne
Research Data Management Training
Module on metadata, documentation, and citation, MANTRA

Metadata Standards

Metadata means, simply, data about data. In the context of research data management, metadata refers to both data documentation, and to structured information that conforms to a metadata standard.

Metadata structures are often referred to as schemas. A schema is a logical plan which shows the relationships between metadata elements. The completed metadata are often reported in a machine-readable language such as XML.

Most data repositories require that your project metadata follows a specific standard. A widely-used, general purpose metadata standard is the Dublin Core Metadata Element Set. This standard defines fifteen properties used to describe data:

Contributor - An entity responsible for making contributions to the resource.
Coverage - The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
Creator - An entity primarily responsible for making the resource.
Date - A point or period of time associated with an event in the life cycle of the resource.
Description - An account of the resource.
Format - The file format, physical medium, or dimensions of the resource.
Identifier - An unambiguous reference to the resource within a given context.
Language - A language of the resource.
Publisher - An entity responsible for making the resource available.
Relation - A related resource.
Rights - Information about rights held in and over the resource.
Source - A related resource from which the described resource is derived.
Subject - The topic of the resource.
Title - A name given to the resource
Type - The nature or genre of the resource

Digital Curation Center (DCC)
The Digital Curation Center maintains a catalog of domain-specific metadata standards, including for biology.
FAIRsharing
A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.
Metadata Standards Catalog
The RDA Metadata Standards Catalog is a collaborative, open directory of metadata standards applicable to research data.
Minimum Information for Biological and Biomedical Investigations (MIBBI)
A portal to a group of nearly 40 checklists of minimum information for various biological disciplines. The MIBBI Foundry is developing a cross-analysis of these guidelines to create an intercompatible, extensible community of standards.
NIH Common Data Elements (CDE) Repository
NIH portal of data elements that are common to multiple data sets across different studies in support of improving data quality and promoting data sharing.

Metadata and Data Documentation
by - Digital Scholarship - Last Updated Nov 13, 2024 436 views this year

Understanding Metadata (PDF)
by the National Information Standards Organization
Metadata Basics
by the Dublin Core Metadata Initiative
Documenting Your Data
by the UK Data Archive

Last Updated: Jan 20, 2025 12:47 PM

Subjects: Health Sciences

Tags: data documentation, data management, data organization, data sharing, taubman health sciences library