Skip to Main Content

Research Data Management (Health Sciences)

Tips for managing data and creating a data management/sharing plan.

Data Documentation

Investing time in creating data documentation will help to ensure that your research data can be found, understood, and used by others.

Ideally, you should begin to document your research data at the outset of your project, and continue to create and update the documentation throughout the course of your project. This will decrease the risk that your documentation will be incomplete, or that you will forget important details about your data.

Research data documentation falls in to two categories - project documentation and dataset documentation.

Project documentation includes:

  • Where and how the data was collected
  • How the data files are structured and organized
  • How the data was validated
  • How the data was transformed
  • Who can access the data, and for what purpose
  • How the data can be used, and under what conditions

Dataset documentation includes:

  • Variable names and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format and software use

An essential piece of research data documentation, the readme file provides basic information about a data file or dataset to help ensure that the data can be correctly interpreted, both by you at a later date or by others when sharing or publishing data. For readme file best practices and recommended content, see Guide to writing "readme" style metadata from Cornell University Research Data Management Service Group.

For further guidance on collecting, tracking, and structuring your research data documentation:

Metadata Standards

Metadata means, simply, data about data. In the context of research data management, metadata refers to both data documentation, and to structured information that conforms to a metadata standard.

Metadata structures are often referred to as schemas. A schema is a logical plan which shows the relationships between metadata elements. The completed metadata are often reported in a machine-readable language such as XML.

Most data repositories require that your project metadata follows a specific standard. A widely-used, general purpose metadata standard is the Dublin Core Metadata Element Set. This standard defines fifteen properties used to describe data:

  • Contributor - An entity responsible for making contributions to the resource.
  • Coverage - The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
  • Creator - An entity primarily responsible for making the resource.
  • Date - A point or period of time associated with an event in the life cycle of the resource.
  • Description - An account of the resource.
  • Format - The file format, physical medium, or dimensions of the resource.
  • Identifier - An unambiguous reference to the resource within a given context.
  • Language - A language of the resource.
  • Publisher - An entity responsible for making the resource available.
  • Relation - A related resource.
  • Rights - Information about rights held in and over the resource.
  • Source - A related resource from which the described resource is derived.
  • Subject - The topic of the resource.
  • Title - A name given to the resource
  • Type - The nature or genre of the resource