Skip to Main Content

Research Data Management (Health Sciences)

Tips for managing data and creating a data management/sharing plan.

File and Directory Best Practices

Time spent at the beginning of the project to define folder hierarchy and file naming conventions will make it much easier to keep things organized and findable, both throughout the project and after project completion. Adhering to well-thought out naming conventions:

  • helps prevent accidental overwrites or deletion
  • makes it easier to locate specific data files
  • makes collaborating on the same files less confusing

File naming best practices

Include a few pieces of descriptive information in the filename, in a standard order, to make it clear what the file contains. For example, filenames could include

  • experiment name or acronym
  • researcher initials
  • date data collected
  • type of data
  • conditions
  • file version
  • file extension for application-specific files

Consider sort order:

  • If it is useful for files to stay in chronological order, a good convention is to start file names with YYYYMMDD or YYMMDD.
  • If you are using a sequential numbering system, use leading zeros to maintain sort order, e.g. 007 will sort before 700.

Do not use special (i.e. non-alphanumeric) characters such as  "/ \ : * ? ‘ < > [ ] [ ] { }  ( ) & $ ~ ! @ #  % ^   , '  in names. These could be interpreted by programs or operating systems in unexpected ways.

Do not use spaces in file or folder names, as some operating systems will not recognize them and you will need to enclose them in quotation marks to reference them in scripts and programs. Alternatives to spaces in filenames:

  • Underscores, e.g. file_name.xxx
  • Dashes, e.g. file-name.xxx
  • No separation, e.g. filename.xxx
  • Camel case, where the first letter of each section of text is capitalized, e.g. FileName.xxx

Keep names short, no more than 25 characters.

File versioning ensures that you always understand what version of a file you are working with, and what are the working and final versions of files. Recommended file versioning practices:

  • Include a version number at the end of the file name such as v01. Change this version number each time the file is saved.
  • For the final version, substitute the word FINAL for the version number.
  • Take advantage of the versioning capabilities available in collaborative workspaces such as github OSF, Google Drive, and Box.
  • Track versions of computer code with versioning software such as Git, Subversion, or CVS.

Directories can be organized in many different ways. Consider what makes sense for your project and research team, and how people new to the project might look for files.

Once you determine how you want your directories to be organized, it is a good idea to stub out an empty directory structure to hold future data, and to document the contents of each directory in a readme file.

Directory Best Practices

  • Organize directories hierarchically, with broader topics at the top level of the hierarchy and more specific topics lower in the structure.
  • Group files of similar information together in a single directory.
  • Name directories after aspects of the project rather than after the names of individual researchers.
  • Once you have decided on a directory structure, follow it consistently and audit it periodically.
  • Separate ongoing and completed work.

Recommended File Formats

The file formats you use have a direct impact on your ability to open those files at a later date, and on the ability of other people to access those data.

Whenever possible, select file formats that are

  • non-proprietary*
  • unencrypted
  • uncompressed
  • in common usage by the research community
  • adherent to an open, documented standard
  • interoperable among diverse platforms and applications
  • royalty-free and without intellectual property restrictions
  • developed and maintained by an established open standards organization

*If you must save files in a proprietary format, include a readme file in the same directory as the data file that documents the software and version used to generate the file, so that the files can be accessed in the future.

Best practices for handling original data:

  • Retain your original, unedited raw data in its native format; do not alter or edit it.
  • Document the tools, instruments, or software used to create the data.
  • Make a copy the original data file prior to any analysis or data manipulations.

Audio: WAVE, AIFF, MP3, MXF

Containers: TAR, GZIP, ZIP

Databases: XML, CSV

Statistics: ASCII, DTA, POR, SAS, SAV

Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

Tabular data: CSV

Text (documentation, scripts): XML, PDF/A, HTML, Plain Text (ASCII, UTF-8)

Video: MOV, MPEG, AVI, MXF

Web archive: WAR

Storage and Backup

Storage refers to preserving your data files in a secure location you can access readily. This is not the same thing as backup (described below).

Storing your data properly ensures that it will be there when you need to use it for publications, theses, or grant proposals. A granting agency may require that you retain data for a given period and may ask you to explain in a data plan how you will store and backup your data.

Storage Best Practices:

  • Unencrypted storage is the easiest to use and work with as it is the easiest to access. However if you are working with sensitive data you may be required to encrypt your files.
  • Keep passwords and keys on paper (2 copies), and in a PGP (pretty good privacy) encrypted digital file
  • Don’t rely on 3rd party encryption alone
  • Uncompressed files are ideal for storage files you will be accessing frequently, however due to space concerns you may need to compress your files, or at least your backup files.

Keeping reliable backups is an integral part of data management. 

Backup refers to preserving additional copies of your data in a separate physical location from data files in storage. Backup preserves older copies so you can restore your data if accidental deletion/alteration or a disaster such as fire, flood, or hardware malfunction damages your data in storage.

Back up Best Practices:

  • Make 3 copies (e.g. original + external/local + external/remote).
  • Have them geographically distributed (local vs. remote depends on recovery time needed).
  • Test your backup system ensure files can be recovered without corruption or data loss; do this both when you first set up your backup system, and then periodically throughout the course of your project.

Backup Options:

  • Hard drive (examples: via Vista backup, Mac Timeline, UNIX rsync)
  • Departmental or university servers
  • CDs or DVDs are not reliable backup mediums due to frequent failure.

Off-Site Storage Options