Skip to main content
Library Research Guides

Research Data Management (Health Sciences)

File Format Best Practices

The file formats you use have a direct impact on your ability to open those files at a later date, and on the ability of other people to access those data.

Whenever possible, select file formats that are

  • non-proprietary*
  • unencrypted
  • uncompressed
  • in common usage by the research community
  • adherent to an open, documented standard
  • interoperable among diverse platforms and applications
  • royalty-free and without intellectual property restrictions
  • developed and maintained by an established open standards organization

*If you must save files in a proprietary format, include a readme file in the same directory as the data file that documents the software and version used to generate the file, so that the files can be accessed in the future.

Best practices for handling original data:

  • Retain your original, unedited raw data in its native format; do not alter or edit it.
  • Document the tools, instruments, or software used to create the data.
  • Make a copy the original data file prior to any analysis or data manipulations.

Sources:
Best practices for file formats, Stanford Libraries
Data Types & File Formats, University of Virginia Library

Recommended File Formats

Audio: WAVE, AIFF, MP3, MXF

Containers: TAR, GZIP, ZIP

Databases: XML, CSV

Statistics: ASCII, DTA, POR, SAS, SAV

Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

Tabular data: CSV

Text (documentation, scripts): XML, PDF/A, HTML, Plain Text (ASCII, UTF-8)

Video: MOV, MPEG, AVI, MXF

Web archive: WAR

Sources:
File Formats Table, UK Data Archive
Recommended Formats Statement, Library of Congress