Managing and organizing your data well can facilitate good science and help avoid data loss. On this page you'll find recommended practices for file naming conventions, organizing your directories, versioning, file formats, and storage.
Time spent at the beginning of the project to define folder hierarchy and file naming conventions will make it much easier to keep things organized and findable, both throughout the project and after project completion. Adhering to well-thought out naming conventions:
File naming best practices
Include a few pieces of descriptive information in the filename, in a standard order, to make it clear what the file contains. For example, filenames could include
Consider sort order:
YYYYMMDD
or YYMMDD
.Do not use special (i.e. non-alphanumeric) characters such as "/ \ : * ? ‘ < > [ ] [ ] { } ( ) & $ ~ ! @ # % ^ , '
in names. These could be interpreted by programs or operating systems in unexpected ways.
Do not use spaces in file or folder names, as some operating systems will not recognize them and you will need to enclose them in quotation marks to reference them in scripts and programs. Alternatives to spaces in filenames:
file_name.xxx
file-name.xxx
filename.xxx
FileName.xxx
Keep names short, no more than 25 characters.
File versioning ensures that you always understand what version of a file you are working with, and what are the working and final versions of files. Recommended file versioning practices:
Directories can be organized in many different ways. Consider what makes sense for your project and research team, and how people new to the project might look for files.
Once you determine how you want your directories to be organized, it is a good idea to stub out an empty directory structure to hold future data, and to document the contents of each directory in a readme file.
Directory Best Practices
The file formats you use have a direct impact on your ability to open those files at a later date, and on the ability of other people to access those data.
Whenever possible, select file formats that are
*If you must save files in a proprietary format, include a readme file in the same directory as the data file that documents the software and version used to generate the file, so that the files can be accessed in the future.
Best practices for handling original data:
Audio: WAVE, AIFF, MP3, MXF
Containers: TAR, GZIP, ZIP
Databases: XML, CSV
Statistics: ASCII, DTA, POR, SAS, SAV
Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
Tabular data: CSV
Text (documentation, scripts): XML, PDF/A, HTML, Plain Text (ASCII, UTF-8)
Video: MOV, MPEG, AVI, MXF
Web archive: WAR
Storage refers to preserving your data files in a secure location you can access readily. This is not the same thing as backup (described below).
Storing your data properly ensures that it will be there when you need to use it for publications, theses, or grant proposals. A granting agency may require that you retain data for a given period and may ask you to explain in a data plan how you will store and backup your data.
Storage Best Practices:
Keeping reliable backups is an integral part of data management.
Backup refers to preserving additional copies of your data in a separate physical location from data files in storage. Backup preserves older copies so you can restore your data if accidental deletion/alteration or a disaster such as fire, flood, or hardware malfunction damages your data in storage.
Back up Best Practices:
Backup Options: