Archiving Websites and Data

Tools that can help preserve access to websites and datasets that are needed for research and publication. This guide was created from the work of Shauna-Kay Harrison and Abby Sypniewski

Save Page Now

Description of Save Page Now features
Save Page Now Description
Creator

Internet Archive

Technology

Heritrix Crawler

Brozzler Crawler

Requirements Browser access
Ease of Use No training needed
Location of Archived Data Archived webpage can be viewed in Internet Archives' Wayback Machine
Access No account is needed to use this feature, but if you would like to see a collection of archived pages, a free account must be created at Archive.org and you must be logged in when submitting links to be archived
Advantages

Provides access to Internet Archive's technology 

Users can see webpages theyve contributed to be archived

Disadvantages/Other Considerations

Ease of Access

Lack of customizability/quality assurance

Lack of ownership of the archived content

Archived content is automatically public

 

Archive-It

Description of Archive-It features
Archive-It Features
Creator Internet Archive
Technology

Heretrix Crawler

Brozzler Crawler

Requirements Browser access
Ease of Use Intuitive GUI, some training is necessary
Location of Archived Data WARC files stored on Internet Archive servers, option to download to local storage
Access Wayback Machine portal for account
Advantages

All-in-one web archiving service that is widely adopted across institutions

Good tech support

Batch loading

Scheduling options

Can ignore robots.txt

Disadvantages/Other Considerations

Paid service limits total data per year

Some technical crawler and playback limitations

Can't evaluate crawls until they finish

Manual QA of test crawls can be time-intensive

 

Browsertrix

Description of Browsertrix features
Browsertrix Features
Creator Web Recorder
Technology Varies
Requirements Browser access
Ease of Use Intuitive, some training needed
Location of Archived Data Capture, organize, and view captured webpages in Replay/Web.page and download locally as WACZ
Advantages

Relatively easy to use

Uses real browsers to get behind paywals and logins

Watch crawls as they happen

Replay technology is embedded in same place as crawling technology

Disadvantages/Other Considerations

Monthly cost

Instagram and Facebook requires extra setup

Will capture comments and commenters' usernames and profile photos under social media posts

 

Conifer

Description of Conifer features
Conifer Description
Creator Web Recorder/Rhizome
Technology Varies
Requirements Browser access
Ease of Use Intuitive, some training needed
Location of Archived Data Capture, organize and view captured webpages in web browser (i.s. cloud storage)
Access Login required. Access and 5GB of storage are free. Pay for additional storage and features
Disadvantages/Other Considerations Limited amount of space

 

WARCreate

Description of WARCreate features
WARCreate Description
Creator Mat Kelley
Technology N/A
Requirements Google Chrome or Microsoft Edge
Ease of Use Intuitive, no training needed
Location of Archived Data WARC file that can be downloaded locally
Access No login required
Advantages Easy and quick way of saving WARC files for later use
Disadvantages/Other Considerations Does not have built-in replay capabilities

 

Description of Zotero features
Zotero Features
Technology N/A
Requirements Desktop app and browser extension
Ease of Use Semi-intuitive, some training is needed
Location of Archived Data Stored locally on computer (Zotero Settings -> Advanced -> Files and Folders to find where Zotero data resides on your computer)
Access Log in and desktop app required. Software is free and a set amount of storage is given for free. If users need to go over this amount, more storage is available at a cost.
Advantages
  • Easy way to save webpages and resources (attached resources can be saved alongside the webpage, but uses more storage)
  • Users can add notes, tags, folders, and work collaboratively
  • Aids with bibliography generation and can work with Google Docs and Microsoft Word to add citations
Disadvantages/Other Considerations

Some training needed

 

Perma.cc

Description of Perma.cc features
Perma.cc Features
Creator Harvard Law School Library
Technology N/A
Requirements Browser access
Ease of Use Intuitive, some training needed
Location of Archived Data A permanent link that can be shared anywhere
Access Account needed. Free for academic use. Individual accounts will need a paid subscription.
Advantages Easy way of creating permanent links that can be used safely in citations and when sharing information
Disadvantages/Other Considerations Paid subscription for individual use/no academic institution affiliation

 

Print-to-PDF

Description of Print-to-PDF features
Print-to-PDF Features
Creator Adobe
Technology N/A
Requirements Account and subscription
Ease of Use Intuitive, no training needed
Location of Archived Data Converted PDF file that can be downloaded locally
Access Windows software, potential Adobe account and subscription
Advantages Quick and easy way to save sites as files for personal use
Disadvantages/Other Considerations Windows only. MacOS has this functionality built-in for Word and Pages. Google Chrome allows print-do-pdf. For webpages an Adobe subscription is also needed

 

Auto Archiver

Description of Auto Archiver features
Auto Archiver Features
Creator Bellingcat
Technology Varies
Requirements Command line, Python, or Docker, Google Sheets
Ease of Use Intermediate, training needed with command line or Python but can be made easier using Docke
Location of Archived Data Google Sheets (or other document output) with the information below
Access Open source, free to use
Advantages No webpage or desktop app needed
Disadvantages/Other Considerations Service is more manual and understanding of the command line may be needed

 

Web Curator Tool

Description of Web Curator Tool features
Web Curator Tool Features

Creator

National Library of New Zealand, British Library, International Internet Preservation Consortium

Technology

Webrecorder’s pywb

Heritrix crawler

Requirements

Installed app

Ease of Use

Intermediate GUI, some training needed

Location of Archived Data

WARC file that can be downloaded locally

Access

Free of use. Account needed.

Advantages

A full tool for web archiving planning, execution

User is able to Demo the tool before committing

Developed specifically for non-technical users 

Disadvantages/Other Considerations

Does not have built-in replay capabilities

 

HTTracker

Description of HTTracker features
HTTracker Features

Creator

Xavier Roche

Technology

Written in C (programming language)

Requirements

Installed app

Ease of Use

Intermediate, command line needed to execute service

Location of Archived Data

Copied HTML, CSS, JavaScript files downloaded locally

Access

Free of charge

Advantages

Command line is needed but minimally

Allows the user to capture and navigate the website in its original context without in-browser emulator

Disadvantages/Other Considerations

Output format not as suitable for long-term preservation

 

Web Archiving Integration Layer (WAIL)

Description of Web Archiving Integration Layer (WAIL) features
Web Archiving Integration Layer (WAIL) Features

Creator

Mat Kelley

Technology

  • Heritrix 3.2.0
  • OpenWayback 2.4.0

Requirements

Installed app

Ease of Use

Intuitive GUI, Security settings may need to be adjusted for MacOS users

Location of Archived Data

Web view of archived page stored locally

Access

No account needed, free of charge

Advantages

  • Provides Archive-it technology free of charge

Disadvantages/Other Considerations

  • Same limitations as Archive-It technology
  • Independent, open-resource project with less security measures

 

Last Updated: Oct 8, 2025 7:38 AM