Archiving Websites and Data
Save Page Now
-
Save Page NowAllows users to submit links to web pages that they would like the Internet Archive to save to the Wayback Machine.
| Save Page Now | Description |
|---|---|
| Creator |
Internet Archive |
| Technology |
Heritrix Crawler Brozzler Crawler |
| Requirements | Browser access |
| Ease of Use | No training needed |
| Location of Archived Data | Archived webpage can be viewed in Internet Archives' Wayback Machine |
| Access | No account is needed to use this feature, but if you would like to see a collection of archived pages, a free account must be created at Archive.org and you must be logged in when submitting links to be archived |
| Advantages |
Provides access to Internet Archive's technology Users can see webpages theyve contributed to be archived |
| Disadvantages/Other Considerations |
Ease of Access Lack of customizability/quality assurance Lack of ownership of the archived content Archived content is automatically public |
Archive-It
-
Archive-ItBrowser-based service and tools to crawl, review, save, describe, store, and display archived web pages.
| Archive-It | Features |
|---|---|
| Creator | Internet Archive |
| Technology |
Heretrix Crawler Brozzler Crawler |
| Requirements | Browser access |
| Ease of Use | Intuitive GUI, some training is necessary |
| Location of Archived Data | WARC files stored on Internet Archive servers, option to download to local storage |
| Access | Wayback Machine portal for account |
| Advantages |
All-in-one web archiving service that is widely adopted across institutions Good tech support Batch loading Scheduling options Can ignore robots.txt |
| Disadvantages/Other Considerations |
Paid service limits total data per year Some technical crawler and playback limitations Can't evaluate crawls until they finish Manual QA of test crawls can be time-intensive |
Browsertrix
-
BrowsertrixBrowser-based tools to crawl, review, save, describe, store, and display archived web pages.
| Browsertrix | Features |
|---|---|
| Creator | Web Recorder |
| Technology | Varies |
| Requirements | Browser access |
| Ease of Use | Intuitive, some training needed |
| Location of Archived Data | Capture, organize, and view captured webpages in Replay/Web.page and download locally as WACZ |
| Advantages |
Relatively easy to use Uses real browsers to get behind paywals and logins Watch crawls as they happen Replay technology is embedded in same place as crawling technology |
| Disadvantages/Other Considerations |
Monthly cost Instagram and Facebook requires extra setup Will capture comments and commenters' usernames and profile photos under social media posts |
Conifer
-
ConiferAllows user to create a collection of captured webpages and view them within the browser.
| Conifer | Description |
|---|---|
| Creator | Web Recorder/Rhizome |
| Technology | Varies |
| Requirements | Browser access |
| Ease of Use | Intuitive, some training needed |
| Location of Archived Data | Capture, organize and view captured webpages in web browser (i.s. cloud storage) |
| Access | Login required. Access and 5GB of storage are free. Pay for additional storage and features |
| Disadvantages/Other Considerations | Limited amount of space |
WARCreate
-
WARCreateGenerate WARC files that can later be replayed in an emulator such as Web Archiving Integration Layer (WAIL) or be uploaded to Archive-it.
| WARCreate | Description |
|---|---|
| Creator | Mat Kelley |
| Technology | N/A |
| Requirements | Google Chrome or Microsoft Edge |
| Ease of Use | Intuitive, no training needed |
| Location of Archived Data | WARC file that can be downloaded locally |
| Access | No login required |
| Advantages | Easy and quick way of saving WARC files for later use |
| Disadvantages/Other Considerations | Does not have built-in replay capabilities |
-
zoteroAllows user to organize and save sources in one place and generate bibliography and citations.
| Zotero | Features |
|---|---|
| Technology | N/A |
| Requirements | Desktop app and browser extension |
| Ease of Use | Semi-intuitive, some training is needed |
| Location of Archived Data | Stored locally on computer (Zotero Settings -> Advanced -> Files and Folders to find where Zotero data resides on your computer) |
| Access | Log in and desktop app required. Software is free and a set amount of storage is given for free. If users need to go over this amount, more storage is available at a cost. |
| Advantages |
|
| Disadvantages/Other Considerations |
Some training needed |
Perma.cc
-
Perma.ccCreates a shareable permanent link that prevents link rot.
| Perma.cc | Features |
|---|---|
| Creator | Harvard Law School Library |
| Technology | N/A |
| Requirements | Browser access |
| Ease of Use | Intuitive, some training needed |
| Location of Archived Data | A permanent link that can be shared anywhere |
| Access | Account needed. Free for academic use. Individual accounts will need a paid subscription. |
| Advantages | Easy way of creating permanent links that can be used safely in citations and when sharing information |
| Disadvantages/Other Considerations | Paid subscription for individual use/no academic institution affiliation |
Print-to-PDF
-
Print-to-PDFAllow non-pdf files to be saved as a PDF.
| Print-to-PDF | Features |
|---|---|
| Creator | Adobe |
| Technology | N/A |
| Requirements | Account and subscription |
| Ease of Use | Intuitive, no training needed |
| Location of Archived Data | Converted PDF file that can be downloaded locally |
| Access | Windows software, potential Adobe account and subscription |
| Advantages | Quick and easy way to save sites as files for personal use |
| Disadvantages/Other Considerations | Windows only. MacOS has this functionality built-in for Word and Pages. Google Chrome allows print-do-pdf. For webpages an Adobe subscription is also needed |
Auto Archiver
-
Auto ArchiverPython tool to automatically archive social media posts, videos, and images from a Google Sheet, the console, and more. Uses different archivers depending on the platform, and can save content to local storage, S3 bucket (Digital Ocean Spaces, AWS, ...), and Google Drive. If using Google Sheets as the source for links, it will be updated with information about the archived content. It can be run manually or on an automated basis.
| Auto Archiver | Features |
|---|---|
| Creator | Bellingcat |
| Technology | Varies |
| Requirements | Command line, Python, or Docker, Google Sheets |
| Ease of Use | Intermediate, training needed with command line or Python but can be made easier using Docke |
| Location of Archived Data | Google Sheets (or other document output) with the information below |
| Access | Open source, free to use |
| Advantages | No webpage or desktop app needed |
| Disadvantages/Other Considerations | Service is more manual and understanding of the command line may be needed |
Web Curator Tool
-
Web Curator ToolWorkflow management tool for selecting, crawling websites, performing quality assurance and preparing websites for ingest into a preservation system.
| Web Curator Tool | Features |
|---|---|
|
Creator |
National Library of New Zealand, British Library, International Internet Preservation Consortium |
|
Technology |
Webrecorder’s pywb Heritrix crawler |
|
Requirements |
Installed app |
|
Ease of Use |
Intermediate GUI, some training needed |
|
Location of Archived Data |
WARC file that can be downloaded locally |
|
Access |
Free of use. Account needed. |
|
Advantages |
A full tool for web archiving planning, execution User is able to Demo the tool before committing Developed specifically for non-technical users |
|
Disadvantages/Other Considerations |
Does not have built-in replay capabilities |
HTTracker
-
HTTrackerCapture, or rather copy, websites to a local repository. These websites can then be viewed locally without connection.
| HTTracker | Features |
|---|---|
|
Creator |
Xavier Roche |
|
Technology |
Written in C (programming language) |
|
Requirements |
Installed app |
|
Ease of Use |
Intermediate, command line needed to execute service |
|
Location of Archived Data |
Copied HTML, CSS, JavaScript files downloaded locally |
|
Access |
Free of charge |
|
Advantages |
Command line is needed but minimally Allows the user to capture and navigate the website in its original context without in-browser emulator |
|
Disadvantages/Other Considerations |
Output format not as suitable for long-term preservation |
Web Archiving Integration Layer (WAIL)
-
Web Archiving Integration Layer (WAIL)Crawls and saves websites from a GUI application on your desktop and allows you to view crawled sites in a web-browser.
| Web Archiving Integration Layer (WAIL) | Features |
|---|---|
|
Creator |
Mat Kelley |
|
Technology |
|
|
Requirements |
Installed app |
|
Ease of Use |
Intuitive GUI, Security settings may need to be adjusted for MacOS users |
|
Location of Archived Data |
Web view of archived page stored locally |
|
Access |
No account needed, free of charge |
|
Advantages |
|
|
Disadvantages/Other Considerations |
|