Skip to Main Content

Archiving Websites and Data

Tools that can help preserve access to websites and datasets that are needed for research and publication. This guide was created from the work of Shauna-Kay Harrison and Abby Sypniewski

Know Your Content

First, it is important to know what it is that you’re archiving. It may be a single news article, a handful of YouTube videos, or a personal blog hosted on WordPress. Here are some examples (and non-examples) of the types of content covered in this guide. If you’re still unsure, it’s a safe bet to assume it’s a dynamic website (or at least the tools that are geared towards capturing dynamic sites will be most versatile).

Use cases for archiving different kinds of web content
Use Case Is Is NOT
Publication or Article

An article or publication on the web that you want to reliably cite in the future:

  • Article from AP News
  • Scholarly publication
Link to an article or scholarly publication

An entire scholarly database or news outlet site. Non-examples include:

  • All of JSTOR
  • All of New York Times
Static Website

A “simple” site with fixed content. Look for simple URLs with .html or .htm extensions and no user-specific interactions or real-time updates.

Examples include:

  • Personal blogs
  • Portfolios

A site with user-specific interactions or real-time updates (see Dynamic Website). Non-examples include:

  • Amazon
  • Wikipedia
  • An art gallery website that loads new content as you scroll down the page

Dynamic Website

(This distinction between static vs. dynamic sites is important because some tools use Brozzler, a technology that is meant to capture more complex sites/elements as described above)

A site with user-specific interactions or real-time updates. Examples include:

  • An art gallery website that loads new content as you scroll down the page
  • Sites with embedded media content
  • Sites with “clickable” elements like drop-down menus or image carousels

A super large dynamic site that is likely out of scope for many web crawling technologies. Non-examples include:

  • Amazon
  • Wikipedia
  • Facebook
Social Media Post/Video

A single post or fixed group of posts on a social media site. Examples include:

  • A video on TikTok
  • A playlist of YouTube videos

The entire social media site. Non-examples include:

  • All of YouTube
  • All of TikTok

 

Legal and Ethical Considerations

If you're planning to archive and share content, it's important to understand the legal and ethical considerations involved. The Library Copyright Office offers valuable guidance on copyright and fair use. For an overview, consult the Copyright Basics research guide.

Archiving social media can become complex quickly. If you're preserving a specific post, consider whether the original creator is able to give you, the archivist, informed consent. More broadly, ask yourself: Is the content you're archiving created by or for historically marginalized communities? Could redistributing this material potentially cause harm to individuals or communities?

Whatever you're archiving, we strongly encourage you to pause and read  Documenting the Now’s Ethics White Paper before moving forward.

Last Updated: Oct 8, 2025 7:38 AM