Diving into Digital Ephemera: Identifying Defunct URLs in the Web Archives

Diving into Digital Ephemera: Identifying Defunct URLs in the Web Archives
Olivia Meehan, who worked on the web archiving team at the US Library of Congress, evaluates how well online archives of the Papal Transition 2005 Collection from 2005 have survived: Based on the results I have so far and conversations I’ve had with other web archivists, the lifecycle of websites is unpredictable to the extent that accurately tracking the status of a site inherently requires nuance, time, and attention — which is difficult to maintain at scale. This data is valuable, however, and is worth pursuing when possible├é. Using a sample selection of URLs from larger collections could make this more manageable than comprehensive reviews.

Of the content originally captured in the Papal Transition 2005 Collection, 41% is now offline. Without the archived pages, the information, perspectives, and experiences expressed on those websites would potentially be lost forever. They include blogs, personal websites, individually-maintained web portals, and annotated bibliographies. They frequently represent small voices and unique perspectives that may be overlooked or under-represented by large online publications with the resources to maintain legacy pages and articles.

The internet is impermanent in a way that is difficult to quantify. The constant creation of new information obscures what is routinely deleted, overwritten, and lost. While the scope of this project is small within the context of the wider internet, and even within the context of the Library’s Web Archive collections as a whole, I hope that it effectively demonstrates the value of web archives in preserving snapshots of the online world as it moves and changes at a record pace.

Read more of this story at Slashdot.