PANDORA Australia's Web Archive

PANDORA Australia's Web Archive

Web archiving at the NLA Archiving the music web Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving National Library of Australia 1. Background the what, why and how 2. What makes a valuable resource for archiving? 3. What can you do to help? What is web archiving about and why do it? Archiving = long-term preservation and access Building collections

Building documentary historical record Creating artefacts from the web experience Discovering what is produced online An act of consciousness Whats involved in web archiving? At the NLA its: Identifying, selecting, scoping Seeking permission to collect and make accessible Creating and recording metadata administrative, descriptive, preservation

Crawling/harvesting (including scheduling) Processing for quality assurance (best effort) Storing and maintaining the data Planning and implementing preservation strategies Preparing and rendering for public display Providing access and discovery mechanisms What is the NLA doing? PANDORA Archive 1996 PANDORA participants NLA, state libraries (not Tas), NFSA, AWM, AIATSIS (and soon the NGA) Highly selective, small scale, quality collection, open access PANDAS workflow management system, 2001

Australian (.au) domain harvests Annual since 2005 Internet Archive No access (yet) Comparative statistics of NLA web collections PANDORA (selective) .au Domain Harvests Files: 73 million Files:

2.3 billion Size: 3.26 TB Size: 78.75 TB Domain Harvest 2005 2006

2007 2008 Unique files 185 million 596 million 516 million 1 billion Hosts crawled

811,523 1,046,038 1,247,614 3,038,658 Size 6.69 TB 19.04 18.47TB

34.55 TB Music in the PANDORA Archive 500+ titles available from the PANDORA public listing of music NFSA 33% NLA 30% Others 37% Musicians, bands, orchestras, composers, organisations, festivals, blogs, instrument makers, magazines Plus 280 considered but not available 35% (no permission, rejected, yet to be selected) What makes a valuable resource for archiving? Content

substantial, original Provenance Long-term research value Cultural or social significance and interest including events Curatorial/expert suggestion (e.g. Music Australia) Different collecting approaches based on value Priorities, but never say never How can you help? 10 tips: 1. Think about the issue of long term access what is your intention? 2. Communicate interest and intentions with collecting institutions; let us know about your site

respond to requests for permission 3. Organise and structure sites simply its all about links 4. Comply with standards limit use of proprietary technology if possible 5. Make it robot friendly indexing, discovery, capture How can you help? 10 tips: 6. Keep contributors informed and involved make sure contributors understand and agree to long-term preservation and access from the beginning 7. Clear copyright, rights and contact information

it helps to know what and who (oh, and trust us too) 8. Maintain content online as much as possible increases chance of it being collected 9. Learn to love and live with your past archives are not the same as the live web archived versions cannot be altered 10. Do your own back up, of course PANDORA Australias Web Archive

Recently Viewed Presentations