Preliminary Agenda items for the non-technical workshop
- Introduction
Presentation of participants Expectations Any special focus area Review/update of agenda How do we handle outstanding issues. Step by step experience from Netarchive.dk concerning broad crawlsPreparation How to manage deduplication Actual impact on computing and storage Experience during the crawl QA Metrics from the past domain crawls : how much, how many, how fast, etc. Support of WARCStatus of support of WARC in NetarchiveSuite
Experience with ARC -> WARC tools
Status of transferring old webarchives into Netarkivet.dk CollectionWhat's a collection?