Preliminary Agenda items (proposals) for the non-technical workshop
- Introduction (CHH)
Overall presentation of Netarkivet.dk and NetarchiveSuite (BNH)
Presentation of participants (CHH) Expectations (CLO) Review/update of agenda (CLO) Step by step experience from Netarchive.dk concerning broad crawls (KAH)Preparation - Scheduling
- Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
- Dealing with junk data
- Sorting and spliting seed lists into different jobs running test crawl
Selection of sites How to manage deduplication Actual impact on computing and storage Experience during the crawl - Using frontier reports
- Modifying settings, creating overrides
QA - Visual QA
- Running a patch crawl
Metrics from the past domain crawls : how much, how many, how fast, etc. Sum up outstanding issues (CLO) Experiences from Netarchive.dk concerning selective crawls (SAS) OPTIONAL further experiences User management NetarchiveSuite (CLO)Different set of roles using the NetarchiveSuite
A simple user interface for people who are not very familiar with webarchiving. Sum up outstanding issues (CLO) Statistics module (CLO)Base for all kinds of calculations and general information about the webarchive Comparing results of crawls for quality control. Access (CLO)Comparison of legal basis regarding access Access with Wayback Sum up outstanding issues (CLO) Collection (CLO)What's a collection? Sum up outstanding issues (CLO)