Preliminary Agenda items (proposals) for the non-technical workshop
- Introduction (CHH)
Overall presentation of Netarkivet.dk and NetarchiveSuite (BNH)
Presentation of participants Expectations (CLO) Review/update of agenda (CLO) Sum up outstandings issues (CHH) Step by step experience from Netarchive.dk concerning broad crawls (CLO)Preparation - Scheduling
- Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
- Dealing with junk data
- Sorting and spliting seed lists into different jobs running test crawl
Selection of sites How to manage deduplication Actual impact on computing and storage Experience during the crawl - Using frontier reports
- Modifying settings, creating overrides
QA - Visual QA
- Running a patch crawl
Metrics from the past domain crawls : how much, how many, how fast, etc. Sum up outstandings issues (CLO) User management NetarchiveSuite (CLO)Different set of roles using the NetarchiveSuite
A simple user interface for people who are not very familiar with webarchiving. Sum up outstandings issues (CLO) Statistics module (CLO)Base for all kinds of calculations and general information about the webarchive Comparing results of crawls for quality control. Access (CLO)Comparison of legal basis regarding access Access with Wayback Sum up outstandings issues (CLO) Collection (CLO)What's a collection? Sum up outstandings issues (CLO)