Preliminary Agenda items (proposals) for the non-technical workshop
- Introduction
Presentation of participants Expectations Review/update of agenda Step by step experience from Netarchive.dk concerning broad crawlsPreparation - Scheduling
- Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
- Dealing with junk data
- Sorting and spliting seed lists into different jobs running test crawl
Selection of sites How to manage deduplication Actual impact on computing and storage Experience during the crawl - Using frontier reports
- Modifying settings, creating overrides
QA - Visual QA
- Running a patch crawl
Metrics from the past domain crawls : how much, how many, how fast, etc. User management NetarchiveSuiteDifferent set of roles using the NetarchiveSuite
A simple user interface for people who are not very familiar with webarchiving. Statistics moduleBase for all kinds of calculations and general information about the webarchive Comparing results of crawls for quality control. AccessComparison of legal basis regarding access Access with Wayback CollectionWhat's a collection?