Differences between revisions 18 and 19
Revision 18 as of 2009-04-23 08:24:26
Size: 1363
Comment:
Revision 19 as of 2009-04-23 08:27:18
Size: 1474
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
 Introduction::  Introduction (CHH)::
 * Overall presentation of Netarkivet.dk and !NetarchiveSuite
Line 6: Line 7:
 * Expectations
 * Review/update of agenda
 * Expectations (CLO)
 * Review/update of agenda (CLO)
Line 9: Line 10:
 Step by step experience from Netarchive.dk concerning broad crawls::  Step by step experience from Netarchive.dk concerning broad crawls (CLO)::
Line 26: Line 27:
 User management NetarchiveSuite::  User management NetarchiveSuite (CLO)::
Line 30: Line 31:
 Statistics module::  Statistics module (CLO)::
Line 34: Line 35:
 Access::  Access (CLO)::
Line 38: Line 39:
 Collection::  Collection (CLO)::

Preliminary Agenda items (proposals) for the non-technical workshop

Introduction (CHH)
  • Overall presentation of Netarkivet.dk and NetarchiveSuite

  • Presentation of participants
  • Expectations (CLO)
  • Review/update of agenda (CLO)
  • Step by step experience from Netarchive.dk concerning broad crawls (CLO)
  • Preparation
    • Scheduling
    • Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
    • Dealing with junk data
    • Sorting and spliting seed lists into different jobs running test crawl
  • Selection of sites
  • How to manage deduplication
  • Actual impact on computing and storage
  • Experience during the crawl
    • Using frontier reports
    • Modifying settings, creating overrides
  • QA
    • Visual QA
    • Running a patch crawl
  • Metrics from the past domain crawls : how much, how many, how fast, etc.
  • User management NetarchiveSuite (CLO)
  • Different set of roles using the NetarchiveSuite

  • A simple user interface for people who are not very familiar with webarchiving.
  • Statistics module (CLO)
  • Base for all kinds of calculations and general information about the webarchive
  • Comparing results of crawls for quality control.
  • Access (CLO)
  • Comparison of legal basis regarding access
  • Access with Wayback
  • Collection (CLO)
  • What's a collection?
  • PreliminaryAgendaItemsNonTech (last edited 2010-08-16 10:24:58 by localhost)