Differences between revisions 2 and 3
Revision 2 as of 2009-03-26 11:25:17
Size: 411
Comment:
Revision 3 as of 2009-03-26 12:45:54
Size: 774
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
 Experience from Netarchive.dk concerning broad crawls::
 * Preparation procedure
 Step by step experience from Netarchive.dk concerning broad crawls::
 * Preparation
* How to manage deduplication
 * Actual impact on computing and sto
rage
Line 15: Line 17:
 * Metrics from the past domain crawls : how much, how many, how fast, etc.
Line 16: Line 19:
   Support of WARC::
 * Status of support of WARC in !NetarchiveSuite
 * Experience with ARC -> WARC tools
 * Status of transferring old webarchives into Netarkivet.dk
Line 18: Line 24:
 Collection::
 * What's a collection?

Preliminary Agenda items for the non-technical workshop

Introduction
  • Presentation of participants
  • Expectations
  • Any special focus area
  • Review/update of agenda
  • How do we handle outstanding issues.
  • Step by step experience from Netarchive.dk concerning broad crawls
  • Preparation
  • How to manage deduplication
  • Actual impact on computing and storage
  • Experience during the crawl
  • QA
  • Metrics from the past domain crawls : how much, how many, how fast, etc.
  • Support of WARC
  • Status of support of WARC in NetarchiveSuite

  • Experience with ARC -> WARC tools

  • Status of transferring old webarchives into Netarkivet.dk
  • Collection
  • What's a collection?
  • PreliminaryAgendaItemsNonTech (last edited 2010-08-16 10:24:58 by localhost)