Differences between revisions 1 and 32 (spanning 31 versions)
Revision 1 as of 2009-03-26 11:24:23
Size: 406
Comment:
Revision 32 as of 2010-08-16 10:24:58
Size: 1736
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl DevelopmentGroup:read,write,delete,revert
== Preliminary Agenda items for the non-technical workshop ==
##acl DevelopmentGroup:read,write,delete,revert
== Preliminary Agenda items (proposals) for the non-technical workshop ==
Line 4: Line 4:
 Introduction::
 * Presentation of participants
 * Expectations
* Any special focus area
 * Review/update of agenda
* How do we handle outstanding issues.
 Introduction (CHH)::
 * Overall presentation of Netarkivet.dk and !NetarchiveSuite (BNH)

 * Presentation of participants (CHH)
 * Expectations (CLO)
 * Review/update of agenda (CLO)
Line 11: Line 10:
 Experience from Netarchive.dk concerning broad crawls
 * Preparation procedure
 Step by step experience from Netarchive.dk concerning broad crawls (KAH)::
 * Preparation
  * Scheduling
  * Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
  * Dealing with junk data
  * Sorting and spliting seed lists into different jobs running test crawl
 * Selection of sites
 * How to manage deduplication
 * Actual impact on computing and storage
Line 14: Line 20:
  * Using frontier reports
  * Modifying settings, creating overrides
Line 15: Line 23:
  * Visual QA
  * Running a patch crawl
 * Metrics from the past domain crawls : how much, how many, how fast, etc.
 * Sum up outstanding issues (CLO)
Line 16: Line 28:
    Experiences from Netarchive.dk concerning selective crawls (SAS) OPTIONAL ::
 * further experiences

 User management NetarchiveSuite (CLO)::
 * Different set of roles using the !NetarchiveSuite
 * A simple user interface for people who are not very familiar with webarchiving.
 * Sum up outstanding issues (CLO)

 Statistics module (CLO)::
 * Base for all kinds of calculations and general information about the webarchive
 * Comparing results of crawls for quality control.

 Access (CLO)::
 * Comparison of legal basis regarding access
 * Access with Wayback
 * Sum up outstanding issues (CLO)
 
 Collection (CLO)::
 * What's a collection?
 * Sum up outstanding issues (CLO)

Preliminary Agenda items (proposals) for the non-technical workshop

Introduction (CHH)
  • Overall presentation of Netarkivet.dk and NetarchiveSuite (BNH)

  • Presentation of participants (CHH)
  • Expectations (CLO)
  • Review/update of agenda (CLO)
  • Step by step experience from Netarchive.dk concerning broad crawls (KAH)
  • Preparation
    • Scheduling
    • Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
    • Dealing with junk data
    • Sorting and spliting seed lists into different jobs running test crawl
  • Selection of sites
  • How to manage deduplication
  • Actual impact on computing and storage
  • Experience during the crawl
    • Using frontier reports
    • Modifying settings, creating overrides
  • QA
    • Visual QA
    • Running a patch crawl
  • Metrics from the past domain crawls : how much, how many, how fast, etc.
  • Sum up outstanding issues (CLO)
  • Experiences from Netarchive.dk concerning selective crawls (SAS) OPTIONAL
  • further experiences
  • User management NetarchiveSuite (CLO)
  • Different set of roles using the NetarchiveSuite

  • A simple user interface for people who are not very familiar with webarchiving.
  • Sum up outstanding issues (CLO)
  • Statistics module (CLO)
  • Base for all kinds of calculations and general information about the webarchive
  • Comparing results of crawls for quality control.
  • Access (CLO)
  • Comparison of legal basis regarding access
  • Access with Wayback
  • Sum up outstanding issues (CLO)
  • Collection (CLO)
  • What's a collection?
  • Sum up outstanding issues (CLO)
  • PreliminaryAgendaItemsNonTech (last edited 2010-08-16 10:24:58 by localhost)