406
Comment:
|
← Revision 32 as of 2010-08-16 10:24:58 ⇥
1736
converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl DevelopmentGroup:read,write,delete,revert == Preliminary Agenda items for the non-technical workshop == |
##acl DevelopmentGroup:read,write,delete,revert == Preliminary Agenda items (proposals) for the non-technical workshop == |
Line 4: | Line 4: |
Introduction:: * Presentation of participants * Expectations * Any special focus area * Review/update of agenda * How do we handle outstanding issues. |
Introduction (CHH):: * Overall presentation of Netarkivet.dk and !NetarchiveSuite (BNH) * Presentation of participants (CHH) * Expectations (CLO) * Review/update of agenda (CLO) |
Line 11: | Line 10: |
Experience from Netarchive.dk concerning broad crawls * Preparation procedure |
Step by step experience from Netarchive.dk concerning broad crawls (KAH):: * Preparation * Scheduling * Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...) * Dealing with junk data * Sorting and spliting seed lists into different jobs running test crawl * Selection of sites * How to manage deduplication * Actual impact on computing and storage |
Line 14: | Line 20: |
* Using frontier reports * Modifying settings, creating overrides |
|
Line 15: | Line 23: |
* Visual QA * Running a patch crawl * Metrics from the past domain crawls : how much, how many, how fast, etc. * Sum up outstanding issues (CLO) |
|
Line 16: | Line 28: |
Experiences from Netarchive.dk concerning selective crawls (SAS) OPTIONAL :: * further experiences User management NetarchiveSuite (CLO):: * Different set of roles using the !NetarchiveSuite * A simple user interface for people who are not very familiar with webarchiving. * Sum up outstanding issues (CLO) Statistics module (CLO):: * Base for all kinds of calculations and general information about the webarchive * Comparing results of crawls for quality control. Access (CLO):: * Comparison of legal basis regarding access * Access with Wayback * Sum up outstanding issues (CLO) Collection (CLO):: * What's a collection? * Sum up outstanding issues (CLO) |
Preliminary Agenda items (proposals) for the non-technical workshop
- Introduction (CHH)
Overall presentation of Netarkivet.dk and NetarchiveSuite (BNH)
- Scheduling
- Responsabilities, roles of participants in a broad crawl defining crawl target (number of URL, scope, seed lists, politeness,budget...)
- Dealing with junk data
- Sorting and spliting seed lists into different jobs running test crawl
- Using frontier reports
- Modifying settings, creating overrides
- Visual QA
- Running a patch crawl
Different set of roles using the NetarchiveSuite