Differences between revisions 4 and 5
Revision 4 as of 2007-09-10 13:25:40
Size: 250
Editor: EldZierau
Comment:
Revision 5 as of 2007-09-11 06:46:56
Size: 2250
Editor: EldZierau
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:

= Roadmap =
 * Java upgrade to Java 1.6
 * Encrypting password for bit archive
 * New Heritrix integration
= Quality Assurance =
 * Automatic filtering of collected URLs
 * Use scope in QA ?
= Extra Metadata =
 * Contractual Information
 * Naming of jobs
 * Look up of ???? data to add to metadata
 * Access to Heritrix logs via Harvest definition interface
 * A way to show count of stored bytes (after deduplication) and information on deduplication
 * Subdomains from logs
= Harvest Tuning =
 * Possibility to inactivate domains
 * Removing seeds from database
 * Automatically check seeds
 * Exclude domains from snapshot harvest
 * Alias detection tool integrated in system
 * Domain implementation
 * QueueAssignmentPolicy part of plugin ???
 * Global crawlertraps
 * Count inline images in the same domain
= Heritrix =
 * Pause/stop harvest from User Interface
 * Check email address using Heritrix method
 * Internationalisation of Heritrix (not seen as important)
 * Operator in metadata
 * Heritrix timeout challenge
 * Meta language for Heritix templates
 * Sharing of crawlertraps (Heritrix)
 * Arcreader and Arcwriter separately
 * Heritrix split
 * Use Heritrix deduplcation module
 * Arcfile naming
= Infrastructure =
 * Take down parts and start them again
 * Use of SPRING for deploy
 * Automatic start of application that is down
= Database =
 * Specify wanted backup directory
 * Elaborate on update procedure when database has changed
 * Oracle implementation as plugin
 * All SQL out in DBSpecific for plugin
= Open Source Procedure =
 * Control on formats for patches
 * What about new translations (in release information)
 * User traning material
 * Documentaiton on listen to queues
= User Interface =
 * More neat user interface
 * Click pattern
 * Filter job lists by category
 * Password & user nameto control settings
 * Calendar view of jobs
 * For people with special needs
 * Provinance
= End User access =

Yellow Notes for Potential Future Developments

Points noted as being important for some participants are marked by being bolded

Points that needs further explanation is marked with ?

Roadmap

  • Java upgrade to Java 1.6
  • Encrypting password for bit archive
  • New Heritrix integration

Quality Assurance

  • Automatic filtering of collected URLs
  • Use scope in QA ?

Extra Metadata

  • Contractual Information
  • Naming of jobs
  • Look up of ???? data to add to metadata
  • Access to Heritrix logs via Harvest definition interface
  • A way to show count of stored bytes (after deduplication) and information on deduplication
  • Subdomains from logs

Harvest Tuning

  • Possibility to inactivate domains
  • Removing seeds from database
  • Automatically check seeds
  • Exclude domains from snapshot harvest
  • Alias detection tool integrated in system
  • Domain implementation
  • QueueAssignmentPolicy part of plugin ???

  • Global crawlertraps
  • Count inline images in the same domain

Heritrix

  • Pause/stop harvest from User Interface
  • Check email address using Heritrix method
  • Internationalisation of Heritrix (not seen as important)
  • Operator in metadata
  • Heritrix timeout challenge
  • Meta language for Heritix templates
  • Sharing of crawlertraps (Heritrix)
  • Arcreader and Arcwriter separately
  • Heritrix split
  • Use Heritrix deduplcation module
  • Arcfile naming

Infrastructure

  • Take down parts and start them again
  • Use of SPRING for deploy
  • Automatic start of application that is down

Database

  • Specify wanted backup directory
  • Elaborate on update procedure when database has changed
  • Oracle implementation as plugin
  • All SQL out in DBSpecific for plugin

Open Source Procedure

  • Control on formats for patches
  • What about new translations (in release information)
  • User traning material
  • Documentaiton on listen to queues

User Interface

  • More neat user interface
  • Click pattern
  • Filter job lists by category
  • Password & user nameto control settings

  • Calendar view of jobs
  • For people with special needs
  • Provinance

End User access

WorkshopSeptember2007YellowNotes (last edited 2010-08-16 10:25:11 by localhost)