Differences between revisions 1 and 8 (spanning 7 versions)
Revision 1 as of 2007-09-10 13:23:42
Size: 106
Editor: EldZierau
Comment:
Revision 8 as of 2007-09-11 07:33:03
Size: 2365
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
Yellow Notes for Potential Future Developments = Yellow Notes for Potential Future Developments =
Points noted as being important for some participants are marked by being bolded

Points that needs further explanation is marked with ?

== Roadmap ==
 * Java upgrade to Java 1.6
 * Encrypting password for bit archive
 * '''New Heritrix integration'''
== Quality Assurance ==
 * Automatic filtering of collected URLs
 * Use scope in QA ?
== Extra Metadata ==
 * Contractual Information
 * Naming of jobs
 * Look up of WHOIS data to add to metadata
 * Access to Heritrix logs via Harvest definition interface
 * A way to show count of stored bytes (after deduplication) and information on deduplication
 * Subdomains from logs
== Harvest Tuning ==
 * Possibility to inactivate domains
 * '''Removing seeds from database'''
 * Automatically check seeds
 * '''Exclude domains from snapshot harvest'''
 * Alias detection tool integrated in system
 * '''Domain implementation'''
 * QueueAssignmentPolicy part of plugin ???
 * '''Global crawlertraps'''
 * Count inline images in the same domain
== Heritrix ==
 * '''Pause/stop harvest from User Interface'''
 * Check email address using Heritrix method
 * Internationalisation of Heritrix (not seen as important)
 * Operator in metadata
 * Heritrix timeout challenge
 * Meta language for Heritix templates
 * '''Sharing of crawlertraps (Heritrix)'''
 * '''Arcreader and Arcwriter separately'''
 * Heritrix split
 * Use Heritrix deduplcation module
 * '''Arcfile naming'''
== Infrastructure ==
 * Take down parts and start them again
 * Use of SPRING for deploy
 * Automatic start of application that is down
== Database ==
 * Specify wanted backup directory
 * Elaborate on update procedure when database has changed
 * Oracle implementation as plugin
 * '''All SQL out in DBSpecific for plugin'''
== Open Source Procedure ==
 * Control on formats for patches
 * What about new translations (in release information)
 * User traning material
 * Documentaiton on listen to queues
== User Interface ==
 * '''More neat user interface '''
 * '''Click pattern '''
 * '''Filter job lists by category '''
 * '''Password & user nameto control settings '''
 * '''Calendar view of jobs'''
 * For people with special needs
 * Provinance
== End User access ==

Yellow Notes for Potential Future Developments

Points noted as being important for some participants are marked by being bolded

Points that needs further explanation is marked with ?

Roadmap

  • Java upgrade to Java 1.6
  • Encrypting password for bit archive
  • New Heritrix integration

Quality Assurance

  • Automatic filtering of collected URLs
  • Use scope in QA ?

Extra Metadata

  • Contractual Information
  • Naming of jobs
  • Look up of WHOIS data to add to metadata
  • Access to Heritrix logs via Harvest definition interface
  • A way to show count of stored bytes (after deduplication) and information on deduplication
  • Subdomains from logs

Harvest Tuning

  • Possibility to inactivate domains
  • Removing seeds from database

  • Automatically check seeds
  • Exclude domains from snapshot harvest

  • Alias detection tool integrated in system
  • Domain implementation

  • QueueAssignmentPolicy part of plugin ???

  • Global crawlertraps

  • Count inline images in the same domain

Heritrix

  • Pause/stop harvest from User Interface

  • Check email address using Heritrix method
  • Internationalisation of Heritrix (not seen as important)
  • Operator in metadata
  • Heritrix timeout challenge
  • Meta language for Heritix templates
  • Sharing of crawlertraps (Heritrix)

  • Arcreader and Arcwriter separately

  • Heritrix split
  • Use Heritrix deduplcation module
  • Arcfile naming

Infrastructure

  • Take down parts and start them again
  • Use of SPRING for deploy
  • Automatic start of application that is down

Database

  • Specify wanted backup directory
  • Elaborate on update procedure when database has changed
  • Oracle implementation as plugin
  • All SQL out in DBSpecific for plugin

Open Source Procedure

  • Control on formats for patches
  • What about new translations (in release information)
  • User traning material
  • Documentaiton on listen to queues

User Interface

  • More neat user interface

  • Click pattern

  • Filter job lists by category

  • Password & user nameto control settings

  • Calendar view of jobs

  • For people with special needs
  • Provinance

End User access

WorkshopSeptember2007YellowNotes (last edited 2010-08-16 10:25:11 by localhost)