Differences between revisions 7 and 8
Revision 7 as of 2007-09-11 06:50:49
Size: 2364
Editor: EldZierau
Comment:
Revision 8 as of 2007-09-11 07:33:03
Size: 2365
Comment:
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
 * Look up of ???? data to add to metadata  * Look up of WHOIS data to add to metadata

Yellow Notes for Potential Future Developments

Points noted as being important for some participants are marked by being bolded

Points that needs further explanation is marked with ?

Roadmap

  • Java upgrade to Java 1.6
  • Encrypting password for bit archive
  • New Heritrix integration

Quality Assurance

  • Automatic filtering of collected URLs
  • Use scope in QA ?

Extra Metadata

  • Contractual Information
  • Naming of jobs
  • Look up of WHOIS data to add to metadata
  • Access to Heritrix logs via Harvest definition interface
  • A way to show count of stored bytes (after deduplication) and information on deduplication
  • Subdomains from logs

Harvest Tuning

  • Possibility to inactivate domains
  • Removing seeds from database

  • Automatically check seeds
  • Exclude domains from snapshot harvest

  • Alias detection tool integrated in system
  • Domain implementation

  • QueueAssignmentPolicy part of plugin ???

  • Global crawlertraps

  • Count inline images in the same domain

Heritrix

  • Pause/stop harvest from User Interface

  • Check email address using Heritrix method
  • Internationalisation of Heritrix (not seen as important)
  • Operator in metadata
  • Heritrix timeout challenge
  • Meta language for Heritix templates
  • Sharing of crawlertraps (Heritrix)

  • Arcreader and Arcwriter separately

  • Heritrix split
  • Use Heritrix deduplcation module
  • Arcfile naming

Infrastructure

  • Take down parts and start them again
  • Use of SPRING for deploy
  • Automatic start of application that is down

Database

  • Specify wanted backup directory
  • Elaborate on update procedure when database has changed
  • Oracle implementation as plugin
  • All SQL out in DBSpecific for plugin

Open Source Procedure

  • Control on formats for patches
  • What about new translations (in release information)
  • User traning material
  • Documentaiton on listen to queues

User Interface

  • More neat user interface

  • Click pattern

  • Filter job lists by category

  • Password & user nameto control settings

  • Calendar view of jobs

  • For people with special needs
  • Provinance

End User access

WorkshopSeptember2007YellowNotes (last edited 2010-08-16 10:25:11 by localhost)