= Potential Future Developments = == Yellow Notes for Potential Future Developments == These are the notes from the "potential future developments" board at the [[AgendaWorkshopSeptember2007|workshop 2007]]. The notes collected during the workshop were discussed and most important issues for some participants were marked. The issues marked as important are bolded in the below listing. === Roadmap === * Java upgrade to Java 1.6 * Encrypting password for bit archive * '''New Heritrix integration''' === Quality Assurance === * Automatic filtering of collected URLs * Use scope in QA (improved integration between harvest info and QA) === Extra Metadata === * Contractual Information * Naming of jobs/groups of harvests * Look up of WHOIS data to add to metadata * Access to Heritrix logs via Harvest definition interface * A way to show count of stored bytes (after deduplication) and information on deduplication * Subdomains from logs === Harvest Tuning === * Possibility to deactivate domains * '''Removing seeds from database''' * Automatically check seeds when entered * '''Exclude domains from snapshot harvest''' * Alias detection tool integrated in system * '''Domain implementation''' * !QueueAssignmentPolicy part of plugin ??? * '''Global crawlertraps''' * Count inline images in the same domain === Heritrix === * '''Pause/stop harvest from User Interface''' * Check email address using Heritrix method * Internationalisation of Heritrix (not seen as important) * Store operator in metadata * Heritrix timeout challenge * Meta language for Heritrix templates * '''Sharing of crawlertraps (Heritrix)''' * '''Arcreader and Arcwriter separately''' * Heritrix split of a single job * Use Heritrix deduplication module * '''Arcfile naming - should be configurable''' === Infrastructure === * Take down applications/machines and start them again * Use of SPRING for deploy * Automatic start of application that is down === Database === * Specify wanted backup directory * Elaborate on update procedure when database has changed * Oracle implementation as plugin * '''All SQL out in DBSpecific/separate file for plugin''' === Open Source Procedure === * Control on formats for patches * What about new translations (in release information) * User training material translated to English * Documentation on listen to queues === User Interface === * Neater user interface * '''Click pattern ''' * '''Filter job lists by category ''' * '''Password & user name to control settings ''' * '''Calendar view of jobs''' * For people with special needs * Provenance === End User access === * '''Access to harvested data by public users''' == "Round the table" on Potential Future Developments == All participants pointed out the most important issue for them to be highest priority in future development. All pointed at one of the following two issues: * '''Harvest Tuning: Domain implementation'''' * '''Roadmap: New Heritrix integration'''