3069
Comment:
|
← Revision 16 as of 2010-08-16 10:25:11 ⇥
3070
converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
These are the notes from the "potential future developments" board at the [:AgendaWorkshopSeptember2007:workshop 2007]. | These are the notes from the "potential future developments" board at the [[AgendaWorkshopSeptember2007|workshop 2007]]. |
Potential Future Developments
Yellow Notes for Potential Future Developments
These are the notes from the "potential future developments" board at the workshop 2007. The notes collected during the workshop were discussed and most important issues for some participants were marked.
The issues marked as important are bolded in the below listing.
Roadmap
- Java upgrade to Java 1.6
- Encrypting password for bit archive
New Heritrix integration
Quality Assurance
- Automatic filtering of collected URLs
- Use scope in QA (improved integration between harvest info and QA)
Extra Metadata
- Contractual Information
- Naming of jobs/groups of harvests
- Look up of WHOIS data to add to metadata
- Access to Heritrix logs via Harvest definition interface
- A way to show count of stored bytes (after deduplication) and information on deduplication
- Subdomains from logs
Harvest Tuning
- Possibility to deactivate domains
Removing seeds from database
- Automatically check seeds when entered
Exclude domains from snapshot harvest
- Alias detection tool integrated in system
Domain implementation
QueueAssignmentPolicy part of plugin ???
Global crawlertraps
- Count inline images in the same domain
Heritrix
Pause/stop harvest from User Interface
- Check email address using Heritrix method
- Internationalisation of Heritrix (not seen as important)
- Store operator in metadata
- Heritrix timeout challenge
- Meta language for Heritrix templates
Sharing of crawlertraps (Heritrix)
Arcreader and Arcwriter separately
- Heritrix split of a single job
- Use Heritrix deduplication module
Arcfile naming - should be configurable
Infrastructure
- Take down applications/machines and start them again
- Use of SPRING for deploy
- Automatic start of application that is down
Database
- Specify wanted backup directory
- Elaborate on update procedure when database has changed
- Oracle implementation as plugin
All SQL out in DBSpecific/separate file for plugin
Open Source Procedure
- Control on formats for patches
- What about new translations (in release information)
- User training material translated to English
- Documentation on listen to queues
User Interface
- Neater user interface
Click pattern
Filter job lists by category
Password & user name to control settings
Calendar view of jobs
- For people with special needs
- Provenance
End User access
Access to harvested data by public users
"Round the table" on Potential Future Developments
All participants pointed out the most important issue for them to be highest priority in future development. All pointed at one of the following two issues:
Harvest Tuning: Domain implementation'
Roadmap: New Heritrix integration