2527
Comment:
|
2637
|
Deletions are marked like this. | Additions are marked like this. |
Line 18: | Line 18: |
* Naming of jobs | * Naming of jobs/groups of harvests |
Line 26: | Line 26: |
* Automatically check seeds | * Automatically check seeds when entered |
Line 37: | Line 37: |
* Operator in metadata | * Store operator in metadata |
Line 39: | Line 39: |
* Meta language for Heritix templates | * Meta language for Heritrix templates |
Line 42: | Line 42: |
* Heritrix split * Use Heritrix deduplcation module |
* Heritrix split of a single job * Use Heritrix deduplication module |
Line 46: | Line 46: |
* Take down parts and start them again | * Take down applications/machines and start them again |
Line 53: | Line 53: |
* '''All SQL out in DBSpecific for plugin''' | * '''All SQL out in DBSpecific/separate file for plugin''' |
Line 57: | Line 57: |
* User traning material * Documentaiton on listen to queues |
* User training material translated to English * Documentation on listen to queues |
Line 63: | Line 63: |
* '''Password & user nameto control settings ''' | * '''Password & user name to control settings ''' |
Line 66: | Line 66: |
* Provinance | * Provenance |
Yellow Notes for Potential Future Developments
These are the notes from the "potential future developments" board at the [:AgendaWorkshopSeptember2007:workshop 2007]. Points noted as being important for some participants are marked by being bolded
Points that needs further explanation is marked with ?
Roadmap
- Java upgrade to Java 1.6
- Encrypting password for bit archive
New Heritrix integration
Quality Assurance
- Automatic filtering of collected URLs
- Use scope in QA (improved integration between harvest info and QA)
Extra Metadata
- Contractual Information
- Naming of jobs/groups of harvests
- Look up of WHOIS data to add to metadata
- Access to Heritrix logs via Harvest definition interface
- A way to show count of stored bytes (after deduplication) and information on deduplication
- Subdomains from logs
Harvest Tuning
- Possibility to inactivate domains
Removing seeds from database
- Automatically check seeds when entered
Exclude domains from snapshot harvest
- Alias detection tool integrated in system
Domain implementation
QueueAssignmentPolicy part of plugin ???
Global crawlertraps
- Count inline images in the same domain
Heritrix
Pause/stop harvest from User Interface
- Check email address using Heritrix method
- Internationalisation of Heritrix (not seen as important)
- Store operator in metadata
- Heritrix timeout challenge
- Meta language for Heritrix templates
Sharing of crawlertraps (Heritrix)
Arcreader and Arcwriter separately
- Heritrix split of a single job
- Use Heritrix deduplication module
Arcfile naming
Infrastructure
- Take down applications/machines and start them again
- Use of SPRING for deploy
- Automatic start of application that is down
Database
- Specify wanted backup directory
- Elaborate on update procedure when database has changed
- Oracle implementation as plugin
All SQL out in DBSpecific/separate file for plugin
Open Source Procedure
- Control on formats for patches
- What about new translations (in release information)
- User training material translated to English
- Documentation on listen to queues
User Interface
- Neater user interface
Click pattern
Filter job lists by category
Password & user name to control settings
Calendar view of jobs
- For people with special needs
- Provenance