250
Comment:
|
2250
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
= Roadmap = * Java upgrade to Java 1.6 * Encrypting password for bit archive * New Heritrix integration = Quality Assurance = * Automatic filtering of collected URLs * Use scope in QA ? = Extra Metadata = * Contractual Information * Naming of jobs * Look up of ???? data to add to metadata * Access to Heritrix logs via Harvest definition interface * A way to show count of stored bytes (after deduplication) and information on deduplication * Subdomains from logs = Harvest Tuning = * Possibility to inactivate domains * Removing seeds from database * Automatically check seeds * Exclude domains from snapshot harvest * Alias detection tool integrated in system * Domain implementation * QueueAssignmentPolicy part of plugin ??? * Global crawlertraps * Count inline images in the same domain = Heritrix = * Pause/stop harvest from User Interface * Check email address using Heritrix method * Internationalisation of Heritrix (not seen as important) * Operator in metadata * Heritrix timeout challenge * Meta language for Heritix templates * Sharing of crawlertraps (Heritrix) * Arcreader and Arcwriter separately * Heritrix split * Use Heritrix deduplcation module * Arcfile naming = Infrastructure = * Take down parts and start them again * Use of SPRING for deploy * Automatic start of application that is down = Database = * Specify wanted backup directory * Elaborate on update procedure when database has changed * Oracle implementation as plugin * All SQL out in DBSpecific for plugin = Open Source Procedure = * Control on formats for patches * What about new translations (in release information) * User traning material * Documentaiton on listen to queues = User Interface = * More neat user interface * Click pattern * Filter job lists by category * Password & user nameto control settings * Calendar view of jobs * For people with special needs * Provinance = End User access = |
Yellow Notes for Potential Future Developments
Points noted as being important for some participants are marked by being bolded
Points that needs further explanation is marked with ?
Roadmap
- Java upgrade to Java 1.6
- Encrypting password for bit archive
- New Heritrix integration
Quality Assurance
- Automatic filtering of collected URLs
- Use scope in QA ?
Extra Metadata
- Contractual Information
- Naming of jobs
- Look up of ???? data to add to metadata
- Access to Heritrix logs via Harvest definition interface
- A way to show count of stored bytes (after deduplication) and information on deduplication
- Subdomains from logs
Harvest Tuning
- Possibility to inactivate domains
- Removing seeds from database
- Automatically check seeds
- Exclude domains from snapshot harvest
- Alias detection tool integrated in system
- Domain implementation
QueueAssignmentPolicy part of plugin ???
- Global crawlertraps
- Count inline images in the same domain
Heritrix
- Pause/stop harvest from User Interface
- Check email address using Heritrix method
- Internationalisation of Heritrix (not seen as important)
- Operator in metadata
- Heritrix timeout challenge
- Meta language for Heritix templates
- Sharing of crawlertraps (Heritrix)
- Arcreader and Arcwriter separately
- Heritrix split
- Use Heritrix deduplcation module
- Arcfile naming
Infrastructure
- Take down parts and start them again
- Use of SPRING for deploy
- Automatic start of application that is down
Database
- Specify wanted backup directory
- Elaborate on update procedure when database has changed
- Oracle implementation as plugin
- All SQL out in DBSpecific for plugin
Open Source Procedure
- Control on formats for patches
- What about new translations (in release information)
- User traning material
- Documentaiton on listen to queues
User Interface
- More neat user interface
- Click pattern
- Filter job lists by category
Password & user nameto control settings
- Calendar view of jobs
- For people with special needs
- Provinance