Differences between revisions 8 and 16 (spanning 8 versions)

Potential Future Developments

Yellow Notes for Potential Future Developments

These are the notes from the "potential future developments" board at the workshop 2007. The notes collected during the workshop were discussed and most important issues for some participants were marked.

The issues marked as important are bolded in the below listing.

Roadmap

Java upgrade to Java 1.6
Encrypting password for bit archive
New Heritrix integration

Quality Assurance

Automatic filtering of collected URLs
Use scope in QA (improved integration between harvest info and QA)

Extra Metadata

Contractual Information
Naming of jobs/groups of harvests
Look up of WHOIS data to add to metadata
Access to Heritrix logs via Harvest definition interface
A way to show count of stored bytes (after deduplication) and information on deduplication
Subdomains from logs

Harvest Tuning

Possibility to deactivate domains
Removing seeds from database
Automatically check seeds when entered
Exclude domains from snapshot harvest
Alias detection tool integrated in system
Domain implementation
QueueAssignmentPolicy part of plugin ???
Global crawlertraps
Count inline images in the same domain

Heritrix

Pause/stop harvest from User Interface
Check email address using Heritrix method
Internationalisation of Heritrix (not seen as important)
Store operator in metadata
Heritrix timeout challenge
Meta language for Heritrix templates
Sharing of crawlertraps (Heritrix)
Arcreader and Arcwriter separately
Heritrix split of a single job
Use Heritrix deduplication module
Arcfile naming - should be configurable

Infrastructure

Take down applications/machines and start them again
Use of SPRING for deploy
Automatic start of application that is down

Database

Specify wanted backup directory
Elaborate on update procedure when database has changed
Oracle implementation as plugin
All SQL out in DBSpecific/separate file for plugin

Open Source Procedure

Control on formats for patches
What about new translations (in release information)
User training material translated to English
Documentation on listen to queues

User Interface

Neater user interface
Click pattern
Filter job lists by category
Password & user name to control settings
Calendar view of jobs
For people with special needs
Provenance

End User access

Access to harvested data by public users

"Round the table" on Potential Future Developments

All participants pointed out the most important issue for them to be highest priority in future development. All pointed at one of the following two issues:

Harvest Tuning: Domain implementation'
Roadmap: New Heritrix integration

-  ⇤ ← Revision 8 as of 2007-09-11 07:33:03 → 
  Size: 2365
  Editor: KaareChristiansen
  Comment:
+   ← Revision 16 as of 2010-08-16 10:25:11 → ⇥
  Size: 3070
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-#acl NetarkivetGroup:read,write,delete,revert,admin All:
= Yellow Notes for Potential Future Developments =
Points noted as being important for some participants are marked by being bolded
+= Potential Future Developments =
-Line 5:
+Line 3:
-Points that needs further explanation is marked with ?
+== Yellow Notes for Potential Future Developments ==
These are the notes from the "potential future developments" board at the [[AgendaWorkshopSeptember2007|workshop 2007]].
The notes collected during the workshop were discussed and most important issues for some participants were marked.
 Line 7:
-== Roadmap ==
+The issues marked as important are bolded in the below listing.

=== Roadmap ===
-Line 11:
+Line 13:
-== Quality Assurance ==
+=== Quality Assurance ===
-Line 13:
+Line 15:
- * Use scope in QA ?
== Extra Metadata ==
+ * Use scope in QA (improved integration between harvest info and QA)
=== Extra Metadata ===
-Line 16:
+Line 18:
- * Naming of jobs
+ * Naming of jobs/groups of harvests
-Line 21:
+Line 23:
-== Harvest Tuning ==
 * Possibility to inactivate domains
+=== Harvest Tuning ===
 * Possibility to deactivate domains
-Line 24:
+Line 26:
- * Automatically check seeds
+ * Automatically check seeds when entered
-Line 28:
+Line 30:
- * QueueAssignmentPolicy part of plugin ???
+ * !QueueAssignmentPolicy part of plugin ???
-Line 31:
+Line 33:
-== Heritrix ==
+=== Heritrix ===
-Line 35:
+Line 37:
- * Operator in metadata
+ * Store operator in metadata
-Line 37:
+Line 39:
- * Meta language for Heritix templates
+ * Meta language for Heritrix templates
-Line 40:
+Line 42:
- * Heritrix split
 * Use Heritrix deduplcation module
 * '''Arcfile naming'''
== Infrastructure ==
 * Take down parts and start them again
+ * Heritrix split of a single job
 * Use Heritrix deduplication module
 * '''Arcfile naming - should be configurable'''
=== Infrastructure ===
 * Take down applications/machines and start them again
-Line 47:
+Line 49:
-== Database ==
+=== Database ===
-Line 51:
+Line 53:
- * '''All SQL out in DBSpecific for plugin'''
== Open Source Procedure ==
+ * '''All SQL out in DBSpecific/separate file for plugin'''
=== Open Source Procedure ===
-Line 55:
+Line 57:
- * User traning material
 * Documentaiton on listen to queues
== User Interface ==
 * '''More neat user interface '''
+ * User training material translated to English
 * Documentation on listen to queues
=== User Interface ===
 * Neater user interface
-Line 61:
+Line 63:
- * '''Password & user nameto control settings '''
+ * '''Password & user name to control settings '''
-Line 64:
+Line 66:
- * Provinance
== End User access ==
+ * Provenance
=== End User access ===
 * '''Access to harvested data by public users'''

== "Round the table" on Potential Future Developments ==
All participants pointed out the most important issue for them to be highest priority in future development. All pointed at one of the following two issues:
 * '''Harvest Tuning: Domain implementation''''
 * '''Roadmap: New Heritrix integration'''