Release Notes for NetarchiveSuite 3.5.0
Please note that this is a development release, and not fit for production.
This version of NetarchiveSuite was released on 2008-03-04.
New features since NetarchiveSuite 3.4.*
General
Common Module
Improved stability with JMS connections
Previously, if an application lost its connection to the JMS server, the application had to be restarted. It will now attempt to reconnect in known recoverable scenarios.
Security manager supported
The NetarchiveSuite now comes with security.policy files, that limit what source code not in the NetarchiveSuite jar files or libraries is allowed to do. This will increase the security, especially for bitarchives.
The file is distributed as conf/security.policy, and to use it you need to start java with -Djava.security.manager -Djava.security.policy=conf/test.policy
See the new batch job possiblities for use case.
Harvester Module
Switch to DecidingScope
NetarchiveSuite now controls Heritrix using deciding scope, rather than the older deprecated HostScope and DomainScope. This is expected to improve harvesting performance. For the end user, there is no visible difference.
HarvestDefinitionApplication replaced by GUIApplication
The application HarvestDefinitionApplication has been removed. Instead, use dk.netarkivet.common.GUIApplication. It will work exactly as the old application, assuming you have the HarvestDefinition site section deployed in settings, that is:
<settings> ... <common> ... <webinterface> ... <siteSection> <!-- A subclass of SiteSection that defines this part of the web interface. --> <class>dk.netarkivet.harvester.webinterface.DefinitionsSiteSection</class> <!-- The directory or war-file containing the web application for this site section.--> <webapplication>webpages/HarvestDefinition</webapplication> <!-- The URL path for this section of the web interface. --> <deployPath>/HarvestDefinition</deployPath> </siteSection> ...
This is the default, and unchanged since previous versions.
Archive Module
Bit preservation restructuring
The bit preservation has undergone a huge restructuring. This is partly preparation for more actions that will improve bit preservation, but it has the immediate effect, that reestablishing a missing file, or fixing a file with a bit error, will require fewer mouse clicks, and generally perform faster now.
Support for submitting externally contributed batch jobs
The bit archive now has support for launching a batch job on the archive, that is written by an external source, without recompiling.
This is a great tool for researchers wishing to do some analysis on the entire archive.
All you have to do is to subclass the abstract class dk.netarkivet.common.utils.arc.FileBatchJob and implement methods for initialisation, finishing and what to do on each file. The results must be written to an output stream. It will then be executed on all bitarchive machines. The results will be written to a file, or to the screen. To work on individual arc records, rather than entire files, subclass dk.netarkivet.common.utils.arc.ARCBatchJob instead.
The mechanism to do this is the command line tool dk.netarkivet.archive.tools.RunBatch. E.g.
java dk.netarkivet.archive.tools.RunBatch MyBatchJob.class
Optionally, you can run the job on a subset of all files, on a specific location, and save the output to a file. Try starting the command line client woth no arguments for an example.
It is important to run your bitarchive with a security manager, and a restrictive policy (see above) to use this option. Otherwise external batch jobs might damage you bit archives.
Monitor Module
Automatic registering of applications for monitoring
Applications now automatically register themselves for monitoring by the monitor GUI.
This has two effects:
- Positive: You do not need to have a list of the machines to monitor in monitor_settings.xml, and you do not need to update it when adding new machines
- Negative: You will not get an error message, if an expected machine does not show up on the list of monitored machines.
Bugs fixed since NetarchiveSuite 3.4.*
Common Module
555 788 1063 1175 1185 1194
Harvester Module
1078 1161 1165 1182
Archive Module
1079
Monitor Module
900 1042 1050 1190
Upgrade instructions
Upgrading from 3.4.*
Please note that we only support upgrading from the previous stable release. Upgrading across several stable releases is not supported. You will need to upgrade step-by-step through all stable releases (for instance 3.0 -> 3.2 -> 3.4).
Automatic monitor registration settings
The automatic registering of clients is a new pluggable setting. The default implementation sends a JMS notification to the monitoring interface every minute, that informs the monitor that this application is alive and can be monitored.
This needs to be set in settings with the following setting:
<settings> ... <common> ... <monitorregistryClient xsi:type="jmsmonitorregistryclient"> <!-- The class instantiated to register JMX urls at a registry. --> <class>dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient</class> </monitorregistryClient>
The settings from monitor_settings.xml that describe running applications can be removed. The only thing left in that settings file is:
<settings> <monitor> <!-- the password used to connect to the all Mbeanservers started by the application. This password must be same as the one in jmxremote.password --> <jmxMonitorRolePassword>JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER</jmxMonitorRolePassword> </monitor> </settings>
Version History
Version 3.5.0 |
2008-03-04 |
Improvement of archive component with regard to security, batch, and preservation; greater JMS stability; important bug fixes |
Version 3.4.1 |
2008-01-16 |
Bug fix release, fixing out of memory on very large indexes |
Version 3.4.0 |
2008-01-03 |
Separation of Heritrix, work on developing our open source platform, two-part TLDs like co.uk, and lots of bugfixes |
Version 3.3.* |
|
Develop versions aiming for 3.4.0 |
Version 3.2.3 |
2007-09-27 |
Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing |
Version 3.2.2 |
2007-08-03 |
Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs |
Version 3.2.1 |
2007-07-04 |
Bugfix of 3.2.0 fixing trouble using the quick start manual. |
Version 3.2.0 |
2007-07-04 |
Open source release |
Version 3.1.* |
|
Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library. |
Version 3.0.0 |
2007-02-02 |
Marked the naming of the NetarchiveSuite, the splitting of NetarchiveSuite into independent modules, and the licensing of NetarchiveSuite under LGPL |
Version 2.* |
|
Various features and updates |
Version 2.0 |
2006-08-30 |
Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten. |
Version 1.* |
|
Various features and updates |
Version 1.0 |
2005-07-01 |
The first version of the netarchive| software put in production for harvesting the entire Danish web |
Version 0.* |
|
Various pre-production development versions |