= Release Notes for NetarchiveSuite 3.3.2 = This version of !NetarchiveSuite was released on 2007-11-08. '''Note: This is a development release, it is not tested to be of production quality.''' <> == New features since NetarchiveSuite 3.2.* == === General === We now have a publicly available SVN repository. See https://gforge.statsbiblioteket.dk/scm/?group_id=7. We also have a public tracking system available at the GForge site. Here you can browse known bugs and feature requests, as well as report new bugs or request new features. You can also submit patches for the NetarchiveSuite code. See https://gforge.statsbiblioteket.dk/tracker/index.php?group_id=7 We also have three mailing lists available. The netarchivesuite-announce list is low traffic announcement of new releases. The netarchivesuite-users list is a list for general discussion. The developers are active on this list, and feedback and comments is very welcome. Finally there is a mailing list where all commits to our repository are automatically reported. This third list is somewhat high volume and of interest for developers only. See https://gforge.statsbiblioteket.dk/mail/?group_id=7 === Common Module === ==== New secure remotefile implementation ==== A new remote file implementation using secure https ha sbeen added. To use it, set your remote file settings in {{{settings.xml}}} as follows: {{{ ... dk.netarkivet.common.distribute.HTTPSRemoteFile 8300 path/to/keystore testpass testpass2 }}} The keystore file must contain a certificate with the given passwords. It can be generated with the {{{keytool}}} application distributed with Java 5. Run the following command: {{{ keytool -alias NetarchiveSuite -keystore keystore -genkey }}} It should the respond with the following: {{{ Enter keystore password: }}} Enter the password for the keystore. The keytool will now prompt you for the following information {{{ What is your first and last name? [Unknown]: What is the name of your organizational unit? [Unknown]: What is the name of your organization? [Unknown]: What is the name of your City or Locality? [Unknown]: What is the name of your State or Province? [Unknown]: What is the two-letter country code for this unit? [Unknown]: Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct? [no]: }}} answer all the questions, and end with "yes". Finally you will bey asked for the certificate password. {{{ Enter key password for (RETURN if same as keystore password): }}} Answer with a password for the certificate. You now how a file called {{{keystore}}} which contains a certificate. To keep your environment secure, you should make sure you do not have the keystore and settings file readable for anyone but the application. ==== Settings split ==== The settings for the monitor module have been moved to a separate settings file. It is expected that we will do further restructuring of our settings to make them more extensible and modular in a later release. ==== Better pluggability of JMS implementation ==== If you wish to use a different JMS queue software, the class handling all JMS communication is now pluggable and can be specified in Settings. === Harvester Module === ==== Two-part TLDs possible ==== The possibility of using e.g. co.uk as a top-level domain has been introduced. See upgrade section for example. ==== Heritrix integration ==== Heritrix is now run as a separate process and controlled by JMX. This among other makes it possible to use the Heritrix harvest interface while the harvest is running. It also opens new possiblities for how we can control Heritrix during a harvest in later releases. ==== Manual override of scheduling ==== For selective harvests it is now possible to override when it should next run. This will override the scheduling with a date chosen by the operator. === Monitor Module === ==== Dynamic reload of settings for deployed applications ==== With the new settings file for the monitor module, you will be able to update which applications are to be monitored. Simply change the settings file, and the file should be reloaded on the next load of the monitor application. == Bugs fixed since NetarchiveSuite 3.2.* == === Common Module === {{{ 1016 ExtractCDX tool does not handle arc-files with large records 1023 [javadoc] Tag @see: can't find getInstance(File) in dk.netarkivet.common.distribute.RemoteFile 1034 English NetarchiveSuite thumbnail redirects to Danish site 1057 HTTPRemoteFile breaks ? 1061 Remove obsolete settings (monitor.htmlDir, viewerproxy.hostname) 1070 SUNMQ class must be fixed }}} === Harvester Module === {{{ 628 need a way to reset the nextdate of a HD 789 illegal regexp crashes entire job 861 No logmessage in HarvestDefinitionGUI, when submitting crawljob 915 TLDs having two parts i.e co.uk are disallowed 926 error writing crawl.log to metadata.arc 937 rescheduling of jobs is very slow and blocks normal scheduling 939 Webpages should handle the case, where no schedules or harvestdefinitions exist already properly 970 DB error on registering jobs with too many upload-errors 971 Seeds, passwords and configurations sorted without regard to locale 984 Missing headlines in "Selective Harvests" window 993 missing trim of domain strings or a normal error message 1010 long string in ordertemplate blows up formating of Job Details page 1033 Sensitive "Find Domain(s)" 1038 New title for 'Harvest status' - 'All Jobs per domain' 1039 SideKick sets it's application name to "dk.netarkivet.SideKick" 1040 sortNamedObjectList and Named should be moved to common.utils 1049 Danish translation of resubmitted is wrong 1051 Error on Definitions-create-domain.jsp when trying to create an invalid domain 1052 wrong description of setting in installation manual 1053 wrong parameter in configuration-link on job details page 1065 Show harvest run count 1074 more information from harvesters (JobID and Priority) }}} === Archive Module === {{{ 1022 [javadoc] Parameter "data" is documented more than once in RemoveAndGetFileMessage 1024 An error message is always shown on the bitarchive checksum page 1036 missing translation 1046 Bitpreservation-filestatus-checksum.jsp: Missing whitespace between filename and Info-link 1059 Mention of locations as institutions in comments, and variable names 1062 indexserver skips a lot of lines due to threading problem with SimpeDateFormat 1077 more logging when indexing large number of jobs 1080 failed upload does not clean up FTPRemoteFile }}} === Viewerproxy Module === {{{ 1030 new viewerproxy command URL gives strange behavior in some browsers }}} === Monitor Module === {{{ 936 no way to add a new bitarchive machine to the JMX-overview without restarting the GUI }}} === Deploy module (unsupported) === {{{ 1081 deploy uses largeIndexRequestTimeout for both LOW and HIGH priority harvester instances }}} == Upgrade instructions == Please note that we only support upgrading from the previous stable release. Upgrading across several stable releases is not supported. You will need to upgrade step-by-step through all stable releases (for instance 3.0 -> 3.2 -> 3.4). === Monitor settings === To upgrade from a 3.2.* version of !NetarchiveSuite, you will need to update your settings files on the application running the monitor GUI, usually !HarvestDefinitionApplication. What needs to be done is move the section that looks like {{{ JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER 3 hostname1.example.com 8100 8101 8102 hostname2.example.com 8100 8101 hostname3.example.com 8100 8101 8102 8103 }}} to the file called {{{monitor_settings.xml}}}. If you use {{{-Ddk.netarkivet.settings.file=path/to/your/settings.xml}}} you should now also use {{{-Ddk.netarkivet.monitorsettings.file=path/to/your/monitor_settings.xml}}}. === Setting top-level domains === The old setting {{{settings.harvester.datamodel.domain.validDomainRegex}}} has been removed. Instead a new repeatable setting {{{settings.harvester.datamodel.domain.tld}}} is introduced, which declares the valid top-level-domain-parts for domains in the system. This setting also defines how domains are split. Thus, if '{{{co.uk}}}' is a valid top-level domain, a legal domain would be "{{{bbc.co.uk}}}", and the host "{{{news.bbc.co.uk}}}" would be considered a host in that domain. Example: Say your old regex looked like this: {{{ ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[^\0000-,.-/:-@\[-`{-\0177]+\.(dk|com|net|uk))$ }}} You could replace it with the following for the exact same meaning: {{{ dk com net uk }}} Or you could introduce a better handling of .uk domains as follows: {{{ dk com net co.uk gov.uk ltd.uk me.uk mod.uk net.uk nic.uk nhs.uk org.uk plc.uk police.uk sch.uk uk }}} Note that {{{uk}}} is added at the end. This will let domains not belonging to any of the two-part TLDs still be caught (like British Library). === New settings for controlling heritrix === You can control the time we wait for the Heritrix external process to start. That time is set with the setting {{{settings.common.processTimeout}}}. Also, new settings set the ports for communication with Heritrix using JMX in the external process. Your new settings should be as follows: {{{ ... ... 5000 ... ... ... admin adminPassword 8090 8091 1598M }}} === Full classname need for MQ implementation === Previously the setting {{{settings.common.jms.class}}} was just the prefix of a class name, e.g. {{{SunMQ}}}. Now the full class name is needed. Example: If your previous settings.xml file contained: {{{ ... SunMQ ... ... }}} it should now become {{{ ... dk.netarkivet.common.distribute.JMSConnectionSunMQ ... ... }}} == Version History == === Current development versions === ||Version 3.3.2||2007-11-08||Separation of Heritrix, more bugfix work|| ||Version 3.3.1||2007-09-24||Mostly bugfix work, including possibility to use two-part TLDs like co.uk|| ||Version 3.3.0||2007-08-06||Mostly bugfix work, including upgradability of the monitored applications, and faster resubmitting of jobs|| === Stable versions === ||Version 3.2.3||2007-09-27||Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing|| ||Version 3.2.2||2007-08-03||Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs|| ||Version 3.2.1||2007-07-04||Bugfix of 3.2.0 fixing trouble using the quick start manual.|| ||Version 3.2.0||2007-07-04||Open source release|| ||Version 3.1.*|| ||Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library.|| ||Version 3.0.0||2007-02-02||Marked the naming of the !NetarchiveSuite, the splitting of !NetarchiveSuite into independent modules, and the licensing of !NetarchiveSuite under LGPL|| ||Version 2.* || ||Various features and updates|| ||Version 2.0 ||2006-08-30||Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten.|| ||Version 1.* || ||Various features and updates|| ||Version 1.0 ||2005-07-01||The first version of the netarchive| software put in production for harvesting the entire Danish web|| ||Version 0.* || ||Various pre-production development versions||