= Release Notes for NetarchiveSuite 3.3.1 = This version of the !NetarchiveSuite was released on 2007-09-24. '''Note: This is a development release, it is not tested to be of production quality''' <> == New features since NetarchiveSuite 3.2.* == === General === We now have a publicly available SVN repository. See https://gforge.statsbiblioteket.dk/scm/?group_id=7. Please note that anonymous access is not available, although the page claims so. You will need to create a gforge account, and use the instructions under Developer Subversion Access. === Common Module === ==== Settings split ==== The settings for the monitor module have been moved to a separate settings file. It is expected that we will do further restructuring of our settings to make them more extensible and modular in a later release. === Harvester Module === ==== Two-part TLDs possible ==== The possibility of using e.g. co.uk as a top-level domain has been introduced. See upgrade section for example. ==== Heritrix integration ==== Preliminary work has been done on communicating with the harvesters through JMX and get the Heritrix UI up and running. === Monitor Module === ==== Dynamic reload of settings for deployed applications ==== With the new settings file for the monitor module, you will be able to update which applications are to be monitored. Simply change the settings file, and the file should be reloaded on the next load of the monitor application. == Bugs fixed since NetarchiveSuite 3.2.* == === Common Module === {{{ 1016 ExtractCDX tool does not handle arc-files with large records 1023 [javadoc] Tag @see: can't find getInstance(File) in dk.netarkivet.common.distribute.RemoteFile 1034 English NetarchiveSuite thumbnail redirects to Danish site 1057 HTTPRemoteFile breaks ? }}} === Harvester Module === {{{ 628 need a way to reset the nextdate of a HD 789 illegal regexp crashes entire job 861 No logmessage in HarvestDefinitionGUI, when submitting crawljob 915 TLDs having two parts i.e co.uk are disallowed 926 error writing crawl.log to metadata.arc 937 rescheduling of jobs is very slow and blocks normal scheduling 939 Webpages should handle the case, where no schedules or harvestdefinitions exist already properly 970 DB error on registering jobs with too many upload-errors 971 Seeds, passwords and configurations sorted without regard to locale 984 Missing headlines in "Selective Harvests" window 993 missing trim of domain strings or a normal error message 1033 Sensitive "Find Domain(s)" 1038 New title for 'Harvest status' - 'All Jobs per domain' 1039 SideKick sets it's application name to "dk.netarkivet.SideKick" 1040 sortNamedObjectList and Named should be moved to common.utils 1049 Danish translation of resubmitted is wrong 1051 Error on Definitions-create-domain.jsp when trying to create an invalid domain 1053 wrong parameter in configuration-link on job details page }}} === Archive Module === {{{ 1022 [javadoc] Parameter "data" is documented more than once in RemoveAndGetFileMessage 1024 An error message is always shown on the bitarchive checksum page 1036 missing translation 1046 Bitpreservation-filestatus-checksum.jsp: Missing whitespace between filename and Info-link 1059 Mention of locations as institutions in comments, and variable names }}} === Viewerproxy Module === {{{ 1030 new viewerproxy command URL gives strange behavior in some browsers }}} === Monitor Module === {{{ 936 no way to add a new bitarchive machine to the JMX-overview without restarting the GUI }}} == Upgrade instructions == === Monitor settings === To upgrade from a previous version of NetarchiveSuite, you will need to update your settings files on the application running the monitor GUI, usually HarvestDefinitionApplication. What needs to be done is move the section that looks like {{{ JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER 3 hostname1.example.com 8100 8101 8102 hostname2.example.com 8100 8101 hostname3.example.com 8100 8101 8102 8103 }}} to the file called {{{monitor_settings.xml}}}. If you use {{{-Ddk.netarkivet.settings.file=path/to/your/settings.xml}}} you should now also use {{{-Ddk.netarkivet.monitorsettings.file=path/to/your/monitor_settings.xml}}}. === Setting top-level domains === The old setting {{{settings.harvester.datamodel.domain.validDomainRegex}}} has been removed. Instead a new repeatable setting {{{settings.harvester.datamodel.domain.tld}}} is introduced, which decalres the valid top-level-domain-parts for domains in the system. This setting also defines how domains are split. Thus, if '{{{co.uk}}}' is a valid top-level domain, a legal domain would be "{{{bbc.co.uk}}}", and the host "{{{news.bbc.co.uk}}}" would be considered a host in that domain. Example: Say your old regex looked like this: {{{ ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[^\0000-,.-/:-@\[-`{-\0177]+\.(dk|com|net|uk))$ }}} You could replace it with the following for the exact same meaning: {{{ dk com net uk }}} Or you could introduce a better handling of {{{.uk}}} domains as follows: {{{ dk com net co.uk gov.uk ltd.uk me.uk mod.uk net.uk nic.uk nhs.uk org.uk plc.uk police.uk sch.uk uk }}} Note that {{{uk}}} is added at the end. This will let domains not belonging to any of the two-part TLDs still be caught (like British Library). == Version History == === Current development versions === || Version 3.3.1 || 2007-09-24 || Mostly bugfix work, including possibility to use two-part TLDs like co.uk || || Version 3.3.0 || 2007-08-06 || Mostly bugfix work, including upgradability of the monitored applications, and faster resubmitting of jobs || === Stable versions === || Version 3.2.3 || 2007-09-27 || Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing || || Version 3.2.2 || 2007-08-03 || Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs || || Version 3.2.1 || 2007-07-04 || Bugfix of 3.2.0 fixing trouble using the quick start manual. || || Version 3.2.0 || 2007-07-04 || Open source release || || Version 3.1.* || || Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library. || || Version 3.0.0 || 2007-02-02 || Marked the naming of the !NetarchiveSuite, the splitting of !NetarchiveSuite into independent modules, and the licensing of !NetarchiveSuite under LGPL || || Version 2.* || || Various features and updates || || Version 2.0 || 2006-08-30 || Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten. || || Version 1.* || || Various features and updates || || Version 1.0 || 2005-07-01 || The first version of the netarchive software put in production for harvesting the entire Danish web || || Version 0.* || || Various pre-production development versions ||