Release Notes for NetarchiveSuite 3.13.0

This version of NetarchiveSuite was released on 2010-06-15

TableOfContents

New features since NetarchiveSuite 3.12.*

This release has primarily focused on integrating code into the main NetarchiveSuite branch implemented at BNF. Additionally, there has been done work in the deploy and wayback packages.

The following bugs and features have been fixed since 3.12

Common Module

Bug 1835 The new examples folder is missing from build
FR 1580 The applications should be able to tell us the version of NetarchiveSuite
FR 1880 Change copyright string from "Copyright 2004-2009" to "Copyright 2004-2010"

Deploy Module

Bug 1705 Make jmxremote.access writable before overwriting it (install script)
Bug 1914 The script to start the archive database lacks max heap option
FR 1790 Print usage of RunNetarchiveSuite.sh
FR 1846 Deploy the bitpreservation database (fixed QA)
FR 1876 Automatic startup of archive database and database url generation for test instance

Harvester Module

Bug 1777 Add event seeds only accepts a very short list of seeds
FR 1116 Global crawlertraps
FR 872 More logging needed in method HarvestControllerServer.HarvesterThread.run()
Bug 1856 Schedule problem after first start on NAS 3.10.0. No schedule started

Access Module

Bug1758 UrlCanonicalizerFactory falls back to default value silently

Archive Module

Bug 1934 SQLNonTransientConnectionException: Insufficient data while reading from the network
Bug 1920 Duplicates are currently ignored in DatabaseBasedBitpreservation.FindMissingFiles
Bug 1917 Request for all checksum timed out after 60 seconds
Bug 1911 java.lang.NullPointerException WARNING: Cannot retrieve the filenames to reply on the
Bug 1910 Update of Checksum replica takes more than 1 minut
Bug 1909 creation of adminDB with prod admin.data will take 6-7 days...
Bug 1905 DatabasedBased Bitpreservation can initiate multiple checksum and filelist requests at the same time.
Bug 1903 SEVERE: Cannot handle 2 files with the name '22583-MB100.arc'
Bug 1897 Wrong nulls in filelist status
Bug 1894 INFO: No replica name found in request.
Bug 1833 The method isAdminCheckSumOk() from the type FilePreservationState is not visible
FR 1734 Unittest of BitachiveMonitor
FR 1736 Monitoring batchjobs through logging
Bug 754 NullPointerException i bitpreservation
Bug 842 Bitpreservation GUI fetches checksums twice
Bug 810 TEST7, step 8: "Send failed" error  when updating filestatus for location SB

Documentation Module

FR 1818 Have all the example configuration files in one folder different from the "conf"

Monitor Module

FR 1757 Need a way to remove an application from lists of monitored applications
FR 1861 When clicking "More" in a status page, the link jumps to the top
FR 1578 The interval (REREGISTER_DELAY) between apps re-registering themselves should be a setting
Documentation

Upgrade instructions

New settings

settings.harvester.harvesting.heritrix.monitorResetInterval: The time interval in seconds after which the HarvestMonitorServer will reset the job state data. This is a simple way to detect the end of a job. The default is 300 seconds (5 minutes).

settings.harvester.harvesting.heritrix.crawlLoopWaitTime: Time interval in seconds to wait during a crawl loop in the harvest controller. The default is 20 seconds.

settings.harvester.harvesting.heritrix.abortIfConnectionLost: A boolean flag. If set to true, the harvest controller will abort the current crawl when the JMX connection is lost. If set to false it will only log a warning, leaving the crawl operator shutting down harvester manually. Used only by the BnfHeritrixController. The default is true.

settings.harvester.harvesting.heritrix.waitForReportGenerationTimeout: Maximum time in seconds to wait for Heritrix to generate report files once crawling is over. The default is 600 seconds (10 minutes).

settings.harvester.harvesting.heritrixLauncherClass: The implementation of the HeritrixLauncher abstract class to be used. The default is dk.netarkivet.harvester.harvesting.controller.DefaultHeritrixLauncher.

New translation strings

harvester/Translations.properties

errormsg;template.upload.failed.with.exception.0=Harvest template upload failed with exception {0}

monitor/Translation.properties

tablefield;removeapplication=Remove Application
errormsg.error.when.unregistering.mbean.0=Error when unreqistering JMX MBean identified with query ''{0}''.

(note: This latter needs to be changed to "errormsg;error.when.unregistering.mbean.0" See outstanding bug 1844 Wrong labelling of the translation key "errormsg.error.when.unregistering.mbean.0")

Deleted translation strings

harvester/Translation.properties

errormsg;template.upload.failed=Harvest template upload failed

archive/Translation.properties:

pagetitle;filestatus.update=Update of filestatus information
errormsg;unknown.filestatus.update.type.0=Unknown filestatus update type ''{0}''.
initiating;update.of.0.for.replica.1=Initiating update of ''{0}'' for replica ''{1}''
be.patient.this.operation.can.take.hours=Please be patient. This operation can take hours

Version History

Version 3.12.0

2010-05-03

New Bitpreservation infrastructure, and upgrade of Apache Derby to version 10.5.3.0

Version 3.11.*

Development versions aiming for 3.12.0

Version 3.10.0

2009-11-16

New deploy application; JMX stability issues fixed; JMS stability issues also fixed

Version 3.9.*

Development versions aiming for 3.10.0

Version 3.8.2

2009-09-10

Fix an important index synchronization bug

Version 3.8.1

2009-07-15

Fix of important bug leading to unresponsive harvesters

Version 3.8.0

2009-05-23

Java 1.6, Heritrix 1.14.1, Derby 10.4.2.0, complete rewrite of settings, new supported deploy module, gui access to harvest logs

Version 3.7.0

2008-11-04

Develop version aiming for 3.8.0

Version 3.6.0

2008-07-03

Improvement of archive component with regard to security, batch, and preservation; greater JMS stability; important bug fixes

Version 3.5.*

Develop versions aiming for 3.6.0

Version 3.4.2

2008-03-14

Bug fix release, fixing JMX timeout

Version 3.4.1

2008-01-16

Bug fix release, fixing out of memory on very large indexes

Version 3.4.0

2008-01-03

Separation of Heritrix, work on developing our open source platform, two-part TLDs like co.uk, and lots of bugfixes

Version 3.3.*

Develop versions aiming for 3.4.0

Version 3.2.3

2007-09-27

Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing

Version 3.2.2

2007-08-03

Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs

Version 3.2.1

2007-07-04

Bugfix of 3.2.0 fixing trouble using the quick start manual.

Version 3.2.0

2007-07-04

Open source release

Version 3.1.*

Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library.

Version 3.0.0

2007-02-02

Marked the naming of the NetarchiveSuite, the splitting of NetarchiveSuite into independent modules, and the licensing of NetarchiveSuite under LGPL

Version 2.*

Various features and updates

Version 2.0

2006-08-30

Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten.

Version 1.*

Various features and updates

Version 1.0

2005-07-01

The first version of the netarchive| software put in production for harvesting the entire Danish web

Version 0.*

Various pre-production development versions