Differences between revisions 8 and 9
Revision 8 as of 2009-03-22 00:07:43
Size: 11498
Comment:
Revision 9 as of 2009-03-23 16:59:00
Size: 11671
Comment:
Deletions are marked like this. Additions are marked like this.
Line 92: Line 92:
bug 431: Settings.DIR_COMMONTEMPDIR directories should be emptied upon startup
bug 433: Starting the bit archives twice without killing inbetween make bitarchive immortal

Release Notes for NetarchiveSuite 3.8.0

This version of NetarchiveSuite was released on 2009-MM-DD.

TableOfContents

New features since NetarchiveSuite 3.6.*

Apart from a general fixing of bugs (see below) the most important new features are:

General

Issue tracking cleanup

We have cleaned up the bug/feature request/patch trackers, removing a lot of confusing fields. We are working on the final documentation of explaining the precise definition of each remaining field, and how bugs are handled

Common Module

New settings structure

The way we read settings has been radically updated. It is no longer required to have a settings file with all settings, instead default values are used, if the settings are not set. Furthermore, more than one settings file can be used, one overriding the other. Refer to the ["Installation Manual devel"] for more info on how they are read. It is recommended to not set values where the default values are acceptable, so new defaults will be deployed automatically, in new versions of NetarchiveSuite.

Deploy Module

New and improved software to deploy NetarchiveSuite

The existing deploy software which were developed for the needs of the Netarkivet installation, and which were lacking in many areas, has been replaced by new deploy software that is more versatile and configurable than the old one, thus hopefully more useful for a larger range of scenarios.

Harvester Module

Access to harvest logs

From the job page of any DONE or FAILED job, there is now a link to inspect the Heritrix logs, metadata files, and harvested files from that job! This greatly enhances the possibilities of debugging why a job behaved as it did. You can also get the subset of a crawl.log that referred to a specific domain in a harvest.

Archive Module

Better handling of third-party batch jobs with tool

The command-line tool for submitting third-party batch jobs now supports a much better syntax. Simply launch the tool (dk.netarkivet.archive.tools.RunBatch) with no arguments for a description.

Documentation

The documentation has been brought up to day, and some parts have been elaborated. And a new manual, the Configuration Manual , has been added. ["Developer Manual"]Database documentation has been added to the ["Developer Manual"], and the scripts to generate a new harvest database now has elaborate documentation on the tables used by the harvest definition interface. These can be found in the distribution packages under scripts/sql

Bugs fixed since NetarchiveSuite 3.6.*

Common Module

bug 1247: FileUtils.unzip does not unzip directories properly
FR 291: HarvestControllerServer uses http port to set unique THIS_HACO
FR 1101: Should upgrade to using Java 1.6
FR 1252: Upgrade to Apache Derby 10.4.1.3
FR 1276: QuickStart installation should not use DEV as its environmentName
FR 1277: Move derbytools library from tests/lib/db to lib/db

Patch 1493: German Translation

Harvester Module

bug 1157: jobs disappeared in a strange way
bug 1181: The templates in scripts/simple_harvest/data/originals/harvestdefinitionbasedir/order_templates are invalid
bug 1226: Discrepancy between how our database are defined in the dev, and prod environments respectively
bug 1240: PAUSED heritrix gets terminated by JMXHarvestController
bug 1292: Possible to store Jobs in database coming from unknown harvestdefinition
bug 1468: If harvest process ends with non-zero exit code for any reason, it will never be retried to start the harvester by the SideKick
bug 1469: Wrong translation label in Definitions-find-domains if no domains are in the database

FR 770: Resubmitted jobs do not provide information about which job they are resubmitted as
FR 1108: Heritrix logs should be accessible in harvest definition interface
FR 1146: submit date as new field on jobs
FR 1160: Status and order selection (771) on History/Harveststatus-perharvestrun.jsp
FR 1162: Multiselect of status on all jobs
FR 1485: File must be streamed instead on QA/QA-crawlloglines.jsp 
FR 1497: Button in NetarchiveSuite

Archive Module

bug 1191: NullPointerException in BitarchiveMonitorServer
bug 1193: Exceptions from FileBatchJob stop batch job processing
bug 1212: When a file with bit errors has been restored, it still appears on the list of files with checksum errors
bug 1261: Wrong table headings (mostly Danish) in info part on page "Missing Files"
bug 1278: Error from RunBatch unexpected
bug 1279: Missing toString method on FileBatchJob classes
bug 1294: ref. to argument before check of argument in Filelist batchjob

FR 1029: A tool in dk.netarkivet.archive.tools to retrieve a file from the archive would be nice
FR 1195: TrivialArcRepositoryClient does not offer setting for file directory
FR 1263: Confusing layout of "Files with checksum errors"
FR 1410: Missing files page could look nicer after update
FR 1498: batchJobs given in jar files

Monitor Module

bug 1223: System overview takes more than 20 secs and Show all never returns
bug 1388: Too much logging during automatic registering of applications

FR 1042: Automatic registering to monitor of running applications desirable
FR 1379: Misleading error in SystemState when harvester has not started
FR 1471: Monitor plugin had no alternative
FR 1483: Alternative monitor plugin could be print of jmx url

Deploy Module

bug 431: Settings.DIR_COMMONTEMPDIR directories should be emptied upon startup
bug 433: Starting the bit archives twice without killing inbetween make bitarchive immortal
bug 1271: Naming of GuiApplication scripts are misleading
FR 1520: Deploy of more than one Bitapplication per server
FR 1572: We need a new DeployApplication that is more usable, and more configurable

Documentation

FR 281 set FTP-directory to ~/ftp (default is ~) FR 1287: Developer Manual should have better splits FR 1389: Installation Manual should document the need to set the maximum number of producers on JMS broker

Upgrade instructions

Remember to stop the running installation before upgrading.

New settings structure

As mentioned above, settings now have defaults, and can be read from multiple settings files. It is recommended that you no longer set settings, where you wish to use the default values.

Default Monitor class moved

The plugin for distributed monitoring using StatusSiteSection has moved. The setting settings.monitorregistryClient.class that was previously dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient is now dk.netarkivet.monitor.distribute.JMSMonitorRegistryClient

New settings

Settings from monitor_settings.xml are now settings in the standard settings files. The new settings are:

settings.monitor.jmxUsername: Must match the JMX username in all monitored applications. Default: monitorRole

Note that if you change the jmxUsername, you must change the security.policy accordingly. Assuming you set jmxUsername to "anonymous", we need the following line in the security.policy:

grant principal javax.management.remote.JMXPrincipal "anonymous" {
  permission java.security.AllPermission; };

settings.monitor.jmxPassword: Must match the JMX password in all monitored applications. Default: JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER

Changed setting names

The setting settings.common.jms.environmentName is now settings.common.environmentName

The setting settings.common.harvester.datamodel.domain.tld is now settings.common.topLevelDomains.tld

The setting settings.common.database.specificsclass is now settings.common.database.class

The setting settings.archive.bitarchive.limitForRecordDatatransferInFile is now settings.common.repository.limitForRecordDatatransferInFile

The setting settings.archive.arcrepository.location is now settings.common.locations.location

The setting settings.archive.arcrepository.batchLocation is now settings.common.locations.batchLocation

The setting settings.archive.bitarchive.thisLocation is now settings.common.thisPhysicalLocation

The setting settings.monitor.applicationName is now settings.common.applicationName

Removed settings

The setting settings.common.siteSection.deployPath is no longer used.

New translations

If you are maintaining a translation, please note that the following new keys have been added:

archive/Translations.properties

no.files.with.checksum.errors=No files with checksum errors were found
location=Location

harvester/Translations.properties

subtitle;reports.for.job=Harvest information for job
harvest.reports=Browse reports for jobs
harvest.files=Browse harvest files for job
crawl.log.lines.for.domain.0=Browse only relevant crawl-log lines for domain {0}

viewerproxy/Translations.properties

pagetitle;qa.get.files=Get harvested files
pagetitle;qa.get.reports=Get harvest reports
pagetitle;qa.crawllog.lines.for.domain=Lines from crawl.log about domain
pagetitle;files.for.job.0=Files for job {0}
pagetitle;reports.for.job.1=Reports for job {0}
pagetitle;qa.crawllog.lines.for.domain.0.in.1=Lines from crawl.log of job {1} concerning domain {0}
helptext;get.job.qa.information.with.viewerproxy=The links below will only work \
if your browser is set up to use the viewerproxy as web proxy.

Version History

Version 3.7.0

2008-11-04

Develop version aiming for 3.8.0

Version 3.6.0

2008-07-03

Improvement of archive component with regard to security, batch, and preservation; greater JMS stability; important bug fixes

Version 3.5.*

Develop versions aiming for 3.6.0

Version 3.4.2

2008-03-14

Bug fix release, fixing JMX timeout

Version 3.4.1

2008-01-16

Bug fix release, fixing out of memory on very large indexes

Version 3.4.0

2008-01-03

Separation of Heritrix, work on developing our open source platform, two-part TLDs like co.uk, and lots of bugfixes

Version 3.3.*

Develop versions aiming for 3.4.0

Version 3.2.3

2007-09-27

Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing

Version 3.2.2

2007-08-03

Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs

Version 3.2.1

2007-07-04

Bugfix of 3.2.0 fixing trouble using the quick start manual.

Version 3.2.0

2007-07-04

Open source release

Version 3.1.*

Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library.

Version 3.0.0

2007-02-02

Marked the naming of the NetarchiveSuite, the splitting of NetarchiveSuite into independent modules, and the licensing of NetarchiveSuite under LGPL

Version 2.*

Various features and updates

Version 2.0

2006-08-30

Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten.

Version 1.*

Various features and updates

Version 1.0

2005-07-01

The first version of the netarchive| software put in production for harvesting the entire Danish web

Version 0.*

Various pre-production development versions

ReleaseNotes3_8_0 (last edited 2010-08-16 10:24:45 by localhost)