9169
Comment:
|
9261
|
Deletions are marked like this. | Additions are marked like this. |
Line 98: | Line 98: |
Note that if you change the jmxUsername, you must change the security.policy accordingly: | Note that if you change the jmxUsername, you must change the security.policy accordingly. Assuming you set jmxUsername to "anonymous", we need the following line in the security.policy: |
Line 100: | Line 100: |
grant principal javax.management.remote.JMXPrincipal "monitorRole" { | grant principal javax.management.remote.JMXPrincipal "anonymous" { |
Release Notes for NetarchiveSuite 3.7.0
This version of NetarchiveSuite was released on 2008-11-03.
New features since NetarchiveSuite 3.6.*
Apart from a general fixing of bugs (see below) the most important new features are:
General
Issue tracking cleanup
We have cleaned up the bug/feature request/patch trackers, removing a lot of confusing fields. We are working on the final documentation of explaining the precise definition of each remaining field, and how bugs are handled
Common Module
New settings structure
The way we read settings has been radically updated. It is no longer required to have a settings file with all settings, instead default values are used, if the settings are not set. Furthermore, more than one settings file can be used, one overriding the other. Refer to the ["Installation Manual devel"] for more info on hw they are read. It is recommended to not set values where the default values are acceptable, so new defaults will be deployed automatically, in new versions of NetarchiveSuite.
Harvester Module
Access to harvest logs
From the job page of any DONE or FAILED job, there is now a link to inspect the Heritrix logs, metadata files, and harvested files from that job! This greatly enhances the possibilities of debugging why a job behaved as it did. You can also get the subset of a crawl.log that referred to a specific domain in a harvest.
Archive Module
Better handling of third-party batch jobs with tool
The command-line tool for submitting third-party batch jobs now supports a much better syntax. Simply launch the tool (dk.netarkivet.archive.tools.RunBatch) with no arguments for a description.
Documentation
The documentation has been brought up to day, and some parts have been elaborated. Especially, the database documentation in the ["Developer Manual"] has been updated, and the scripts to generate a new harvest database now has elaborate documentation on the tables used by the harvest definition interface. These can be found in the distribution packages under scripts/sql
Bugs fixed since NetarchiveSuite 3.6.*
Common Module
bug 1247: FileUtils.unzip does not unzip directories properly
Harvester Module
bug 1157: jobs disappeared in a strange way bug 1181: The templates in scripts/simple_harvest/data/originals/harvestdefinitionbasedir/order_templates are invalid bug 1240: PAUSED heritrix gets terminated by JMXHarvestController bug 1292: Possible to store Jobs in database coming from unknown harvestdefinition bug 1468: If harvest process ends with non-zero exit code for any reason, it will never be retried to start the harvester by the SideKick bug 1469: Wrong translation label in Definitions-find-domains if no domains are in the database FR 1108: Heritrix logs should be accessible in harvest definition interface
Archive Module
bug 1191: NullPointerException in BitarchiveMonitorServer bug 1193: Exceptions from FileBatchJob stop batch job processing (patially) bug 1212: When a file with bit errors has been restored, it still appears on the list of files with checksum errors bug 1261: Wrong table headings (mostly Danish) in info part on page "Missing Files" bug 1278: Error from RunBatch unexpected bug 1279: Missing toString method on FileBatchJob classes bug 1294: ref. to argument before check of argument in Filelist batchjob FR 1263: Confusing layout of "Files with checksum errors"
Monitor Module
bug 1223: System overview takes more than 20 secs and Show all never returns bug 1388: Too much logging during automatic registering of applications FR 1471: Monitor plugin had no alternative
Deploy Module
bug 1271: Naming of GuiApplication scripts are misleading
Upgrade instructions
New settings structure
As mentioned above, settings now have defaults, and can be read from multiple settings files. It is recommended that you no longer set settings, where you wish to use the default values.
Default Monitor class moved
The plugin for distributed monitoring using StatusSiteSection has moved. The setting settings.monitorregistryClient.class that was previously dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient is now dk.netarkivet.monitor.distribute.JMSMonitorRegistryClient
New settings
Settings from monitor_settings.xml are now settings in the standard settings files. The new settings are:
settings.monitor.jmxUsername: Must match the JMX username in all monitored applications. Default: monitorRole
Note that if you change the jmxUsername, you must change the security.policy accordingly. Assuming you set jmxUsername to "anonymous", we need the following line in the security.policy:
grant principal javax.management.remote.JMXPrincipal "anonymous" { permission java.security.AllPermission; };
settings.monitor.jmxPassword: Must match the JMX password in all monitored applications. Default: JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER
Changed setting names
The setting settings.common.jms.environmentName is now settings.common.environmentName
The setting settings.common.harvester.datamodel.domain.tld is now settings.common.topLevelDomains.tld
The setting settings.common.database.specificsclass is now settings.common.database.class
The setting settings.archive.bitarchive.limitForRecordDatatransferInFile is now settings.common.repository.limitForRecordDatatransferInFile
The setting settings.archive.arcrepository.location is now settings.common.locations.location
The setting settings.archive.arcrepository.batchLocation is now settings.common.locations.batchLocation
The setting settings.archive.bitarchive.thisLocation is now settings.common.thisPhysicalLocation
The setting settings.monitor.applicationName is now settings.common.applicationName
Removed settings
The setting settings.common.siteSection.deployPath is no longer used.
New translations
If you are maintaining a translation, please note that the following new keys have been added:
archive/Translations.properties
no.files.with.checksum.errors=No files with checksum errors were found location=Location
harvester/Translations.properties
subtitle;reports.for.job=Harvest information for job harvest.reports=Browse reports for jobs harvest.files=Browse harvest files for job crawl.log.lines.for.domain.0=Browse only relevant crawl-log lines for domain {0}
viewerproxy/Translations.properties
pagetitle;qa.get.files=Get harvested files pagetitle;qa.get.reports=Get harvest reports pagetitle;qa.crawllog.lines.for.domain=Lines from crawl.log about domain pagetitle;files.for.job.0=Files for job {0} pagetitle;reports.for.job.1=Reports for job {0} pagetitle;qa.crawllog.lines.for.domain.0.in.1=Lines from crawl.log of job {1} concerning domain {0} helptext;get.job.qa.information.with.viewerproxy=The links below will only work \ if your browser is set up to use the viewerproxy as web proxy.
Version History
Version 3.6.0 |
2008-07-03 |
Improvement of archive component with regard to security, batch, and preservation; greater JMS stability; important bug fixes |
Version 3.5.* |
|
Develop versions aiming for 3.6.0 |
Version 3.4.2 |
2008-03-14 |
Bug fix release, fixing JMX timeout |
Version 3.4.1 |
2008-01-16 |
Bug fix release, fixing out of memory on very large indexes |
Version 3.4.0 |
2008-01-03 |
Separation of Heritrix, work on developing our open source platform, two-part TLDs like co.uk, and lots of bugfixes |
Version 3.3.* |
|
Develop versions aiming for 3.4.0 |
Version 3.2.3 |
2007-09-27 |
Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing |
Version 3.2.2 |
2007-08-03 |
Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs |
Version 3.2.1 |
2007-07-04 |
Bugfix of 3.2.0 fixing trouble using the quick start manual. |
Version 3.2.0 |
2007-07-04 |
Open source release |
Version 3.1.* |
|
Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library. |
Version 3.0.0 |
2007-02-02 |
Marked the naming of the NetarchiveSuite, the splitting of NetarchiveSuite into independent modules, and the licensing of NetarchiveSuite under LGPL |
Version 2.* |
|
Various features and updates |
Version 2.0 |
2006-08-30 |
Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten. |
Version 1.* |
|
Various features and updates |
Version 1.0 |
2005-07-01 |
The first version of the netarchive| software put in production for harvesting the entire Danish web |
Version 0.* |
|
Various pre-production development versions |