Differences between revisions 1 and 17 (spanning 16 versions)
Revision 1 as of 2010-06-15 15:05:33
Size: 6821
Comment:
Revision 17 as of 2010-08-16 10:24:53
Size: 18064
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
[[TableOfContents]] <<TableOfContents>>
Line 7: Line 7:
This release has primarily focused on integrating code into the main NetarchiveSuite branch implemented at BNF.
Additionally, there has been done work in the deploy and wayback packages.
This release has primarily focused on integrating code into the main !NetarchiveSuite branch implemented at BNF. Additionally, there has been done work in the deploy and wayback packages.
Line 14: Line 13:
Bug 1835 The new examples folder is missing from build
FR 1580 The applications should be able to tell us the version of NetarchiveSuite
FR 1880 Change copyright string from "Copyright 2004-2009" to "Copyright 2004-2010"
}}}
=== Deploy Module ===
{{{
Bug 1705 Make jmxremote.access writable before overwriting it (install script)
Bug 1914 The script to start the archive database lacks max heap option
FR 1790 Print usage of RunNetarchiveSuite.sh
FR 1846 Deploy the bitpreservation database (fixed QA)
FR 1876 Automatic startup of archive database and database url generation for test instance
FR 1929 15 second level TLD related to the .fr and .re domains
Line 28: Line 17:
Bug 1777 Add event seeds only accepts a very short list of seeds
FR 1116 Global crawlertraps
FR 872 More logging needed in method HarvestControllerServer.HarvesterThread.run()
Bug 1856 Schedule problem after first start on NAS 3.10.0. No schedule started
Bug 1856: Schedule problem after first start on NAS 3.10.0. No schedule started
Bug 1964: PWC6033: Unable to compile class for JSP
Bug 1971: PostgreSQL does not work with current NetarchiveSuite: forgot to copy BNF change to trunk
FR 1134 Filter job lists by category
FR 1668 Paginate and make sortable and searchable the list of jobs
FR 1688 Monitoring broad crawls
FR 1924 Allows to search a domain in active jobs (in case of webmaster complain)
FR 1925 PostgreSQL connectivity (using the PostgreSQL driver version 8.4 - JDBC 4)
FR 1927 Delay job end to allow Heritrix report generation
FR 1928 Ability to easily resubmit a selection of failed jobs (but see Bug 1972)
FR 1930 Ability to implement a different crawl control loop via HeritrixLauncher / new Heritrix JMX controller
FR 1951 Upgrade to heritrix 1.14.4
Line 35: Line 32:
Bug1758 UrlCanonicalizerFactory falls back to default value silently Bug 1955 NetarchiveResourceStore fails to handle redirects
Bug 1976 Wayback fails to start with 'all_test.sh'
Line 39: Line 37:
Bug 1934 SQLNonTransientConnectionException: Insufficient data while reading from the network
Bug 1920 Duplicates are currently ignored in DatabaseBasedBitpreservation.FindMissingFiles
Bug 1917 Request for all checksum timed out after 60 seconds
Bug 1911 java.lang.NullPointerException WARNING: Cannot retrieve the filenames to reply on the
Bug 1910 Update of Checksum replica takes more than 1 minut
Bug 1909 creation of adminDB with prod admin.data will take 6-7 days...
Bug 1905 DatabasedBased Bitpreservation can initiate multiple checksum and filelist requests at the same time.
Bug 1903 SEVERE: Cannot handle 2 files with the name '22583-MB100.arc'
Bug 1897 Wrong nulls in filelist status
Bug 1894 INFO: No replica name found in request.
Bug 1833 The method isAdminCheckSumOk() from the type FilePreservationState is not visible
FR 1734 Unittest of BitachiveMonitor
FR 1736 Monitoring batchjobs through logging
Bug 754 NullPointerException i bitpreservation
Bug 842 Bitpreservation GUI fetches checksums twice
Bug 810 TEST7, step 8: "Send failed" error when updating filestatus for location SB
Bug 1949 Findbugs: use of known null pointer in BitpreserveFileState.processUpdateRequest
Line 58: Line 41:
FR 1818 Have all the example configuration files in one folder different from the "conf"
}}}
=== Monitor Module ===
{{{
FR 1757 Need a way to remove an application from lists of monitored applications
FR 1861 When clicking "More" in a status page, the link jumps to the top
FR 1578 The interval (REREGISTER_DELAY) between apps re-registering themselves should be a setting
Documentation
}}}
Bug 1732 LocalArcRepositoryClient not documented
}}}
Line 69: Line 44:

=== New settings ===
Note that a lot of new settings has been introduced in this release, and a bunch of new third-party libraries has been added.

=== New settings in the common module ===
'''settings.common.webinterface.harvestStatus.defaultPageSize''': The default number of jobs to show in the harvest status section on one result page. The default number is 100.

'''settings.common.batch.batchjobs.batchjob.class''': The list of batchjobs to be runnable from the GUI. Must be the complete path to the batchjob classes (e.g. dk.netarkivet.archive.arcrepository.bitpreservation.!ChecksumJob). Must inherit !FileBatchJob. The default is the following:

{{{
<batchjobs>
                <batchjob>
                    <class>dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob</class>
                    <jarfile></jarfile>
                </batchjob>
                <batchjob>
                    <class>dk.netarkivet.archive.arcrepository.bitpreservation.FileListJob</class>
                    <jarfile></jarfile>
                </batchjob>
            </batchjobs>
}}}
'''settings.common.batch.batchjobs.batchjob.arcfile''': The list of the corresponding jar-files containing the batchjob. This will be used for !LoadableJarBatchJobs.

 . If no file is specified, it is assumed, that the batchjob exists with
the default classpath of the involved applications (!BitarchiveMonitor, !ArcRepository, GUIWebServer and !BitArchive).

'''settings.common.batch.baseDir''': The directory where the resulting files will be placed when running a batchjob through the GUI interface. The default is the relative dir "batch"

==== New settings in the harvester module ====
'''settings.harvester.harvesting.heritrix.monitorResetInterval''': The time interval in seconds after which the !HarvestMonitorServer will reset the job state data. This is a simple way to detect the end of a job. The default is 300 seconds (5 minutes).

'''settings.harvester.harvesting.heritrix.crawlLoopWaitTime''': Time interval in seconds to wait during a crawl loop in the harvest controller. The default is 20 seconds.

'''settings.harvester.harvesting.heritrix.abortIfConnectionLost''': A boolean flag. If set to true, the harvest controller will abort the current crawl when the JMX connection is lost. If set to false it will only log a warning, leaving the crawl operator shutting down harvester manually. Used only by the !BnfHeritrixController. The default is true.

'''settings.harvester.harvesting.heritrix.waitForReportGenerationTimeout''': Maximum time in seconds to wait for Heritrix to generate report files once crawling is over. The default is 600 seconds (10 minutes).

'''settings.harvester.harvesting.heritrixLauncherClass''': The implementation of the !HeritrixLauncher abstract class to be used. The default is dk.netarkivet.harvester.harvesting.controller.!DefaultHeritrixLauncher.

==== New settings in the wayback module ====
A lot of new settings has appeared, that configures hibernation, and its database connection manager used by hibernate (c3p0). Documentation about the c3p0 settings and c3p0 in general can be found here: About configuring c3p0: http://www.mchange.com/projects/c3p0/index.html#configuration_properties <br>About c3p0 in general: http://www.mchange.com/projects/c3p0/index.html

'''settings.wayback.hibernate.c3p0.acquire_increment''': Determines how many connections at a time c3p0 will try to acquire when the pool is exhausted. Defines the value of hibernate configuration key "hibernate.c3p0.acquire_increment" which is the same as the c3p0-native property name "c3p0.acquireIncrement". The default is 1.

'''settings.wayback.hibernate.c3p0.idle_test_period''': If this is a number greater than 0, c3p0 will test all idle, pooled but unchecked-out connections, every this number of seconds. Defines the value of hibernate configuration key "hibernate.c3p0.idle_test_period" which is the same as the c3p0-native property name "c3p0.idleConnectionTestPeriod". The default is 100.

'''settings.wayback.hibernate.c3p0.max_size''': Maximum number of Connections a pool will maintain at any given time. Defines the value of hibernate configuration key "hibernate.c3p0.max_size" which is the same as the c3p0-native property name "c3p0.maxPoolSize". The default is 100.

'''settings.wayback.hibernate.c3p0.max_statements''': The size of c3p0's global !PreparedStatement cache. Defines the value of hibernate configuration key "hibernate.c3p0.max_statements" which is the same as the c3p0-native property name "c3p0.maxStatements". The default is 100.

'''settings.wayback.hibernate.c3p0.min_size''': Minimum number of Connections a pool will maintain at any given time. Defines the value of hibernate configuration key "hibernate.c3p0.min_size" which is the same as the c3p0-native property name "c3p0.minPoolSize". The default is 10.

'''settings.wayback.hibernate.c3p0.timeout''' Defines the value of hibernate configuration key "hibernate.c3p0.timeout" which is the same as the c3p0-native property name "c3p0.maxIdleTime". The default is 100.

'''settings.wayback.hibernate.connection_url''': The hibernate connection url. The default is "jdbc:derby:derbyDB/wayback_indexer_db;create=true"

'''settings.wayback.hibernate.db_driver_class''': The hibernate client driver class. The default is "org.apache.derby.jdbc.!ClientDriver"

'''settings.wayback.hibernate.use_reflection_optimizer''': Look in the hibernation documentation for its meaning. The default is "false".

'''settings.wayback.hibernate.transaction_factory''': Look in the hibernation documentation for its meaning. The default is org.hibernate.transaction.JDBCTransactionFactory.

'''settings.wayback.hibernate.dialect''': Look in the hibernation documentation for its meaning. The default is "org.hibernate.dialect.!DerbyDialect"

'''settings.wayback.hibernate.show_sql''': Look in the hibernation documentation for its meaning. The default is "true".

'''settings.wayback.hibernate.format_sql''': Look in the hibernation documentation for its meaning. The default is "true".

'''settings.wayback.hibernate.hbm2ddl_auto''': Look in the hibernation documentation for its meaning. The default is "update".

'''settings.wayback.hibernate.user''': Look in the hibernation documentation for its meaning. The default is "".

'''settings.wayback.hibernate.password''': Look in the hibernation documentation for its meaning. The default is "".

'''settings.wayback.indexer.replicaId''': The replica to be used by the wayback indexer. Default value is "ONE".

'''settings.wayback.indexer.temp_batch_output_dir''': The directory to which batch output is written during indexing. The default value is "tempdir".

'''settings.wayback.indexer.final_batch_output_dir''': The directory to which batch output is moved after a batch indexing job is successfully completed. The default value is "batchOutputDir".

'''settings.wayback.indexer.maxFailedAttempts''': The maximum number of times an archive file may generate a batch error during indexing before we give up on it. The default value is "3";

'''settings.wayback.indexer.producerDelay''': The delay in milliseconds before the producer thread is started. The default value is "0";

'''settings.wayback.indexer.producerInterval''': The interval, in milliseconds, between successive runs of the producer thread. The default value is "86400000";

'''settings.wayback.indexer.consumerThreads''': The number of consumer threads to run. The default value is "5";

'''settings.wayback.indexer.initialFiles''': A file containing a list of files which have been archived and therefore do not need to be archived again. This key may be unset. The default value is "";

'''settings.wayback.aggregator.index_file-input_dir''': The directory the Aggregator consumes raw index files from The default value is "batchOutputDir";

'''settings.wayback.aggregator.index_file-output_dir''': The directory the Aggregator places the Aggregated and sorted files into The default value is "indexDir";

'''settings.wayback.aggregator.temp_aggregator_dir''': The directory used by the aggregator to store temporary files. The default value is "aggregator_tempdir";

'''settings.wayback.aggregator.aggregation_interval''': The time to between each scheduled aggregation run (in miliseconds). The default value is "86400000";

'''settings.wayback.aggregator.max_intermediate_index_file_size''': The maximum size of the Intermediate index file in MB. When this limit is reached a new index file is created and new indexes are added to this file. In case of a 0 value, the intermediate index file will always be merged into the main index file. The default value is "102400";

'''settings.wayback.aggregator.max_main_index_file_size''': The maximum size of the main wayback index file in MB. When this limit is reached a new index file is created and new indexes are added to this file. The old index file will be rename to ${finalIndexFileSizeLimit}.1 The default value is "104857600";
Line 73: Line 145:
'''common/Translation.properties'''

{{{
batchpage;Name.of.batchjob=Name of batchjob
batchpage;Description=Description
batchpage;Last.run=Last run
batchpage;Batchjob.has.never.been.run=Batchjob has never been run
batchpage;Size.of.output.file=Size of output file
batchpage;Number.of.lines.in.output.file=Number of lines in output file
batchpage;Size.of.error.file=Size of error file
batchpage;Number.of.lines.in.error.file=Number of lines in error file
batchpage;Choose.replica=Choose replica
batchpage;Regular.expression.for.filenames.(all.files)=Regular expression for file names (\".*\" = all files)
batchpage;Execute.batchjob=Execute batchjob
batchpage;Arguments=Arguments batchpage;Bad.argument.metadata.for.the.constructor=Bad argument metadata resources for the constructor ''{0}''
batchpage;Argument.i=Argument {0}
batchpage;Argument.i.missing.argument.metadata=Argument {0} (missing argument metadata)
batchpage;No.batchjobs.defined.in.settings=No batchjobs defined in settings
batchpage;Predefined.batchjobs=Predefined batchjobs
batchpage;Batchjob=Batchjob batchpage;No.output.file=No output file
batchpage;No.error.file=No error file
batchpage;Warning.0=Warning: {0}
batchpage;Which.files=Which files
batchpage;Job.ID=Job ID
batchpage;Metadata=Metadata
batchpage;Content=Content
batchpage;Both=Both
batchpage;No.outputfile=No outputfile
batchpage;No.errorfile=No errorfile
batchpage;Download.outputfile=Download outputfile
batchpage;Download.errorfile=Download errorfile
batchpage;No.valid.timestamp=No valid timestamp batchpage;bytes=bytes
batchpage;lines=lines batchpage;Started.date=Stated date
batchpage;Ended.date=Ended date batchpage;Output.file=Output file
batchpage;Error.file=Error file batchpage;Number.of.runs.0=Number of runs: {0}
}}}
Line 76: Line 184:
errormsg;template.upload.failed.with.exception.0=Harvest template upload failed with exception {0} pagetitle;all.jobs.running=Running Jobs
status.job.filters.group1=Job status {0} Harvest name {1} Start date {2} End date {3}
status.job.filters.group2=Order {0} Display {1} rows per page.
status.job.filters.group3=Job status {0} Harvest name {1}
status.sort.order.job.reset=Reset
status.results.page=Displaying result page {0}.
status.results.displayed=Search results: {0}, displaying results {1} to {2}.
status.results.displayed.pagination={0} / {1}
status.results.displayed.nextPage=next
status.results.displayed.prevPage=previous
status.harvest.all=All
table.running.jobs.jobId=Job ID
table.running.jobs.harvestName=Harvest definition
table.running.jobs.host=Host table.running.jobs.progress=Progress
table.running.jobs.queuedFiles=Queued files
table.running.jobs.totalQueues=Queues
table.running.jobs.activeQueues=Active
table.running.jobs.retiredQueues=Retired
table.running.jobs.exhaustedQueues=Exhausted
table.running.jobs.elapsedTime=Elapsed time
table.running.jobs.alerts=Alerts
table.running.jobs.downloadedCount=Downloaded files
table.running.jobs.currentProcessedKBPerSec=KB/s
table.running.jobs.currentProcessedDocsPerSec=URL/s
table.running.jobs.queues=Queues
table.running.jobs.performance=Performance
table.running.jobs.toeThreads=Threads
table.running.jobs.status.preCrawl=Crawl in preparation
table.running.jobs.status.crawlerRunning=Crawler is running
table.running.jobs.status.crawlerPaused=Crawler is paused
table.running.jobs.status.crawlFinished=Crawl finished
table.running.jobs.legend={0} - crawl in preparation, {1} crawler is running, {2} - crawler is paused, {3} - crawl finished
running.jobs.finder.inputGroup=Find job harvesting domain {0}
running.jobs.finder.submit=Search
resubmit.jobs.submit=Resubmit selected failed jobs
errormsg;resubmit.jobs.selectionEmpty=Please select at least one job!
running.jobs.finder.table.jobId=Job ID
Line 81: Line 225:
tablefield;removeapplication=Remove Application
errormsg.error.when.unregistering.mbean.0=Error when unreqistering JMX MBean identified with query ''{0}''.
}}}
(note: This latter needs to be changed to "errormsg;error.when.unregistering.mbean.0" See outstanding bug 1844 Wrong labelling of the translation key "errormsg.error.when.unregistering.mbean.0")
}}}
'''viewerproxy/Translation.properties'''

{{{
pagetitle;qa.batchjob.overview=Batchjob Overview
pagetitle;qa.batchjob.retrieve.resultfile=BatchJob resultfile
pagetitle;qa.batchjob=Batchjob pagetitle;qa.batchjob.execute=Executing
batchjob pagetitle;qa.get.files=Get harvested files
pagetitle;qa.get.reports=Get harvest reports
pagetitle;qa.crawllog.lines.for.domain=Lines from crawl.log about domain
pagetitle;files.for.job.0=Files for job {0}
pagetitle;reports.for.job.1=Reports for job {0}
pagetitle;qa.crawllog.lines.for.domain.0.in.1=Lines from crawl.log of job {1} concerning domain {0}
}}}
Line 90: Line 243:
errormsg;template.upload.failed=Harvest template upload failed
Line 95: Line 247:
pagetitle;filestatus.update=Update of filestatus information
errormsg;unknown.filestatus.update.type.0=Unknown filestatus update type ''{0}''.
initiating;update.of.0.for.replica.1=Initiating update of ''{0}'' for replica ''{1}''
be.patient.this.operation.can.take.hours=Please be patient. This operation can take hours

Release Notes for NetarchiveSuite 3.13.0

This version of NetarchiveSuite was released on 2010-06-15

New features since NetarchiveSuite 3.12.*

This release has primarily focused on integrating code into the main NetarchiveSuite branch implemented at BNF. Additionally, there has been done work in the deploy and wayback packages.

The following bugs and features have been fixed since 3.12

Common Module

FR 1929 15 second level TLD related to the .fr and .re domains

Harvester Module

Bug 1856: Schedule problem after first start on NAS 3.10.0. No schedule started
Bug 1964: PWC6033: Unable to compile class for JSP
Bug 1971: PostgreSQL does not work with current NetarchiveSuite: forgot to copy BNF change to trunk
FR 1134 Filter job lists by category
FR 1668 Paginate and make sortable and searchable the list of jobs
FR 1688 Monitoring broad crawls
FR 1924 Allows to search a domain in active jobs (in case of webmaster complain)
FR 1925 PostgreSQL connectivity (using the PostgreSQL driver version 8.4 - JDBC 4)
FR 1927 Delay job end to allow Heritrix report generation 
FR 1928 Ability to easily resubmit a selection of failed jobs (but see Bug 1972)
FR 1930 Ability to implement a different crawl control loop via HeritrixLauncher / new Heritrix JMX controller
FR 1951 Upgrade to heritrix 1.14.4

Access Module

Bug 1955 NetarchiveResourceStore fails to handle redirects
Bug 1976 Wayback fails to start with 'all_test.sh'

Archive Module

Bug 1949 Findbugs: use of known null pointer in BitpreserveFileState.processUpdateRequest

Documentation Module

Bug 1732 LocalArcRepositoryClient not documented

Upgrade instructions

Note that a lot of new settings has been introduced in this release, and a bunch of new third-party libraries has been added.

New settings in the common module

settings.common.webinterface.harvestStatus.defaultPageSize: The default number of jobs to show in the harvest status section on one result page. The default number is 100.

settings.common.batch.batchjobs.batchjob.class: The list of batchjobs to be runnable from the GUI. Must be the complete path to the batchjob classes (e.g. dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob). Must inherit FileBatchJob. The default is the following:

<batchjobs>
                <batchjob>
                    <class>dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob</class>
                    <jarfile></jarfile>
                </batchjob>
                <batchjob>
                    <class>dk.netarkivet.archive.arcrepository.bitpreservation.FileListJob</class>
                    <jarfile></jarfile>
                </batchjob>
            </batchjobs>

settings.common.batch.batchjobs.batchjob.arcfile: The list of the corresponding jar-files containing the batchjob. This will be used for LoadableJarBatchJobs.

  • If no file is specified, it is assumed, that the batchjob exists with

the default classpath of the involved applications (BitarchiveMonitor, ArcRepository, GUIWebServer and BitArchive).

settings.common.batch.baseDir: The directory where the resulting files will be placed when running a batchjob through the GUI interface. The default is the relative dir "batch"

New settings in the harvester module

settings.harvester.harvesting.heritrix.monitorResetInterval: The time interval in seconds after which the HarvestMonitorServer will reset the job state data. This is a simple way to detect the end of a job. The default is 300 seconds (5 minutes).

settings.harvester.harvesting.heritrix.crawlLoopWaitTime: Time interval in seconds to wait during a crawl loop in the harvest controller. The default is 20 seconds.

settings.harvester.harvesting.heritrix.abortIfConnectionLost: A boolean flag. If set to true, the harvest controller will abort the current crawl when the JMX connection is lost. If set to false it will only log a warning, leaving the crawl operator shutting down harvester manually. Used only by the BnfHeritrixController. The default is true.

settings.harvester.harvesting.heritrix.waitForReportGenerationTimeout: Maximum time in seconds to wait for Heritrix to generate report files once crawling is over. The default is 600 seconds (10 minutes).

settings.harvester.harvesting.heritrixLauncherClass: The implementation of the HeritrixLauncher abstract class to be used. The default is dk.netarkivet.harvester.harvesting.controller.DefaultHeritrixLauncher.

New settings in the wayback module

A lot of new settings has appeared, that configures hibernation, and its database connection manager used by hibernate (c3p0). Documentation about the c3p0 settings and c3p0 in general can be found here: About configuring c3p0: http://www.mchange.com/projects/c3p0/index.html#configuration_properties <br>About c3p0 in general: http://www.mchange.com/projects/c3p0/index.html

settings.wayback.hibernate.c3p0.acquire_increment: Determines how many connections at a time c3p0 will try to acquire when the pool is exhausted. Defines the value of hibernate configuration key "hibernate.c3p0.acquire_increment" which is the same as the c3p0-native property name "c3p0.acquireIncrement". The default is 1.

settings.wayback.hibernate.c3p0.idle_test_period: If this is a number greater than 0, c3p0 will test all idle, pooled but unchecked-out connections, every this number of seconds. Defines the value of hibernate configuration key "hibernate.c3p0.idle_test_period" which is the same as the c3p0-native property name "c3p0.idleConnectionTestPeriod". The default is 100.

settings.wayback.hibernate.c3p0.max_size: Maximum number of Connections a pool will maintain at any given time. Defines the value of hibernate configuration key "hibernate.c3p0.max_size" which is the same as the c3p0-native property name "c3p0.maxPoolSize". The default is 100.

settings.wayback.hibernate.c3p0.max_statements: The size of c3p0's global PreparedStatement cache. Defines the value of hibernate configuration key "hibernate.c3p0.max_statements" which is the same as the c3p0-native property name "c3p0.maxStatements". The default is 100.

settings.wayback.hibernate.c3p0.min_size: Minimum number of Connections a pool will maintain at any given time. Defines the value of hibernate configuration key "hibernate.c3p0.min_size" which is the same as the c3p0-native property name "c3p0.minPoolSize". The default is 10.

settings.wayback.hibernate.c3p0.timeout Defines the value of hibernate configuration key "hibernate.c3p0.timeout" which is the same as the c3p0-native property name "c3p0.maxIdleTime". The default is 100.

settings.wayback.hibernate.connection_url: The hibernate connection url. The default is "jdbc:derby:derbyDB/wayback_indexer_db;create=true"

settings.wayback.hibernate.db_driver_class: The hibernate client driver class. The default is "org.apache.derby.jdbc.ClientDriver"

settings.wayback.hibernate.use_reflection_optimizer: Look in the hibernation documentation for its meaning. The default is "false".

settings.wayback.hibernate.transaction_factory: Look in the hibernation documentation for its meaning. The default is org.hibernate.transaction.JDBCTransactionFactory.

settings.wayback.hibernate.dialect: Look in the hibernation documentation for its meaning. The default is "org.hibernate.dialect.DerbyDialect"

settings.wayback.hibernate.show_sql: Look in the hibernation documentation for its meaning. The default is "true".

settings.wayback.hibernate.format_sql: Look in the hibernation documentation for its meaning. The default is "true".

settings.wayback.hibernate.hbm2ddl_auto: Look in the hibernation documentation for its meaning. The default is "update".

settings.wayback.hibernate.user: Look in the hibernation documentation for its meaning. The default is "".

settings.wayback.hibernate.password: Look in the hibernation documentation for its meaning. The default is "".

settings.wayback.indexer.replicaId: The replica to be used by the wayback indexer. Default value is "ONE".

settings.wayback.indexer.temp_batch_output_dir: The directory to which batch output is written during indexing. The default value is "tempdir".

settings.wayback.indexer.final_batch_output_dir: The directory to which batch output is moved after a batch indexing job is successfully completed. The default value is "batchOutputDir".

settings.wayback.indexer.maxFailedAttempts: The maximum number of times an archive file may generate a batch error during indexing before we give up on it. The default value is "3";

settings.wayback.indexer.producerDelay: The delay in milliseconds before the producer thread is started. The default value is "0";

settings.wayback.indexer.producerInterval: The interval, in milliseconds, between successive runs of the producer thread. The default value is "86400000";

settings.wayback.indexer.consumerThreads: The number of consumer threads to run. The default value is "5";

settings.wayback.indexer.initialFiles: A file containing a list of files which have been archived and therefore do not need to be archived again. This key may be unset. The default value is "";

settings.wayback.aggregator.index_file-input_dir: The directory the Aggregator consumes raw index files from The default value is "batchOutputDir";

settings.wayback.aggregator.index_file-output_dir: The directory the Aggregator places the Aggregated and sorted files into The default value is "indexDir";

settings.wayback.aggregator.temp_aggregator_dir: The directory used by the aggregator to store temporary files. The default value is "aggregator_tempdir";

settings.wayback.aggregator.aggregation_interval: The time to between each scheduled aggregation run (in miliseconds). The default value is "86400000";

settings.wayback.aggregator.max_intermediate_index_file_size: The maximum size of the Intermediate index file in MB. When this limit is reached a new index file is created and new indexes are added to this file. In case of a 0 value, the intermediate index file will always be merged into the main index file. The default value is "102400";

settings.wayback.aggregator.max_main_index_file_size: The maximum size of the main wayback index file in MB. When this limit is reached a new index file is created and new indexes are added to this file. The old index file will be rename to ${finalIndexFileSizeLimit}.1 The default value is "104857600";

New translation strings

common/Translation.properties

batchpage;Name.of.batchjob=Name of batchjob
batchpage;Description=Description
batchpage;Last.run=Last run
batchpage;Batchjob.has.never.been.run=Batchjob has never been run
batchpage;Size.of.output.file=Size of output file
batchpage;Number.of.lines.in.output.file=Number of lines in output file
batchpage;Size.of.error.file=Size of error file
batchpage;Number.of.lines.in.error.file=Number of lines in error file
batchpage;Choose.replica=Choose replica
batchpage;Regular.expression.for.filenames.(all.files)=Regular expression for file names (\".*\" = all files)
batchpage;Execute.batchjob=Execute batchjob
batchpage;Arguments=Arguments batchpage;Bad.argument.metadata.for.the.constructor=Bad argument metadata resources for the constructor ''{0}''
batchpage;Argument.i=Argument {0}
batchpage;Argument.i.missing.argument.metadata=Argument {0} (missing argument metadata)
batchpage;No.batchjobs.defined.in.settings=No batchjobs defined in settings
batchpage;Predefined.batchjobs=Predefined batchjobs
batchpage;Batchjob=Batchjob batchpage;No.output.file=No output file
batchpage;No.error.file=No error file
batchpage;Warning.0=Warning: {0}
batchpage;Which.files=Which files
batchpage;Job.ID=Job ID
batchpage;Metadata=Metadata
batchpage;Content=Content
batchpage;Both=Both
batchpage;No.outputfile=No outputfile
batchpage;No.errorfile=No errorfile
batchpage;Download.outputfile=Download outputfile
batchpage;Download.errorfile=Download errorfile
batchpage;No.valid.timestamp=No valid timestamp batchpage;bytes=bytes
batchpage;lines=lines batchpage;Started.date=Stated date
batchpage;Ended.date=Ended date batchpage;Output.file=Output file
batchpage;Error.file=Error file batchpage;Number.of.runs.0=Number of runs: {0}

harvester/Translations.properties

pagetitle;all.jobs.running=Running Jobs
status.job.filters.group1=Job status {0} Harvest name {1} Start date {2} End date {3}
status.job.filters.group2=Order {0} Display {1} rows per page.
status.job.filters.group3=Job status {0} Harvest name {1}
status.sort.order.job.reset=Reset
status.results.page=Displaying result page {0}.
status.results.displayed=Search results: {0}, displaying results {1} to {2}.
status.results.displayed.pagination={0} / {1}
status.results.displayed.nextPage=next
status.results.displayed.prevPage=previous
status.harvest.all=All
table.running.jobs.jobId=Job ID
table.running.jobs.harvestName=Harvest definition
table.running.jobs.host=Host table.running.jobs.progress=Progress
table.running.jobs.queuedFiles=Queued files
table.running.jobs.totalQueues=Queues
table.running.jobs.activeQueues=Active
table.running.jobs.retiredQueues=Retired
table.running.jobs.exhaustedQueues=Exhausted
table.running.jobs.elapsedTime=Elapsed time
table.running.jobs.alerts=Alerts
table.running.jobs.downloadedCount=Downloaded files
table.running.jobs.currentProcessedKBPerSec=KB/s
table.running.jobs.currentProcessedDocsPerSec=URL/s
table.running.jobs.queues=Queues
table.running.jobs.performance=Performance
table.running.jobs.toeThreads=Threads
table.running.jobs.status.preCrawl=Crawl in preparation
table.running.jobs.status.crawlerRunning=Crawler is running
table.running.jobs.status.crawlerPaused=Crawler is paused
table.running.jobs.status.crawlFinished=Crawl finished
table.running.jobs.legend={0} - crawl in preparation, {1} crawler is running, {2} - crawler is paused, {3} - crawl finished
running.jobs.finder.inputGroup=Find job harvesting domain {0}
running.jobs.finder.submit=Search
resubmit.jobs.submit=Resubmit selected failed jobs
errormsg;resubmit.jobs.selectionEmpty=Please select at least one job!
running.jobs.finder.table.jobId=Job ID

monitor/Translation.properties

viewerproxy/Translation.properties

pagetitle;qa.batchjob.overview=Batchjob Overview
pagetitle;qa.batchjob.retrieve.resultfile=BatchJob resultfile
pagetitle;qa.batchjob=Batchjob pagetitle;qa.batchjob.execute=Executing
batchjob pagetitle;qa.get.files=Get harvested files
pagetitle;qa.get.reports=Get harvest reports
pagetitle;qa.crawllog.lines.for.domain=Lines from crawl.log about domain
pagetitle;files.for.job.0=Files for job {0}
pagetitle;reports.for.job.1=Reports for job {0}
pagetitle;qa.crawllog.lines.for.domain.0.in.1=Lines from crawl.log of job {1} concerning domain {0}

Deleted translation strings

harvester/Translation.properties

archive/Translation.properties:

Version History

Version 3.12.0

2010-05-03

New Bitpreservation infrastructure, and upgrade of Apache Derby to version 10.5.3.0

Version 3.11.*

Development versions aiming for 3.12.0

Version 3.10.0

2009-11-16

New deploy application; JMX stability issues fixed; JMS stability issues also fixed

Version 3.9.*

Development versions aiming for 3.10.0

Version 3.8.2

2009-09-10

Fix an important index synchronization bug

Version 3.8.1

2009-07-15

Fix of important bug leading to unresponsive harvesters

Version 3.8.0

2009-05-23

Java 1.6, Heritrix 1.14.1, Derby 10.4.2.0, complete rewrite of settings, new supported deploy module, gui access to harvest logs

Version 3.7.0

2008-11-04

Develop version aiming for 3.8.0

Version 3.6.0

2008-07-03

Improvement of archive component with regard to security, batch, and preservation; greater JMS stability; important bug fixes

Version 3.5.*

Develop versions aiming for 3.6.0

Version 3.4.2

2008-03-14

Bug fix release, fixing JMX timeout

Version 3.4.1

2008-01-16

Bug fix release, fixing out of memory on very large indexes

Version 3.4.0

2008-01-03

Separation of Heritrix, work on developing our open source platform, two-part TLDs like co.uk, and lots of bugfixes

Version 3.3.*

Develop versions aiming for 3.4.0

Version 3.2.3

2007-09-27

Bugfix of 3.2.2 with patched deduplicator, that fixes problem in parallel indexing

Version 3.2.2

2007-08-03

Bugfix of 3.2.1 with patched Heritrix 1.12.1, that supports ARCRecords larger than 2GBs

Version 3.2.1

2007-07-04

Bugfix of 3.2.0 fixing trouble using the quick start manual.

Version 3.2.0

2007-07-04

Open source release

Version 3.1.*

Development versions. Version 3.1.7 was kindly reviewed by Internet Archive and the Norwegian national library.

Version 3.0.0

2007-02-02

Marked the naming of the NetarchiveSuite, the splitting of NetarchiveSuite into independent modules, and the licensing of NetarchiveSuite under LGPL

Version 2.*

Various features and updates

Version 2.0

2006-08-30

Marked a general restructuring of the code, where harvest definition data was backed by a database, the viewerproxy was trimmed and rewritten.

Version 1.*

Various features and updates

Version 1.0

2005-07-01

The first version of the netarchive| software put in production for harvesting the entire Danish web

Version 0.*

Various pre-production development versions

ReleaseNotes3_13_0 (last edited 2010-08-16 10:24:53 by localhost)