Differences between revisions 3 and 4
Revision 3 as of 2011-05-04 13:28:42
Size: 2926
Comment:
Revision 4 as of 2011-05-04 13:40:45
Size: 3243
Comment:
Deletions are marked like this. Additions are marked like this.
Line 38: Line 38:
Line 40: Line 39:
This tools enables you to create, download, update and show the existing templates. This tools enables you to create (create command), download (download command), update (update command) and show (showall command) the existing templates.
Line 43: Line 42:
You need to point to a settings file with connection information for your harvest database.
In a standard NAS deployment, use the INSTALLDIR/conf/settings_GUIApplication.xml
You need to point to a settings file with connection information for your harvest database. In a standard NAS deployment, use the INSTALLDIR/conf/settings_GUIApplication.xml
Line 48: Line 46:
java dk.netarkivet.harvester.tools.HarvestTemplateApplication <command> <args>
create <template-name> <xml-file for this template>
download [<template-name>]
update <template-name> <xml-file to replace this template>
showall
export INSTALLDIR=/home/test/netarchive
export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.harvester.jar
java -Ddk.netarkivet.settings.file=$INSTALLDIR/conf/settings_GUIApplication.xml dk.netarkivet.harvester.tools.HarvestTemplateApplication <command> <args>
Line 54: Line 50:

The different <command> <args> possibilities:

  1. create <template-name> <xml-file for this template>
  1. download [<template-name>]
  1. update <template-name> <xml-file to replace this template>
  1. showall

Tools in the Harvester Module

edit

dk.netarkivet.tools.harvester.CreateCDXMetadataFile (deprecated)

Given a specific jobID (e.g. 42), this tool can be used to create a metadata-1.arc containing the CDX-entries for all arc-files belonging to that job.

prequisites and arguments

You need to specify the repositoryclient used for accessing your archived-data. If you use the default client JMSArcRepositoryClient you also need to specify the archive replica you will use (defined by setting "settings.common.useReplicaId"), the environmentname, the applicationName, the applicationInstanceId. These can all be defined on the commandline as overrides to the default values, or defined in a local settings.xml file.

Needed jarfiles in the classpath: dk.netarkivet.harvester.jar, dk.netarkivet.archive.jar (if using default repositoryclient)

The tool only has one argument, the jobID

Sample usage of this tool

export INSTALLDIR=/home/test/netarchive
export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.harvester.jar:$INSTALLDIR/lib/dk.netarkivet.archive.jar
java -Ddk.netarkivet.settings.file=localsettings.xml dk.netarkivet.harvester.tools.CreateCDXMetadataFile 42

dk.netarkivet.harvester.tools.CreateLogsMetadataFile (deprecated)

In the beginning, the metadata-1.arc files did not include the Heritrix logs. This tool was made to allow us to make a metadata-2.arc file that contains the heritrix logs associated with a given job.

Consider this tool deprecated. For further information see the javadoc for this method. Note that settings file mentioned below need to contain proper values for the harvesting metadata settings:

            <metadata>
                <heritrixFilePattern>.*(\.xml|\.txt|\.log|\.out)</heritrixFilePattern>
                <reportFilePattern>.*-report.txt</reportFilePattern>
                <logFilePattern>.*(\.log|\.out)</logFilePattern>
            </metadata>

Sample usage of this tool

export INSTALLDIR=/home/test/netarchive
export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.harvester.jar
java -Ddk.netarkivet.settings.file=localsettings.xml dk.netarkivet.harvester.tools.CreateLogsMetadataFile jobid-harvestid.txt jobsdir

dk.netarkivet.harvester.tools.HarvestTemplateApplication

This tools enables you to create (create command), download (download command), update (update command) and show (showall command) the existing templates.

prequisites and arguments

You need to point to a settings file with connection information for your harvest database. In a standard NAS deployment, use the INSTALLDIR/conf/settings_GUIApplication.xml

Sample usage of this tool

export INSTALLDIR=/home/test/netarchive
export CLASSPATH=$INSTALLDIR/lib/dk.netarkivet.harvester.jar
java -Ddk.netarkivet.settings.file=$INSTALLDIR/conf/settings_GUIApplication.xml dk.netarkivet.harvester.tools.HarvestTemplateApplication <command> <args>

The different <command> <args> possibilities:

  1. create <template-name> <xml-file for this template>

  2. download [<template-name>]

  3. update <template-name> <xml-file to replace this template>

  4. showall

Additional Tools Manual 3.16/Tools in Harvester Module (last edited 2011-05-04 14:06:03 by SoerenCarlsen)