Differences between revisions 3 and 4
Revision 3 as of 2010-08-16 10:24:33
Size: 4324
Editor: localhost
Comment: converted to 1.6 markup
Revision 4 as of 2010-08-24 08:31:16
Size: 1101
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
The Wayback installation under !NetarchiveSuite is only tested on a pc installed with linux and in !ProxyReplay mode. Other modes should work, but no guaranties are given. This section describes the configuration of the applications responsible for the continuous indexing of files in a !NetarchiveSuite arcrepository. In addition, there is a plugin which enables an arcrepository to be accessed by an instance of wayback. This is described in the Additional Tools Manual, along with various batch jobs which may be of use to anyone wishing to index an arcrepository without using the applications described here.
Line 6: Line 6:
To be able to use the Wayback batchjobs all the Bitarchiveapplications must have the dk.netarkivet.wayback.jar file in their classpath. In the deploy configuration the following should be added: === Basic Concepts for the Indexer/Aggregator ===
Line 8: Line 8:
''<deployClassPath>lib/dk.netarkivet.wayback.jar</deployClassPath>'' There are two applications responsible for indexing an arcrepository. The {{{WaybackIndexerApplication}}} checks a repository for any new files and issues batch jobs to index each new file individually. These unsorted index files are deposited in a local folder. The {{{AggregatorApplication}}} sorts and merges these index files and then merges the result into the existing index files being used by your wayback instance. These applications may be configured and deployed using the !NetarchiveSuite Deploy Tool.
Line 11: Line 11:
=== Requirements ===
The following applications should be running and reachable from the machine running Tomcat with Wayback web application.

 1. JMS server.
 1. FTP server.
 1. Archive (eg. Standalone archive given in ./conf/wayback/standalone_archive.xml). The needed applications from !NetarchiveSuite is !BitarchiveApplication, !BitarchiveMonitorApplication, !ArcRepositoryApplication. The !NetarchveSuite version should be newer than 3.10.
This setup has been tested with Tomcat (6.0.20).

When configuring Wayback to work with !NetarchiveSuite, the above services is needed, furthermore it is needed to have a full source package of the !NetarchiveSuite and an installation of ''ant'', it has been tested with 1.7.1.

=== Configuration ===
The two configuration files that should be modified are located in ''./conf/wayback/'' in the !NetarchiveSuite full source package. The files are named ''CDXCollection.xml'' and ''wayback.xml''.

==== wayback.xml ====
In this config file there are multiple settings that should be changed to fit your setup, to make the system run correctly:

''wayback.basedir=/tmp/wayback'' - The web application should have read and write access to this directory.

The port should be specified in the following three lines, and be available (i.e. not yet already used by another application).

 * <bean name="8080:wayback" class="org.archive.wayback.webapp.!AccessPoint">
 * <property name="replayURIPrefix" value="http://localhost.archive.org:8080/wayback/"/>
 * <bean name="8090" parent="8080:wayback">
==== CDXCollection.xml ====
This configuration file describes where Wayback finds its CDX files (i.e indices of the ARC/WARC files).

In this file it should only be necessary to change the following path to point a local CDX collection.

''<value>/wayback/file.sorted.cdx</value>''

=== Compiling Tomcat target ===
This can be done from the !NetarchiveSuite root directory. By running the command ''ant -file wayback.build.xml warfile'', this produces a ROOT.war file in the !NetarchiveSuite root director, and this ROOT.war file should be copied to'' $TOMCAT_HOME/webapps/''.

Tomcat should furthermore have access to a settings.xml file, see below. This can be done by adding the following line to ''$TOMCAT_HOME/bin/catalina.sh'' just after the first line.

''CATALINA_OPTS='-Ddk.netarkivet.settings.file=$TOMCAT_HOME/webapps/ROOT/WEB-INF/settings.xml' ''

This setting file is a !NetarchiveSuite settings.xml file, and only includes the ''common'' and ''wayback'' sections.

The following settings should be modified to fit the local installation.

Change the following to match the FTP settings on the system.

{{{
        <remoteFile>
            <!-- TODO: See user documentation for NetarchiveSuite
            http://netarkivet.dk/suite/Documentation . -->
            <serverName>ftp.yourdomain.com</serverName>
            <userName>ftpuser</userName>
            <userPassword>ftppassword</userPassword>
        </remoteFile>
}}}
Update the following mail settings

{{{
        <mail>
            <server>mail.yourdomain.com</server>
        </mail>
        <notifications>
            <class>dk.netarkivet.common.utils.EMailNotifications</class>
            <sender>example@yourdomain.com</sender>
            <receiver>example@yourdomain.com</receiver>
        </notifications>
}}}
=== Described elsewhere ===
It is outside the scope of this configuration guide to describe how to harvest a ARC/WARC file. It is also outside the scope of this guide to describe how to get import an ARC/WARC collection into Wayback by way of CDX-entries for each object in the colletion.

Setting up !NetarchiveSuite archive is described elsewhere and a sample setup file is given in the !NetarchiveSuite source package.
=== WaybackIndexerApplication ===

Wayback Configuration

edit

This section describes the configuration of the applications responsible for the continuous indexing of files in a NetarchiveSuite arcrepository. In addition, there is a plugin which enables an arcrepository to be accessed by an instance of wayback. This is described in the Additional Tools Manual, along with various batch jobs which may be of use to anyone wishing to index an arcrepository without using the applications described here.

Basic Concepts for the Indexer/Aggregator

There are two applications responsible for indexing an arcrepository. The WaybackIndexerApplication checks a repository for any new files and issues batch jobs to index each new file individually. These unsorted index files are deposited in a local folder. The AggregatorApplication sorts and merges these index files and then merges the result into the existing index files being used by your wayback instance. These applications may be configured and deployed using the NetarchiveSuite Deploy Tool.

WaybackIndexerApplication

Configuration Manual 3.14/Wayback Configurations (last edited 2010-08-31 07:54:22 by ColinRosenthal)