Differences between revisions 36 and 37
Revision 36 as of 2007-06-28 16:55:56
Size: 9944
Comment:
Revision 37 as of 2007-06-28 16:59:12
Size: 588
Comment:
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
[[Include(Installation Manual/Start and stop order)]]
[[Include(Installation Manual/Monitoring)]]
Line 15: Line 16:

== Other configurations ==

=== Select a file datatransfer method ===
As mentioned in Appendix C, you can choose between FTP or HTTP as the filetransfer method.
Correct: Both methods try to use to simple filesystem copying, whenever possible to optimize the filetransfer.
The FTP method requires one or more FTP-servers installed. (Se Appendix A for further details)
The xml-below is a extract of a settings.xml, which you have to replace serverName, userName, userPassword with proper values.
[SKALRETTES] If you want to use more than one FTP-server, you must use different settings-files, or define the values
when starting the applications on the commandline.

{{{
<remoteFile xsi:type="ftpremotefile">
            <!-- The class to use for RemoteFile objects. -->
            <class>dk.netarkivet.common.distribute.FTPRemoteFile</class>
            <!-- The default FTP-server used -->
            <serverName>hostname</serverName>
            <!-- The default FTP-server port used -->
            <serverPort>21</serverPort>
            <!-- The default FTP username -->
            <userName>exampleusername</userName>
            <!-- The default FTP password -->
            <userPassword>examplepassword</userPassword>
            <!-- The number of times FTPRemoteFile should try before giving up
                 a copyTo operation. We augment FTP with checksum checks. -->
            <retries>3</retries>
        </remoteFile>
}}}
Using HTTP as filetransfer method, you need to reserve a HTTP port on each machine pr. application for this usage.
Note: The easiest way to set this port on application level is to set it on the commandline:
{{{ -Dsettings.common.remoteFile.port=5442 }}}

See xml-below for the proper syntax:
{{{
 <remoteFile xsi:type="httpremotefile">
            <!-- The class to use for RemoteFile objects. -->
            <class>dk.netarkivet.common.distribute.HTTPRemoteFile</class>
            <!-- Port for embedded HTTP server -->
            <port>5442</port>
        </remoteFile>
}}}

=== Configure scheduling (schedule interval) ===
By default the scheduling takes place every minut, unless the previous scheduling is not finished yet.

=== Configure job-generation and harvesting ===


== The actual deployment of the software ==

We assume, that all machines in the chosen setup are unix-servers. The procedure below may not work on other platforms.
After creating the new settings (possibly more than one: One for the HarvestDefinitionApplication,..) to be used in the deployment of the software, copy the modified NetarchiveSuite.zip to all machines taking part in the deployment
{{{
export USER=test
export MACHINES="machine1.domain1, machine2.domain1, .. machine1.domain2, machine2.domain2"
scp NetarchiveSuite.zip "to every machine in MACHINES"
}}}

=== Standard commandline settings ===

==== The CLASSPATH ====
The CLASSPATH needed to start and run the java applications in NetarchiveSuite consists of 4 jarfiles,
dk.netarkivet.harvester.jar, dk.netarkivet.archive.jar, dk.netarkivet.viewerproxy.jar, and dk.netarkivet.monitor.jar
The dk.netarkivet.common.jar and all our 3rd party dependencies need not be added explicitly to the CLASSPATH, as they
are referenced indirectly in the jar-files.
 
{{{
export NetarchiveSuiteDir=/path/to/netarchiveSuite
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.harvester.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
}}}


==== logging ====
We use the apache.commons.logging.framework, so we both need to point to the wanted logger-class (org.apache.commons.logging.impl.Jdk14Logger),
and the logging configuration file. You may want to use different properties for different applications, especially when more that one application logs to the same logging directory. E.g. you want the change line {{{ java.util.logging.FileHandler.pattern=./log/APPID%u.log }}}
to something different.

{{{
export LOG_SETTINGS=-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=$NetarchiveSuiteDir/conf/log.prop
}}}

==== JMX settings ====
Each application has its own JMX- and RMI port. Her the JMX port is 8100 and the associated RMI port 8200.
JMX also uses a password-file, which is the same throughout the installation.

{{{
export JMX_SETTINGS =-Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200 -Dsettings.common.jmx.passwordFile=$NetarchiveSuiteDir/conf/jmxremote.password
}}}

==== Select the appropiate settings.file for the application ====
The conf/settings.xml (the new one configured to your environment) is probably OK for most applications.
But you may need to use special purpose settings-files for some applications, e.g. BitarchiveApplications (you can't allocate one than one fileDir on the commandline).
{{{
export SETTING=-Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings.xml
}}}

==== JVM options ====
We need to set the maximum Java heap size to 1.5 Gbytes. You may to change that or add other JVM options.
{{{
export JAVA_OPTS=-Xmx1536m
}}}

=== admin machine ===
On the admin machine, we have to deploy the following applications

 * HarvestdefinitionApplication (Starts the GUI and the scheduler):
      
      Running the following will start the HarvestDefinitionApplication:
      {{{
        cd $NetarchiveSuiteDir
        export APP_OPTIONS=
        export APP=dk.netarkivet.harvester.webinterface.HarvestDefinitionApplication
 java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP
      }}}

   * BitarchiveMonitorApplication (two, one for each Location, here called "LocationA", and "LocationB".
     Two distinguish the two monitors from each other, we need to define the settings.common.http.port, which is used as a identifier.
     Associated with the bitarchive at each Location, there is a associated credentialscode in plain text.
 
 Start the monitor for bitarchive at 'LocationA' using "8081" as identifier thus:
 {{{

 cd $NetarchiveSuiteDir
        export APP_OPTIONS=-Dsettings.common.archive.bitarchive.thisLocation=LocationA
                           -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_A
                           -Dsettings.common.http.port=8081
        export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication
 java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP
        }}}
     
       Start the monitor for archive at 'LocationB' using "8082" as identifier thus:
 {{{
 cd $NetarchiveSuiteDir
        export APP_OPTIONS=-Dsettings.common.archive.bitarchive.thisLocation=LocationAB
                           -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_B
                           -Dsettings.common.http.port=8082
        export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication
 java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP
        }}}

   * one ARCRepository (this takes care of all access to the bitarchives.
     {{{
         export CLASSPATH=..
         ex
         dk.netarkivet.archive.arcrepository.ArcRepositoryApplication













=== harvester machines ===

On each harvestermachine, we have one or more HarvesterControllerApplications.
Each HarvesterControllerApplication have their own sidekick application checking on whether or not
the application is alive. If not, it restarts the HarvesterControllerApplication.

Settings related to the HarvesterControllerApplication is

  -- setting.common.http.port (to distinguish between HarvesterControllerApplications running on same server)
  -- settings.harvester.harvesting.queuePriority (to select which of two queues to accept jobs from: HIGH_PRIORITY (jobs part of a selective harvest), or LOW_PRIORITY (jobs part of a snapshotharvest)
     
=== bitarchive machines ===
(X + Y machines located on either 'location' L1 or L2
One bitarchiveServer for each machine (?).
Each bitarchiveServer -- can have storage on several filesystems
{{{
<fileDir>/home/bitarchiveOne/</fileDir>
<fileDir>/home/bitarchiveTwo/</fileDir>
}}}

{{{
#!/bin/bash
export CLASSPATH=/home/dev/UNITTEST/lib/dk.netarkivet.archive.jar:/home/dev/UNITTEST/lib/dk.netarkivet.viewerproxy.jar:/home/dev/UNITTEST/lib/dk.netarkivet.monitor.jar:$CLASSPATH;
cd /home/dev/UNITTEST
java -Xmx1536m -Ddk.netarkivet.settings.file=/home/dev/UNITTEST/conf/settings.xml -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=/home/dev/UNITTEST/conf/log_bitarchiveapplication.prop -Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200 -Dsettings.common.jmx.passwordFile=/home/dev/UNITTEST/conf/jmxremote.password dk.netarkivet.archive.bitarchive.BitarchiveApplication < /dev/null > start_bitarchive.sh.log 2>&1 &
}}}



=== Access servers ===
(one or more access servers both 'locations')

   * IndexServerApplication






   * ViewerProxyApplication





wiki:Appendices

Installation Manual (last edited 2010-08-16 10:24:51 by localhost)