Differences between revisions 12 and 13
Revision 12 as of 2007-06-28 13:26:38
Size: 11581
Comment:
Revision 13 as of 2007-06-28 13:52:33
Size: 17325
Comment:
Deletions are marked like this. Additions are marked like this.
Line 177: Line 177:
=== Standard commandline settings ===
The CLASSPATH needed to start and run the java applications in NetarchiveSuite consists of 4 jarfiles,
dk.netarkivet.harvester.jar, dk.netarkivet.archive.jar, dk.netarkivet.viewerproxy.jar, and dk.netarkivet.monitor.jar
The dk.netarkivet.common.jar and all our 3rd party dependencies need not be added explicitly to the CLASSPATH, as they
are referenced indirectly in the jar-files.
 
{{{
export NetarchiveSuiteDir=/path/to/netarchiveSuite
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.harvester.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
}}}



Line 178: Line 195:
On the admin machine, On the admin machine, we have to deploy the following applications

 * HarvestdefinitionApplication (Starts the GUI and the scheduler):
      
      Running the following will start the HarvestDefinitionApplication:
      {{{
        export NetarchiveSuiteDir=/path/to/netarchiveSuite
 export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.harvester.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
 
        cd $NetarchiveSuiteDir
 java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings.xml
  -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger
  -Djava.util.logging.config.file=$NetarchiveSuiteDir/conf/log_harvestdefinitionapplication.prop
  -Dsettings.common.jmx.port=8100
  -Dsettings.common.jmx.rmiPort=8200
  -Dsettings.common.jmx.passwordFile=$NetarchiveSuiteDir/conf/jmxremote.password
 dk.netarkivet.harvester.webinterface.HarvestDefinitionApplication
      }}}

   * BitarchiveMonitorApplication (two, one for each Location, here called "LocationA", and "LocationB".
     Two distinguish the two monitors from each other, we need to define the settings.common.http.port, which is used as a identifier.
     
     Associated with the bitarchive at each Location, there is a associated credentialscode in plain text.
 
 Start the monitor for bitarchive at 'LocationA' thus:
 {{{
 export NetarchiveSuiteDir=/path/to/netarchiveSuite
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
 cd $NetarchiveSuiteDir
 java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings_bamonitor_kb.xml
  -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger
  -Djava.util.logging.config.file=$NetarchiveSuiteDir/conf/log_bitarchivemonitorapplication.prop
                -Dsettings.common.http.port=8081
                -Dsettings.common.archive.bitarchive.thisLocation=LocationA
                -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_A
  -Dsettings.common.jmx.port=8102
  -Dsettings.common.jmx.rmiPort=8202
  -Dsettings.common.jmx.passwordFile=$NetarchiveSuiteDir/conf/jmxremote.password
  dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication
        }}}
     
       Start the monitor for archive at 'LocationB' thus:
 {{{
 export NetarchiveSuiteDir=/path/to/netarchiveSuite
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
        export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
 cd $NetarchiveSuiteDir
 java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings_bamonitor_kb.xml
   -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger
                ##
  -Djava.util.logging.config.file=$NetarchiveSuiteDir/log_bitarchivemonitorapplication.prop
         ## special application settings
                -Dsettings.common.http.port=8082
                -Dsettings.common.archive.bitarchive.thisLocation=LocationB
                -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_B
         ## JMX-settings
          -Dsettings.common.jmx.port=8103
  -Dsettings.common.jmx.rmiPort=8203
  -Dsettings.common.jmx.passwordFile=/home/dev/UNITTEST/conf/jmxremote.password
  dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication
        }}}

   * one ARCRepository (this takes care of all access to the bitarchives.
     {{{
         export CLASSPATH=..
         ex
         dk.netarkivet.archive.arcrepository.ArcRepositoryApplication










Line 201: Line 301:
{{{
#!/bin/bash
export CLASSPATH=/home/dev/UNITTEST/lib/dk.netarkivet.archive.jar:/home/dev/UNITTEST/lib/dk.netarkivet.viewerproxy.jar:/home/dev/UNITTEST/lib/dk.netarkivet.monitor.jar:$CLASSPATH;
cd /home/dev/UNITTEST
java -Xmx1536m -Ddk.netarkivet.settings.file=/home/dev/UNITTEST/conf/settings.xml -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=/home/dev/UNITTEST/conf/log_bitarchiveapplication.prop -Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200 -Dsettings.common.jmx.passwordFile=/home/dev/UNITTEST/conf/jmxremote.password dk.netarkivet.archive.bitarchive.BitarchiveApplication < /dev/null > start_bitarchive.sh.log 2>&1 &
}}}


Installing the NetarchiveSuite software

This manual describes how to install and configure the NetarchiveSuite web archive software package. It includes description of how to obtain and install required libraries, how to install the software on separate machines, what command line options and configuration file changes are necessary, and how to start the programs. It then goes on to explain the other parameters available for tuning the behaviour of NetarchiveSuite. It does not explain how to extend the functionality of the system (see the DeveloperManual for this) or how to use the running system (see the UserManual for this).

The intended audience of this manual is system administrators who will be responsible for the actual installation and setup of NetarchiveSuite as well as technical personnel responsible for proper operation of NetarchiveSuite. Knowledge of Unix system administration is expected, and some familiarity with XML and Java is an advantage.

Choose a setup

NetarchiveSuite can be installed in a number of different ways, with varying numbers of machines on different sites. To keep clear what is necessary for which setups, we will consider the following types of setup:

  • A. Single-machine setup. This corresponds to the setup used in the QuickstartManual, where all applications run on the same machine, and file transfer can be done simple by copying files locally. It is the simplest setup, but does not scale very well. Note that the scripts used in QuickstartManual resets the system at every restart, including deleting all harvested material. Obviously, this is not the intent for a running installation, so those scripts cannot be used in production environments as they are.

  • B. Single-site setup. In this scenario, multiple machines are involved, necessitating file transfer between machines and multiple installations of the code. However, the machines are expected to be within the same firewall, so port setup should be no problem.
  • C. Single-site setup with duplicate archive. This expands on the single-site setup in that more than one copy of the archives are used, using the concept of separate "locations" to indicate the duplicates.
  • D. Multi-site setup. When more than one site is involved, separated by firewalls, extra issues of opening ports and specifying the correct site come into play. This is the most complex scenario, but also the more secure against systematic errors, hacking, and other disasters.

Setups C, and D involves having a distributed bitarchive. In these setups we have the the bitarchive distributed on two Locations, here called LocationA, and LocationB. These Locations must be written to the general settings.xml before deployment:

<arcrepository>
      ...
      <!-- The names of all bit archive locations in the
                 environment, e.g., "LocationA" and "LocationB". -->
      <location>
        <name>LocationA</name>
      </location>
      <location>
        <name>LocationB</name>
      </location>
      <!-- Default bit archive to use for batch jobs (if none is specified) -->
      <batchLocation>LocationA</batchLocation>
</arcrepository>

Choose JMS broker

The NetarchiveSuite requires the use of a JMS broker. The installation and startup of a JMSbroker is described in Appendix A. In the below extract of conf/settings.xml, the JMSbroker resides at machine1.domain, and listens for messages on port 7676. You must also select a JMS environmentName. This allows you have more than one running installation of the NetarchiveSuite, each with its own environmentName, and makes it easy to cleanup the JMS queues associated with a given environmentName. The NetarchiveSuite currently only supports one kind of JMS broker, so only the 'broker','port', and 'environmentName' can be changed.

<jms>
    <!-- Selects the broker vendor to be used. -->
            <class>SunMQ</class>
            <!-- The JMS broker host contacted by the JMS connection -->
            <broker>localhost</broker>
            <!-- The port the JMS connection should use -->
            <port>7676</port>
            <!-- The name of the environment in which this code is running, e.g.
                 PROD, RELEASETEST, NHC,... Common prefix to all JMS channels
                  -->
            <environmentName>PROD</environmentName>
        </jms>

Choose the set of machines taking part in the installation/deployment

When you have chosen your setup, you must decide on the number of machines, you want to use in the deployment of the NetarchiveSuite. For setup A, the answer is of course one. For the setup B-D, the answer is more complicated.

The NetarchiveSuite operates with 4 kinds of machines:

In the standard setup used in our test-environment, we have 9 machines:

1 bitarchive server (on Location A)
2 bitarchive servers (on Location B)

1 admin machine (placed on Location A)
2 harvester-machines (placed on Location A)
2 harvester-machines (placed on Location B)

1 access server (placed on Location A)

Configure monitoring (allocating JMX and RMI ports)

Monitoring the deployed NetarchiveSuite relies on JMX (Java Management Extensions). Each application in the NetarchiveSuite needs its own JMX-port and associated RMI-port, so they can be monitored from the NetarchiveSuite GUI, and using jconsole (see below). You need to select a range for the JMX-ports. In the example below, the chosen JMX/RMI-range begins at 8100. Note: The RMI-ports for a certain JMX-port are assumed to be JMX-port-number + 100 Firewall Note: This requires that the admin-machine has access to each machine taking part in the deployment on ports 8100-8300.

You need to select a password for the JMX monitorRole, and replace the string "JMX_MONITOR_ROLE_PASSWORD_PLACEHOLDER" with the selected password in two files: the conf/jmxremote.password, and the settings file used When starting the application we define the path to the jmx passwordfile on the commandline:

  • -Dsettings.common.jmx.passwordFile=INSTALLATION_DIR/conf/jmxremote.password

The JMX-ports are registered in the settings.xml used by the HarvestDefinitionApplication (GUI/Scheduler) in the deploy section of the settings.xml file:

<deploy>
<jmxMonitorRolePassword>SELECTED_PASSWORD</jmxMonitorRolePassword>
<numberOfHosts>NUMBER_OF_MACHINES_INVOLVED</numberOfHosts>
<host1>
<name>MACHINE_1</name>
<jmxport>8100</jmxport>
<jmxport>8101</jmxport>
</host1>
...
<hostX>
<name>MACHINE_X</name>
<jmxport>8100</jmxport>
<jmxport>8101</jmxport>
</hostX>
</deploy>

Other configurations

Select a file datatransfer method

As mentioned in Appendix C, you can choose between FTP or HTTP as the filetransfer method. Correct: Both methods try to use to simple filesystem copying, whenever possible to optimize the filetransfer. The FTP method requires one or more FTP-servers installed. (Se Appendix A for further details) The xml-below is a extract of a settings.xml, which you have to replace serverName, userName, userPassword with proper values. [SKALRETTES] If you want to use more than one FTP-server, you must use different settings-files, or define the values when starting the applications on the commandline.

<remoteFile xsi:type="ftpremotefile">
            <!-- The class to use for RemoteFile objects. -->
            <class>dk.netarkivet.common.distribute.FTPRemoteFile</class>
            <!-- The default FTP-server used -->
            <serverName>hostname</serverName>
            <!-- The default FTP-server port used -->
            <serverPort>21</serverPort>
            <!-- The default FTP username -->
            <userName>exampleusername</userName>
            <!-- The default FTP password -->
            <userPassword>examplepassword</userPassword>
            <!-- The number of times FTPRemoteFile should try before giving up
                 a copyTo operation. We augment FTP with checksum checks. -->
            <retries>3</retries>
        </remoteFile>

Using HTTP as filetransfer method, you need to reserve a HTTP port on each machine pr. application for this usage. Note: The easiest way to set this port on application level is to set it on the commandline:  -Dsettings.common.remoteFile.port=5442 

See xml-below for the proper syntax:

 <remoteFile xsi:type="httpremotefile">
            <!-- The class to use for RemoteFile objects. -->
            <class>dk.netarkivet.common.distribute.HTTPRemoteFile</class>
            <!-- Port for embedded HTTP server -->
            <port>5442</port>
        </remoteFile>

Configure scheduling (schedule interval)

By default the scheduling takes place every minut, unless the previous scheduling is not finished yet.

Configure job-generation and harvesting

The actual deployment of the software

We assume, that all machines in the chosen setup are unix-servers. The procedure below may not work on other platforms. After creating the new settings (possibly more than one: One for the HarvestDefinitionApplication,..) to be used in the deployment of the software, copy the modified NetarchiveSuite.zip to all machines taking part in the deployment

export USER=test
export MACHINES="machine1.domain1, machine2.domain1, .. machine1.domain2, machine2.domain2"
scp NetarchiveSuite.zip "to every machine in MACHINES"

=== Standard commandline settings === The CLASSPATH needed to start and run the java applications in NetarchiveSuite consists of 4 jarfiles, dk.netarkivet.harvester.jar, dk.netarkivet.archive.jar, dk.netarkivet.viewerproxy.jar, and dk.netarkivet.monitor.jar The dk.netarkivet.common.jar and all our 3rd party dependencies need not be added explicitly to the CLASSPATH, as they are referenced indirectly in the jar-files.

export NetarchiveSuiteDir=/path/to/netarchiveSuite
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.harvester.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar

admin machine

On the admin machine, we have to deploy the following applications

  • HarvestdefinitionApplication (Starts the GUI and the scheduler):

    • Running the following will start the HarvestDefinitionApplication:

              export NetarchiveSuiteDir=/path/to/netarchiveSuite
              export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.harvester.jar
              export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
              export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
              export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
              
              cd $NetarchiveSuiteDir
              java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings.xml
                      -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger  
                      -Djava.util.logging.config.file=$NetarchiveSuiteDir/conf/log_harvestdefinitionapplication.prop 
                      -Dsettings.common.jmx.port=8100 
                      -Dsettings.common.jmx.rmiPort=8200 
                      -Dsettings.common.jmx.passwordFile=$NetarchiveSuiteDir/conf/jmxremote.password  
              dk.netarkivet.harvester.webinterface.HarvestDefinitionApplication
    • BitarchiveMonitorApplication (two, one for each Location, here called "LocationA", and "LocationB".

      • Two distinguish the two monitors from each other, we need to define the settings.common.http.port, which is used as a identifier. Associated with the bitarchive at each Location, there is a associated credentialscode in plain text.
        • Start the monitor for bitarchive at 'LocationA' thus:
                  export NetarchiveSuiteDir=/path/to/netarchiveSuite
                  export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
                  export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
                  export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
                  cd $NetarchiveSuiteDir  
                  java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings_bamonitor_kb.xml 
                          -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger 
                          -Djava.util.logging.config.file=$NetarchiveSuiteDir/conf/log_bitarchivemonitorapplication.prop
                          -Dsettings.common.http.port=8081
                          -Dsettings.common.archive.bitarchive.thisLocation=LocationA
                          -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_A
                          -Dsettings.common.jmx.port=8102
                          -Dsettings.common.jmx.rmiPort=8202
                          -Dsettings.common.jmx.passwordFile=$NetarchiveSuiteDir/conf/jmxremote.password 
                          dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication 
        • Start the monitor for archive at 'LocationB' thus:
          •         export NetarchiveSuiteDir=/path/to/netarchiveSuite
                    export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.archive.jar
                    export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.viewerproxy.jar
                    export CLASSPATH=$CLASSPATH:$NetarchiveSuiteDir/lib/dk.netarkivet.monitor.jar
                    cd $NetarchiveSuiteDir  
                    java -Xmx1536m -Ddk.netarkivet.settings.file=$NetarchiveSuiteDir/conf/settings_bamonitor_kb.xml
                             -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger 
                            ##     
                            -Djava.util.logging.config.file=$NetarchiveSuiteDir/log_bitarchivemonitorapplication.prop
                            ## special application settings                     
                            -Dsettings.common.http.port=8082
                            -Dsettings.common.archive.bitarchive.thisLocation=LocationB
                            -Dsettings.common.archive.bitarchive.thisCredentials=SELECTED_CREDENTIALS_CODE_FOR_LOCATION_B
                            ## JMX-settings 
                            -Dsettings.common.jmx.port=8103
                            -Dsettings.common.jmx.rmiPort=8203
                            -Dsettings.common.jmx.passwordFile=/home/dev/UNITTEST/conf/jmxremote.password 
                            dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication 
    • one ARCRepository (this takes care of all access to the bitarchives.
      •          export CLASSPATH=..
                 ex
                 dk.netarkivet.archive.arcrepository.ArcRepositoryApplication
        
        
        
        
        
        
        
        
        
        
        
        
        
        === harvester machines ===
        
        On each harvestermachine, we have one or more HarvesterControllerApplications. 
        Each HarvesterControllerApplication have their own sidekick application checking on whether or not
        the application is alive. If not, it restarts the HarvesterControllerApplication.
        
        Settings related to the HarvesterControllerApplication is 
        
          -- setting.common.http.port (to distinguish between HarvesterControllerApplications running on same server)
          -- settings.harvester.harvesting.queuePriority (to select which of two queues to accept jobs from: HIGH_PRIORITY (jobs part of a selective harvest), or LOW_PRIORITY (jobs part of a snapshotharvest)
             
        === bitarchive machines ===
        (X + Y machines located on either 'location' L1 or L2
        One bitarchiveServer for each machine (?).
        Each bitarchiveServer -- can have storage on several filesystems
        {{{
        <fileDir>/home/bitarchiveOne/</fileDir>
        <fileDir>/home/bitarchiveTwo/</fileDir>

export CLASSPATH=/home/dev/UNITTEST/lib/dk.netarkivet.archive.jar:/home/dev/UNITTEST/lib/dk.netarkivet.viewerproxy.jar:/home/dev/UNITTEST/lib/dk.netarkivet.monitor.jar:$CLASSPATH;
cd /home/dev/UNITTEST
java -Xmx1536m -Ddk.netarkivet.settings.file=/home/dev/UNITTEST/conf/settings.xml -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=/home/dev/UNITTEST/conf/log_bitarchiveapplication.prop -Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200 -Dsettings.common.jmx.passwordFile=/home/dev/UNITTEST/conf/jmxremote.password dk.netarkivet.archive.bitarchive.BitarchiveApplication < /dev/null > start_bitarchive.sh.log 2>&1 &

Access servers

(one or more access servers both 'locations')

<hr> [[wiki:InstallationManualAppendices]

Installation Manual (last edited 2010-08-16 10:24:51 by localhost)