The Deploy Configuration File

edit

The deploy configuration file contains the definitions for the installation and distribution of NetarchiveSuite. This involves the scopes for the levels in the figure below, and their settings.

This figure also shows the pattern of inheritance of the settings (physicalLocation inherits settings and parameters from deployGlobal, deployMachine inherits from physicalLocation, etc.).

layers.gif

These levels can have several instances of the levels below them.

Settings scope

The settings scope is described in the Configuration Manual for NetarchiveSuite. It is no longer required that every variable within the settings scope is explicitly defined for an application, since the undefined variables are replaced by the default settings, when the application is run.

Each level (in the figure at the begining of this section) inherits the settings from the level above it (until deployGlobal), though only the variables which is not explicitly defined at the current level. The content of the settings scope at the application level (level 4) is printed into an application specific settings file, which is used for running the application.

Some parts within the settings scope is used by deploy, and they will be described in the following section.

Deploy scope

The levels in the figure can have an instance of the settings scope defined. These settings are inherited through the hierarchy.

The scope levels of Deploy:

One level can have several instances of a lower level (e.g. a deployMachine can have several applicationName, and not vice versa).

This will look like the following:

<deployGlobal>
    <thisPhysicalLocation name="myPhysicalLocation">
        <deployMachine name="myMachine" os="linux">
            <applicationName name="myApplication">
            </applicationName>
            <applicationName name="myOtherApplication">
            </applicationName>
        </deployMachine>
        <deployMachine name="myOtherMachine" os="windows">
            <applicationName name="myApplication">
            </applicationName>
        </deployMachine>
    </thisPhysicalLocation>
</deployGlobal>

This configuration has one physical location with two machines, one with Linux/Unix and one with Windows. The Linux/Unix machine has two applications, 'myApplication' and 'myOtherApplication', while the Windows machine has only one application, 'myApplication'.

Parameters

Each of the above scopes can have several of the following parameters defined. These parameters can be applied to each of the above scopes, and they are inherited from the parent scope in the same way as settings.

The parameter scopes the levels can have:

An example of how this works is given below.

<deployGlobal>
    <deployClassPath>lib/dk.netarkivet.common.jar</deployClassPath>
    <deployClassPath>lib/dk.netarkivet.archive.jar</deployClassPath>
    <deployJavaOpt>-Xmx1536m</deployJavaOpt>
    <thisPhysicalLocation name="myPhysicalLocation">
        <deployMachineUserName>myUserName</deployMachineUserName>
        <deployMachine name="myLinuxMachine">
            <deployInstallDir>/home/myUserName/myInstallationDirectory</deployInstallDir>
            <deployDatabaseDir>myDatabaseDir</deployDatabaseDir>
            <settings>
                <common>
                    <database>
                         <url>jdbc:derby:myDatabaseDir/fullhddb</url>
                    </database>
                </common>
            </settings>
            <applicationName name="myLinuxApplication">
            </applicationName>
        </deployMachine>
        <deployMachine name="myWindowsMachine" os="windows">
            <deployInstallDir>C:\myInstallationDirectory</deployInstallDir>
            <deployJavaOpt>-Xmx1150m</deployJavaOpt>
            <applicationName name="myWindowsApplication">
                <deployClassPath>lib/dk.netarkivet.common.jar</deployClassPath>
                <deployClassPath>lib/dk.netarkivet.harvester.jar</deployClassPath>
                <deployClassPath>lib/dk.netarkivet.viewerproxy.jar</deployClassPath>
            </applicationName>
        </deployMachine>
    </thisPhysicalLocation>
<deployGlobal>

This defines two different machines each with a single application. These machines have different operating systems (one with windows and one with linux), and therefore they have different installation directories and Java options.

The Linux machine inherits the Java option -Xmx1536m from the physical location, which inherits it from deployGlobal. The Windows machine has a Java option specified and does therefore not inherit deployGlobal Java option.

The deployDatabaseDir is only specified on the Linux machine, and the database will therefore be unpacked only on this machine. It is specified in settings.common.database.url what type the database is, and where the it is found after it is unpacked. If a specific database is not given as parameter when calling deploy the default Derby database 'fullhddb.jar' is used.

The application myLinuxApplication on the Linux machine does not have any class paths specified, and does therefore inherit the lib/dk.netarkivet.common.jar and lib/dk.netarkivet.archive.jar all the way from deployGlobal (through thisPhysicalLocation and deployMachine).

On the other hand does myWindowsApplication on the Windows machine not inherit these libraries, since it has its own class paths specified. It has the libraries lib/dk.netarkivet.common.jar, lib/dk.netarkivet.harvester.jar and lib/dk.netarkivet.viewerproxy.jar in the class path, and does therefore not have the lib/dk.netarkivet.archive.jar since it is neither specified nor inherited.

The myLinuxApplication will be called with the following command:

java -Xmx1536m -cp lib/dk.netarkivet.common.jar:lib/dk.netarkivet.archive.jar myLinuxApplication

The myWindowsApplication will be called with the following command:

java -Xmx1150m -cp lib/dk.netarkivet.common.jar;lib/dk.netarkivet.harvester.jar;lib/dk.netarkivet.viewerproxy.jar myWindowsApplication

The class paths are separated with ':' on Linux/Unix and with ';' on Windows.

Application Instance Id

The scope settings.common.applicationInstanceId defines identification of a single application instance (e.g. suffix for application specific scripts, suffix for directory to place files etc.). This is needed in cases where there are more instances of the same application are placed on the same machine (e.g. BitarchiveMonitors)

An example of two identical applications with different application instance id on the same machine is given below:

<deployGlobal>
    <thisPhysicalLocation name="myPhysicalLocation">
        <deployMachine name="myMachine">
            <applicationName name="dk.netarkivet.archive.bitarchive.BitarchiveApplication">
                <settings>
                    <common>
                        <applicationInstanceId>myFirstInstance</applicationInstanceId>
                    </common>
                </settings>
            </applicationName>
            <applicationName name="dk.netarkivet.archive.bitarchive.BitarchiveApplication">
                <settings>
                    <common>
                        <applicationInstanceId>mySecondInstance</applicationInstanceId>
                    </common>
                </settings>
            </applicationName>
        </deployMachine>
    </thisPhysicalLocation>
</deployGlobal>

These application will be called BitarchiveApplication_myFirstInstance and BitarchiveApplication_mySecondInstance respectivly.

Limitations and Requirements

And deploy has the following requirements:

The deploy configuration has the following limitations in comparison to the manual installation.

The limitations and requirements for the configuration of the applications can be found in Configuration Manual. Specific for deploy are the following:

Configuration example

Here is an example of a configuration file for deploy.

deploy_distributed_example_database.xml

The following part of this section describes how to change this configuration file template to fit your specific system. This describes how to make the changes, scope for scope, to fit a system with the same structure, and it describes how to expand the scopes with new machines and applications.

Deploy Global

The deployGlobal scope contains two parts: the parameters and the settings.

Just leave the <deployClassPath parameters, since they will be overwritten for the applications which need other libraries. The <deployJavaOpt>-Xmx1536m</deployJavaOpt> parameter just sets the maximum heap size to 1.5 GB (1536 MB). This value should not be larger than the amount of accessable memory on a machine.

Within the settings scope of deployGlobal the following needs to be done.

The environment name is not required to be changed for the system to work, though it is usually a good idea to change this to a more appropriately name for the installation or system. This is the settings at 'settings.common.environmentName'.

    <settings>
        <common>
            <environmentName>test</environmentName>
        <common>
    <settings>

The replicas should be changed to fit the system. A replica will generally be connected to a specific physical location, though a physical location can have several replicas. These settings can be found under 'settings.common.replicas'.

    <settings>
        <common>
            <replicas>
                <replica>
                    <replicaId>A</replicaId>
                    <replicaName>ReplicaA</replicaName>
                    <replicaType>bitArchive</replicaType>
                </replica>
                <replica>
                    <replicaId>B</replicaId>
                    <replicaName>ReplicaB</replicaName>
                    <replicaType>bitArchive</replicaType>
                </replica>
            </replicas>
        <common>
    <settings>

The JMS-broker is defined at the global level, and it should be set to the administation machine, e.g. the machine with the 'dk.netarkivet.common.webinterface.GUIApplication', the 'dk.netarkivet.archive.arcrepository.ArcRepositoryApplication' and the instances of 'dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication' should be run. This is defined in the settings: 'settings.common.jms.broker'.

        <settings>
            <common>
                <broker>kb-test-adm-001.kb.dk</broker>
            <common>
        <settings>

If more replicas are wanted, they have to be defined in the settings at the deployGlobal level. Each replica needs a unique replicaId and replicaName, and it also needs the following applications: dk.netarkivet.archive.bitarchive.BitarchiveApplication, and dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication.

Physical Locations

The configuration example file has two physical locations: EAST and WEST. Every physical location need to have a unique name.

    <thisPhysicalLocation name="EAST">
        ...
    </thisPhysicalLocation>
    <thisPhysicalLocation name="WEST">
        ...
    </thisPhysicalLocation>

For the settings of a physical location the following need to be done. A physical location needs to know which replica it uses. This replicaId has to be amongst the replicas defined in the deployGlobal scope. It has the path: 'settings.common.useReplicaId'.

        <settings>
            <common>
                <useReplicaId>A</useReplicaId>
            </common>
        </settings>

If using FTPRemoteFile, it is necessary to specify a machine on which an ftp server is running, together with valid login credentials, for example

                <remoteFile>
                    <serverName>kb-test-har-001.kb.dk</serverName>
                    <userName>ftptestuser</userName>
                    <userPassword>ftptestpasswd</userPassword>
                </remoteFile>

The notifications settings should be setup to tell where mails should be sent. The receiver should be changed to the mail of the administrator of the system.

                <notifications>
                    <sender>example@netarkivet.dk</sender>
                    <receiver>example@netarkivet.dk</receiver>
                </notifications>

It is currently not possible to have more than two physical locations, but this problem will be dealt with, and it will be possible in a future release.

Machine

The name of a machine has to be change to the network ID, e.g. either network name or IP address. The 'os' attribute should only be set for the windows machines, which can only run applications of the instance dk.netarkivet.archive.bitarchive.BitarchiveApplication.

        <deployMachine os="windows" name="kb-dev-bar-011.bitarkiv.kb.dk">

Change the following parameters to fit to the machine definition: A machine needs to have the following parameters defined (they can also be defined at the physicalLocation level, and then just be inherited).

        <deployMachineUserName>test</deployMachineUserName>
        <deployInstallDir>/home/test</deployInstallDir>

There are no specific settings required at the machine level, which is not inherited by the outer scopes. And therefore no settings to change to fit to your system.

A new machine has to be created within a physical location scope. It requires the name attribute, and the parameters deployMachineUserName and deployInstallDir has to be defined or inherited. The parameter deployDatabaseDir is required, if the machine runs an application which requires a database.

Application

All applications need the following settings defined under settings.common.jmx:

                            <port>8100</port>
                            <rmiPort>8300</rmiPort>

These port values must be unique for the machine, where the application should run.

A new application needs the name attribute to be defined as the name in the classpath for the application. E.g:

            <applicationName name="dk.netarkivet.common.webinterface.GUIApplication">

It is important to notify that when a new application is added to a machine, which already has an application of the same instance, these applications must have the settings.common.applicationInstanceId defined with different values.

Some of the applications require some specific settings to be defined. This is described in the following specifically

BitarchiveApplication

The dk.netarkivet.archive.bitarchive.BitarchiveApplication requires the settings settings.archive.bitarchive.baseFileDir to be defined. This path should be changed, and it has to be changed if the drive/partition in the path does not exist on the machine.

HarvestControllerApplication

For the dk.netarkivet.harvester.harvesting.HarvestControllerApplication the following settings defined under settings.harvester.harvesting.heritrix should be changed to fit your system: guiPort and jmxPort.

A new instance of the dk.netarkivet.harvester.harvesting.HarvestControllerApplication requires the settings settings.harvester.harvesting.queuePriority to be defined to either LOWPRIORITY or HIGHPRIORITY. A system requires at least one HarvestControllerApplication with each priority.

IndexServerApplication and ViewerProxyApplication

Both the dk.netarkivet.archive.indexserver.IndexServerApplication and dk.netarkivet.viewerproxy.ViewerProxyApplication should have the settings.common.http.port and the settings.viewerproxy.baseDir changed to fit your system.

BitarchiveMonitorApplication

All the instances of dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication should be placed on the same machine as the dk.netarkivet.common.webinterface.GUIApplication. These applications monitors the BitarchiveApplications at a given replica, though they do not have to be on the same physical location. They should therefore have the settings.common.useReplicaId defined.

Installation Manual 3.12/Configuration File (last edited 2010-09-14 08:32:21 by TueLarsen)