Functionality of the Deploy Software

edit

The main function of deploy is to install and configure NetarchiveSuite on a distributed system. This is done through scripts to install, start and stop the applications of NetarchiveSuite based on a configuration file for the system. A sample file is provided with NetarchiveSuite in the file conf/it_conf_example.xml.

The figure below shows the hierarchy of the instances in the deploy configuration file.

layers.gif

Terminology

Running deploy

The Deploy module has to be run from a Linux/Unix machine, since the scripts for handling the physical locations only works on this platform. Some of the application are supported on Windows, and therefore some machines with Windows as operating system can be used in the distributed system. Just not the machine where the deployment takes place, since the deployment is done through the scripting language Bash which only works on Linux/Unix.

The figure below shows what happens when the deploy application is run.

deploy_step1.gif

Deploy arguments

Deploy takes the following arguments:

Other dependencies

Deploy requires the following libraries in the classpath:

Deploy uses Java 1.6 and therefore this has to be put in the path before calling the java application.

Example

The complete call for running deploy will therefore be the following (with lib/ being the directory for the libraries):

export JAVA_HOME=/usr/java/jdk1.6.0_07
export PATH=$JAVA_HOME/bin:$PATH
java -cp lib/dk.netarkivet.deploy.jar:lib/dk.netarkivet.archive.jar:lib/dk.netarkivet.common.jar:lib/dk.netarkivet.harvester.jar:lib/dk.netarkivet.monitor.jar:lib/dk.netarkivet.viewerproxy.jar:lib/dom4j-1.5.2.jar:lib/commons-logging-1.0.4.jar:lib/commons-cli-1.0.jar:lib/jaxen-1.1.jar dk.netarkivet.deploy.DeployApplication -Cdeploy_config.xml -ZNetarchiveSuite.zip -Ssecurity.policy -Llog.prop

where deploy_config.xml is the name and path to the configuration file, NetarchiveSuite.zip is the path of the NetarchiveSuite package, security.policy is the path of the security policy file and log.prop is the path of the property file for logging. Java version 1.6.0_07 is specifically called here, though any Java version above 1.6.0 is usable.

Files

When deploy is run a number of files are created in the output directory. This involves scripts to install, start and kill the applications on the distributed platform. Also the NetarchiveSuite package file is copied to this location (unless it already exists in the output directory).

In addition to a NetarchiveSuite settings file, the following configuration files are also created on a per-machine or per-application basis:

Jmxremote password file

This file is created from scratch for each machine. A large instructional header for the use of the jmxremote.password is initially created for the file, then the jmx username and jmx password for the monitor and for heritrix is appended. It is only the jmx logins (username and password), which is used by the applications.

The login variables for the monitor are found through the paths in the settings for any of the applications: settings.monitor.jmxUsername and settings.monitor.jmxPassword.

The login variables for heritrix are found through the paths in any of the application settings: settings.harvester.harvesting.heritrix.jmxUsername and settings.harvester.harvesting.heritrix.jmxPassword.

If any application has a monitor defined in the settings file, the monitor must have a jmx login defined. The monitor jmx logins has to be the same for all applications on a machine. This also applies for heritrix jmx logins, though the monitor jmx login and heritrix jmx login does not have to be the same.

Log property file

A log property file for each application is created. This file is given as input and it is changed to fit the application.

The only change in the log property file is changing the tag APPID to the identification of the application (applicationName + ["_" + applicationInstanceId]). Where the ["_" + applicationInstanceId] only is appended to the applicationName if the application has an applicationInstanceId defined.

The name of this application specific log property file is: "log_" + applicationIdentification + ".prop". Where the applicationIdentification is given as applicationName + ["_" + applicationInstanceId], as described above.

Security policy file

The security policy file for a machine is initially a copy of the security policy file given as argument. This machine specific security policy file is then modified to suit the needs of the machine and it's applications.

The tag ROLE is replaced by the monitor.jmxUsername for the machine. This has to be defined on the machine level in the deploy configuration file.

Permission to read the baseFileDir under bitarchive for all applications is granted. The path to these directories are changed to fit the language in security policy, the directory separator ('/' for Linux and '\' for Windows) is changed to '${/}'.

Evaluate

It is possible to evaluate the content of the configuration file when deploying, by giving the '-E' parameter with argument either 'y' or 'yes'. This is a tool for finding bugs within a configuration file (e.g. a mispelled name or wrongly placed branch).

This checks if the all the branches in the configuration file can be found within the default settings, and makes a warning for those it cannot find. It does not check if the content of these branches are correct (e.g. http-port = -1), it only checks whether the branches also exists in the default settings.

Deploy does not terminate when unknown branches are found. It only generates warnings about each unknown branch and then continues with the deployment.

Some module have plugins which uses some values within the settings, which is not part of the default settings, and they will therefore be noted as unknown. Such plugin specific branches should not be considered errors, even though warnings are made about them.

Test instance

In the case where test argument are given a new configuration file is created, with the _test appended to the name (e.g. deploy_config.xml will have the test instance configuration file: deploy_config_test.xml).

The following test arguments are given: test_HttpOffsetPort, test_HttpPort, test_EnvironmentName, and test_Mailreceivers. These arguments are given without spaces between them in the above order. An Offset variable is calculate as the difference between the test_HttpPort and the test_HttpOffsetPort (e.g. Offset = test_HttpPort - test_HttpOffsetPort). The value of this Offset must be between 0 and 9 .

The test argument is applied to it_config_test file, where the following changes are made:

Path

index

settings.common.jmx.port

3

settings.common.jmx.rmiPort

3

settings.harvester.harvesting.heritrix.guiPort

2

settings.harvester.harvesting.heritrix.jmxPort

2

E.g. Offset = 7 and a settings.common.jmx.port = 1234 will yield a new settings.common.jmx.port = 1274 for the test instance, whereas a settings.harvester.harvesting.heritrix.jmxPort = 1234 will yield a new settings.harvester.harvesting.heritrix.jmxPort = 1734.

Install

An installation script is created for each physical location. This script contains the commands for making the installation on all the machine of the physical location as described in the pseudo code.

The figure below shows the pattern of installation.

deploy_step2.gif

Install script pseudo code

Install the NetarchiveSuite file

The NetarchiveSuite file is copied to the machine using scp (send copy). Then file is unzipped in the installation directory, which is created as a subdirectory in the local user directory.

Install necessary directories

In the config file a number of directories are defined, and these directories have to be created during the installation on a machine. The following table show which directories are created based on the main branch where they are defined, and their path from this branch. The branch level represents where the applications have to be defined before they can be applied. They can easily be defined in a prior instance, and then be inherited to the given branch level.

Path

Directory

Branch level

settings.harvester.harvesting.serverDir

$/

applicationName

where $/ in Directory is the value of the path. All the directories along this path will be created, if they do not exists already. A directory is only created if the path is defined under settings for the branch level (or inherited to the branch level) and it contains a proper value (not empty).

The installation of the directories will be executed from the installDir. The directories will only be installed if they do no already exist, with the optional exception of the tempDir, which will be removed before creation if the -R argument is set to 'yes'. It is only the directory at the end of the path, which has its content removed, not all the directories along the path. E.g. a tempDir with the path myPath/myEndDir will only clean the directory 'myEndDir', and not the directory 'myPath'.

On Linux/Unix machines directories are created directly through ssh, while Windows machines use a batch program, which is installed, run and then deleted.

This is because only a single command line can be run through ssh, and this command line is run as bash on Linux/Unix and as batch on Windows. Since bash can take many commands on a single command line, it is possible to install all the directories through ssh on Linux/Unix. batch on the other hand can only handle a single command per command line, and the directories can therefore not be installed through a single ssh call. The batch commands to install the directories are therefore combined in a batch program, which is installed on the windows machine, then run and afterwards deleted.

Install scripts, settings and database

The jmxremote.password file has to be not-writable when the applications are running, which means that a reinstallation of this file cannot happen before it is made writable again.

Then all the script and setting files are copied from the local directory with the machine name to the 'conf/' directory in the installation directory on the machine.

Then the optional database is handled, though only on the machines with a specified database directory. This database overrides the existing standard database in the NetarchiveSuite package. The database is then unzipped to the database directory, but only if it is empty.

Then the scripts are made executable and the jmxremote.password is made read-only.

Start, Restart and Kill

The figure below shows how the applications are started, and the same pattern are used for killing the applications again (replace start with kill in the figure).

deploy_step3.gif

An application cannot be started if it is already running. The way the applications are started and run are quite different for the Linux and Windows platforms.

The restart script can be used for restarting the running applications. It starts by calling the killall script, then waits 5 seconds for the applications to terminate completely, and finally runs the startall script. This script can be used for Windows Services (automatic execution during startup).

Linux

On the Linux platform an application is only started if no instances of this application be found among the running processes. Likewise an application is only killed if it can be found in the process list.

The way an instance of a specific application can be found amongst the list of running processes, is by looking for any process with the same name, and which is using the same settings file.

When killing the an application of the instance dk.netarkivet.harvester.harvesting.HarvestControllerApplication, then the Heritrix application is also killed.

Windows

It requires several files on windows to run the application, and making sure that maximum one instance of the application is running. Two scripts for killing it, two scripts for starting it and one temporary file for telling whether it an instance is running.

The application can only be started if the temporary run-file does not exist. It is done by calling a VBS script for running the application. This script starts the application as a process and saves method for killing this process in a kill-process file.

The application can only be killed if the temporary run-file exists. The kill-process file is called for killing the process of the application. Then the temporary run-file is removed, thus telling that the application is not running and can be started again.

The Heritrix application is not killed when an application of the instance dk.netarkivet.harvester.harvesting.HarvestControllerApplication is killed. This is because a Heritrix is not throughly tested on Windows, and might not be supported.

Installation Manual 3.10/Functionality (last edited 2010-08-16 10:24:10 by localhost)