== Manual installation of the NetarchiveSuite == <> If the deploy software is not adequate for the installation needed, this section will give some hints on how to distribute and install the !NetarchiveSuite software on a number of machines. In the examples below, we assume that ~+{{{$deployInstallDir}}}+~ is set to the directory in which the !NetarchiveSuite code is to be installed. We assume that all machines in the chosen scenario are unix/linux servers. The procedure below may not work on other platforms. After having created the new settings to be used in the deployment of the software, zip together the !NetarchiveSuite files including the new settings and copy the modified !NetarchiveSuite.zip to all machines taking part in the deployment: {{{ export USER=test export MACHINES="machine1.domain1, machine2.domain1, .. machine1.domain2, machine2.domain2" for MACHINE in $MACHINES; do scp NetarchiveSuite.zip $USER@$MACHINE:$deployInstallDir ssh $USER@$MACHINE "cd $deployInstallDir && unzip NetarchiveSuite.zip" done }}} === NetarchiveSuite settings === The !NetarchiveSuite settings can be set for applications in three different ways: * use default setting * in a setting file * on command line ==== Using NetarchiveSuite default settings ==== If no settings are set, the default setting is used. Please refer to the [[Configuration Manual 3.16#DefaultSettings|Configuration Manual - Default Settings]] for more information on these. ==== Setting NetarchiveSuite settings on the command line ==== To set the value of a setting on the command line, add "-Dkey=value" to your java command line, for instance: {{{ java -Dsettings.common.http.port=8076 dk.netarkivet.common.webinterface.GUIApplication }}} will override the setting for the http port to be 8076. ==== Setting NetarchiveSuite settings with settings files ==== To set the values using a configuration file, save the settings in an XML file as described above. By default, !NetarchiveSuite will look for the settings file in ~+{{{conf/settings.xml}}}+~, that is: the file ~+{{{settings.xml}}}+~ under the directory ~+{{{conf}}}+~ from the current working directory. You can override this, by specifying ~+{{{-Ddk.netarkivet.settings.file=path/to/settings.file.xml}}}+~ on the commandline, for instance: {{{ java -Ddk.netarkivet.settings.file=/home/netarchive/guisettings.xml dk.netarkivet.common.webinterface.GUIApplication }}} will read settings from the file ~+{{{/home/netarchive/guisettings.xml}}}+~ . You can even specify multiple configuration files, if you wish. You do this by separating the paths with ':' on unix/linux/MacOS or ';' on windows. For instance: {{{ java -Ddk.netarkivet.settings.file=guisettings.xml:basicsettings.xml dk.netarkivet.common.webinterface.GUIApplication }}} will read settings from both ~+{{{guisettings.xml}}}+~ and ~+{{{basicsettings.xml}}}+~ in the current directory. ==== The order of resolving NetarchiveSuite settings ==== If a setting is set on both command line and in settings files, or if it is set in multiple settings files, the setting is resolved as follows: * If the setting is set with system properties (i.e. set on the command line), use these * Else if the setting is specified in configuration files, use the '''first''' specified value * Else use default value As an example, consider the resulting value for http-port (knowing that the default value is empty) when using the following two configuration files: settings1.xml {{{ 8076 }}} settings2.xml {{{ 8077 }}} The following command will use the value empty string as http-port: {{{ java dk.netarkivet.common.webinterface.GUIApplication }}} The following command will use the value 8078 as http-port: {{{ java -Ddk.netarkivet.settings.file=settings1.xml:settings2.xml -Dsettings.common.http.port=8078 dk.netarkivet.common.webinterface.GUIApplication }}} The following command will use the value 8076 as http-port: {{{ java -Ddk.netarkivet.settings.file=settings1.xml:settings2.xml dk.netarkivet.common.webinterface.GUIApplication }}} The following command will use the value 8077 as http-port: {{{ java -Ddk.netarkivet.settings.file=settings2.xml:settings1.xml dk.netarkivet.common.webinterface.GUIApplication }}} === Standard commandline settings === ==== The CLASSPATH ==== The CLASSPATH needed to start and run the java applications in !NetarchiveSuite consists of 5 jarfiles, dk.netarkivet.harvester.jar, dk.netarkivet.archive.jar, dk.netarkivet.viewerproxy.jar, dk.netarkivet.wayback.jar, and dk.netarkivet.monitor.jar. The dk.netarkivet.common.jar and all our 3rd party dependencies need not be added explicitly to the CLASSPATH, as they are referenced indirectly in the jar-files. {{{ export deployInstallDir=/path/to/netarchiveSuite export CLASSPATH=$CLASSPATH:$deployInstallDir/lib/dk.netarkivet.harvester.jar export CLASSPATH=$CLASSPATH:$deployInstallDir/lib/dk.netarkivet.archive.jar export CLASSPATH=$CLASSPATH:$deployInstallDir/lib/dk.netarkivet.viewerproxy.jar export CLASSPATH=$CLASSPATH:$deployInstallDir/lib/dk.netarkivet.wayback.jar export CLASSPATH=$CLASSPATH:$deployInstallDir/lib/dk.netarkivet.monitor.jar }}} <> ==== Logging ==== We use the apache.commons.logging.framework, so we need to point to the wanted logger-class (eg. org.apache.commons.logging.impl.!Jdk14Logger) as well as to the logging configuration file. You may want to use different logging properties for different applications, especially when more than one application logs to the same logging directory. E.g. you want the change line ~+{{{java.util.logging.FileHandler.pattern=./log/APPID%u.log}}}+~ in the ~+{{{conf/log.prop}}}+~ file to something different. {{{ export LOG_SETTINGS="-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger \ -Djava.util.logging.config.file=$deployInstallDir/conf/log.prop" }}} Note that if you use the !MonitorSiteSection, your logging properties file must contain the handler ~+{{{dk.netarkivet.monitor.logging.CachingLogHandler}}}+~ {{{ handlers=java.util.logging.FileHandler,java.util.logging.ConsoleHandler, \ dk.netarkivet.monitor.logging.CachingLogHandler }}} ==== JMX settings ==== Each application instance has its own JMX- and RMI port. For example the JMX port could be 8100 and the associated RMI port 8200, as in the example below, for the first application instance on the machine , then 8101/8201 for the second application instance, and so on. JMX also uses a password-file, which is the same throughout the installation ($deployInstallDir/conf/jmxremote.password) {{{ export JMX_SETTINGS="-Dsettings.common.jmx.port=8100 \ -Dsettings.common.jmx.rmiPort=8200" }}} Note: For the !StatusSiteSection to work, your logging must be configured to use java.util.logging with the ~+{{{dk.netarkivet.monitor.logging.CachingLogHandler}}}+~ enabled, see [[Installation Manual 3.16#CommandLineLogging|Logging]] section (This is done automatically, if the !NetarchiveSuite deploy software is used to configure and install your !NetarchiveSuite installation). ==== Select the appropriate settings.file for the application ==== The conf/settings.xml (the new one configured to your environment) is probably OK for most applications. But you may need to use special purpose settings-files for some applications, e.g. !BitarchiveApplications (since you can't allocate more than one ~+{{{baseFileDir}}}+~ on the commandline). The settings file used in an application can be specified by: {{{ export SETTING=-Ddk.netarkivet.settings.file=$deployInstallDir/conf/settings.xml }}} ==== JVM options ==== We need to set the maximum Java heap size to 1.5 Gbytes. You may use this to change that or add other JVM options. {{{ export JAVA_OPTS=-Xmx1536m }}} === Admin machine === On the admin machine, we have to start the following 5 applications: * 1 GUIApplication. * 1 HarvestJobManagerApplication (handles the scheduling of jobs) * 2 instances of !BitarchiveMonitorApplication (Controlling the access to a single bitarchive replica), one for each bitarchive replicas (e.g. EAST and WEST). * 1 ARCRepositoryApplication (this application handles access to the bitarchive replicas). ==== Starting the GUIApplication ==== Before, we can start the GUIApplication, the external database needs to started in advance (The deploy software does for you if the external database is a derby database). We also need to prepare the JSP-pages. You can unzip the war-files in the webpages directory as below: {{{ cd $deployInstallDir/webpages rm -rf BitPreservation unzip -o BitPreservation.war -d BitPreservation rm -rf HarvestDefinition unzip -o HarvestDefinition.war -d HarvestDefinition rm -rf History unzip -o History.war -d History rm -rf QA unzip -o QA.war -d QA rm -rf Status unzip -o Status.war -d Status }}} Or you can update your settings.xml file to refer to the war-files instead of the unpacked directories, for instance {{{ ... ... dk.netarkivet.harvester.webinterface.DefinitionsSiteSection webpages/HarvestDefinition.war ... ... }}} and similar for other sitesections. Now we are ready to start the application: {{{ cd $deployInstallDir export APP=dk.netarkivet.common.webinterface.GUIApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP }}} ==== Starting the BitarchiveMonitorApplication instances ==== In the general set-up with two distributed bitarchive replicas, we have a !BitarchiveMonitorApplication associated with each replica. Here the replicas are ~+{{{ReplicaOne}}}+~ (with replicaId ~+{{{ONE}}}+~) and ~+{{{ReplicaTwo}}}+~ (with replicaId ~+{{{TWO}}}+~). To distinguish the two instances from each other, we use the '''settings.common.applicationInstanceId''' setting, which is used as a identifier (here we use BMONE and BMTWO) as the two identifiers. Start the monitor for bitarchive at ~+{{{ReplicaOne}}}+~ using ~+{{{BMONE}}}+~ as identifier thus: {{{ cd $deployInstallDir export APP_OPTIONS="-Dsettings.common.archive.bitarchive.useReplicaId=ONE \ -Dsettings.common.applicationInstanceId=BMONE" export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP }}} Start the monitor for the bitarchive at ~+{{{ReplicaTwo}}}+~ using ~+{{{BMTWO}}}+~ as identifier thus: {{{ cd $deployInstallDir export APP_OPTIONS="-Dsettings.common.archive.bitarchive.useReplicaId=TWO \ -Dsettings.common.applicationInstanceId=BMTWO" export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP }}} * one ARCRepository (this application handles all access to the bitarchives). {{{ cd $deployInstallDir export APP=dk.netarkivet.archive.arcrepository.ArcRepositoryApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP }}} === Harvester machines === On each harvester machine, we have one or more !HarvestControllerApplications. Settings related to the !HarvestControllerApplication are * setting.common.applicationInstanceId (to distinguish between !HarvestControllerApplications running on same machine) * settings.harvester.harvesting.queuePriority (to select which of two queues to accept jobs from: HIGHPRIORITY (jobs part of a selective harvest), or LOWPRIORITY (jobs part of a snapshotharvest) * settings.harvester.harvesting.minSpaceLeft (how many bytes ''must'' be available in the serverdir to accept crawljobs). The default is 400000000 (~400 Mbytes). In the following, a low-priority !HarvestControllerApplication is started with application instace id=SEL {{{ cd $deployInstallDir export APP_OPTIONS="-Dsettings.harvester.harvesting.queuePriority=LOWPRIORITY \ -Dsettings.common.applicationInstanceId=SEL" export APP=dk.netarkivet.harvester.harvesting.HarvestControllerApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP }}} === Bitarchive machines === For each Replica, you can have !BitarchiveServer's installed on one or more machines. We suggest using just one !BitarchiveServer for each machine, though it is possible to use more than one. Each !BitarchiveServer can have storage on several filesystems, so if archive-storage is spread over more than one filesystem, you need to modify the settings file like this {{{ .. ... ... /home/fileSys1/ /home/fileSys2/ ... .. }}} Starting a !BitarchiveServer requires knowing what Replica it resides on, and the credentials required for correcting the data stored in the bitarchive, for ~+{{{ReplicaOne}}}+~ with id ~+{{{ONE}}}+~ this would be: {{{ cd $deployInstallDir export APP_OPTIONS="-Dsettings.archive.bitarchive.useReplicaId=ONE \ -Dsettings.archive.bitarchive.thisCredentials=CREDENTIALS" export APP=dk.netarkivet.archive.bitarchive.BitarchiveApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP }}} === Access servers === On the access-servers, we deploy any number of '''!ViewerProxyApplication''' instances, and maybe one '''!IndexServerApplication''' (only one in all) used to generate indices needed by the harvesters and the !ViewerProxyApplication instances. {{{ cd $deployInstallDir export APP=dk.netarkivet.archive.indexserver.IndexServerApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP }}} Each !ViewerproxyApplication instance uses a application instance id(settings.common.applicationInstanceId), and its own distinct base directory (settings.viewerproxy.baseDir). They also belong to a Replica(settings.archive.bitarchive.useReplicaId). In the start sample below, the instance uses application instance id "first" and 'viewerproxy_first' as base directory, and belongs to ~+{{{ReplicaOne}}}+~ with id ~+{{{ONE}}}+~: {{{ cd $deployInstallDir export APP_OPTIONS="-Dsettings.common.applicationInstanceId=first \ -Dsettings.viewerproxy.baseDir=viewerproxy_first \ -Dsettings.archive.bitarchive.useReplicaId=ONE" export APP=dk.netarkivet.viewerproxy.ViewerProxyApplication java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP_OPTIONS $APP }}} About the !NetarchiveSuite support for wayback, see [[Additional Tools Manual 3.16/Tools in Wayback Module]]