Differences between revisions 3 and 4
Revision 3 as of 2011-01-21 14:51:34
Size: 5757
Comment:
Revision 4 as of 2011-02-21 11:21:03
Size: 3289
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
 * Verify that you have all the needed software installed by installing the !QuickStart according to https://netarchive.dk/suite/Quick_Start_Manual_3.16 e.g. in /home/test/netarchive by starting the Quickstart.
 * Shutdown the !QuickStart according to the !QuickStart Manual
 * Download following attached files to e.g. /home/test/netarchive:

 * Download following attached files to e.g. /home/test/netarchive:
[[http://netarchive.dk/suite/Installation_Manual_3.16/AppendixC?action=AttachFile&do=get&target=RunNetarchiveSuite.sh RunNetarchiveSuite.sh]]

[[http://netarchive.dk/suite/Installation_Manual_3.16/AppendixC?action=AttachFile&do=get&target=deploy_standalone_example.xml deploy_standalone_example.xml ]]


The first script is a simple script for doing all the steps during deployment. It takes a !NetarchiveSuite package ('.zip'), a configuration file (the second file), and a temporary installation directory as arguments (in the given order).

In the configuration file all the applications are placed on one machine (e.g. the current machine, ~+{{{localhost}}}+~). This gives the same kind of instance as the !QuickStart. If run directly it is installed and run from the directory ~+{{{/home/test/USER}}}+~.
 * Verify that you have all the needed software installed by installing the !QuickStart according to https://netarchive.dk/suite/Quick_Start_Manual_3.14 e.g. in /home/test/netarchive by starting the Quickstart.
Line 20: Line 8:
E.g.  * You can now create, run and browse according to the QuickStart - or User Manual
Line 22: Line 10:
{{{
cd /home/test/netarchive
bash RunNetarchiveSuite.sh NetarchiveSuite.zip deploy_standalone_example.xml USER/
#if you have not setup your ssh keygen correctly, you need to login some times before the installation finish successfully
}}}
The script creates a "USER" folder in e.g. /home/test , which contains e.g. methods for starting and stopping NetarchiveSuite and starts the whole NetarchiveSuite.
== Examples of deploy configuration files ==
The following example of configuration file requires adaptation to your own system before use.
Line 29: Line 13:
 * Set your browser to proxy according to the QuickStart Manual on port 8070
 * Choose the URL e.g. http://dia-test-int-01.kb.dk:8074/HarvestDefinition/
 * You can now create, run and browse according to the QuickStart - or User Manual
== Examples of deploy configuration files ==
In the following are two examples of configuration files for deploy. The first two requires adaptation to your own system before use.
[[attachment:deploy_distributed_example_3_14.xml]]
Line 35: Line 15:

[http://netarchive.dk/suite/Installation_Manual_3.16/AppendixC?action=AttachFile&do=get&target=deploy_distributed_example.xml deploy_distributed_example.xml ]

The instance with two replicas divided over two physical locations. Each physical locations contain several machines. Bitarchive machines, harvester machine and viewerproxy machine. Only one physical location has an administator machine, which contains the GUI application, the Bitarchive monitors and the arc repository.

----
 . [http://netarchive.dk/suite/Installation_Manual_3.16/AppendixC?action=AttachFile&do=get&target=deploy_distributed_example_single.xml deploy_distributed_example_single.xml ]

This is the instance with only one replica and one physical location. It is very close to the first example, just with one replica removed.

----
 . [http://netarchive.dk/suite/Installation_Manual_3.16/AppendixC?action=AttachFile&do=get&target=deploy_distributed_example_database.xml deploy_distributed_example_database.xml ]

This is an instance using the archive database for the !ArcRepository and the !DatabaseBasedActiveBitPreservation. It contains a checksum replica, and it does not use admin.data.

----
The instance with two replicas divided over two physical locations. Each physical locations contain several machines. Bitarchive machines, harvester machine and viewerproxy machine. Only one physical location has an administator machine, which contains the GUI application, the Bitarchive monitors, the HarvestJobManager, HarvestJobMonitor and the arc repository.
Line 54: Line 19:
 . http://netarchive.dk/suite/Installation_Manual_3.16?action=AttachFile&do=view&target=HW_SW_production_example.txt  . [[attachment:HW_SW_production_example.txt]]
Line 61: Line 27:
Line 67: Line 34:
Line 70: Line 38:
Line 93: Line 62:

Appendix C: Easy Installation of NetarchiveSuite

edit

Below, you find other deploy examples. ( They have to be modfied to your environment)

  • You can now create, run and browse according to the QuickStart - or User Manual

Examples of deploy configuration files

The following example of configuration file requires adaptation to your own system before use.

deploy_distributed_example_3_14.xml

The instance with two replicas divided over two physical locations. Each physical locations contain several machines. Bitarchive machines, harvester machine and viewerproxy machine. Only one physical location has an administator machine, which contains the GUI application, the Bitarchive monitors, the HarvestJobManager, HarvestJobMonitor and the arc repository.

A running HW/SW setup example from June 2009 for Netarkivet.dk



How to add a harvester more on the same machine and set all to HIGHPRIORITY selective harvesting

Using e.g. deploy_example.xml

  • Duplicate the existing harvester <applicationName> definition within <deployMachine>

In the new duplicate harvester config, change all following duplicate values to new unique values within <deployMachine>:

  • <applicationInstanceId>

  • <common><jmx><port> and <rmiPort>

  • <heritrix><guiport> and <jmxPort>

  • <serverDir>harvester_high_2</serverDir>

and set

  • <queuePriority>HIGHPRIORITY</queuePriority>

e.g.:

  • <applicationName name="dk.netarkivet.harvester.harvesting.HarvestControllerApplication">

    • <settings>

      • <common>

        • <applicationInstanceId>high2</applicationInstanceId>

        • <jmx>

          • <port>8112</port>

          • <rmiPort>8212</rmiPort>

          </jmx>

        </common>

      • <harvester>

        • <harvesting>

          • <queuePriority>HIGHPRIORITY</queuePriority> <heritrix>

            • <guiPort>8192</guiPort> <!-- T: jmxPort to be modified by test (was 8093) --> <jmxPort>8193</jmxPort>

              • <jmxUsername>controlRole</jmxUsername>

              • <jmxPassword>R_D</jmxPassword>

            </heritrix>

          • <serverDir>harvester_high_2</serverDir>

          </harvesting>

        </harvester>

      </settings>

    </applicationName>

How to configure which Heritrix report has to be uploaded in the metadata ARC file

Three settings properties control which heritrix reports are added to the metadata ARC file:

- settings.harvester.harvesting.metadata.heritrixFilePattern is a java pattern that allows you select which files in the crawl dir (not recursively) to include in the metadata ARC.

- settings.harvester.harvesting.metadata.reportFilePattern is also a java pattern that controls which subset of the files selected by heritrixFilePattern are to be considered as report files. All the other files will be considered as setup files.

- settings.harvester.harvesting.metadata.logFilePattern is a third java pattern that controls which files in the logs subdirectory of the crawldir are to be added as log files to the metadata ARC.

Installation Manual 3.16/AppendixC (last edited 2011-02-21 12:21:38 by SoerenCarlsen)