Differences between revisions 3 and 4
Revision 3 as of 2007-06-27 14:36:06
Size: 3244
Comment:
Revision 4 as of 2007-06-27 15:40:26
Size: 4280
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
Scenario C, and D involves having a distributed bitarchive. In these scenarios we have the the bitarchive distributed on two Locations, here called LocationA, and LocationB. These Locations must be written to the general settings.xml before deployment: Setups C, and D involves having a distributed bitarchive. In these setups we have the the bitarchive distributed on two Locations, here called LocationA, and LocationB. These Locations must be written to the general settings.xml before deployment:
Line 31: Line 31:
When you have chosen your setup, you must decide on the number of machines, you want to use in the deployment of the NetarchiveSuite.
For setup A, the answer is of course one.
For the setup B-D, the answer is more complicated.

The NetarchiveSuite operates with 4 kinds of machines:

   - Admin machine (one server): Here we deploy one or more BitarchiveMonitorApplication's (one for each Location),
          one ArcrepositoryApplication, and one HarvestDefinitionApplication (running the GUI, and scheduler). The latter application is the only application using a database.

   - harvester machines (one or more): Here we deploy the HarvesterControllerApplication's and their associated SideKick's

   - bitarchive machines (one or more): BitarchiveApplication

   - access servers (one or more): On these machines, we have the ViewerproxyApplication enabling us to browse in already stored webpages,
       and the IndexServerApplication. The latter must/can only be installed on one of the accesservers.








Line 33: Line 57:

Installing the NetarchiveSuite software

This manual describes how to install and configure the NetarchiveSuite web archive software package. It includes description of how to obtain and install required libraries, how to install the software on separate machines, what command line options and configuration file changes are necessary, and how to start the programs. It then goes on to explain the other parameters available for tuning the behaviour of NetarchiveSuite. It does not explain how to extend the functionality of the system (see the DeveloperManual for this) or how to use the running system (see the UserManual for this).

The intended audience of this manual is system administrators who will be responsible for the actual installation and setup of NetarchiveSuite as well as technical personnel responsible for proper operation of NetarchiveSuite. Knowledge of Unix system administration is expected, and some familiarity with XML and Java is an advantage.

NetarchiveSuite can be installed in a number of different ways, with varying numbers of machines on different sites. To keep clear what is necessary for which setups, we will consider the following types of setup:

  • A. Single-machine setup. This corresponds to the setup used in the QuickstartManual, where all applications run on the same machine, and file transfer can be done simple by copying files locally. It is the simplest setup, but does not scale very well. Note that the scripts used in QuickstartManual resets the system at every restart, including deleting all harvested material. Obviously, this is not the intent for a running installation, so those scripts cannot be used in production environments as they are.

  • B. Single-site setup. In this scenario, multiple machines are involved, necessitating file transfer between machines and multiple installations of the code. However, the machines are expected to be within the same firewall, so port setup should be no problem.
  • C. Single-site setup with duplicate archive. This expands on the single-site setup in that more than one copy of the archives are used, using the concept of separate "locations" to indicate the duplicates.
  • D. Multi-site setup. When more than one site is involved, separated by firewalls, extra issues of opening ports and specifying the correct site come into play. This is the most complex scenario, but also the more secure against systematic errors, hacking, and other disasters.

Setups C, and D involves having a distributed bitarchive. In these setups we have the the bitarchive distributed on two Locations, here called LocationA, and LocationB. These Locations must be written to the general settings.xml before deployment:

<arcrepository>
      ...
      <!-- The names of all bit archive locations in the
                 environment, e.g., "LocationA" and "LocationB". -->
      <location>
        <name>LocationA</name>
      </location>
      <location>
        <name>LocationB</name>
      </location>
      <!-- Default bit archive to use for batch jobs (if none is specified) -->
      <batchLocation>LocationA</batchLocation>
</arcrepository>

When you have chosen your setup, you must decide on the number of machines, you want to use in the deployment of the NetarchiveSuite. For setup A, the answer is of course one. For the setup B-D, the answer is more complicated.

The NetarchiveSuite operates with 4 kinds of machines:

<hr> [[wiki:InstallationManualAppendices]

Installation Manual (last edited 2010-08-16 10:24:51 by localhost)