Overall Systems Design

edit

This section includes overall descriptions of the modules. Additional inforation can be found in the Overview document.

There are seven modules in the NetarchiveSuite software. This section gives an overview of what's contained in each module, and points out some of the most important packages. All sources are found in the src directory, and all packages start with dk.netarkivet. Units tests are similarly arranged, but under tests instead of src. The web interface definitions are found in the webpages directory.

More detailed descriptions are given later in this document.

Access (Viewerproxy)

The dk.netarkivet.viewerproxy package implements a simple access client to the archived data, based on web-page proxying. For more details please refer to Detailed Access Design.

Archive

The dk.netarkivet.archive package and its subpackages provide redundant, distributed storage primarily for ARC files as well as Lucene indexing of same. The arcrepository subpackage contains the logic of keeping multiple bit archives synchronized. The bitarchive subpackage contains the application that stores the actual files and manages access to them. The indexserver subpackage handles merging CDX files and crawl.log files into a Lucene index used for deduplication and for viewerproxy access. For more details please refer to Detailed Archive Design

Common

The dk.netarkivet.common package and its subpackages provide module-neutral code partly of a generic nature, partly specific to NetarchiveSuite, e.g. settings and channels. For more details please refer to Detailed Common Design

Deploy

The dk.netarkivet.deploy module contains software for installing NetarchiveSuite onto multiple machines. This module is only used in the deployment phase. For more details please refer to Detailed Deploy Design

Harvester

The dk.netarkivet.harvester package and its subpackages handle the definition and execution of harvests. Its main parts are the database containing the harvest definitions (the datamodel subpackage), the webinterface that the user can access the database with, the scheduler subpackage which handles scheduling and splitting into jobs, and the harvesting subpackage which encapsulates running Heritrix and sending the results off to the archive. For more details please refer to Detailed Harvester Design

Monitor

The dk.netarkivet.monitor package provides web-access to JMX-packaged information from all NetarchiveSuite applications. For more details please refer to Detailed Monitor Design

Wayback

The dk.netarkivet.wayback package provides tools for integrating NetarchiveSuite with the open-source wayback machine for browsing webarchives. These are described in the Additional Tools Manual.

System Design 3.10/Overall Systems Description (last edited 2010-08-16 10:24:06 by localhost)