## page was renamed from Developer Manual 3.14/Overall Systems Description <> = Overall Systems Design = <> This section includes an overall description of the !NetarchiveSuite modules. Additional information can be found in the [[Overview 3.14|Overview]] document. There are seven modules in the !NetarchiveSuite software. This section gives an overview of what each module contains. All Java sourcefiles are found in the ~+`src`+~ directory, and all packages start with ~+`dk.netarkivet`+~. Units tests are similarly arranged, but under ~+`tests`+~ instead of ~+`src`+~. The web interface definitions are found in the ~+`webpages`+~ directory. The ~+`lib`+~ directory contains all the libraries necessary to compile and run the code. More detailed descriptions are given later in this document. == Access (Viewerproxy) == The ~+`dk.netarkivet.viewerproxy`+~ package implements a simple access client to the archived data, based on web-page proxying. For more details please refer to [[#DetailedAccessDescription|Detailed Access Design]]. == Archive == The ~+`dk.netarkivet.archive`+~ package and its subpackages provide redundant, distributed storage primarily for ARC files as well as Lucene indexing of same. The ~+`arcrepository`+~ subpackage contains the logic of keeping multiple bit archives synchronized. The ~+`bitarchive`+~ subpackage contains the application that stores the actual files and manages access to them. The ~+`indexserver`+~ subpackage handles merging CDX files and crawl.log files into a Lucene index used for deduplication and for viewerproxy access. The ~+`checksum`+~ subpackage contains the checksum replica code. For more details please refer to [[#DetailedArchiveDescription|Detailed Archive Design]] == Common == The ~+`dk.netarkivet.common`+~ package and its subpackages provide module-neutral code partly of a generic nature, partly specific to !NetarchiveSuite, e.g. settings and channels. For more details please refer to [[#DetailedCommonDescription|Detailed Common Design]] == Deploy == The ~+`dk.netarkivet.deploy`+~ module contains software for installing !NetarchiveSuite on multiple machines. This module is only used in the deployment phase. For more details please refer to [[#DetailedDeployDescription|Detailed Deploy Design]] == Harvester == The ~+`dk.netarkivet.harvester`+~ package and its subpackages handle the definition and execution of harvests. Its main parts are the database containing the harvest definitions (the ~+`datamodel`+~ subpackage), the webinterface that the user can access the database with, the ~+`scheduler`+~ subpackage which handles scheduling and splitting into jobs, and the ~+`harvesting`+~ subpackage which encapsulates running Heritrix and sending the results off to the archive. For more details please refer to [[#DetailedHarvesterDescription|Detailed Harvester Design]] == Monitor == The ~+`dk.netarkivet.monitor`+~ package provides web-access to JMX-packaged information from all !NetarchiveSuite applications. For more details please refer to [[#DetailedMonitorDescription|Detailed Monitor Design]] == Wayback == The ~+`dk.netarkivet.wayback`+~ package provides tools for integrating !NetarchiveSuite with the open-source wayback machine for browsing webarchives. The tools we provide can be divided under three headings: * A plugin to enable wayback to access arcrecords through a !NetarchiveSuite arcrepository. This is described in the Additional Tools Manual * Batch jobs to enable indexing of arcfiles and deduplication records via the batch interface to the !NetarchiveSuite arcrepository. These are also described in the Additional Tools Manual. * Two command-line applications which enable the continuous automatic indexing of a running !NetarchiveSuite installation. These applications are deployed along with other !NetarchiveSuite applications and they are therefore described in the Configuration Manual.