Differences between revisions 5 and 6
Revision 5 as of 2010-11-24 15:14:26
Size: 3555
Comment:
Revision 6 as of 2011-04-27 11:41:29
Size: 388
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
This section includes an overall description of the !NetarchiveSuite modules. Additional information can be found in the [[Overview 3.16|Overview]] document. For a general overview see the [[Overview 3.16|Overview]] document.
Line 9: Line 9:
There are seven modules in the !NetarchiveSuite software. This section gives an overview of what each module contains. All Java sourcefiles are found in the ~+`src`+~ directory, and all packages start with ~+`dk.netarkivet`+~. Units tests are similarly arranged, but under ~+`tests`+~ instead of ~+`src`+~. The web interface definitions are found in the ~+`webpages`+~ directory. The ~+`lib`+~ directory contains all the libraries necessary to compile and run the code.

More detailed descriptions are given later in this document.

== Access (Viewerproxy) ==
The ~+`dk.netarkivet.viewerproxy`+~ package implements a simple access client to the archived data, based on web-page proxying. For more details please refer to [[#DetailedAccessDescription|Detailed Access Design]].

== Archive ==
The ~+`dk.netarkivet.archive`+~ package and its subpackages provide redundant, distributed storage primarily for ARC files as well as Lucene indexing of same. The ~+`arcrepository`+~ subpackage contains the logic of keeping multiple bit archives synchronized. The ~+`bitarchive`+~ subpackage contains the application that stores the actual files and manages access to them. The ~+`indexserver`+~ subpackage handles merging CDX files and crawl.log files into a Lucene index used for deduplication and for viewerproxy access. The ~+`checksum`+~ subpackage contains the checksum replica code. For more details please refer to [[#DetailedArchiveDescription|Detailed Archive Design]]

== Common ==
The ~+`dk.netarkivet.common`+~ package and its subpackages provide module-neutral code partly of a generic nature, partly specific to !NetarchiveSuite, e.g. settings and channels. For more details please refer to [[#DetailedCommonDescription_Detailed_Common_Design|#DetailedCommonDescription Detailed Common Design]]

== Deploy ==
The ~+`dk.netarkivet.deploy`+~ module contains software for installing !NetarchiveSuite on multiple machines. This module is only used in the deployment phase. For more details please refer to [[#DetailedDeployDescription_Detailed_Deploy_Design|#DetailedDeployDescription Detailed Deploy Design]]

== Harvester ==
The ~+`dk.netarkivet.harvester`+~ package and its subpackages handle the definition and execution of harvests. Its main parts are the database containing the harvest definitions (the ~+`datamodel`+~ subpackage), the webinterface that the user can access the database with, the ~+`scheduler`+~ subpackage which handles scheduling and splitting into jobs, and the ~+`harvesting`+~ subpackage which encapsulates running Heritrix and sending the results off to the archive. For more details please refer to [[#DetailedHarvesterDescription_Detailed_Harvester_Design|#DetailedHarvesterDescription Detailed Harvester Design]]

== Monitor ==
The ~+`dk.netarkivet.monitor`+~ package provides web-access to JMX-packaged information from all !NetarchiveSuite applications. For more details please refer to [#DetailedMonitorDescription Detailed Monitor Design]

== Wayback ==
The ~+`dk.netarkivet.wayback`+~ package provides tools for integrating !NetarchiveSuite with the open-source wayback machine for browsing webarchives. These are described in the Additional Tools Manual.
For more detailed information on the different modules, see the related javadoc [[https://sbforge.org/maven/netarchivesuite/apidocs/3.16.0|javadoc|]].

Overall Systems Design

edit

For a general overview see the Overview document.

For more detailed information on the different modules, see the related javadoc javadoc.

System Design 3.16/Overall Systems Description (last edited 2011-04-28 06:32:33 by MikisSethSorensen)