title(NetarchiveSuite Overview)

Note that this document is for the development version of NetarchiveSuite. It will be updated continuously to reflect the currently developed SVN repository For documentation for the stable version, please refer to the stable ["Overview"]

software to harvest, archive and browse large parts of the internet.

Action(print,Printer friendly version)

TableOfContents

Introduction

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our webcrawler. The NetarchiveSuite can organize three different kinds of harvests:

The software has been designed with the following in mind:

The modules in the NetarchiveSuite

The NetarchiveSuite is split into four main modules: One module with common functionality and three modules corresponding to ingesting, archiving and accessing (See illustration)

=== dk.netarkivet.common module === This module contains the framework, and utilities used by the whole suite, like exceptions, settings, messaging, filetransfer (RemoteFile), and logging. It also defines the interfaces used to communicate between the different modules, to support alternative implementations.

dk.netarkivet.harvester module

This module handles defining, scheduling, and performing harvests.

dk.netarkivet.archive module

This module allows running a repository with replication, active bit consistency checks for bitpreservation, and support for distributed batch jobs on the archive.

dk.netarkivet.viewerproxy module

This module gives access to previously harvested material, through a proxy solution.

For developers