=NetarchiveSuite Overview= software to harvest, archive and browse large parts of the internet. Introduction The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our webcrawler. The NetarchiveSuite can organize three different kinds of harvests: * Event harvesting (organize harvests of a set of domains related to a specific event (e.g. 9/11, Royal Weddings, Elections and so on)).

The software has been designed with the following in mind: * Friendly to non-developers - designed to be usable by librarians and curators with a minimum of technical supervision

The modules in the NetarchiveSuite The NetarchiveSuite is split into four main modules: One module with common functionality and three modules corresponding to ingesting, archiving and accessing. dk.netarkivet.common module The framework, and utilities used by the whole suite, like exceptions, settings, messaging, filetransfer (RemoteFile), and logging. It also defines the interfaces used to communicate between the different modules, to support alternative implementations. dk.netarkivet.harvester module This module handles defining, scheduling, and performing harvests. * Harvesting uses Heritrix from Internet Archive as the crawler, the harvesting module allows flexible automated definitions of harvests. The system allows the full power of Heritrix, given knowledge of the Heritrix crawler. NetarchiveSuite wraps the crawler in an easy-to-use interface that handles scheduling and configuring the crawl, and distributing it to several crawling servers.

dk.netarkivet.archive module This module allows running a repository with replication, active bit consistency checks for bitpreservation, and support for distributed batch jobs on the archive. * The archiving component offers a secure environment for storing your harvested material. It is designed for high preservation guarantees on bit preservation.

dk.netarkivet.viewerproxy module This module gives access to previously harvested material, through a proxy solution. * The viewerproxy component supports transparent access to the harvested data, using a proxy solution, and an archive with an index.

For developers * The modules are loosely coupled, communicating through interfaces, with the implementation replacable without recompiling.