Welcome to the NetarchiveSuite

The NetarchiveSuite software is developed by the two national deposit libraries in Denmark, [http://www.kb.dk/ The Royal Library] and [http://www.statsbiblioteket.dk The State and University Library], and has been running in production, harvesting the Danish world wide web for three years. The Danish netarchive currently contains over 70 TB of data that are mirrored on two different geographical locations.

The NetarchiveSuite is the complete web archiving software package developed within the netarchive.dk project from 2004 and onwards. The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. It scales to a wide range of tasks, from small, thematic harvests (e.g. related to special events, or special domains) to harvesting and archiving the content of an entire national domain. The software has built-in bit preservation functionality. The systems architecture allows for the software to be distributed among several machines, possibly on more than one geographical location. The NetarchiveSuite is built around the Heritrix web crawler, which is fundamental for the NetarchiveSuite behavior in harvests of the web. You find more information in the [:Overview:overview].

Include(News)

To get started with NetarchiveSuite, [:Get NetarchiveSuite: download] it and try it out with our [:Quick Start Manual: Quick Start] installation setup, which only requires one standard Linux machine.

The software is released with full source under the LGPL license.