Differences between revisions 14 and 15
Revision 14 as of 2007-06-29 16:01:20
Size: 1326
Comment:
Revision 15 as of 2007-07-02 08:14:10
Size: 1551
Editor: EldZierau
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
attachment:transparent_logo.png = Welcome to the !NetarchiveSuite =
Line 3: Line 3:
== Introduction ==
Software to harvest and preserve websites.
The NetarchiveSuite is the complete web archiving software package developed within the netarchive.dk project from 2004 and onwards.
The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. The NetarchiveSuite is built
around the Heritrix web crawler and is scalable to national domain level crawls as well as built for small selective and thematic harvests.
The software has built-in bit preservation functionality as well as the overall architecture is distributed among machines and geographical
locations. For more information please refer to ["Overview"]
Line 6: Line 9:
The NetarchiveSuite software consists of harvesting, preserving and making available parts of the world wide web. Latest News on the web is
[[Include(News)]]
To see later news, please refer to [[Include(News)]]
Line 8: Line 13:
Its harvesting capabilities are built around the Heritrx web crawler from internet archive, but the focus of the NetarchiveSuite is to make the power of Heritrix available to the common librarian or curator.

The archiving module supports distributed storage with active bit integrity checking of large amounts of data, and support for batch runs over the data.

The access module gives a proxy-based approach, where setting a proxy in your browser will give you access to the web, as it looked at the time of harvest.
How to see and evaluate the NetarchiveSuite
To get started trying out the software in a simple setup. This should contain all needed information to get a simple test system up and running on a standard linux machine. Please refer to ["Get NetarchiveSuite"] and use the [:QuickStart Manual:Quick Start Manual] for the installation.
Line 16: Line 17:
== About the Netarchive/us == === About the Netarchive/us ===
Line 18: Line 19:

== News ==
[[Include(News)]]

== Getting Started ==
 * Click on the tab ["Get NetarchiveSuite"] and follow the instructions

Welcome to the !NetarchiveSuite

The NetarchiveSuite is the complete web archiving software package developed within the netarchive.dk project from 2004 and onwards. The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. The NetarchiveSuite is built around the Heritrix web crawler and is scalable to national domain level crawls as well as built for small selective and thematic harvests. The software has built-in bit preservation functionality as well as the overall architecture is distributed among machines and geographical locations. For more information please refer to ["Overview"]

Latest News on the web is Include(News) To see later news, please refer to Include(News)

How to see and evaluate the NetarchiveSuite To get started trying out the software in a simple setup. This should contain all needed information to get a simple test system up and running on a standard linux machine. Please refer to ["Get NetarchiveSuite"] and use the [:QuickStart Manual:Quick Start Manual] for the installation. Everything is released with full source under the LGPL license.

About the Netarchive/us

The NetarchiveSuite software was developed by the two national deposit libraries in Denmark, [http://www.kb.dk/ The Royal Library] and [http://www.statsbiblioteket.dk The State and University Library], and has been running in production, harvesting the Danish world wide web for two years. The Danish netarchive currently contains over 30 TB of data.

Welcome (last edited 2012-04-10 10:43:03 by MikisSethSorensen)