Differences between revisions 13 and 14
Revision 13 as of 2007-06-28 11:08:14
Size: 389
Editor: TueLarsen
Comment:
Revision 14 as of 2007-06-29 16:01:20
Size: 1326
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
... todo ... The NetarchiveSuite software consists of harvesting, preserving and making available parts of the world wide web.

Its harvesting capabilities are built around the Heritrx web crawler from internet archive, but the focus of the NetarchiveSuite is to make the power of Heritrix available to the common librarian or curator.

The archiving module supports distributed storage with active bit integrity checking of large amounts of data, and support for batch runs over the data.

The access module gives a proxy-based approach, where setting a proxy in your browser will give you access to the web, as it looked at the time of harvest.

Everything is released with full source under the LGPL license.
Line 9: Line 17:
... todo ... The NetarchiveSuite software was developed by the two national deposit libraries in Denmark, [http://www.kb.dk/ The Royal Library] and [http://www.statsbiblioteket.dk The State and University Library], and has been running in production, harvesting the Danish world wide web for two years. The Danish netarchive currently contains over 30 TB of data.
Line 12: Line 20:
... todo ... [[Include(News)]]
Line 16: Line 24:

... todo ...

 * Release Note
 * Download
 * Source Code
 * Manuals/Javadoc
 * Release Test

attachment:transparent_logo.png

Introduction

Software to harvest and preserve websites.

The NetarchiveSuite software consists of harvesting, preserving and making available parts of the world wide web.

Its harvesting capabilities are built around the Heritrx web crawler from internet archive, but the focus of the NetarchiveSuite is to make the power of Heritrix available to the common librarian or curator.

The archiving module supports distributed storage with active bit integrity checking of large amounts of data, and support for batch runs over the data.

The access module gives a proxy-based approach, where setting a proxy in your browser will give you access to the web, as it looked at the time of harvest.

Everything is released with full source under the LGPL license.

About the Netarchive/us

The NetarchiveSuite software was developed by the two national deposit libraries in Denmark, [http://www.kb.dk/ The Royal Library] and [http://www.statsbiblioteket.dk The State and University Library], and has been running in production, harvesting the Danish world wide web for two years. The Danish netarchive currently contains over 30 TB of data.

News

Include(News)

Getting Started

Welcome (last edited 2012-04-10 10:43:03 by MikisSethSorensen)