Differences between revisions 1 and 2
Revision 1 as of 2009-05-26 11:40:46
Size: 822
Editor: KaareChristiansen
Comment: Releasing version 3.8
Revision 2 as of 2010-08-16 10:24:08
Size: 827
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
[[Action(edit)]] <<Action(edit)>>
Line 13: Line 13:
 attachment:Overview_3.8/Netarchive_structure_simplified2.png  {{attachment:Overview 3.8/Netarchive_structure_simplified2.png}}
Line 15: Line 15:
Please refer to the [:Overview 3.8:overview] description for more details. Please refer to the [[Overview 3.8|overview]] description for more details.

System overview

edit

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our webcrawler. The NetarchiveSuite can organize three different kinds of harvests:

  • Event harvesting (organize harvests of a set of domains related to a specific event, e.g. 9/11, Royal Weddings, Elections and so on).
  • Selective harvesting (recurrent harvests of a set of domains).
  • Snapshot harvesting (organizing a complete snapshot of all known domains)

The NetarchiveSuite is split into three main modules corresponding to harvesting, archiving and accessing via viewerproxy.

  • Overview 3.8/Netarchive_structure_simplified2.png

Please refer to the overview description for more details.

Quick Start Manual 3.8/System overview (last edited 2010-08-16 10:24:08 by localhost)