Differences between revisions 1 and 2
Revision 1 as of 2010-05-04 13:16:38
Size: 814
Editor: SoerenCarlsen
Comment: Generated documentation branch for 3.14
Revision 2 as of 2010-08-16 10:24:35
Size: 819
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
[[Action(edit)]] <<Action(edit)>>
Line 11: Line 11:
 . attachment:Overview_3.14/Netarchive_structure_simplified2.png
Please refer to the [:Overview 3.14:overview] description for more details.
 . {{attachment:Overview 3.14/Netarchive_structure_simplified2.png}}
Please refer to the [[Overview 3.14|overview]] description for more details.

System overview

edit

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our webcrawler. The NetarchiveSuite can organize three different kinds of harvests:

  • Event harvesting (organize harvests of a set of domains related to a specific event, e.g. 9/11, Royal Weddings, and Elections).
  • Selective harvesting (recurrent harvests of a set of domains).
  • Snapshot harvesting (organizing a complete snapshot of all known domains)

The NetarchiveSuite is split into three main modules corresponding to harvesting, archiving and accessing via viewerproxy.

  • Overview 3.14/Netarchive_structure_simplified2.png

Please refer to the overview description for more details.

Quick Start Manual 3.14/System overview (last edited 2010-08-16 10:24:35 by localhost)