⇤ ← Revision 1 as of 2010-05-04 13:16:38
814
Comment: Generated documentation branch for 3.14
|
← Revision 2 as of 2010-08-16 10:24:35 ⇥
819
converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
[[Action(edit)]] | <<Action(edit)>> |
Line 11: | Line 11: |
. attachment:Overview_3.14/Netarchive_structure_simplified2.png Please refer to the [:Overview 3.14:overview] description for more details. |
. {{attachment:Overview 3.14/Netarchive_structure_simplified2.png}} Please refer to the [[Overview 3.14|overview]] description for more details. |
System overview
The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our webcrawler. The NetarchiveSuite can organize three different kinds of harvests:
- Event harvesting (organize harvests of a set of domains related to a specific event, e.g. 9/11, Royal Weddings, and Elections).
- Selective harvesting (recurrent harvests of a set of domains).
- Snapshot harvesting (organizing a complete snapshot of all known domains)
The NetarchiveSuite is split into three main modules corresponding to harvesting, archiving and accessing via viewerproxy.
Please refer to the overview description for more details.