Here is our running setup for the actual snapshot harvest at KB/SB in Denmark:

We expect to download about 22 TB in 12 weeks with a mix of old and new bitarchive and harvester servers at KB and SB (preparation of the harvest takes about 3 weeks).

Today, 2.june 2009, we have 16 running harvester instances at KB - (6 are actually downloading): 6 harvesters on each DL380 G4 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs) 2 Harvesters on each DL380 G5 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)

30 running SW harvesters at SB - 11 are actually downloading: 6 harvesters on each Dell 2850 x 2 (1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs) 6 harvesters on each DL380 G3 x 1 (1 snapshot harvest instance - the rest is ready for or running selective harvest jobs) 6 Harvesters on each DL380 G5 x 2 (3 snapshot harvest instances on 1 machine - the rest is ready for selective harvest jobs)

In total we have 13 harvest instances running snapshot ( 4 at KB and 9 at SB and one extra DL380 G5 harvest server in reserve at KB) The rest is ready for or running selective/event harvest jobs ( 12 on KB and 21 on SB).

Total archive storage ca. 183 TB (currently used 90 TB)

The new bitarchive servers e.g.: 1 x 360 server with 6 bitapps running 24 hours stores in avg. 240 GB ( measured over 4 days: 90 GB - 360 GB) Our server stress test shows, that the new bitarchive servers can store 16,3 TB within 24 hours running 6 write processes in parallel to 6 RAIDS!

Each new harvester has currently a avg. capacity of 24MB/sec per connection and can manage 5 snapshot harvest instances per new machine ( the old harvest servers can only manage 1 snapshot instance per machine).

The download capacity is also dependent on how the Heritrix order.xml's are configured !

There are 15 viewerproxy access instances for QA running at SB plus 1 tomcat and 1 apache (for wayback). At KB there are 10 viewerproxy access instances for QA and 1 Lucene index server.

Your network should run min. 1 GB or more. You should have a firewall setup which can handle in parallel min 30 - 90 MBit/sec. At SB/KB we have 3 firewalls! The 2 firewalls at KB is currently our main bottleneck. The central admin machine with the JMS-broker, ADMGui, ArcRepository, BitarchiveMonitors, Derby database and Apache servers for secure login is also a bottleneck and single point of failure and should by mirrored or be in a cluster failover setup.

Here is our HW setup:

Bitarchive storage servers at SB:

Harvester servers at SB:

Access machines at SB:

Bitarchive storage servers at KB new architecture:

Harvester servers at KB:

Access machines at KB:

For a similar deploy installation see the first deploy example in chapter 10.1 in the Installation Manual

Installation Manual 3.12/AppendixC/deploy_example.xml

Installation Manual 3.12/AppendixC/HW SW KB SB prod june 2009 (last edited 2010-08-16 10:24:28 by localhost)