Differences between revisions 1 and 2
Revision 1 as of 2009-11-17 08:16:59
Size: 7099
Editor: KaareChristiansen
Comment: Generated the documentation branch for 3.12
Revision 2 as of 2010-08-16 10:24:28
Size: 7103
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 132: Line 132:
attachment:Installation_Manual_3.12/AppendixC/deploy_example.xml [[attachment:Installation Manual 3.12/AppendixC/deploy_example.xml]]

Here is our running setup for the actual snapshot harvest at KB/SB in Denmark:

We expect to download about 22 TB in 12 weeks with a mix of old and new bitarchive and harvester servers at KB and SB (preparation of the harvest takes about 3 weeks).

Today, 2.june 2009, we have 16 running harvester instances at KB - (6 are actually downloading): 6 harvesters on each DL380 G4 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs) 2 Harvesters on each DL380 G5 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)

30 running SW harvesters at SB - 11 are actually downloading: 6 harvesters on each Dell 2850 x 2 (1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs) 6 harvesters on each DL380 G3 x 1 (1 snapshot harvest instance - the rest is ready for or running selective harvest jobs) 6 Harvesters on each DL380 G5 x 2 (3 snapshot harvest instances on 1 machine - the rest is ready for selective harvest jobs)

In total we have 13 harvest instances running snapshot ( 4 at KB and 9 at SB and one extra DL380 G5 harvest server in reserve at KB) The rest is ready for or running selective/event harvest jobs ( 12 on KB and 21 on SB).

Total archive storage ca. 183 TB (currently used 90 TB)

The new bitarchive servers e.g.: 1 x 360 server with 6 bitapps running 24 hours stores in avg. 240 GB ( measured over 4 days: 90 GB - 360 GB) Our server stress test shows, that the new bitarchive servers can store 16,3 TB within 24 hours running 6 write processes in parallel to 6 RAIDS!

Each new harvester has currently a avg. capacity of 24MB/sec per connection and can manage 5 snapshot harvest instances per new machine ( the old harvest servers can only manage 1 snapshot instance per machine).

The download capacity is also dependent on how the Heritrix order.xml's are configured !

There are 15 viewerproxy access instances for QA running at SB plus 1 tomcat and 1 apache (for wayback). At KB there are 10 viewerproxy access instances for QA and 1 Lucene index server.

Your network should run min. 1 GB or more. You should have a firewall setup which can handle in parallel min 30 - 90 MBit/sec. At SB/KB we have 3 firewalls! The 2 firewalls at KB is currently our main bottleneck. The central admin machine with the JMS-broker, ADMGui, ArcRepository, BitarchiveMonitors, Derby database and Apache servers for secure login is also a bottleneck and single point of failure and should by mirrored or be in a cluster failover setup.

Here is our HW setup:

Bitarchive storage servers at SB:

  • number of machines: 2

    model: Dell PowerEdge 2850 and 2950 processors : 2 * Intel Xeon 2.8 GHz and Intel Xeon 2.0 GHz both hyperthreaded RAM: 4GB local hard disk: 73GB mirrored local 32 TB in SAN (raid 5 and raid 6) and 73GB mirrored local 73 TB in SAN (raid 5 and raid 6) network interface: operating system: Linux red hat (RHEL)

Harvester servers at SB:

  • number of machines: 2

    model: Dell PowerEdge 2850 processors: 2 * Intel Xeon CPU 3.20GHz hyperthreaded disk: 600GB (3 300GB in raid-5) RAM: 4GB network interface: 1 Gbit/s OS: Linux Centos number of machines: 1

    model: HP ProLiant DL380 G4 processors: 2 * Intel Xeon 2.8 GHz hyperthreaded disk: 340 GB (6 73GB in raid-5) RAM: 2,5GB network interface: 1 Gbit/s OS: Linux Centos number of machines: 2

    model: HP ProLiant DL380 G5 processors: 2 * Intel Xeon 2.0 GMz 4 cores disk: 956 GB (8 143GB in raid-5) RAM: 10GB network interface: 1 Gbit/s OS: Linux Centos

Access machines at SB:

  • number of machines: 1

    model: Dell PowerEdge 2850 processors : 2 CPU x 3GHZ RAM: 2 GB local hard disk 1,5 TB local + 4 TB SAN ( for wayback) network interface: 1 Gbit/s OS: Linux

Bitarchive storage servers at KB new architecture:

  • number of machines: 12 model: HP DL360 G5 processors : 2 x QC CPU 2 GHZ RAM: 3 GB Controllers: Internal P400, External p800 Storage: 2 x MSA60 one with 3 x RAID 5 with (3 TB) and the other with 2 x RAID 5 (3 TB) , 1 x RAID 5 (2TB) and 1 TB without RAID for temp data local hard disk: 2 x 72 GB RAID 1 for OS/Software network interface: Gigabit operating system: Windows Web 2008 temp-storage to batch jobs: 5%

Harvester servers at KB:

  • number of machines: 2 model: HP DL380 G4 processors : 2 CPU x 3GHZ RAM: 4 GB local hard disk: 6 x 72 GB network interface: OS: Linux number of machines: 2

    model: HP ProLiant DL380 G5 processors: 2 * Intel Xeon 2.0 GHZ 4 cores disk: 956 GB (8 x 146 GB in raid-5) RAM: 10GB network interface: 1 Gigabit OS: Linux Centos

Access machines at KB:

  • number of machines: 1 model: HP DL380 G4 processors : 1 CPU x 3GHZ RAM: 2 GB local hard disk: 2 x 72 GB + 4 x 300 GB network interface: 1 Gbit/s OS: Linux

For a similar deploy installation see the first deploy example in chapter 10.1 in the Installation Manual

Installation Manual 3.12/AppendixC/deploy_example.xml

Installation Manual 3.12/AppendixC/HW SW KB SB prod june 2009 (last edited 2010-08-16 10:24:28 by localhost)