Differences between revisions 2 and 3
Revision 2 as of 2009-06-09 07:42:19
Size: 7066
Editor: TueLarsen
Comment:
Revision 3 as of 2009-06-09 07:43:14
Size: 3
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Here is our running setup for the actual snapshot harvest at KB/SB in Denmark:

We expect to download about 22 TB in 12 weeks with a mix of old and new bitarchive and harvester servers at KB and SB
(preparation of the harvest takes about 3 weeks).

Today, 2.june 2009, we have
16 running harvester instances at KB - (6 are actually downloading):
6 harvesters on each DL380 G4 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)
2 Harvesters on each DL380 G5 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)

30 running SW harvesters at SB - 11 are actually downloading:
6 harvesters on each Dell 2850 x 2 (1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)
6 harvesters on each DL380 G3 x 1 (1 snapshot harvest instance - the rest is ready for or running selective harvest jobs)
6 Harvesters on each DL380 G5 x 2 (3 snapshot harvest instances on 1 machine - the rest is ready for selective harvest jobs)

In total we have 13 harvest instances running snapshot ( 4 at KB and 9 at SB and one extra DL380 G5 harvest server in reserve at KB)
The rest is ready for or running selective/event harvest jobs ( 12 on KB and 21 on SB).

Total archive storage ca. 183 TB (currently used 90 TB)

The new bitarchive servers e.g.:
1 x 360 server with 6 bitapps running 24 hours stores in avg. 240 GB ( measured over 4 days: 90 GB - 360 GB)
Our server stress test shows, that the new bitarchive servers can store 16,3 TB within 24 hours running 6 write processes in parallel to 6 RAIDS!
Line 25: Line 2:
Each new harvester has currently a avg. capacity of 24MB/sec per connection and can manage 5 snapshot harvest instances per new machine ( the old harvest servers can only manage 1
snapshot instance per machine).

The download capacity is also dependent on how the Heritrix order.xml's are configured !

There are 15 viewerproxy access instances for QA running at SB plus 1 tomcat and 1 apache (for wayback).
At KB there are 10 viewerproxy access instances for QA and 1 Lucene index server.

Your network should run min. 1 GB or more.
You should have a firewall setup which can handle in parallel min 30 - 90 MBit/sec.
At SB/KB we have 3 firewalls! The 2 firewalls at KB is currently our main bottleneck.
The central admin machine with the JMS-broker, ADMGui, ArcRepository, BitarchiveMonitors, Derby database and Apache servers for secure login is also
a bottleneck and single point of failure and should by mirrored or be in a cluster failover setup.

Here is our HW setup:

Bitarchive storage servers at SB:

                             number of machines: 2
                             model: Dell PowerEdge 2850 and 2950
                             processors : 2 * Intel Xeon 2.8 GHz and Intel Xeon 2.0 GHz both hyperthreaded
                             RAM: 4GB
                             local hard disk: 73GB mirrored local 32 TB in SAN (raid 5 and raid 6) and 73GB mirrored local 73 TB in SAN (raid 5 and raid 6)
                             network interface:
                             operating system: Linux red hat (RHEL)

Harvester servers at SB:

                             number of machines: 2
                             model: Dell PowerEdge 2850
                             processors: 2 * Intel Xeon CPU 3.20GHz hyperthreaded
                             disk: 600GB (3 300GB in raid-5)
                             RAM: 4GB
                             network interface: 1 Gbit/s
                             OS: Linux Centos

                             number of machines: 1
                             model: HP ProLiant DL380 G4
                             processors: 2 * Intel Xeon 2.8 GHz hyperthreaded
                             disk: 340 GB (6 73GB in raid-5)
                             RAM: 2,5GB
                             network interface: 1 Gbit/s
                             OS: Linux Centos

                             number of machines: 2
                             model: HP ProLiant DL380 G5
                             processors: 2 * Intel Xeon 2.0 GMz 4 cores
                             disk: 956 GB (8 143GB in raid-5)
                             RAM: 10GB
                             network interface: 1 Gbit/s
                             OS: Linux Centos

Access machines at SB:

                             number of machines: 1
                             model: Dell PowerEdge 2850
                             processors : 2 CPU x 3GHZ
                             RAM: 2 GB
                             local hard disk 1,5 TB local + 4 TB SAN ( for wayback)
                             network interface: 1 Gbit/s
                             OS: Linux

Bitarchive storage servers at KB new architecture:

                             number of machines: 12
                             model: HP DL360 G5
                             processors : 2 x QC CPU 2 GHZ
                             RAM: 3 GB
                             Controllers: Internal P400, External p800
                             Storage: 2 x MSA60 one with 3 x RAID 5 with (3 TB) and the other with 2 x RAID 5 (3 TB) , 1 x RAID 5 (2TB) and 1 TB without RAID for temp data
                             local hard disk: 2 x 72 GB RAID 1 for OS/Software
                             network interface: Gigabit
                             operating system: Windows Web 2008
                             temp-storage to batch jobs: 5%

Harvester servers at KB:

                             number of machines: 2
                             model: HP DL380 G4
                             processors : 2 CPU x 3GHZ
                             RAM: 4 GB
                             local hard disk: 6 x 72 GB
                             network interface:
                             OS: Linux

                             number of machines: 2
                             model: HP ProLiant DL380 G5
                             processors: 2 * Intel Xeon 2.0 GHZ 4 cores
                             disk: 956 GB (8 x 146 GB in raid-5)
                             RAM: 10GB
                             network interface: 1 Gigabit
                             OS: Linux Centos

Access machines at KB:

                             number of machines: 1
                             model: HP DL380 G4
                             processors : 1 CPU x 3GHZ
                             RAM: 2 GB
                             local hard disk: 2 x 72 GB + 4 x 300 GB
                             network interface: 1 Gbit/s
                             OS: Linux

For a similar deploy installation see the first deploy example in chapter 10.1 in the Installation Manual
( https://netarchive.dk/suite/Installation_Manual_devel/AppendixC?action=AttachFile&do=get&target=deploy_example.xml )

Installation Manual 3.10/AppendixC/HW SW KB SB prod june 2009 (last edited 2010-08-16 10:24:35 by localhost)