Attachment 'HW_SW_production_example.txt'
Download 1 Here is our running setup for the actual snapshot harvest at KB/SB in Denmark:
2
3 We expect to download about 22 TB in 12 weeks with a mix of old and new bitarchive and harvester servers at KB and SB
4 (preparation of the harvest takes about 3 weeks).
5
6 Today, 2.june 2009, we have
7 16 running harvester instances at KB - (6 are actually downloading):
8 6 harvesters on each HP DL380 G4 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)
9 2 Harvesters on each HP DL380 G5 x 2 ( 1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)
10
11 30 running SW harvesters at SB - 11 are actually downloading:
12 6 harvesters on each Dell 2850 x 2 (1 snapshot harvest instance on each machine - the rest is ready for or running selective harvest jobs)
13 6 harvesters on each HP DL380 G3 x 1 (1 snapshot harvest instance - the rest is ready for or running selective harvest jobs)
14 6 Harvesters on each HP DL380 G5 x 2 (3 snapshot harvest instances on 1 machine - the rest is ready for selective harvest jobs)
15
16 In total we have 13 harvest instances running snapshot ( 4 at KB and 9 at SB and one extra DL380 G5 harvest server in reserve at KB)
17 The rest is ready for or running selective/event harvest jobs ( 12 on KB and 21 on SB).
18
19 Total archive storage ca. 183 TB (August 2009 112 TB used)
20
21 The new bitarchive servers e.g.:
22 1 x 360 server with 6 bitapps running 24 hours stores in avg. 240 GB ( measured over 4 days: 90 GB - 360 GB)
23 Our server stress test shows, that the new bitarchive servers can store 16,3 TB within 24 hours running 6 write processes in parallel to 6 RAIDS!
24
25 Each new harvester has currently a avg. capacity of 24MB/sec per connection and can manage 5 snapshot harvest instances per new machine ( the old harvest servers can only manage 1
26 snapshot instance per machine).
27
28 The download capacity is also dependent on how the Heritrix order.xml's are configured!
29
30 There are 15 viewerproxy access instances for QA running at SB plus 1 tomcat and 1 apache (for wayback).
31 At KB there are 10 viewerproxy access instances for QA and 1 Lucene index server.
32
33 Your network should run min. 1 GB or more.
34
35 You should have a firewall setup which can handle in parallel min 30 - 90 MBit/sec.
36 At SB/KB we have 3 firewalls! The 2 firewalls at KB is currently our main bottleneck.
37 The central admin machine with the JMS-broker, ADMGui, ArcRepository, BitarchiveMonitors, Derby database and Apache servers for secure login is also a bottleneck and single point of failure and should by mirrored or be in a cluster failover setup.
38
39 Here is our HW setup:
40
41 Bitarchive storage servers at SB:
42
43 number of machines: 2
44 model: Dell PowerEdge 2850 and 2950
45 processors : 2 * Intel Xeon 2.8 GHz and Intel Xeon 2.0 GHz both hyperthreaded
46 RAM: 4GB
47 local hard disk: 73GB mirrored local 32 TB in SAN (raid 5 and raid 6) and 73GB mirrored local 73 TB in SAN (raid 5 and raid 6)
48 network interface:
49 operating system: Linux red hat (RHEL)
50
51 Harvester servers at SB:
52
53 number of machines: 2
54 model: Dell PowerEdge 2850
55 processors: 2 * Intel Xeon CPU 3.20GHz hyperthreaded
56 disk: 600GB (3 300GB in raid-5)
57 RAM: 4GB
58 network interface: 1 Gbit/s
59 OS: Linux Centos
60
61 number of machines: 1
62 model: HP ProLiant DL380 G4
63 processors: 2 * Intel Xeon 2.8 GHz hyperthreaded
64 disk: 340 GB (6 73GB in raid-5)
65 RAM: 2,5GB
66 network interface: 1 Gbit/s
67 OS: Linux Centos
68
69 number of machines: 2
70 model: HP ProLiant DL380 G5
71 processors: 2 * Intel Xeon 2.0 GMz 4 cores
72 disk: 956 GB (8 143GB in raid-5)
73 RAM: 10GB
74 network interface: 1 Gbit/s
75 OS: Linux Centos
76
77 Access machines at SB:
78
79 number of machines: 1
80 model: Dell PowerEdge 2850
81 processors : 2 CPU x 3GHZ
82 RAM: 2 GB
83 local hard disk 1,5 TB local + 4 TB SAN ( for wayback)
84 network interface: 1 Gbit/s
85 OS: Linux
86
87 Bitarchive storage servers at KB new architecture:
88
89 number of machines: 12
90 model: HP DL360 G5
91 processors : 2 x QC CPU 2 GHZ
92 RAM: 3 GB
93 Controllers: Internal P400, External p800
94 Storage: 2 x MSA60 one with 3 x RAID 5 with (3 TB) and the other with 2 x RAID 5 (3 TB) , 1 x RAID 5 (2TB) and 1 TB without RAID for temp data
95 local hard disk: 2 x 72 GB RAID 1 for OS/Software
96 network interface: Gigabit
97 operating system: Windows Web 2008
98 temp-storage to batch jobs: 5%
99
100 Harvester servers at KB:
101
102 number of machines: 2
103 model: HP DL380 G4
104 processors : 2 CPU x 3GHZ
105 RAM: 4 GB
106 local hard disk: 6 x 72 GB
107 network interface:
108 OS: Linux
109
110 number of machines: 2
111 model: HP ProLiant DL380 G5
112 processors: 2 * Intel Xeon 2.0 GHZ 4 cores
113 disk: 956 GB (8 x 146 GB in raid-5)
114 RAM: 10GB
115 network interface: 1 Gigabit
116 OS: Linux Centos
117
118 Access machines at KB:
119
120 number of machines: 1
121 model: HP DL380 G4
122 processors : 1 CPU x 3GHZ
123 RAM: 2 GB
124 local hard disk: 2 x 72 GB + 4 x 300 GB
125 network interface: 1 Gbit/s
126 OS: Linux
127
128 For a similar deploy installation see the first deploy example in chapter 10.1 in the Installation Manual
129 ( https://netarchive.dk/suite/Installation_Manual_devel/AppendixC?action=AttachFile&do=get&target=deploy_example.xml )
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.