TEST2: Std-2 (snapshotharvesting, configurations, bytelimits, alias's and domainlists)

Testgoals: Test snaphots harvesting in detail and subsequent follow-up harvesting

To testwriters: This test should not contain non-standard snapshotharvesting behavior. After install it should not be necessary to use shell script or commandline statements.

Follow the instructions in the section 10. Easy Installation of NetarchiveSuite in Installation manualBefore start, set <deduplication><enabled>false</enabled></deduplication> in deploy_example_one_machine.xml

If you are netarkiv.dk tester, then follow these instructions : Netarkiv Installation setup

Items

Status 1

Status 2

Status 3

Notes

Known open bugs

Bugs tested

New bugs found

Previous bugs

1. Check single domain creation, harvest config and domain statistics

OK

1060 is still not fixed

873,892, 973,1051,1033,1060

1.a Check global crawlertraps

OK

1889

2. Update bytelimits for 6 domains

OK

3. Search and add alias in ADM GUI

OK

894,895,896

4. Check that chains of alias is prevented

OK

954

6. Start a snapshot harvest with max 100000 bytes

OK

7. Verify that alias domain is not harvested

OK

8. Check that the 1. snapshot harvest has reached the expected byte limits

OK

998

9. Add sulnudu-alias via ADM GUI

OK

10. Change byte limit on a domain

OK

11. Start of a snapshot harvest with max bytes limit 5 mb. (takes min. 1 hour)

OK

12. Go to Heritrix GUI, verify the job is running and "pause" the job

OK

Bug 1791

1741

13. Go to the System overview in ADM GUI and check the job is paused and there are no error messages

OK

14. Go to Heritrix GUI , do some overrides and resume the job

OK

FR 1765

15. Go to the System overview in ADM GUI and check the job is running again and there are no error messages

OK

15.a Check , that overrides are in the QA reports

OK

FR 1765

15.b Restart the system

OK

15.c Resubmit the failed job and verify it is DONE

N/A

Job was finished

16. Verify that no alias domains are harvested

OK

17. Check that the 2. snapshot harvest has reached the expected byte limits

OK

(18. Check , that objects limits are respected)

OK

Object limit on domain and harvest does work simultaneously. (E.g. domain max 10 and harvest max 25, gives harvest with 25 objects)

removed from test (https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-222)

19. Check , that objects are not deduplicated

OK

If you are netarkiv.dk tester, here is the shutdown instructions: Shutdown the system

Additional comments on test results:

Step 8: here are the actual numbers, these seem incoherent.

Domain

Domain-config byte limit

Bytes Harvested

Stop Reason

lecanardenchaine.fr

150,000

116,234

Max Bytes limit reached

gameblog.fr

100,000

148,863

Domain-config byte limit reached

gamekult.com

150,000

154,239

Domain-config byte limit reached

lexpansion.com

100,000

139,100

Domain-config byte limit reached

lemonde.fr

150,000

369,745

Domain-config byte limit reached

allocine.fr

50,000

157,480

Domain-config byte limit reached

Maybe harvested byte count includes bytes harvested on other domains reached from the domain's seeds?

TEST2 (last edited 2012-03-29 12:58:11 by ColinRosenthal)