TEST2: Std-2 (snapshotharvesting, configurations, bytelimits, alias's and domainlists)
Testgoals: Test snaphots harvesting in detail and subsequent follow-up harvesting
To testwriters: This test should not contain non-standard snapshotharvesting behavior. After install it should not be necessary to use shell script or commandline statements.
Follow the instructions in the section 10. Easy Installation of NetarchiveSuite in Installation manualBefore start, set <deduplication><enabled>false</enabled></deduplication> in deploy_example_one_machine.xml
If you are netarkiv.dk tester, then follow these instructions : Netarkiv Installation setup
Items
Status 1 |
Status 2 |
Status 3 |
Notes |
Known open bugs |
Bugs tested |
New bugs found |
Previous bugs |
1. Check single domain creation, harvest config and domain statistics |
OK |
|
|
|
1060 is still not fixed |
|
|
873,892, 973,1051,1033,1060 |
1.a Check global crawlertraps |
OK |
|
|
|
|
|
|
1889 |
2. Update bytelimits for 6 domains |
OK |
|
|
|
|
|
|
|
3. Search and add alias in ADM GUI |
OK |
|
|
|
|
|
|
894,895,896 |
4. Check that chains of alias is prevented |
OK |
|
|
|
|
|
|
954 |
6. Start a snapshot harvest with max 100000 bytes |
OK |
|
|
|
|
|
|
|
7. Verify that alias domain is not harvested |
OK |
|
|
|
|
|
|
|
8. Check that the 1. snapshot harvest has reached the expected byte limits |
OK |
|
|
|
|
|
|
998 |
9. Add sulnudu-alias via ADM GUI |
OK |
|
|
|
|
|
|
|
10. Change byte limit on a domain |
OK |
|
|
|
|
|
|
|
11. Start of a snapshot harvest with max bytes limit 5 mb. (takes min. 1 hour) |
OK |
|
|
|
|
|
|
|
12. Go to Heritrix GUI, verify the job is running and "pause" the job |
OK |
|
|
|
Bug 1791 |
|
|
1741 |
13. Go to the System overview in ADM GUI and check the job is paused and there are no error messages |
OK |
|
|
|
|
|
|
|
14. Go to Heritrix GUI , do some overrides and resume the job |
OK |
|
|
|
|
FR 1765 |
|
|
15. Go to the System overview in ADM GUI and check the job is running again and there are no error messages |
OK |
|
|
|
|
|
|
|
15.a Check , that overrides are in the QA reports |
OK |
|
|
|
|
FR 1765 |
|
|
15.b Restart the system |
OK |
|
|
|
|
|
|
|
15.c Resubmit the failed job and verify it is DONE |
N/A |
|
|
Job was finished |
|
|
|
|
16. Verify that no alias domains are harvested |
OK |
|
|
|
|
|
|
|
17. Check that the 2. snapshot harvest has reached the expected byte limits |
OK |
|
|
|
|
|
|
|
(18. Check , that objects limits are respected) |
OK |
|
|
Object limit on domain and harvest does work simultaneously. (E.g. domain max 10 and harvest max 25, gives harvest with 25 objects) |
|
|
|
removed from test (https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-222) |
19. Check , that objects are not deduplicated |
OK |
|
|
|
|
|
|
|
If you are netarkiv.dk tester, here is the shutdown instructions: Shutdown the system
Additional comments on test results:
Step 8: here are the actual numbers, these seem incoherent.
Domain |
Domain-config byte limit |
Bytes Harvested |
Stop Reason |
lecanardenchaine.fr |
150,000 |
116,234 |
Max Bytes limit reached |
gameblog.fr |
100,000 |
148,863 |
Domain-config byte limit reached |
gamekult.com |
150,000 |
154,239 |
Domain-config byte limit reached |
lexpansion.com |
100,000 |
139,100 |
Domain-config byte limit reached |
lemonde.fr |
150,000 |
369,745 |
Domain-config byte limit reached |
allocine.fr |
50,000 |
157,480 |
Domain-config byte limit reached |
Maybe harvested byte count includes bytes harvested on other domains reached from the domain's seeds?