Sanity test 1: snapshot harvest

- add a set of domains - configure the domains' default configuration object limit. On my dev setup I added 8 domains, and made two groups, one with a limit of 100 objects (e.g. kb.dk, statsbiblioteket.dk,netarkivet.dk), some with a 200 object limit (e.g. dbc.dk, kum.dk), and an "outsider" with a 100 object limit (e.g. bs.dk). - define a first snapshot harvest, with no byte limit and an object limit that is lower than the smallest domain config limit you set up (I started at 50). - activate this harvest and let it finish. - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (in my test < 50) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (in my test 50)

- Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value (eg. 50) in the Harvest template (Job Details -> Harvest order template -> Show harvest template for job 1)

Sanity test 2: incremental snapshot harvest

- define a new snapshot harvest, with no byte limit and an object limit that is higher than the highest domain config limit you set (e.g. 100). Make this harvest incremental by having it harvest only domains not completed in your initial harvest - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (e.g.< 100) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (e.g. 100) - "Domain-config object limit reached" with a number of harvested documents that is equal to the default domain configuration limit. That stop reason might be tricky to observe because of deduplication yields "Domain completed" more often on consecutive crawls.

- Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value (eg. 100) in the Harvest template (Job Details -> Harvest order template -> Show harvest template for job 1)

Sanity test 3: selective harvest

- pick a domain e.g. netarkivet.dk and create a new configuration for it, with an object limit. - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config object limit reached" with a number of harvested documents that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value in the order.xml file from the metadata arc

Sanity test 4a: combination of object and byte limit

- pick a domain e.g. netarkivet.dk and create a new configuration for it, with an object limit , and a low byte limit (for instance 100ko and 1000 objects) - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config byte limit reached" with a byte size that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer "group-max-fetch-success" and "group-max-all-kb" parameters are set to the proper limit values in the order.xml file from the metadata arc.

Sanity test 4b: combination of object and byte limit

Pick a domain and create a new configuration for it, with a small object limit , and a high byte limit (for instance 10Mo and 10 objects) - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config object limit reached" with a number of harvested documents that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer "group-max-fetch-success" and "group-max-all-kb" parameters are set to the proper limit values in the order.xml file from the metadata arc

It38CheckObjectLimits (last edited 2011-08-30 08:08:36 by ColinRosenthal)