Differences between revisions 4 and 5
Revision 4 as of 2009-10-02 12:56:30
Size: 4002
Editor: TueLarsen
Comment:
Revision 5 as of 2009-10-02 13:08:58
Size: 3982
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
- add a set of domains - configure the domains' default configuration object limit. On my dev setup I added 8 domains, and made two groups, one with a limit of 100 objects (e.g. kb.dk, statsbiblioteket.dk,netarkivet.dk), some  with a 200 object limit (e.g. dbc.dk, kum.dk), and an "outsider" with a 100 object limit (e.g. bs.dk). - define a first snapshot harvest, with no byte limit and an object limit that is lower than the smallest domain config limit you set up (I started at 50). - activate this harvest and let it finish. - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (in my test < 50) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (in my test 50) - add a set of domains - configure the domains' default configuration object limit. On my dev setup I added 8 domains, and made two groups, one with a limit of 100 objects (e.g. kb.dk, statsbiblioteket.dk,netarkivet.dk), some with a 200 object limit (e.g. dbc.dk, kum.dk), and an "outsider" with a 100 object limit (e.g. bs.dk). - define a first snapshot harvest, with no byte limit and an object limit that is lower than the smallest domain config limit you set up (I started at 50). - activate this harvest and let it finish. - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (in my test < 50) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (in my test 50)
Line 11: Line 11:
- define a new snapshot harvest, with no byte limit and an object limit that is higher than the highest domain config limit you set (in my case 500). Make this harvest incremental by having it harvest only domains not completed in your initial harvest - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (in my test < 500) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (in my test 500) - "Domain-config object limit reached" with a number of harvested documents that is equal to the default domain configuration limit. That stop reason might be tricky to observe because of deduplication yields "Domain completed" more often on consecutive crawls. - define a new snapshot harvest, with no byte limit and an object limit that is higher than the highest domain config limit you set (e.g. 100). Make this harvest incremental by having it harvest only domains not completed in your initial harvest - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (e.g.< 100) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (e.g. 100) - "Domain-config object limit reached" with a number of harvested documents that is equal to the default domain configuration limit. That stop reason might be tricky to observe because of deduplication yields "Domain completed" more often on consecutive crawls.

Describe It38CheckObjectLimits here.

Sanity test 1: snapshot harvest

- add a set of domains - configure the domains' default configuration object limit. On my dev setup I added 8 domains, and made two groups, one with a limit of 100 objects (e.g. kb.dk, statsbiblioteket.dk,netarkivet.dk), some with a 200 object limit (e.g. dbc.dk, kum.dk), and an "outsider" with a 100 object limit (e.g. bs.dk). - define a first snapshot harvest, with no byte limit and an object limit that is lower than the smallest domain config limit you set up (I started at 50). - activate this harvest and let it finish. - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (in my test < 50) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (in my test 50)

- Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value in the order.xml files from the metadata arc

Sanity test 2: incremental snapshot harvest

- define a new snapshot harvest, with no byte limit and an object limit that is higher than the highest domain config limit you set (e.g. 100). Make this harvest incremental by having it harvest only domains not completed in your initial harvest - Verify that the stop reasons for domains, once the harvest is complete, are one of: - "Domain completed" with a number of harvested documents that is lower than the snapshot limit (e.g.< 100) - "Max object limit reached" with a number of harvested documents that is equal to the snapshot limit (e.g. 100) - "Domain-config object limit reached" with a number of harvested documents that is equal to the default domain configuration limit. That stop reason might be tricky to observe because of deduplication yields "Domain completed" more often on consecutive crawls.

- Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value in the order.xml files from the metadata arc

Sanity test 3: selective harvest

- pick a domain and create a new configuration for it, with an object limit. - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config object limit reached" with a number of harvested documents that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer parameter "group-max-fetch-success" is set to the proper limit value in the order.xml file from the metadata arc

Sanity test 4: combination of object and byte limit

- pick a domain and create a new configuration for it, with an object limit , and a low byte limit (for instance 100ko and 1000 objects) - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config byte limit reached" with a byte size that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer "group-max-fetch-success" and "group-max-all-kb" parameters are set to the proper limit values in the order.xml file from the metadata arc - pick a domain and create a new configuration for it, with a small object limit , and a high byte limit (for instance 10Mo and 10 objects) - create a new selective harvest, add the domain and select the newly created config. - activate the harvest and let it complete - Verify that the stop reasons for the selected domain is "Domain-config object limit reached" with a number of harvested documents that is equal to the selected domain configuration limit. - Verify that the QuotaEnforcer "group-max-fetch-success" and "group-max-all-kb" parameters are set to the proper limit values in the order.xml file from the metadata arc

It38CheckObjectLimits (last edited 2011-08-30 08:08:36 by ColinRosenthal)