⇤ ← Revision 1 as of 2009-09-18 19:03:14
4932
Comment:
|
4865
|
Deletions are marked like this. | Additions are marked like this. |
Line 7: | Line 7: |
1012: | 1012: |
Line 9: | Line 9: |
Line 13: | Line 12: |
Line 15: | Line 13: |
UI modifications: |
UI modifications: |
Line 18: | Line 15: |
Line 21: | Line 17: |
Line 25: | Line 19: |
New properties : - settings.harvester.datamodel.domain.defaultMaxobjects is the default object limit for domain configs and snapshot harvests |
|
Line 26: | Line 23: |
New properties : - settings.harvester.datamodel.domain.defaultMaxobjects is the default object limit for domain configs and snapshot harvests - settings.harvester.scheduler.splitByObjectLimit is a boolean parameter |
- settings.harvester.scheduler.splitByObjectLimit is a boolean parameter |
Line 34: | Line 26: |
Line 37: | Line 28: |
DomainConfiguration#getExpectedNumberOfObjects((long objectLimit, long byteLimit): for the initial best guess (i.e. when the domain is harvested for the first time) the expectation is now the minimum value between settings.harvester.scheduler.maxDomainSize and the domain object limit Job: - the object limit is now set in order.xml by the value of the group-max-fetch-successes parameter of the QuotaEnforcer element. This generates a special annotation in the crawl log and allows to determine the proper stop reason for the job. |
|
Line 38: | Line 39: |
DomainConfiguration#getExpectedNumberOfObjects((long objectLimit, long byteLimit): for the initial best guess (i.e. when the domain is harvested for the first time) the expectation is now the minimum value between settings.harvester.scheduler.maxDomainSize and the domain object limit |
- contructor: symetrically to the way the byte limit is set, the object limit is the minimum value between the snapshot limit and the domain configuration limit. A boolean flag indicates if the limit was capped to the domain config limit. |
Line 44: | Line 44: |
Job: - the object limit is now set in order.xml by the value of the group-max-fetch-successes parameter of the QuotaEnforcer element. This generates a special annotation in the crawl log and allows to determine the proper stop reason for the job. - contructor: symetrically to the way the byte limit is set, the object limit is the minimum value between the snapshot limit and the domain configuration limit. A boolean flag indicates if the limit was capped to the domain config limit. |
|
Line 60: | Line 48: |
- added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation |
- added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation |
Line 65: | Line 53: |
{{{ | {{{ |
Line 69: | Line 57: |
{{{ | {{{ |
Line 74: | Line 63: |
Line 82: | Line 70: |
|| 114, 166 || In danish, objects translates as "objekter" || Cosmetic || NOTOK || | || 114, 166 || In danish, objects translates as "objekter" || Cosmetic || OK || |
Line 85: | Line 73: |
|| 160-163 || Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. || Cosmetic || NOTOK || | || 160-163 || Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. || Cosmetic || OK || |
Line 88: | Line 76: |
|| 72-73 || Is this check necessary? || Cosmetic || NOTOK || || 78-83 || Does these two operations require that the parameters checked in line 72-73 is set? || Cosmetic || NOTOK || |
|| 72-73 || Is this check necessary? || Cosmetic || OK || || 78-83 || Does these two operations require that the parameters checked in line 72-73 is set? || Cosmetic || OK || |
Line 92: | Line 80: |
|| 171 || , false means it is defined by the .. || Cosmetic || NOTOK || | || 171 || , false means it is defined by the .. || Cosmetic || OK || |
Line 95: | Line 83: |
|| General || Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT || Cosmetic || NOTOK || | || General || Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT || Cosmetic || OK || |
Line 98: | Line 86: |
|| 191 || "Max objects" => Max. Gegenstände || Cosmetic || NOTOK || | || 191 || "Max objects" => Max. Gegenstände || Cosmetic || OK || |
Review (NS-92): [FROM NICOLAS @ BNF]
Author |
BNF user |
Moderator |
Søren |
State |
Closed |
Objectives
1012: [FROM NICOLAS @ BNF] ------------------------------------------------------------------------ FR 1689: Managing crawls using object number ------------------------------------------------------------------------ I have listed below the major modifications brought by this patch. UI modifications: ------------------------------------------------------------------------ Added inputs for object limit in domain configuration edition screen and snapshot harvest creation/edition screen Configurations settings modifications: ------------------------------------------------------------------------ New properties : - settings.harvester.datamodel.domain.defaultMaxobjects is the default object limit for domain configs and snapshot harvests - settings.harvester.scheduler.splitByObjectLimit is a boolean parameter that controls whether initial criterion for splitting a harvest in jobs is byte (default) or object limit (see Job#canAccept) Datamodel modifications : ------------------------------------------------------------------------ DomainConfiguration#getExpectedNumberOfObjects((long objectLimit, long byteLimit): for the initial best guess (i.e. when the domain is harvested for the first time) the expectation is now the minimum value between settings.harvester.scheduler.maxDomainSize and the domain object limit Job: - the object limit is now set in order.xml by the value of the group-max-fetch-successes parameter of the QuotaEnforcer element. This generates a special annotation in the crawl log and allows to determine the proper stop reason for the job. - contructor: symetrically to the way the byte limit is set, the object limit is the minimum value between the snapshot limit and the domain configuration limit. A boolean flag indicates if the limit was capped to the domain config limit. canAccept method: as explained in the previous section, setting the settings.harvester.scheduler.splitByObjectLimit config parameter to true forces the use of the object limit instead of the byte limit as base splitting criterion. - added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation of the stop reason is performed in HarvestSchedulerMonitorServer
Summary
Followup by Nicolas.
Total Time Used (Coding,Documentation,Review):
Time use (Coding,Documentation,Review) Nicolas: 10 MD SVC: 0.5 MD
General comments:
Description |
Classification |
Status |
Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge. |
Cosmetic |
NOTOK |
Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_da.properties', revision 1012
Lines |
Description |
Classification |
Status |
114, 166 |
In danish, objects translates as "objekter" |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1012
Lines |
Description |
Classification |
Status |
160-163 |
Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java', revision 1012
Lines |
Description |
Classification |
Status |
72-73 |
Is this check necessary? |
Cosmetic |
OK |
78-83 |
Does these two operations require that the parameters checked in line 72-73 is set? |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/Job.java', revision 1012
Lines |
Description |
Classification |
Status |
171 |
, false means it is defined by the .. |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/StopReason.java', revision 1013
Lines |
Description |
Classification |
Status |
General |
Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_de.properties', revision 1012
Lines |
Description |
Classification |
Status |
191 |
"Max objects" => Max. Gegenstände |
Cosmetic |
OK |