Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2009-09-18 19:03:14
Size: 4932
Comment:
Revision 4 as of 2010-08-16 10:24:45
Size: 4854
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
1012:  1012:
Line 9: Line 9:
Line 13: Line 12:
Line 15: Line 13:

UI modifications: 
UI modifications:
Line 18: Line 15:
Line 21: Line 17:

Line 25: Line 19:
Line 27: Line 20:

- settings.harvester.datamodel.domain.defaultMaxobjects is the default 
- settings.harvester.datamodel.domain.defaultMaxobjects is the default
Line 30: Line 22:
   - settings.harvester.scheduler.splitByObjectLimit is a boolean parameter  - settings.harvester.scheduler.splitByObjectLimit is a boolean parameter
Line 34: Line 25:
Line 37: Line 27:

DomainConfiguration#getExpectedNumberOfObjects((long objectLimit, 
long byteLimit): for the initial best guess (i.e. when the domain is 
DomainConfiguration#getExpectedNumberOfObjects((long objectLimit,
long byteLimit): for the initial best guess (i.e. when the domain is
Line 41: Line 30:
between settings.harvester.scheduler.maxDomainSize and the domain object  between settings.harvester.scheduler.maxDomainSize and the domain object
Line 43: Line 32:
Line 45: Line 33:

- the object limit is now set in order.xml by the value of the 
- the object limit is now set in order.xml by the value of the
Line 48: Line 35:
  This generates a special annotation in the crawl log and allows to    This generates a special annotation in the crawl log and allows to
Line 50: Line 37:
   - contructor: symetrically to the way the byte limit is set, the object
  limit is the minimum value between the snapshot limit and the domain
  configuration limit. A boolean flag indicates if the limit was
- contructor: symetrically to the way the byte limit is set, the object
  limit is the minimum value between the snapshot limit and the domain
  configuration limit. A boolean flag indicates if the limit was
Line 55: Line 41:
  
Line 60: Line 45:
  - added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation  - added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation
Line 65: Line 49:
{{{  {{{
Line 69: Line 53:
{{{ 
{{{
Line 74: Line 59:
Line 77: Line 61:
|| Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge. || Cosmetic || NOTOK || || Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge. || Cosmetic || OK ||
Line 82: Line 66:
|| 114, 166 || In danish, objects translates as "objekter" || Cosmetic || NOTOK || || 114, 166 || In danish, objects translates as "objekter" || Cosmetic || OK ||
Line 85: Line 69:
|| 160-163 || Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. || Cosmetic || NOTOK || || 160-163 || Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. || Cosmetic || OK ||
Line 88: Line 72:
|| 72-73 || Is this check necessary? || Cosmetic || NOTOK ||
|| 78-83 || Does these two operations require that the parameters checked in line 72-73 is set? || Cosmetic || NOTOK ||
|| 72-73 || Is this check necessary? || Cosmetic || OK ||
|| 78-83 || Does these two operations require that the parameters checked in line 72-73 is set? || Cosmetic || OK ||
Line 92: Line 76:
|| 171 || , false means it is defined by the .. || Cosmetic || NOTOK || || 171 || , false means it is defined by the .. || Cosmetic || OK ||
Line 95: Line 79:
|| General || Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT || Cosmetic || NOTOK || || General || Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT || Cosmetic || OK ||
Line 98: Line 82:
|| 191 || "Max objects" => Max. Gegenstände || Cosmetic || NOTOK || || 191 || "Max objects" => Max. Gegenstände || Cosmetic || OK ||

Review (NS-92): [FROM NICOLAS @ BNF]

Author

BNF user

Moderator

Søren

State

Closed

Objectives

1012:
[FROM NICOLAS @ BNF]
------------------------------------------------------------------------
FR 1689: Managing crawls using object number
------------------------------------------------------------------------
I have listed below the major modifications brought by this patch.
UI modifications:
------------------------------------------------------------------------
Added inputs for object limit in domain configuration edition screen and
snapshot harvest creation/edition screen
Configurations settings modifications:
------------------------------------------------------------------------
New properties :
- settings.harvester.datamodel.domain.defaultMaxobjects is the default
  object limit for domain configs and snapshot harvests
- settings.harvester.scheduler.splitByObjectLimit is a boolean parameter
  that controls whether initial criterion for splitting a harvest in
  jobs is byte (default) or object limit (see Job#canAccept)
Datamodel modifications :
------------------------------------------------------------------------
DomainConfiguration#getExpectedNumberOfObjects((long objectLimit,
long byteLimit): for the initial best guess (i.e. when the domain is
harvested for the first time) the expectation is now the minimum value
between settings.harvester.scheduler.maxDomainSize and the domain object
limit
Job:
- the object limit is now set in order.xml by the value of the
  group-max-fetch-successes parameter of the QuotaEnforcer element.
  This generates a special annotation in the crawl log and allows to
  determine the proper stop reason for the job.
- contructor: symetrically to the way the byte limit is set, the object
  limit is the minimum value between the snapshot limit and the domain
  configuration limit. A boolean flag indicates if the limit was
  capped to the domain config limit.
 canAccept method: as explained in the previous section, setting the
 settings.harvester.scheduler.splitByObjectLimit config parameter to
 true forces the use of the object limit instead of the byte limit as
 base splitting criterion.
- added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation
  of the stop reason is performed in HarvestSchedulerMonitorServer

Summary

Followup by Nicolas.

Total Time Used (Coding,Documentation,Review):

Time use (Coding,Documentation,Review)
Nicolas: 10 MD
SVC: 0.5 MD

General comments:

Description

Classification

Status

Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_da.properties', revision 1012

Lines

Description

Classification

Status

114, 166

In danish, objects translates as "objekter"

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1012

Lines

Description

Classification

Status

160-163

Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java', revision 1012

Lines

Description

Classification

Status

72-73

Is this check necessary?

Cosmetic

OK

78-83

Does these two operations require that the parameters checked in line 72-73 is set?

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/Job.java', revision 1012

Lines

Description

Classification

Status

171

, false means it is defined by the ..

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/StopReason.java', revision 1013

Lines

Description

Classification

Status

General

Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_de.properties', revision 1012

Lines

Description

Classification

Status

191

"Max objects" => Max. Gegenstände

Cosmetic

OK

IssuesFromNs92 (last edited 2010-08-16 10:24:45 by localhost)