Review (NS-92): [FROM NICOLAS @ BNF]

Author

BNF user

Moderator

Søren

State

Closed

Objectives

1012:
[FROM NICOLAS @ BNF]
------------------------------------------------------------------------
FR 1689: Managing crawls using object number
------------------------------------------------------------------------
I have listed below the major modifications brought by this patch.
UI modifications:
------------------------------------------------------------------------
Added inputs for object limit in domain configuration edition screen and
snapshot harvest creation/edition screen
Configurations settings modifications:
------------------------------------------------------------------------
New properties :
- settings.harvester.datamodel.domain.defaultMaxobjects is the default
  object limit for domain configs and snapshot harvests
- settings.harvester.scheduler.splitByObjectLimit is a boolean parameter
  that controls whether initial criterion for splitting a harvest in
  jobs is byte (default) or object limit (see Job#canAccept)
Datamodel modifications :
------------------------------------------------------------------------
DomainConfiguration#getExpectedNumberOfObjects((long objectLimit,
long byteLimit): for the initial best guess (i.e. when the domain is
harvested for the first time) the expectation is now the minimum value
between settings.harvester.scheduler.maxDomainSize and the domain object
limit
Job:
- the object limit is now set in order.xml by the value of the
  group-max-fetch-successes parameter of the QuotaEnforcer element.
  This generates a special annotation in the crawl log and allows to
  determine the proper stop reason for the job.
- contructor: symetrically to the way the byte limit is set, the object
  limit is the minimum value between the snapshot limit and the domain
  configuration limit. A boolean flag indicates if the limit was
  capped to the domain config limit.
 canAccept method: as explained in the previous section, setting the
 settings.harvester.scheduler.splitByObjectLimit config parameter to
 true forces the use of the object limit instead of the byte limit as
 base splitting criterion.
- added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation
  of the stop reason is performed in HarvestSchedulerMonitorServer

Summary

Followup by Nicolas.

Total Time Used (Coding,Documentation,Review):

Time use (Coding,Documentation,Review)
Nicolas: 10 MD
SVC: 0.5 MD

General comments:

Description

Classification

Status

Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_da.properties', revision 1012

Lines

Description

Classification

Status

114, 166

In danish, objects translates as "objekter"

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1012

Lines

Description

Classification

Status

160-163

Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java', revision 1012

Lines

Description

Classification

Status

72-73

Is this check necessary?

Cosmetic

OK

78-83

Does these two operations require that the parameters checked in line 72-73 is set?

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/Job.java', revision 1012

Lines

Description

Classification

Status

171

, false means it is defined by the ..

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/StopReason.java', revision 1013

Lines

Description

Classification

Status

General

Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_de.properties', revision 1012

Lines

Description

Classification

Status

191

"Max objects" => Max. Gegenstände

Cosmetic

OK

IssuesFromNs92 (last edited 2010-08-16 10:24:45 by localhost)