= Review (NS-92): [FROM NICOLAS @ BNF] = || Author || BNF user || || Moderator || Søren || || State || Closed || == Objectives == {{{ 1012: [FROM NICOLAS @ BNF] ------------------------------------------------------------------------ FR 1689: Managing crawls using object number ------------------------------------------------------------------------ I have listed below the major modifications brought by this patch. UI modifications: ------------------------------------------------------------------------ Added inputs for object limit in domain configuration edition screen and snapshot harvest creation/edition screen Configurations settings modifications: ------------------------------------------------------------------------ New properties : - settings.harvester.datamodel.domain.defaultMaxobjects is the default object limit for domain configs and snapshot harvests - settings.harvester.scheduler.splitByObjectLimit is a boolean parameter that controls whether initial criterion for splitting a harvest in jobs is byte (default) or object limit (see Job#canAccept) Datamodel modifications : ------------------------------------------------------------------------ DomainConfiguration#getExpectedNumberOfObjects((long objectLimit, long byteLimit): for the initial best guess (i.e. when the domain is harvested for the first time) the expectation is now the minimum value between settings.harvester.scheduler.maxDomainSize and the domain object limit Job: - the object limit is now set in order.xml by the value of the group-max-fetch-successes parameter of the QuotaEnforcer element. This generates a special annotation in the crawl log and allows to determine the proper stop reason for the job. - contructor: symetrically to the way the byte limit is set, the object limit is the minimum value between the snapshot limit and the domain configuration limit. A boolean flag indicates if the limit was capped to the domain config limit. canAccept method: as explained in the previous section, setting the settings.harvester.scheduler.splitByObjectLimit config parameter to true forces the use of the object limit instead of the byte limit as base splitting criterion. - added a new StopReason, symetric to CONFIG_BYTE_LIMIT. The computation of the stop reason is performed in HarvestSchedulerMonitorServer }}} == Summary == {{{ Followup by Nicolas. }}} '''Total Time Used (Coding,Documentation,Review)''': {{{ Time use (Coding,Documentation,Review) Nicolas: 10 MD SVC: 0.5 MD }}} '''General comments''': || '''Description''' || '''Classification''' || '''Status''' || || Fix the unittests that are failing now. See the stacktraces added to FR 1689 in gforge. || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_da.properties', revision 1012 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 114, 166 || In danish, objects translates as "objekter" || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1012 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 160-163 || Change to: By default the byte limit is used as the base criterion for how many domain configurations are put into one harvest job. However if this parameter is set to "true", then the object limit is used instead as the base criterion. || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/webinterface/SnapshotHarvestDefinition.java', revision 1012 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 72-73 || Is this check necessary? || Cosmetic || OK || || 78-83 || Does these two operations require that the parameters checked in line 72-73 is set? || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/Job.java', revision 1012 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 171 || , false means it is defined by the .. || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/datamodel/StopReason.java', revision 1013 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || General || Haven't we now introduced a new order in the StopReason enum type. When we read the data from old databases, the data will be misrepresented. The new correct order should be: OBJECT_LIMIT, SIZE_LIMIT, CONFIG_SIZE_LIMIT, DOWNLOAD_UNFINISHED, CONFIG_OBJECT_LIMIT || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/harvester/Translations_de.properties', revision 1012 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 191 || "Max objects" => Max. Gegenstände || Cosmetic || OK ||