⇤ ← Revision 1 as of 2009-09-30 14:14:38
36
Comment:
|
3159
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Décrire ici « IssuesFromNs97 ». | = Review (NS-97): FR 1691 (supersedes NS-95) = || Author || BNF user || || Moderator || Søren || || State || Review || == Objectives == {{{ 1017: [FROM NICOLAS @ BNF] ------------------------------------------------------------------------ FR 1691 Configure which Heritrix reports to include in metadata ARC file ------------------------------------------------------------------------ Three new setting properties have been added: - settings.harvester.harvesting.metadata.heritrixFilePattern is a java pattern that allows to filter which files in the crawl dir (not recursively) to include in the lmetadata ARC. - settings.harvester.harvesting.metadata.reportFilePattern is also a java pattern that controls which subset of the files selected by heritrixFilePattern are to be considered as report files. All the other files will be considered as setup files. - settings.harvester.harvesting.metadata.logFilePattern is a third java pattern that controls which files in the logs subdirectory of the crawldir are to be added as log files to the metadata ARC. NOTE: FR 1691 also addresses Bug 808 (Should store crawl-manifest.txt) This bug also suggests storing the conf/heritrix_properties, but why not take all files in conf and modules. }}} '''Total Time Used (Coding,Documentation,Review)''': {{{ Time use (Coding,Documentation,Review) Nicolas: 2 MD SVC: 0.5 MD }}} '''General comments''': || '''Description''' || '''Classification''' || '''Status''' || === Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1017 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || === Comments on file 'trunk/src/dk/netarkivet/harvester/harvesting/HarvestDocumentation.java', revision 1017 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 439 || Maybe log here, which logfiles we have found. || Cosmetic || NOTOK || === Comments on file 'trunk/src/dk/netarkivet/harvester/harvesting/MetadataFile.java', revision 1019 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || General || We have two patterns for entries in the metadata.arc file, for the cdx-entry in dk.netarkivet.archive.indexserver.CDXDataCache.java , and for the crawl-log entry in CrawlLogDataCache.java I suggest they be moved to the MetadataFile class || Cosmetic || NOTOK || || 36, 43 || Delete @author tag; not used in NetarchiveSuite codestyle || Cosmetic || NOTOK || || 45 || Probably need an new MetadataType called 'index' to fit in the crawl/index/cdx entry. || Cosmetic || NOTOK || || 72-80 || Should/could it be verified that these patterns are mutually exclusive? Or add to javadoc: that first the name of a heritrixfile is tested against the reportfile pattern, then again the logfile pattern. If the name matches neither of these, it is considered a setup file. || Cosmetic || NOTOK || || 126 || add javadoc || Cosmetic || NOTOK || || 130 || Add javadoc || Cosmetic || NOTOK || || 134 || Replace "URLS" with "URLs" || Cosmetic || NOTOK || || 146 || Add javadoc || Cosmetic || NOTOK || |
Review (NS-97): FR 1691 (supersedes NS-95)
Author |
BNF user |
Moderator |
Søren |
State |
Review |
Objectives
1017: [FROM NICOLAS @ BNF] ------------------------------------------------------------------------ FR 1691 Configure which Heritrix reports to include in metadata ARC file ------------------------------------------------------------------------ Three new setting properties have been added: - settings.harvester.harvesting.metadata.heritrixFilePattern is a java pattern that allows to filter which files in the crawl dir (not recursively) to include in the lmetadata ARC. - settings.harvester.harvesting.metadata.reportFilePattern is also a java pattern that controls which subset of the files selected by heritrixFilePattern are to be considered as report files. All the other files will be considered as setup files. - settings.harvester.harvesting.metadata.logFilePattern is a third java pattern that controls which files in the logs subdirectory of the crawldir are to be added as log files to the metadata ARC. NOTE: FR 1691 also addresses Bug 808 (Should store crawl-manifest.txt) This bug also suggests storing the conf/heritrix_properties, but why not take all files in conf and modules.
Total Time Used (Coding,Documentation,Review):
Time use (Coding,Documentation,Review) Nicolas: 2 MD SVC: 0.5 MD
General comments:
Description |
Classification |
Status |
Comments on file 'trunk/src/dk/netarkivet/harvester/HarvesterSettings.java', revision 1017
Lines |
Description |
Classification |
Status |
Comments on file 'trunk/src/dk/netarkivet/harvester/harvesting/HarvestDocumentation.java', revision 1017
Lines |
Description |
Classification |
Status |
439 |
Maybe log here, which logfiles we have found. |
Cosmetic |
NOTOK |
Comments on file 'trunk/src/dk/netarkivet/harvester/harvesting/MetadataFile.java', revision 1019
Lines |
Description |
Classification |
Status |
General |
We have two patterns for entries in the metadata.arc file, for the cdx-entry in dk.netarkivet.archive.indexserver.CDXDataCache.java , and for the crawl-log entry in CrawlLogDataCache.java I suggest they be moved to the MetadataFile class |
Cosmetic |
NOTOK |
36, 43 |
Delete @author tag; not used in NetarchiveSuite codestyle |
Cosmetic |
NOTOK |
45 |
Probably need an new MetadataType called 'index' to fit in the crawl/index/cdx entry. |
Cosmetic |
NOTOK |
72-80 |
Should/could it be verified that these patterns are mutually exclusive? Or add to javadoc: that first the name of a heritrixfile is tested against the reportfile pattern, then again the logfile pattern. If the name matches neither of these, it is considered a setup file. |
Cosmetic |
NOTOK |
126 |
add javadoc |
Cosmetic |
NOTOK |
130 |
Add javadoc |
Cosmetic |
NOTOK |
134 |
Replace "URLS" with "URLs" |
Cosmetic |
NOTOK |
146 |
Add javadoc |
Cosmetic |
NOTOK |