= Review (NS-87): FR1678: Improved indexing for wayback = || Author || Colin || || Moderator || Colin || || State || Closed || == Objectives == {{{ See https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1678 and http://netarchive.dk/suite/ImprovedIndexing The implemented code includes: A batch job to extract wayback cdx indexes from archive arc files A batch job to extract wayback cdx indexes from deduplication metadata records An application to extract wayback cdx indexes from deduplication crawl logs + associated helper methods }}} == Summary == {{{ follow-up: csr }}} '''Total Time Used (Coding,Documentation,Review)''': {{{ CSR:4 MD SVC:0.5 MD }}} '''General comments''': || '''Description''' || '''Classification''' || '''Status''' || || On several of the files you need to set the SVN "svn:keywords" property with value=URL Revision Author Date Id || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/DeduplicateToCDXApplication.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 28 || blank line || NA || NOTOK || || 50 || Missing argument validation || Cosmetic || NOTOK || || 55 || explicitly close stream || Minor || NOTOK || || 70 || [spelling] wyaback => wayback || Cosmetic || NOTOK || === Comments on file 'trunk/src/dk/netarkivet/wayback/batch/ExtractWaybackCDXBatchJob.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 43-44 || We don't include that kind of information in the NetarchiveSuite javadoc || Cosmetic || OK || || 52-53 || Missing javadoc || Cosmetic || OK || || 55 || Missing javadoc || Cosmetic || OK || || 64 || missing javadoc || Cosmetic || OK || || 84 || javadoc || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/batch/ExtractDeduplicateCDXBatchJob.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || General || Remove underscores in variablenames. This violates our coding style || Cosmetic || OK || || 30 || Unnecessary blank lines in import block || NA || OK || || 42 || Remove unused line || Cosmetic || OK || || 48 || Missing javadoc || Cosmetic || OK || || 53 || missing javadoc || NA || OK || || 63 || missing javadoc || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/batch/UrlCanonicalizerFactory.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 31 || Missing period in first sentence of javadoc || Cosmetic || OK || || 47 || use try/catch on securityexception || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/batch/DeduplicateToCDXAdapter.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || General || Remove underscores in variablenames. This violates our coding style || Cosmetic || OK || || General || Divide lines longer than 80 characters into two lines. || Cosmetic || OK || || 41 || Missing class javadoc || NA || OK || || 48-58 || Missing javadoc || Cosmetic || OK || || 60 || Missing javadoc || Cosmetic || OK || || 62 || Missing javadoc || Cosmetic || OK || || 66 || Missing javadoc, and missing validation of argument 'line' || Cosmetic || OK || || 67 || Make a constant for the "duplicate:" string || Cosmetic || OK || || 108 || Missing javadoc and argument validation || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/batch/DeduplicateToCDXAdapterInterface.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || 30 || Missing period in first sentence of javadoc || Cosmetic || OK || || 37 || What type of canonicalization is done on the target url || Cosmetic || OK || || 46 || Replace "dedup lines" with "lines containing deduplication information" or similar || Cosmetic || TOK || || 47 || Missing period in first sentence of javadoc || Cosmetic || OK || === Comments on file 'trunk/src/dk/netarkivet/wayback/WaybackSettings.java', revision 995 === || '''Lines''' || '''Description''' || '''Classification''' || '''Status''' || || General || Missing svn svn:keywords property with value=URL Revision Author Date Id || Cosmetic || OK || || 1 || File headers/copyright missing || Cosmetic || OK || || 25 || Missing javadoc || Cosmetic || OK ||