Review (NS-87): FR1678: Improved indexing for wayback
Author |
Colin |
Moderator |
Colin |
State |
Closed |
Objectives
See https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1678 and http://netarchive.dk/suite/ImprovedIndexing The implemented code includes: A batch job to extract wayback cdx indexes from archive arc files A batch job to extract wayback cdx indexes from deduplication metadata records An application to extract wayback cdx indexes from deduplication crawl logs + associated helper methods
Summary
follow-up: csr
Total Time Used (Coding,Documentation,Review):
CSR:4 MD SVC:0.5 MD
General comments:
Description |
Classification |
Status |
On several of the files you need to set the SVN "svn:keywords" property with value=URL Revision Author Date Id |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/DeduplicateToCDXApplication.java', revision 995
Lines |
Description |
Classification |
Status |
28 |
blank line |
NA |
NOTOK |
50 |
Missing argument validation |
Cosmetic |
NOTOK |
55 |
explicitly close stream |
Minor |
NOTOK |
70 |
[spelling] wyaback => wayback |
Cosmetic |
NOTOK |
Comments on file 'trunk/src/dk/netarkivet/wayback/batch/ExtractWaybackCDXBatchJob.java', revision 995
Lines |
Description |
Classification |
Status |
43-44 |
We don't include that kind of information in the NetarchiveSuite javadoc |
Cosmetic |
OK |
52-53 |
Missing javadoc |
Cosmetic |
OK |
55 |
Missing javadoc |
Cosmetic |
OK |
64 |
missing javadoc |
Cosmetic |
OK |
84 |
javadoc |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/batch/ExtractDeduplicateCDXBatchJob.java', revision 995
Lines |
Description |
Classification |
Status |
General |
Remove underscores in variablenames. This violates our coding style |
Cosmetic |
OK |
30 |
Unnecessary blank lines in import block |
NA |
OK |
42 |
Remove unused line |
Cosmetic |
OK |
48 |
Missing javadoc |
Cosmetic |
OK |
53 |
missing javadoc |
NA |
OK |
63 |
missing javadoc |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/batch/UrlCanonicalizerFactory.java', revision 995
Lines |
Description |
Classification |
Status |
31 |
Missing period in first sentence of javadoc |
Cosmetic |
OK |
47 |
use try/catch on securityexception |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/batch/DeduplicateToCDXAdapter.java', revision 995
Lines |
Description |
Classification |
Status |
General |
Remove underscores in variablenames. This violates our coding style |
Cosmetic |
OK |
General |
Divide lines longer than 80 characters into two lines. |
Cosmetic |
OK |
41 |
Missing class javadoc |
NA |
OK |
48-58 |
Missing javadoc |
Cosmetic |
OK |
60 |
Missing javadoc |
Cosmetic |
OK |
62 |
Missing javadoc |
Cosmetic |
OK |
66 |
Missing javadoc, and missing validation of argument 'line' |
Cosmetic |
OK |
67 |
Make a constant for the "duplicate:" string |
Cosmetic |
OK |
108 |
Missing javadoc and argument validation |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/batch/DeduplicateToCDXAdapterInterface.java', revision 995
Lines |
Description |
Classification |
Status |
30 |
Missing period in first sentence of javadoc |
Cosmetic |
OK |
37 |
What type of canonicalization is done on the target url |
Cosmetic |
OK |
46 |
Replace "dedup lines" with "lines containing deduplication information" or similar |
Cosmetic |
TOK |
47 |
Missing period in first sentence of javadoc |
Cosmetic |
OK |
Comments on file 'trunk/src/dk/netarkivet/wayback/WaybackSettings.java', revision 995
Lines |
Description |
Classification |
Status |
General |
Missing svn svn:keywords property with value=URL Revision Author Date Id |
Cosmetic |
OK |
1 |
File headers/copyright missing |
Cosmetic |
OK |
25 |
Missing javadoc |
Cosmetic |
OK |