525
Comment:
|
← Revision 21 as of 2010-08-16 10:24:26 ⇥
718
converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
We use a patched version of the deduplicator. The patch fixes the following issues: | We use a patched version of the 0.3.0-20061218 beta version of the deduplicator. The patches fixes the following issues in the !NetarchiveSuite: |
Line 3: | Line 3: |
* ARC records of >2GB caused arithmetic overflow and could not be read. * ...something about skipping long records...need to check it... |
* [[https://gforge.statsbiblioteket.dk/tracker/?aid=1062|Bug 1062]] Indexserver skips a lot of lines due to threading problem with !SimpleDateFormat * [[https://gforge.statsbiblioteket.dk/tracker/?aid=1078|Bug 1078]] !DeDuplikator index too large |
Line 6: | Line 6: |
[attachment:deduplicator-0.3.0-20061218a.diff Patch against Deduplicator 0.3.0-20061218] | * [[https://gforge.statsbiblioteket.dk/tracker/?aid=1248|Bug 1248]] NPE in deduplicator-0.3.0-20061218b.jar |
Line 8: | Line 8: |
[attachment:Deduplicator-0.3.0-20061218-src.zip Patched sourcecode Deduplicator 0.3.0-20061218-src.zip] | ==== Downloads ==== |
Line 10: | Line 10: |
[attachment:Deduplicator-0.3.0-20061218a.zip Patched binary Deduplicator 0.3.0-20061218a.zip] | <<AttachList>> Note that the deduplicator must be compiled with the same version of heritrix as the !NetarchiveSuite uses, or the deduplicator will fail to work during runtime. |
We use a patched version of the 0.3.0-20061218 beta version of the deduplicator. The patches fixes the following issues in the NetarchiveSuite:
Bug 1062 Indexserver skips a lot of lines due to threading problem with SimpleDateFormat
Bug 1078 DeDuplikator index too large
Bug 1248 NPE in deduplicator-0.3.0-20061218b.jar
Downloads
Note that the deduplicator must be compiled with the same version of heritrix as the NetarchiveSuite uses, or the deduplicator will fail to work during runtime.