Differences between revisions 1 and 21 (spanning 20 versions)
Revision 1 as of 2007-11-21 17:03:01
Size: 525
Comment:
Revision 21 as of 2010-08-16 10:24:26
Size: 718
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
We use a patched version of the deduplicator.  The patch fixes the following issues: We use a patched version of the 0.3.0-20061218 beta version of the deduplicator. The patches fixes the following issues in the !NetarchiveSuite:
Line 3: Line 3:
 * ARC records of >2GB caused arithmetic overflow and could not be read.
 * ...something about skipping long records...need to check it...
 * [[https://gforge.statsbiblioteket.dk/tracker/?aid=1062|Bug 1062]] Indexserver skips a lot of lines due to threading problem with !SimpleDateFormat
 * [[https://gforge.statsbiblioteket.dk/tracker/?aid=1078|Bug 1078]] !DeDuplikator index too large
Line 6: Line 6:
[attachment:deduplicator-0.3.0-20061218a.diff Patch against Deduplicator 0.3.0-20061218]  * [[https://gforge.statsbiblioteket.dk/tracker/?aid=1248|Bug 1248]] NPE in deduplicator-0.3.0-20061218b.jar
Line 8: Line 8:
[attachment:Deduplicator-0.3.0-20061218-src.zip Patched sourcecode Deduplicator 0.3.0-20061218-src.zip] ==== Downloads ====
Line 10: Line 10:
[attachment:Deduplicator-0.3.0-20061218a.zip Patched binary Deduplicator 0.3.0-20061218a.zip] <<AttachList>>

Note that the deduplicator must be compiled with the same version of heritrix as the !NetarchiveSuite uses,
or the deduplicator will fail to work during runtime.

We use a patched version of the 0.3.0-20061218 beta version of the deduplicator. The patches fixes the following issues in the NetarchiveSuite:

  • Bug 1062 Indexserver skips a lot of lines due to threading problem with SimpleDateFormat

  • Bug 1078 DeDuplikator index too large

  • Bug 1248 NPE in deduplicator-0.3.0-20061218b.jar

Downloads

  • [get | view] (2008-05-26 11:44:45, 2621.4 KB) [[attachment:deduplicator-0.3.0-20061218-patch-heritrix-1.12.1b.patch]]
  • [get | view] (2008-05-26 11:44:23, 0.7 KB) [[attachment:deduplicator-0.3.0-20061218-patch-index-NPE.patch]]
  • [get | view] (2008-05-26 11:44:30, 2.5 KB) [[attachment:deduplicator-0.3.0-20061218-patch-local-dateformat.patch]]
  • [get | view] (2008-05-27 08:54:28, 235.4 KB) [[attachment:deduplicator-0.3.0-20061218-patch-lucene-OutOfMemory-2.patch]]
  • [get | view] (2008-05-26 11:44:14, 24.0 KB) [[attachment:deduplicator-0.3.0-20061218-patch-lucene-OutOfMemory.patch]]
  • [get | view] (2008-05-26 11:44:38, 2648.3 KB) [[attachment:deduplicator-0.3.0-20061218-patched-20080522-cumulative.patch]]
  • [get | view] (2008-05-26 11:45:11, 0.9 KB) [[attachment:deduplicator-0.3.0-20061218-patched-20080522.patch]]
  • [get | view] (2008-05-27 08:54:35, 2859.6 KB) [[attachment:deduplicator-0.3.0-20061218-patched-20080527-cumulative.patch]]
  • [get | view] (2008-05-27 08:55:25, 0.9 KB) [[attachment:deduplicator-0.3.0-20061218-patched-20080527.patch]]
  • [get | view] (2008-05-26 11:43:51, 1929.8 KB) [[attachment:deduplicator-0.3.0-20061218-src.zip]]
 All files | Selected Files: delete move to page copy to page

Note that the deduplicator must be compiled with the same version of heritrix as the NetarchiveSuite uses, or the deduplicator will fail to work during runtime.

DeduplicatorPatches (last edited 2010-08-16 10:24:26 by localhost)