Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2007-11-21 17:03:01
Size: 525
Comment:
Revision 14 as of 2008-05-19 12:02:49
Size: 1069
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
We use a patched version of the deduplicator. The patch fixes the following issues: We use a patched version of the 0.3.0-20061218 beta version of the deduplicator . The patches fixes the following issues in the !NetarchiveSuite:
Line 3: Line 3:
 * ARC records of >2GB caused arithmetic overflow and could not be read.
 * ...something about skipping long records...need to check it...
 * [https://gforge.statsbiblioteket.dk/tracker/?aid=1062 Bug 1062] Indexserver skips a lot of lines due to threading problem with !SimpleDateFormat
 * [https://gforge.statsbiblioteket.dk/tracker/?aid=1078 Bug 1078] !DeDuplikator index too large
Line 8: Line 8:
[attachment:Deduplicator-0.3.0-20061218-src.zip Patched sourcecode Deduplicator 0.3.0-20061218-src.zip] [attachment:Deduplicator-0.3.0-20061218b.diff Patch against Deduplicator 0.3.0-20061218a]
Line 10: Line 10:
[attachment:Deduplicator-0.3.0-20061218a.zip Patched binary Deduplicator 0.3.0-20061218a.zip] [attachment:Deduplicator-0.3.0-20061218b-src.zip Patched sourcecode Deduplicator 0.3.0-20061218b-src.zip]

[attachment:Deduplicator-0.3.0-20061218a-bin.zip Patched binary Deduplicator 0.3.0-20061218a-bin.zip]

[attachment:Deduplicator-0.3.0-20061218b-bin.zip Patched binary Deduplicator 0.3.0-20061218b-bin.zip]


Note that the deduplicator must be compiled with the same version of heritrix as the !NetarchiveSuite uses,
or the deduplicator will fail to work during runtime.

We use a patched version of the 0.3.0-20061218 beta version of the deduplicator . The patches fixes the following issues in the NetarchiveSuite:

[attachment:deduplicator-0.3.0-20061218a.diff Patch against Deduplicator 0.3.0-20061218]

[attachment:Deduplicator-0.3.0-20061218b.diff Patch against Deduplicator 0.3.0-20061218a]

[attachment:Deduplicator-0.3.0-20061218b-src.zip Patched sourcecode Deduplicator 0.3.0-20061218b-src.zip]

[attachment:Deduplicator-0.3.0-20061218a-bin.zip Patched binary Deduplicator 0.3.0-20061218a-bin.zip]

[attachment:Deduplicator-0.3.0-20061218b-bin.zip Patched binary Deduplicator 0.3.0-20061218b-bin.zip]

Note that the deduplicator must be compiled with the same version of heritrix as the NetarchiveSuite uses, or the deduplicator will fail to work during runtime.

DeduplicatorPatches (last edited 2010-08-16 10:24:26 by localhost)