We use a patched version of the 0.3.0-20061218 beta version of the deduplicator . The patches fixes the following issues in the NetarchiveSuite:

Known bugs in 0.3.0-20061218b:

Downloads

[attachment:deduplicator-0.3.0-20061218a.diff Patch against Deduplicator 0.3.0-20061218]

[attachment:Deduplicator-0.3.0-20061218b.diff Patch against Deduplicator 0.3.0-20061218a]

[attachment:Deduplicator-0.3.0-20061218b-src.zip Patched sourcecode Deduplicator 0.3.0-20061218b-src.zip]

[attachment:Deduplicator-0.3.0-20061218a-bin.zip Patched binary Deduplicator 0.3.0-20061218a-bin.zip]

[attachment:Deduplicator-0.3.0-20061218b-bin.zip Patched binary Deduplicator 0.3.0-20061218b-bin.zip]

[attachment:Deduplicator-0.3.0-20080502-src.zip Patched binary Deduplicator 0.3.0-20080502-src.zip]

[attachment:Deduplicator-0.3.0-20080502-bin.zip Patched binary Deduplicator 0.3.0-20080502-bin.zip]

[attachment:Deduplicator-0.3.0-20080502.diff Patch against Deduplicator 0.3.0-20061218b]

Note that the deduplicator must be compiled with the same version of heritrix as the NetarchiveSuite uses, or the deduplicator will fail to work during runtime.