We use a patched version of the 0.3.0-20061218 beta version of the deduplicator. The patches fixes the following issues in the NetarchiveSuite:
Bug 1062 Indexserver skips a lot of lines due to threading problem with SimpleDateFormat
Bug 1078 DeDuplikator index too large
Bug 1248 NPE in deduplicator-0.3.0-20061218b.jar
Downloads
Note that the deduplicator must be compiled with the same version of heritrix as the NetarchiveSuite uses, or the deduplicator will fail to work during runtime.