Differences between revisions 4 and 5
Revision 4 as of 2009-10-28 12:04:02
Size: 929
Editor: TueLarsen
Comment:
Revision 5 as of 2010-08-16 10:24:10
Size: 929
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

This page is about how to verify, that data is deduplicated

Click on the JobID for your finished snapshot harvest in the Job status overview

Click on "Browse reports for jobs"

Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.3&harvestid=1&jobid=1"

Check that there is a deduplicator processors-report like this one:

Total handled: 88 
Duplicates found: 20 20.0% 
Bytes total: 6391852 (6.1 MB) 
Bytes discarded: 0 (0  0.0% 
New (no hits): 88 
Exact hits: 0 
Equivalent hits: 0 
......

It38CheckHarvestDeduplicated (last edited 2012-07-03 12:36:55 by SoerenCarlsen)