Differences between revisions 5 and 6
Revision 5 as of 2010-08-16 10:24:10
Size: 929
Editor: localhost
Comment: converted to 1.6 markup
Revision 6 as of 2011-11-01 11:23:27
Size: 929
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.3&harvestid=1&jobid=1" Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1"

This page is about how to verify, that data is deduplicated

Click on the JobID for your finished snapshot harvest in the Job status overview

Click on "Browse reports for jobs"

Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1"

Check that there is a deduplicator processors-report like this one:

Total handled: 88 
Duplicates found: 20 20.0% 
Bytes total: 6391852 (6.1 MB) 
Bytes discarded: 0 (0  0.0% 
New (no hits): 88 
Exact hits: 0 
Equivalent hits: 0 
......

It38CheckHarvestDeduplicated (last edited 2012-07-03 12:36:55 by SoerenCarlsen)