Differences between revisions 6 and 7
Revision 6 as of 2011-11-01 11:23:27
Size: 929
Comment:
Revision 7 as of 2012-07-03 12:33:41
Size: 954
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
Click on the JobID for your finished snapshot harvest in the Job status overview Click on the JobID for your finished snapshot harvest (or repeated selective harvest) in the Job status overview
Line 16: Line 16:
Total handled: 88 
Duplicates found: 20 20.0% 
Bytes total: 6391852 (6.1 MB) 
Bytes discarded: 0 (0 0.0% 
New (no hits): 88 
Exact hits: 0 
Equivalent hits: 0 
Total handled: 88
Duplicates found: 20 20.0%
Bytes total: 6391852 (6.1 MB)
Bytes discarded: 0 (0 0.0%
New (no hits): 88
Exact hits: 0
Equivalent hits: 0

This page is about how to verify, that data is deduplicated

Click on the JobID for your finished snapshot harvest (or repeated selective harvest) in the Job status overview

Click on "Browse reports for jobs"

Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1"

Check that there is a deduplicator processors-report like this one:

Total handled: 88
Duplicates found: 20 20.0%
Bytes total: 6391852 (6.1 MB)
Bytes discarded: 0 (0  0.0%
New (no hits): 88
Exact hits: 0
Equivalent hits: 0
......

It38CheckHarvestDeduplicated (last edited 2012-07-03 12:36:55 by SoerenCarlsen)