Differences between revisions 2 and 3
Revision 2 as of 2009-10-28 12:03:30
Size: 923
Editor: TueLarsen
Comment:
Revision 3 as of 2009-10-28 12:03:48
Size: 928
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This page is about how to verify, that data is deduplicated '''This page is about how to verify, that data is deduplicated''

This page is about how to verify, that data is deduplicated

Click on the JobID for your finished snapshot harvest in the Job status overview

Click on "Browse reports for jobs"

Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.3&harvestid=1&jobid=1"

Check that there is a deduplicator processors-report like this one:

Total handled: 88 
Duplicates found: 20 20.0% 
Bytes total: 6391852 (6.1 MB) 
Bytes discarded: 0 (0  0.0% 
New (no hits): 88 
Exact hits: 0 
Equivalent hits: 0 
......

It38CheckHarvestDeduplicated (last edited 2012-07-03 12:36:55 by SoerenCarlsen)