'''This page is about how to verify, that data is deduplicated''' * Go to http://$GUIadminserver:$http-port/HarvestDefinition/ . where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication . In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074 Click on the JobID for your finished snapshot harvest (or repeated selective harvest) in the Job status overview Click on "Browse reports for jobs" Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=1" Check that there is a deduplicator processors-report similar to this one (the numbers will be different), but duplicates found should be non-zero: {{{ Total handled: 88 Duplicates found: 20 20.0% Bytes total: 6391852 (6.1 MB) Bytes discarded: 0 (0 0.0% New (no hits): 88 Exact hits: 0 Equivalent hits: 0 ...... }}}