⇤ ← Revision 1 as of 2009-10-28 10:18:54
923
Comment:
|
923
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page is about how to verify, that data i sdeduplicated | This page is about how to verify, that data is deduplicated |
This page is about how to verify, that data is deduplicated
Go to http://$GUIadminserver:$http-port/HarvestDefinition/
- where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074
Click on the JobID for your finished snapshot harvest in the Job status overview
Click on "Browse reports for jobs"
Click on the "processors-report" e.g. "metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.3&harvestid=1&jobid=1"
Check that there is a deduplicator processors-report like this one:
Total handled: 88 Duplicates found: 20 20.0% Bytes total: 6391852 (6.1 MB) Bytes discarded: 0 (0 0.0% New (no hits): 88 Exact hits: 0 Equivalent hits: 0 ......