Verify that the harvest is activated and done
This page describes how to verify that a harvest is carried out correctly.
Do the following in a browser:
Start Program
Go to http://$GUIadminserver:$http-port/HarvestDefinition/
- where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074
Verify job status
Click 'Harvest status'->'All Jobs' in the left menu
- Select "All" in "Only display job status" to the right from the menu
- Click the "Show" button, until the jobs have stepped through statuses "NEW", "SUBMITTED", "STARTED", "DONE"
- Wait until all jobs have got status "DONE"
- Check that you can search on Harvest name, start and end date
- Check that you can change number of rows to be displayed per page e.g. 1 and
- Check that you can press next and previous page and
- Check that the reset button resets all changes to default(note that the display value is also blanked, but is 100 by default)
Check the following for the domains raeder.dk and kb.dk: (Using page Harvest Status -> All jobs per domain)
Check that the domain has been harvested by one job of the name <eh. name>
Check that this job has configuration <eh. name>_frontpages
- Check that there is a number for 'Run number' and 'Job ID'
Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contain "Domain Completed"
Check the following job details for the domain netarkivet.dk: (Using page SelectiveHarvests->History->Run Number 0 ->JobID 1)
Check that the 'Submit time', 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name> harvest
- Click on "Browse reports for jobs"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse harvest files for job"
- Check that you don't get any errors when you click on some of the links
- Click on "Browse only relevant crawl-log lines for domain netarkivet.dk"
- Check that you don't get any errors when you click on some of the links
Check the following for the domain netarkivet.dk: (Using page Harvest Status -> All jobs per domain)
Check that the domain has been harvested by 2 jobs of the name <eh. name>
Check that one of the jobs has configuration <eh. name>_frontpages
Check that the 'Start time' and 'End time' columns approximately corresponds to time of test with <eh. name>
Check that one of the jobs has configuration <eh. name>_frontpages_plus_2levels
Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"
Check the following for the domain kaarefc.dk: (Using page Harvest Status -> All jobs per domain)
Check that the domain has been harvested by 1 job of the name <eh. name>
Check that the job has configuration <eh. name>_frontpages_plus_2levels
Check that the 'Start time' and 'End time' approximately corresponds to time of test with <eh. name> harvest
- Check that 'Run number' and 'Job ID' columns contains positive numbers
- Check that the 'Bytes Harvested' and 'Documents Harvested' columns contains positive numbers
- Check that the 'Stopped due to' columns contains "Domain Completed"