Check domain statistics and domain configuration
This page describes how to the check the domainstatistics and the naming of harvest configurations.
Do the following in a browser
Start Program
Go to http://$GUIadminserver:$http-port/HarvestDefinition/
- where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074
Check domain statistics
Note: If you are non-netarchive tester, you need to add your own test domains first using the !HarvestDefinition/Definitions-create-domain.jsp page.
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)
Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter
- Check that the title bar contains search criteria "*" and the number of hits.
Check single domain creation
Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. olsen2.dk and pres enter
- Check that you get the answer:
No matching domains found for query 'olsen2.dk' when searching by 'NAME'
- Go to HarvestDefinition/Definitions-create-domain.jsp
and type "olsen2.dk", and pres "create".
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)
Check harvest configuration
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk
- Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
- Click 'New configuration'
- Enter a configuration 'Name' of your own choice
- You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
- Enter a "maximum number of bytes" of your own choice
- Click 'Save configuration'
- Check the configuration list now includes the new configuration name
Check Crawler traps
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk
- Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
- Click 'Show crawler traps'
- Paste into the input-box following list of known crawler traps
- Remember currently to update the list of crawler trap examples from our production system.
The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain
- Click 'Save'
- Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."