---+ Check domain statistics and domain configuration
This page describes how to check naming of harvest configurations.
---++ Do following in a browser Start program
Go to =http://kb-test-adm-001.kb.dk:807?/HarvestDefinition/= (where '807?' is the port number)
Check domain statistics
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)
Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter
- Check that the title bar contains search criteria and number of hits.
Check single domain creation
Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter
- Check that you get the answer:
- "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
- Press "Yes"
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)
Check harvest configuration
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>
Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
- Click 'New configuration'
- Enter a configuration 'Name' of your own choice
- You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
- Enter a "maximum number of bytes" of your own choice
- Click 'Save configuration'
- Check the configuration list now includes the new configuration name
Check Crawler traps
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>
Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
- Click 'Show crawler traps'
- Paste into the input-box following list of known crawler traps
- Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.
The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt
- Click 'Save'
- Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."