⇤ ← Revision 1 as of 2009-09-22 13:34:11
2654
Comment:
|
2871
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
---++ Do following in a browser Start program * Go to =http://kb-test-adm-001.kb.dk:807?/HarvestDefinition/= (where '807?' is the port number) |
---++ Do following in a browser Start program * Go to http://$GUIadminserver:$http-port/HarvestDefinition/ where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication |
Line 9: | Line 11: |
* Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17) * Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter * Check that the title bar contains search criteria and number of hits. |
* Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17) * Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter * Check that the title bar contains search criteria and number of hits. |
Line 13: | Line 16: |
* Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter * Check that you get the answer: "The domain 'olsen.dk' does not exist in the database. Create it? Yes" * Press "Yes" * Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18) |
* Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter * Check that you get the answer: . "The domain 'olsen.dk' does not exist in the database. Create it? Yes" * Press "Yes" * Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18) |
Line 19: | Line 23: |
* Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim> * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain * Click 'New configuration' * Enter a configuration 'Name' of your own choice * You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml). * Enter a "maximum number of bytes" of your own choice * Click 'Save configuration' * Check the configuration list now includes the new configuration name |
|
Line 28: | Line 24: |
* Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim> * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain * Click 'New configuration' * Enter a configuration 'Name' of your own choice * You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml). * Enter a "maximum number of bytes" of your own choice * Click 'Save configuration' * Check the configuration list now includes the new configuration name |
|
Line 29: | Line 33: |
* Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim> * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain * Click 'Show crawler traps' * Paste into the input-box following list of known crawler traps Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system. The file is now located in CVS at [[http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt]] |
|
Line 38: | Line 34: |
* Click 'Save' * Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..." |
* Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim> * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain * Click 'Show crawler traps' * Paste into the input-box following list of known crawler traps . Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system. The file is now located in CVS at [[http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt]] * Click 'Save' * Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..." |
---+ Check domain statistics and domain configuration
This page describes how to check naming of harvest configurations.
---++ Do following in a browser Start program
Go to http://$GUIadminserver:$http-port/HarvestDefinition/
- where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
Check domain statistics
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)
Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter
- Check that the title bar contains search criteria and number of hits.
Check single domain creation
Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter
- Check that you get the answer:
- "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
- Press "Yes"
Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)
Check harvest configuration
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>
Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
- Click 'New configuration'
- Enter a configuration 'Name' of your own choice
- You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
- Enter a "maximum number of bytes" of your own choice
- Click 'Save configuration'
- Check the configuration list now includes the new configuration name
Check Crawler traps
Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>
Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
- Click 'Show crawler traps'
- Paste into the input-box following list of known crawler traps
- Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.
- Click 'Save'
- Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."