Differences between revisions 1 and 2
Revision 1 as of 2009-09-22 13:34:11
Size: 2654
Editor: TueLarsen
Comment:
Revision 2 as of 2009-10-01 14:52:57
Size: 2871
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
---++ Do following in a browser
Start program
   * Go to =http://kb-test-adm-001.kb.dk:807?/HarvestDefinition/= (where '807?' is the port number)
---++ Do following in a browser Start program

 * Go to http://$GUIadminserver:$http-port/HarvestDefinition/
  where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
Line 9: Line 11:
   * Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)
   * Choose  'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter
   * Check that the title bar contains search criteria and number of hits.

* Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)
 * Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter
 * Check that the title bar contains search criteria and number of hits.
Line 13: Line 16:
   * Choose  'Definitions' -> &#8217;Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter
   * Check that you get the answer:
     "The domain 'olsen.dk' does not exist in the database. Create it? Yes"     * Press "Yes"     * Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

* Choose 'Definitions' -> Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter
 * Check that you get the answer:
  . "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
* Press "Yes"
* Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)
Line 19: Line 23:
   * Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>
   * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
   * Click 'New configuration'
   * Enter a configuration 'Name' of your own choice
   * You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
   * Enter a "maximum number of bytes" of your own choice
   * Click 'Save configuration'
   * Check the configuration list now includes the new configuration name
Line 28: Line 24:
 * Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>
 * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
 * Click 'New configuration'
 * Enter a configuration 'Name' of your own choice
 * You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
 * Enter a "maximum number of bytes" of your own choice
 * Click 'Save configuration'
 * Check the configuration list now includes the new configuration name
Line 29: Line 33:
   * Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>
   * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
   * Click 'Show crawler traps'
   * Paste into the input-box following list of known crawler traps
     Currently is it only possible to ingest the first 24 entries - according to bug 1060
     
     !Remember currently to update the list of crawler trap examples from our production system.
The file is now located in CVS at [[http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt]]
Line 38: Line 34:
   * Click 'Save'
   * Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."
   
 * Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>
 * Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain
 * Click 'Show crawler traps'
 * Paste into the input-box following list of known crawler traps
  . Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.
The file is now located in CVS at [[http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt]]

 * Click 'Save'
 * Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

---+ Check domain statistics and domain configuration

This page describes how to check naming of harvest configurations.

---++ Do following in a browser Start program

Check domain statistics

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter

  • Check that you get the answer:
    • "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)