Differences between revisions 2 and 3
Revision 2 as of 2009-10-01 14:52:57
Size: 2871
Editor: TueLarsen
Comment:
Revision 3 as of 2009-10-01 14:56:39
Size: 2965
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
---+ Check domain statistics and domain configuration Check domain statistics and domain configuration
Line 5: Line 5:
---++ Do following in a browser Start program Do following in a browser Start program
Line 8: Line 8:
  where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication   . where GUIadminserver and http-port are specified in the deploy configuration file under the application named dk.netarkivet.common.webinterface.GUIApplication
  . In the one-machine setup (deploy_example_one_machine.xml ) the link will be : http://localhost:8074

Check domain statistics and domain configuration

This page describes how to check naming of harvest configurations.

Do following in a browser Start program

Check domain statistics

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter

  • Check that you get the answer:
    • "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)