Differences between revisions 10 and 11
Revision 10 as of 2010-03-09 14:02:46
Size: 2765
Editor: TueLarsen
Comment:
Revision 11 as of 2010-04-23 12:57:10
Size: 2771
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Check domain statistics and domain configuration
= Check domain statistics and domain configuration =
Line 42: Line 41:
  Remember currently to update the list of crawler trap examples from our production system.   . Remember currently to update the list of crawler trap examples from our production system.
Line 44: Line 43:

Check domain statistics and domain configuration

This page describes how to check naming of harvest configurations.

Do following in a browser

Start Program

Check domain statistics

If you are non-netarchive tester, you need to add your own test domains first

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. olsen2.dk and pres enter

  • Check that you get the answer:
    • "The domain 'olsen2.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at [http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain]

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)