Differences between revisions 9 and 10
Revision 9 as of 2010-01-27 13:55:19
Size: 2853
Comment:
Revision 10 as of 2010-03-09 14:02:46
Size: 2765
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 42: Line 42:
  . Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.   Remember currently to update the list of crawler trap examples from our production system.

Check domain statistics and domain configuration

This page describes how to check naming of harvest configurations.

Do following in a browser

Start Program

Check domain statistics

If you are non-netarchive tester, you need to add your own test domains first

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. olsen2.dk and pres enter

  • Check that you get the answer:
    • "The domain 'olsen2.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at [http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain]

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)