Differences between revisions 12 and 13
Revision 12 as of 2010-04-23 12:57:29
Size: 2775
Comment:
Revision 13 as of 2010-04-23 12:58:55
Size: 2808
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
This page describes how to check naming of harvest configurations. This page describes how to the check the domainstatistics and the naming of harvest configurations.

Check domain statistics and domain configuration

This page describes how to the check the domainstatistics and the naming of harvest configurations.

Do the following in a browser

Start Program

Check domain statistics

If you are non-netarchive tester, you need to add your own test domains first

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. olsen2.dk and pres enter

  • Check that you get the answer:
    • "The domain 'olsen2.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at [http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain]

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)