Differences between revisions 4 and 5
Revision 4 as of 2009-10-01 14:57:23
Size: 2968
Editor: TueLarsen
Comment:
Revision 5 as of 2009-10-01 15:00:07
Size: 3049
Editor: TueLarsen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:

If you are non-netarcive tester, you need to add your own test domains first.

Check domain statistics and domain configuration

This page describes how to check naming of harvest configurations.

Do following in a browser

Start Program

Check domain statistics

If you are non-netarcive tester, you need to add your own test domains first.

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria and number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. <verbatim>olsen.dk</verbatim> and pres enter

  • Check that you get the answer:
    • "The domain 'olsen.dk' does not exist in the database. Create it? Yes"
  • Press "Yes"
  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim> netarkivet.dk </verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. <verbatim>netarkivet.dk</verbatim>

  • Check that you switch to the 'Edit Domain' screen with data on the <verbatim>netarkivet.dk</verbatim> domain

  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Currently is it only possible to ingest the first 24 entries - according to bug 1060 !Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text/plain http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain][kb-prod-udv-001.kb.dk:/projects/webarkivering/documents/internal/crawlertrapsCollection.txt

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)