Differences between revisions 16 and 17
Revision 16 as of 2010-08-16 10:24:06
Size: 2889
Editor: localhost
Comment: converted to 1.6 markup
Revision 17 as of 2012-07-02 11:38:45
Size: 2969
Comment:
Deletions are marked like this. Additions are marked like this.
Line 22: Line 22:
  . "The domain 'olsen2.dk' does not exist in the database. Create it? Yes"
 * Press "Yes"
No matching domains found for query 'olsen2.dk' when searching by 'NAME'
Go to HarvestDefinition/Definitions-create-domain.jsp
and type "olsen2.dk", and pres "create".

Check domain statistics and domain configuration

This page describes how to the check the domainstatistics and the naming of harvest configurations.

Do the following in a browser

Start Program

Check domain statistics

Note: If you are non-netarchive tester, you need to add your own test domains first using the !HarvestDefinition/Definitions-create-domain.jsp page.

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (currently 17)

  • Choose 'Definitions' -> 'Find Domain(s)' and type * in the search field and press enter

  • Check that the title bar contains search criteria "*" and the number of hits.

Check single domain creation

  • Choose 'Definitions' -> ’Find Domain(s)', type a nonexistent domain e.g. olsen2.dk and pres enter

  • Check that you get the answer:

No matching domains found for query 'olsen2.dk' when searching by 'NAME' Go to HarvestDefinition/Definitions-create-domain.jsp and type "olsen2.dk", and pres "create".

  • Choose 'Definitions' -> 'Domain Statistics' and note the number of domains (now 18)

Check harvest configuration

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'New configuration'
  • Enter a configuration 'Name' of your own choice
  • You may change the harvesttemplate; the default template is the first template in the list of available templates sorted in alphanumerical order (currently 3levels_orderxml).
  • Enter a "maximum number of bytes" of your own choice
  • Click 'Save configuration'
  • Check the configuration list now includes the new configuration name

Check Crawler traps

  • Choose 'Definitions' -> 'Find Domain(s)' and type an existing domain e.g. netarkivet.dk

  • Check that you switch to the 'Edit Domain' screen with data on the netarkivet.dk domain
  • Click 'Show crawler traps'
  • Paste into the input-box following list of known crawler traps
    • Remember currently to update the list of crawler trap examples from our production system.

The file is now located in CVS at http://kb-prod-udv-001.kb.dk/cvsweb/cvsweb.cgi/~checkout~/projects/webarkivering/documents/internal/crawlertrapsCollection.txt?rev=1.1;content-type=text%2Fplain

  • Click 'Save'
  • Check that you do not get an error message like "The regular expression '.*www.\jettebrian\.dk\/calendarix.*' is invalid..."

It24CheckHarvConfig (last edited 2012-07-02 11:41:33 by SoerenCarlsen)