Wait for Job to be Indexed

Use the GUI (http://kb-test-adm-001.kb.dk:807?/History/Harveststatus-alljobs.jsp?jobstatusname=ALL) to monitor your harvest job. When it reaches status Done, log in to kb-test-way-001. Monitor the content of the batch output directory with, for example,

[test@KB-TEST-WAY-001 ~]$ watch ls -l $TESTX/batchOutputDir/

In the test system, the indexer checks for new files every five minutes so there should only be five minutes between the end of your harvest job and at least one file showing up in the directory. You can check when the indexer last ran by looking in $TESTX/log/WaybackIndexerApplication0.log.0 for entries like

Jun 4, 2010 10:52:13 AM dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient batch
FINE: Starting batchjob '
FileList job:
Files Processed = 0
Files  failed = 0' running on replica 'KB'

When the an index file appears in the output directory $TESTX/indexDir, look at its contents. There should be lines like

rosenthal-sodemann.dk/file_to_harvest.html 20100603121108 http://www.rosenthal-sodemann.dk/file_to_harvest.html text/html 200 MSFGU3777WVAXLNSHOF27EFWC2G5C2Z2 - 11022 1-1-20100603121107-00000-kb-test-har-002.kb.dk.arc

reflecting the domains you harvested.

It43Wait for Indexing (last edited 2010-09-17 11:32:34 by TueLarsen)