Harvest History

All jobs

[Harvest Status] in the left menu by default shows a chronological list of all jobs ever harvested with status Started in ascending order. The same does [All jobs]

all_jobs.png

If information is wanted for jobs with other statuses (or All statuses) or other sort order, then this can be specified in the combo-boxes in the top of the page and then activated by clicking the Show button.

For each job the page shows information about the job and its status as well as information about errors (harvest errors or upload errors) and number of configurations in the job.

Chose [Run number] if you want to check details on a specific run of that harvestdefinition – note that a run can consist of multiple jobs.

Chose [Harvest name] if you want to check details on the history of a specific harvest definition.

Chose [JobID] if you want to check details on a specific job.

In case of Harvest errors, a [Restart] button will appear and the operator can choose to resubmit that specific job to be harvested again.

failed_desc.png

When resubmitting a failed job, the status will say 'Resubmitted' and refer to the new resubmitted job.

resubmit_job.png

History of a harvestdefinition

history_harvestdefinition.png

The history page for a harvestdefinition is the same as you can reach from the frontpage with the [History] buttons.

This history page gives you further information for each run of the harvestdefinition: Start time, End time, number of bytes harvested and number of documents harvested.

The page also show how many jobs each run consists of and how many of these that failed and eventually got resubmitted.

History of a domain

history_domain.png

If you want to see all the jobs connected to a specific domain, click on [All jobs per domain] and search for the domain name.

You will get a chronological list of the harvest definitions including the chosen domain.

This page gives the same history information as the other two history pages and further more gives a “Stopped due to” information. This column will show the operator if a harvest was stopped unexpectedly or if the harvest hit the max-bytes limit for the chosen domain or if the harvest was stopped because of an error on the harvester machine.

Details on a job

job_details.png

Clicking on a jobID on any of the harvest history pages will give you a very detailed report on the job.

This page gives all the information available about the job itself (e.g. max-bytes limit) and about the single domains included in the job.

Furthermore the page shows the complete seedlist used with the job and the complete “Harvest order template” as well as detailed error information in case of errors. The two latter is mainly for advanced users debugging specific crawls where things didn't go as expected.

edit

User Manual 3.8/Harvest History (last edited 2010-08-16 10:24:39 by localhost)