Selective Harvests

selective_harvests.png

The front page by default shows the list of selective harvests.

You can [Activate] an inactive harvest definition and [Deactivate] an active harvest definition. If you deactivate a running harvest, the system will finish the running jobs.

Click on [Edit] to change an existing harvest definition or [Create new harvest definition]

Click on [History] if you wish to trace back all the jobs from former finished harvests.

Creating/editing a selective harvest

selective_harvest_edit.png

Create a new selective harvest definition by pressing [Create new selective harvestdefinition] from the frontpage.

Give the harvestdefinition a recognizable harvest name – you can not change it later. If necessary add a comment.

Choose a schedule from the dropdown list.

Now you can add domains to the harvestdefinition.

Write the name of the domains you want to add in the box “Enter domain(s) to add to the harvest here” and click on [Add domains].

The added domains will appear in the column “Domain”.

For each added domain, choose the wanted configuration from the dropdown list for each domain. Press [Save] to save the harvestdefinition.

The scheduling of selective harvest definitions can be overridden by filling out the input field Override with new date. Simply set the date to whenever you wish the harvest definition to run next time. The scheduling of the harvest definition will continue from that point in time.

Easy creation of non existing domains

selective_harvest_non_existing_domain.png

When adding a domain that is not existing in the database you are warned with The following domains are unknown and were not added. You can simply add the unknown domains to the database and your harvestdefinition by clicking [Create and add to the harvestdefinition]

Event harvest

Event harvests are treated almost the same as selective harvests in the system. The only difference is a power-adding of domains function. This could be used for selective harvests as well but was developed for event harvesting definitions where the operator must fill in larger number of URLs without having to edit configurations and seedlists on all those domains.

Adding seeds to an event harvest

event_harvest_seeds.png

Use [Add seeds]. Enter identified start-URLs covering the event in the “Enter seeds:” box. In “Max number of bytes per domain” enter prefered max number, e.g. 1000000000. Select a harvest template with the “Harvest template” drop down box.

All seeds will use the same template, so to harvest different seeds with different templates you need to add them bunch by bunch for each template you need for your event harvest.

Pressing [Insert] starts the power-adding function. This function runs through the entered seeds one by one and does the following with each seed:

  1. Finds the domain from which the seed derives
  2. Creates a seedlist with the name of the harvestdefinition and the template as seedlist-name
  3. Creates a configuration with the name of the harvestdefinition and the template as configuration-name. And select the seedlist from (2) to use with the new configuration.If the seedlist to create in (2) or the configuration to create in (3) already exist (If the power-adding function has been used before with other seeds from the same domain in the same event harvest) the system will only add the new URLs to the existing seedlist.

edit

User Manual 3.8/Selective Harvests (last edited 2010-08-16 10:24:37 by localhost)