DiscussionHarvester1 - NetarchiveSuite

Migration of Harvest Database to Hibernate-based Object-Relational Framework

Background

The current NAS architecture maintains a persistent store of harvest definition information in a database which is accessed by a Data Access Object (DAO) layer. The DAO layer is written and maintained by NAS devdelopers and accesses the database directly via SQL in the form of JDBC Prepared Statements.

Hibernate is an object-relational framework which provides components that allow Java objects (such as HarvestDefinition-s or Job-s) to be persisted directly. The mapping of the objects to a database layer is carried out by the Hibernate framework itself, based on the structure of the objects to be persisted and additional information supplied in Java annotations. Objects are retrieved from storage via an object-based query language (HQL) or via a Query-API. In principle, therefore, well-written Hibernate applications are database-neutral.

Hibernate is used in NAS for the wayback-indexer component, but with a very simple data model. This document discusses the idea of moving our entire Harvest Definition Database to Hibernate.

Problems with the Current Setup

Multiple Database Support

In the current code we support DerbyDB, MySQL and Postgres, thus trebling the workload on every database change. Moreover, the relevant expertise in the different databases is spread out over the different partners. This makes significant change to the persistence layer very challenging.

Complexity of Code

The current Harvester Database persistence layer has been reworked so many time that the DAO layer has become encrusted with hard-to-understand and hard-to-maintain code.

Schema Management System