= Assignment group B.4 - Improve batch architecture = <> <> == References == === Reference documents === * [[attachment:AssignmentGroupB2/bitarchive.pdf|Slides from original design discussion about bitarchive improvements]] * [[attachment:AssignmentGroupB2/bitarchive-ass.pdf|Slides from discussion about assignments with bitarchive improvements]] === Dependencies === * B.4.1 needs to be done before B.4.2 to avoid a huge security flaw. * B.4.3 is independent, but has most use after B.4.2 === Terminology === Nothing yet. === Bugs === (maybe) addressed by these assignments (important ones in bold) * [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1193&group_id=7&atid=108|Feature Request 1193: Exceptions from FileBatchJob stop batch job processing]] === Feature requests === (maybe) addressed by these assignments (import ones in bold) * [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1285&group_id=7&atid=108|Feature Request 1285: Storage of processed batch classes]] * [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1498&group_id=7&atid=108|Feature Request 1498: batchJobs given in jar files]] (implemented and released) * [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1509&group_id=7&atid=108|Feature Request 1509: Possibilities to check if batchjob has been executed on all files]] * [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1409&group_id=7&atid=108|Feature Request 1409: Batchjob information in line with processed files]] <> == Assignment B.4.1 - use security manager for batch jobs == Read relevant parts of the [[http://java.sun.com/docs/books/tutorial/security1.2/index.html|Java security tutorial trail]]. Brief rundown of JavaSecurityCommands. Set up our bitarchives to run with a security policy that grants '''!AllPermission''' to code signed by the '''!NetarchiveSuit'''e, and only limited permissions to other code. The limited permissions should be just enough for third party class files to write results of batchjobs to the result files. Update our build scripts to sign the jar files. Update our deploy applications to start the bitarchives with a security policy. Note: This may not be the correct or ideal solution, a better solution may present itself while reading the Java tutorial trail, which I have not done while writing this assignment. Note: I am unsure if this will not still allow batch jobs to interfere with each other. || '''Estimated time''' || '''Estimator''' || || 3 md || KFC || <> == Assignment B.4.2 - allow third-party batch jobs to be submitted == Basically, we need a batch job that takes a serialised class file, and loads it at the bitarchives and then runs it. Such a class would look like this (but should of course be tested and documented properly, and have error handling): {{{ public class ClassBatchJob extends FileBatchJob { private final byte[] fileBatchJobClass; private transient FileBatchJob job; public ClassBatchJob(byte [] fileBatchJobClass) { this.fileBatchJobClass = fileBatchJobClass; } public void initialize(OutputStream os) { Class c = new ClassLoader() { Class initialize() { return defineClass(null, fileBatchJobClass, 0, fileBatchJobClass.length); } }.initialize(); try { job = (FileBatchJob) c.newInstance(); } catch (InstantiationException e) { throw new IOFailure("Unable to initialise class", e); } catch (IllegalAccessException e) { throw new IOFailure("Illegal access for class", e); } job.initialize(os); } public boolean processFile(File file, OutputStream os) { return job.processFile(file, os); } public void finish(OutputStream os) { job.finish(os); } } }}} Once that job has been written, it is merely a question of making a webpage where you can upload a class file, and submit the job and print the results. The webpage should be asynchronous, so a web timeout does not prevent you from seeing the result of the batch job. Don't forget to update the user manual, the developer manual, and the installation manual (about starting with security manager) afterwards. || '''Estimated time''' || '''Estimator''' || || 3 md || KFC || <> == Assignment B.4.3 - better infrastructure == Update FileBatchJob with a merge method, which given two output files from this batch job, defines how they should be merged to one. The default implementation should simply concatenate them. Update BitarchiveMonitor to call this method when merging. Update developer documentation with information on the extended batch job definition. || '''Estimated time''' || '''Estimator''' || || 2 md || KFC || <> == Assignment B.4.4 - Yet more better infrastructure == Collect exceptions during the execution of a FileBatchJob, and send these back as part of the BatchStatus ([[https://gforge.statsbiblioteket.dk/tracker/?group_id=7&atid=105&func=detail&aid=1193|Bug 1193]]). At the same time, we should improve the information logged about the executed BatchJobs, as from now, we may not know the identity of these batchjobs: [[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1279&group_id=7&atid=105|Bug #1279 Missing toString method on FileBatchJob classes]]. || '''Estimated time''' || '''Estimator''' || || 3.5 md || SVC ||