== Overview of the NetarchiveSuite Wayback module == The wayback module in netarchivesuite consists of three components i. The wayback webapplication. This is a tomcat webapplication configured with a !NetarchiveSuite plugin that allows it to communicate with the archive. i. The !NetarchiveSuite batch-indexer. This is a standalone !NetarchiveSuite application which indexes newly harvested material. i. The !NetarchiveSuite index-aggregator. This is a standalone !NetarchiveSuite application which sorts the indexes and merges them into the large index files used by wayback. == Status of Wayback == The first two components are complete and the third is nearing completion. However the batch-indexer is not useful without the index-aggregator so it should not be run at the present time. However, in the setup used for the release test, the settings file generated for the batch-indexer is also used for wayback itself. Therefore it is necessary to configure and deploy the indexer although it should not be started. == Configuration of Tomcat == Tomcat should be configured with two Connectors in server.xml: {{{ }}} There are start and stop scripts for tomcat in the webarkivering CVS under conf/wayback/test/scripts. The start script should be modified to point to the desired location of the !NetarchiveSuite settings file, the name of the archive environment (PROD or ACCEPT) and a unique applicationInstanceId (e.g. WAYBACK). Note that the stop script includes a cleanup command which removes an unwanted directory from tomcat's temp area. == Configuration of Apache == As discussed at the February teleconference, PROD wayback must only be visible externally via a secured apache proxy (connected to port 8080). == Configuration of Wayback == The script used to deploy and start wayback in the release test is as follows: {{{ MACHINE=test@kb-test-way-001 rm -r wayback_scripts ## Get the start and stop scripts for tomcat CVS_RSH=ssh cvs -Q -d:ext:test@kb-prod-udv-001.kb.dk:/home/cvsroot checkout -P -d wayback_scripts projects/webarkivering/conf/wayback/test/scripts/start.sh CVS_RSH=ssh cvs -Q -d:ext:test@kb-prod-udv-001.kb.dk:/home/cvsroot checkout -P -d wayback_scripts projects/webarkivering/conf/wayback/test/scripts/stop.sh ## Set the applicationInstanceId and Environment for wayback INST_ID=WAYBACKWEBAPP$TESTX sed "s/##ENV##/$TESTX/" wayback_scripts/start.sh > wayback_scripts/temp.sh; mv wayback_scripts/temp.sh wayback_scripts/start.sh sed "s/##INST_ID##/$INST_ID/" wayback_scripts/start.sh > wayback_scripts/temp.sh; mv wayback_scripts/temp.sh wayback_scripts/start.sh scp wayback_scripts/*.sh $MACHINE:/home/test ssh $MACHINE chmod 755 start.sh ssh $MACHINE chmod 755 stop.sh ssh $MACHINE ./start.sh scp $HOME/wayback/wayback.war $MACHINE:tomcat/webapps/ROOT.war ## Copy the wayback settings file to the target machine ## ** scp /home/test/release_software_dist/$TESTX/kb-test-way-001.kb.dk/settings_WaybackIndexerApplication.xml $MACHINE:/home/test/settings_wayback.xml ## Check out all spring configuration files CVS_RSH=ssh cvs -Q -d:ext:test@kb-prod-udv-001.kb.dk:/home/cvsroot checkout -P -d spring projects/webarkivering/conf/wayback/test/spring ## Copy Spring files over to wayback scp spring/* $MACHINE:/home/test/tomcat/webapps/ROOT/WEB-INF ## Restart tomcat sleep 5 ssh $MACHINE ./start.sh }}} It should be straightforward to adapt this to use in production. Simple copy all the referenced configuration files in CVS to a new directory conf/wayback/prod and edit any necessary connection information to make it appropriate for the production system (see below for more details). The line marked ** copies the !NetarchiveSuite settings file for the batch-indexer over to the location of the wayback settings file (as defined in the tomcat start script). This settings file is generated from {{{deploy_config_multi_bitapps.xml}}} in the CVS. It is therefore necessary to modify the production deploy.xml to include the wayback batch-indexer (see below for more details). The file {{{wayback.war}}} is the unmodified webapplication released by the wayback developers. It may be copied from kb-prod-udv-001:/home/test/wayback/wayback.war. == Wayback Configuration Files == These lie in conf/wayback/test/spring in CVS. The following files require modification: i. {{{CDXCollection.xml}}}: modify the element {{{ /home/test/wayback_cdx/index.cdx }}} . to point to the actual locations of the index files in the PROD or ACCEPT environments i. {{{wayback.xml}}}: change the name of the host machine (kb-test-way-001) to the name of the actual host at all occurrences (five places). Change the value of the property {{{maxRecords}}} to 20000. == Deploy Configuration == The section {{{ lib/dk.netarkivet.archive.jar lib/dk.netarkivet.common.jar lib/dk.netarkivet.monitor.jar lib/dk.netarkivet.wayback.jar 1 100 100 100 10 100 jdbc:derby://localhost:1527/wayback_indexer_db;create=true org.apache.derby.jdbc.ClientDriver false org.hibernate.transaction.JDBCTransactionFactory org.hibernate.dialect.DerbyDialect true true update KB batchOutputDir tempdir 3 0 300000 5 }}} . needs to be modified to reflect the actual production environment. However as the batch.indexer is not to be started, the elements {{{hibernate}}} and {{{indexer}}} can actually be removed at this stage. The only function of the running the !DeployApplication on this script is to generate a {{{settings_WaybackIndexerApplication.xml}}} file which can be used by the wayback webapplication. For future reference, however, it should also be noted that the classpath lib/dk.netarkivet.wayback.jar has been added to all archive applications in this deploy file and this should be carried over to the production deployment configuration for future use. == Versioning == The accept test must run against an archive running the same version as the current PROD system. Wayback must use !NetarchiveSuite files from the current release. For reference, minutes of the February teleconference are at http://kb-prod-udv-001.kb.dk/twiki/bin/view/Netarkiv/BriefMeeting22Feb2010