⇤ ← Revision 1 as of 2009-05-26 11:40:22
2409
Comment: Releasing version 3.8
|
← Revision 2 as of 2010-08-16 10:25:16 ⇥
2413
converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
[[Action(edit)]] | <<Action(edit)>> |
Line 64: | Line 64: |
attachment:ChecksumJob.java | [[attachment:ChecksumJob.java]] |
Appendix A - How To Do Examples
Install the QuickStart according to https://netarchive.dk/suite/Quick_Start_Manual , e.g. in /home/test/netarchive.
- Add some domains to harvest using the the ADMGUI e.g. netarkivet.dk, kb.dk, statsbiblioteket.dk
- Create and run a snapshot with a byte limit of 100.000
- Wait until the job is done
- Setup your browser for browsing and index your harvest job {{{cd /home/test/netarchive/scripts/simple_harvest/bitarchive1/filedir
export CLASSPATH=/home/test/netarchive/lib/dk.netarkivet.common.jar ls }}} e.g.
Arc Merge:
java dk.netarkivet.common.tools.ArcMerge 1-1-20090519073602-00000-dia-test-int-01.kb.dk.arc 1-1-20090519073602-00001-dia-test-int-01.kb.dk.arc > resulting.arc
Extract CDX:
java dk.netarkivet.common.tools.ExtractCDX 1-1-20090519083732-00002-dia-test-int-01.kb.dk.arc > output.cdx
Get Record using Lucene:
#e.g. an URI from the harvest found in your "viewerproxy" export URI=http://netarkivet.dk/index-da.php cd /home/test/netarchive/scripts/simple_harvest/cache/fullcrawllogindex cp -r 1-cache 1-cache.unzip cd 1-cache.unzip/ ls gunzip * export LUCENE_INDEX=/home/test/netarchive/scripts/simple_harvest/cache/fullcrawllogindex/1-cache.unzip java -Ddk.netarkivet.settings.file=$SETTINGSFILE -Dsettings.common.remoteFile.port=5000 dk.netarkivet.archive.tools.GetRecord $LUCENE_INDEX $URI
Upload:
cd /home/test/netarchive/scripts/simple_harvest/ cp /home/test/netarchive/scripts/simple_harvest/bitarchive1/filedir/resulting.arc new_resulting.arc java -Ddk.netarkivet.settings.file=/home/test/netarchive/scripts/simple_harvest/settings.xml -Dsettings.common.remoteFile.port=5000 -cp /home/test/netarchive/lib/dk.netarkivet.archive.jar dk.netarkivet.archive.tools.Upload new_resulting.arc #just press <CTRL-C> to stop the job
Batch e.g. with checksum:
cd /home/test/netarchive mkdir batchprogs #copy attached example batchprog ChecksumJob.java to batchprogs/. cd batchprogs javac ChecksumJob.java java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.remoteFile.port=5000 -Ddk.netarkivet.settings.file=/home/test/netarchive/scripts/simple_harvest/settings.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/ChecksumJob.class -Ooutput.checksum