Review (NS-88): B.2.2 changes

Author

Jonas

Moderator

Jonas

State

Review

Objectives

The checksum archive is not properly handled by the FilebasedActiveBitPreservation.
This does not handle the new files for B.2.2b (The new files in dk.netarkivet.archive.bitpreservation).
Please ensure that the FilebasedActiveBitPreservation is not called directly any more. Instead the ActiveBitPreservationFactory should be used for initializing any instance of ActiveBitPreservation.

Total Time Used (Coding,Documentation,Review):

Time use (Coding,Documentation,Review)
KFC: 2.0
JOLF: 20.0
dk.netarkivet.common.utils.DBUtils + dk.netarkivet.common.distribute.arcrepository.TrivialArcRepositoryClient -> Follow up by KFC
The rest -> Follow up by JOLF

General comments:

Description

Classification

Status

dk.netarkivet.archive.webinterface.BitpreserveFileState uses FilebasedActiveBitPreservation directly 4 times.

Minor

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/ChecksumFileServer.java', revision 1006

Lines

Description

Classification

Status

39

Java-doc the class

Cosmetic

OK

45

No reason for "...getName()"

Cosmetic

OK

82

This has been decided! A ChecksumReplica use a THE_CR channel.

Cosmetic

OK

88

I think not. It is only used to know which archives to wait for batch replies from

Cosmetic

OK

94

The checksum archive is defined in the ChecksumServerAPI interface. This should at least be described somewhere. Put 'cs' in this class instead.

Minor

OK

117

remove

Cosmetic

OK

127

X (and the others)

Cosmetic

OK

194

Not a record, an arc-file entry. In this case a line in the file of the FileChecksumArchive.

Cosmetic

OK

247

Unnecessary emtpy line

Cosmetic

OK

250

A very poor warning message. Could be other errors/exceptions. Put 'msg' into the log.

Cosmetic

OK

272

Put 'msg' into the log.

Cosmetic

OK

289

Spelling error: 'recieving', not 'receving'. Also make 'get all checksum message' into the name of the message. NO: receiving!

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/settings.xml', revision 940

Lines

Description

Classification

Status

59

This class needs to be moved to common, now

Minor

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/ChecksumServerAPI.java', revision 937

Lines

Description

Classification

Status

13

Perhaps the name should be ChecksumArchiveServerAPI (if API is retained)?

Cosmetic

OK

13

Could this share a common superinterface with BitArchiveServer?

Cosmetic

Postpone

13

API is usually considered bad practice in interface names

Cosmetic

OK

16-19

REMOVE

Minor

OK

30-41

java-doc

Cosmetic

OK

36-37

It is unclear to me that these need to be part of the public API

Minor

OK

43-47

remove!

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/GetAllChecksumMessage.java', revision 992

Lines

Description

Classification

Status

39-40

The output is a file, which corresponds to the reply file from a ChecksumJob.

Cosmetic

OK

39

Should it say "all the files?"

Cosmetic

OK

44

Shouldn't it be "GetAllChecksumsMessage

Cosmetic

OK

46

As above: Don't use

Cosmetic

OK

118

Why implement it if only calls super.toString()?

Cosmetic

OK

121

Write java-doc, or add @Override. Or both? Just java-doc.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/distribute/JMSArcRepositoryClient.java', revision 1006

Lines

Description

Classification

Status

505

I strongly dislike that we have different methods for doing this on bitreplicas and checksumreplicas. It seems to break the abstraction of bitarchives badly.

Minor

Postpone

522-523, 537, 539-543

We should generify the Synchronizer to be Synchronizer<T extends NetarkivetMessage> and handle the classcasting in there

Cosmetic

Postpone

532-533

That's not an empty list or indeed file. That's null. Update javadoc, if null is possible

Cosmetic

OK

547

This should be placed in the tempDir from the settings: 'settings.common.tempDir'.

Cosmetic

OK

552-553

Remove double space (end of first line and first of second line).

Cosmetic

OK

563

As above

Minor

Postpone

571

Would it be possible to extract a supermethod for these two? It seems like a lot of duplicate code. Possibly for batch jobs too

Cosmetic

Postpone

605

This should be placed in the tempDir from the settings: 'settings.common.tempDir'.

Cosmetic

OK

618-619

Comment this function! checksum must be the 'wrong checksum' of the bad file.

Cosmetic

OK

626-629

What's the relationship between "remove and get one file" and "correct"? Should "remove and get one file" be deprecated (I would like that, that was the original design)?

Minor

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/bitarchive/distribute/BitarchiveClient.java', revision 959

Lines

Description

Classification

Status

General

Please do not destroy our file header formatting.

Cosmetic

OK

43

Needs a '.' at the end.

Cosmetic

OK

48

No need for ".getName()"

Cosmetic

OK

185-186

Why is there no end paranthesis?

Cosmetic

OK

214-257

Handle these checksum replica messages. The current handling is not appropriate. Change the handling of these methods

Minor

Postpone

260

No reason to write "Method for..." in javadoc, that's implicit. Javadoc should describe how the method can be used.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/common/distribute/arcrepository/TrivialArcRepositoryClient.java', revision 937

Lines

Description

Classification

Status

General

I think this class needs to be deprecated or moved to test. LocalArcRepositoryClient does a much more sensible job, and they seem to have the same aim

Minor

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/FileChecksumArchive.java', revision 937

Lines

Description

Classification

Status

General

Make a 'removedChecksum' file, which contains the wrong checksum entries from the correct method.

Minor

Postpone

51

Is this really a good idea? I know the checksumjobs already do this, but I feel unfomfortable with the fact that we can probably break the archive by uploading a file with a "##" in it's name. Make a FR to do something better

Cosmetic

OK

65

.getName is unneccessary

Cosmetic

OK

91

Perhaps use the word "singleton"?

Cosmetic

OK

105

What does it do? Especially: It actually seems to throw exceptions

Cosmetic

OK

110-119

The FileChecksumArchive checks for minimum space left, just like the Bitarchive. Is this even necessary, when each entry has a size smaller than 100 bytes? At least do not use the minimum space left from the Bitarchives.

Minor

Postpone

124

This file should be placed in a archive directory. Not the installation folder. Make setting for folder name, and make checksum file name contain the replica id.

Minor

Postpone

174-182

'after all' ... What should that mean in this context? And change to 'file.isFile()'.

Cosmetic

OK

217

seconf? change to 'second'.

Cosmetic

OK

219

Eeeeek! Possible IndexOutOfBounds exception

Minor

OK

229

WRONG. IOFailure only if problems with the checksum file.

Cosmetic

OK

231

This method seems horribly, horribly inefficient. We have room for the current archive in memory, so use that method for now, and remember to use a journal to persist the checksum archive

Minor

Postpone

290

Eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeek! This method is unsynchronized and writes to a file, _and_ it is called from a threaded context (by JMS)

Major

Postpone

300

What to do if the archive entry already exists? Should the upload be considered a success if they have the same checksum? And a failure if that is not the case? It is a success!

Cosmetic

OK

307

Describe why the 'true' is given as an argument. It means 'append' instead of override.

Cosmetic

OK

317-320

Why not add a newline to the entry we write instead?

Cosmetic

OK

325-326

indentation

Cosmetic

OK

377, 396

Something tells me we have a discepancy here - the MD5 utility method should probably both throw IOFailure, in which case these methods seem unneccessary?

Cosmetic

Postpone

415-474

This function has not a good way of handling the file. If the file is too large, then there will occur memory problems. Perhaps creating a new file with the same content, but where the bad entry has been removed. And then removing the old file and moving the 'new and corrected' file to the correct location in the file system. Change to a 'correct' method. And synchronize.

Major

Postpone

415-416

This method seems to irrevocably remove stuff from the checksum archive, with no way to even manually restore it. That is very scary. We should probably store the removed checksum somewhere as a record (the log is not a proper place, it is recycled too often)

Minor

Postpone

416

Eeek! Again a method that writes to the file without synchronizing

Major

Postpone

449

If this write fails we loose the entire archive. We need to write to a new file, and move on top of the old one

Major

Postpone

464

I'm more than a little scared that the file is not closed in a finally block

Minor

Postpone

486

Place the file in the tempDir according to the settings: settings.common.tempDir;

Cosmetic

OK

488

Why copy the file? There is a method to make a remote file that doesn't erase the original. Think of synchronization while copying

Cosmetic

Postpone

492, 525

No message?

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/common/distribute/arcrepository/LocalArcRepositoryClient.java', revision 937

Lines

Description

Classification

Status

342-359

Implement!

Minor

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/GetAllFilenamesMessage.java', revision 1006

Lines

Description

Classification

Status

39-40

Write that this is the Checksum replica equivelent to the batchjob FilelistJob.

Cosmetic

OK

43-44

remove. We do not use uid

Cosmetic

OK

85

No message ?

Cosmetic

OK

110-113

That doc doesn't really make sense in the implementation

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/ChecksumClient.java', revision 992

Lines

Description

Classification

Status

52-66

This should be java-doc, not just comment!

Cosmetic

OK

64

Please, no _ in field names

Minor

OK

66

Consider removing this field

Cosmetic

OK

71, 73, 87, 89

These aren't the parameter actually in the method

Cosmetic

OK

71, 73, 78

No _ in param names, please

Cosmetic

OK

87, 89, 95

As above

Cosmetic

OK

113

What about checksum param

Cosmetic

OK

115

Not used

Cosmetic

OK

116

A previous design decision, which I prefer, is that you cannot correct an old file, without knowing the previous checksum, which you are replacing.

Minor

OK

136

X

Cosmetic

OK

149

It should be 'through' not 'though'.

Cosmetic

OK

168

X

Cosmetic

OK

186

X

Cosmetic

OK

187-188

Java-doc what is returned

Cosmetic

OK

208

X

Cosmetic

OK

221

X

Cosmetic

OK

234

X

Cosmetic

OK

235, 245, 254, 263, 272

DOC!

Cosmetic

OK

239

remove auto comment.

Cosmetic

OK

240-241

NotImplemented seems to indicate that it will be implemented later. Is IllegalState better?

Cosmetic

OK

248

remove auto comment.

Cosmetic

OK

257

remove auto comment.

Cosmetic

OK

266

remove auto comment.

Cosmetic

OK

272

It seems to me this should be removed in favour of correct(...)

Cosmetic

Postpone

275

remove auto comment.

Cosmetic

OK

282

remove auto comment.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/bitpreservation/ActiveBitPreservationFactory.java', revision 940

Lines

Description

Classification

Status

9-12

Wouldn't it be better just to name the interface that must be used, and let people find implementing classes from that? This is bound not to be kept updated

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/common/utils/DBUtils.java', revision 937

Lines

Description

Classification

Status

90

Perhaps it should be updated to take database as parameter?

Minor

OK

313

Again, that must be deprecated, or this method cannot be in the common module

Minor

OK

935

-do-

Minor

OK

Comments on file 'trunk/src/dk/netarkivet/common/distribute/Channels.java', revision 1006

Lines

Description

Classification

Status

100

This one looks a little scary - it seems to indicate that order of the list is significant, and somehow relates to order of other lists. That seems to indicate that we need a better modelling.

Minor

Postpone

126, 134

We do not want any nulls in our system. Change to the Checksum replica identification channel! Remodel!

Minor

Postpone

139, 147

We do not want any nulls in our system. Change to the Checksum replica identification channel! Remodel!

Minor

Postpone

152, 160

We do not want any nulls in our system. Change to the Checksum replica identification channel! Remodel!

Minor

Postpone

165, 173

We do not want any nulls in our system. Change to the Bitarchive replica identification channel! Remodel!

Minor

Postpone

277, 282-289

Handle this differently. The ChannelID should not be able to be 'null', but if the current replica in UseReplicaId is a checksum replica, then the channel in THE_BAMON will be the identification channel for the checksum replica, e.g. THE_CR. If this method is called and the channel refers to the checksum replica identification channel, then a warning should be issued.

Minor

Postpone

319, 324-327

Handle this differently. The ChannelID should not be able to be 'null', but if the current replica in UseReplicaId is a bitarchive replica, then the channel in THE_CR will be the identification channel for the bitarchive replica, e.g. THE_BAMON. If this method is called and the channel refers to the bitarchive replica identification channel, then a warning should be issued. Remodel!

Minor

Postpone

358, 363-366

Handle this differently. The ChannelID should not be able to be 'null', but if the current replica in UseReplicaId is a checksum replica, then the channel in ALL_BA will be the identification channel for the checksum replica, e.g. THE_CR. If this method is called and the channel refers to the checksum replica identification channel, then a warning should be issued. Remodel!

Minor

Postpone

399, 404-407

Handle this differently. The ChannelID should not be able to be 'null', but if the current replica in UseReplicaId is a checksum replica, then the channel in ANY_BA will be the identification channel for the checksum replica, e.g. THE_CR. If this method is called and the channel refers to the checksum replica identification channel, then a warning should be issued. Remodel!

Minor

Postpone

445

bamon should not be able to be 'null'. Remodel!

Minor

Postpone

461-462

cr should be able to be 'null'. Remodel!

Minor

Postpone

473

Remove this commented comment

Cosmetic

OK

485

Make ArgumentNotValid check on 'channelName'. And describe in Java-doc.

Cosmetic

OK

496-497

Change this error message: "The current channel name, '" + channelName + "' does not refer to an identification channel". Also log the error before issuing it.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/bitpreservation/FileListJob.java', revision 937

Lines

Description

Classification

Status

121

What's the purpose of this method? I think it confuses more than it helps, especially when it takes the outputFile as input, or change to be non-static and use field

Cosmetic

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/distribute/ReplicaClientFactory.java', revision 1006

Lines

Description

Classification

Status

45-48

Please remove _ in methodnames Also, this seems to emphsize the problem about needing a datastructure for this.

Cosmetic

Postpone

57

This needs to be changed, since the channels are not to contain nulls. Use instead the ReplicaType to determine the type of replica. The list of ReplicaIds can be extracted by using the function Channels.getUsedReplicaIds, and through the ReplicaIds the Replica can be found and thus the ReplicaType can be extracted. A faster way to determine the replica type is to test whether ALL_BA, ANY_BA or THE_BAMON refers to the same channel, which is only the case for the ChecksumReplica. Though this way is a ugly hack. Remodel!

Minor

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/ArchiveSettings.java', revision 940

Lines

Description

Classification

Status

143-150

Java-doc!

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/distribute/ReplicaClient.java', revision 959

Lines

Description

Classification

Status

53

java-doc

Cosmetic

OK

140

Doc

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/bitpreservation/FileBasedActiveBitPreservation.java', revision 939

Lines

Description

Classification

Status

40

I'm a little scared that this import is added - it seems some separation of responsibility has been broken. Ah. It is unused

Cosmetic

OK

295, 304

A replica should generally not be called 'ba'. Though it might be all right in this function, just describe why.

Cosmetic

OK

303

Change the name to indicate that the checksum are retrieved from a Bitarchive and not a checksum archive.

Cosmetic

OK

346-376

Indentation

Cosmetic

OK

388, 394

Does this only handle bitarchives? change to replica or rep.

Cosmetic

OK

502

Check whether Bitarchive. Don't assume anything. Change to enum switch

Cosmetic

Postpone

554

Again refering to one replica type only. Change to replicaChecksumSet.

Cosmetic

OK

653

Check whether Bitarchive. Don't assume anything. Change to enum switch

Cosmetic

Postpone

815, 821

Bitarchive? A checksum archive can also have damaged entries. Change to damagedReplica.

Cosmetic

OK

831

It needs to be ensured that the reference replica is a Bitarchive and not a checksum archive. Otherwise it can cause ugly problems. Change to enum switch

Cosmetic

Postpone

874, 881

again: damagedReplica.

Cosmetic

OK

912, 916

ba means BitArchive. This should also handle Checksum archives. Change to 'rep' or 'replica'.

Cosmetic

OK

957, 961

Describe why this function only handles bitarchives. (cannot get a file from a checksum archive)

Cosmetic

OK

981-984

Why is there a function which has not been implemented?

Minor

Postpone

993-996

Why is there another function which has not been implemented?

Minor

Postpone

1005-1008

Why has this function also not been implemented?

Minor

Postpone

1023

Do not use '.equals("")', change to '.isEmpty()'.

Cosmetic

OK

1031

'l' is a bad name for a replica. change to rep.

Cosmetic

OK

1040

Handle this todo!

Cosmetic

Postpone

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/distribute/ArcRepositoryServer.java', revision 962

Lines

Description

Classification

Status

176-177

The ReplicaClient should not be named 'bc', since it stands for BitarchiveClient. It should be remaned to 'rc'.

Cosmetic

OK

199

The ReplicaClient should not be named 'bc', since it stands for BitarchiveClient. It should be remaned to 'rc'.

Cosmetic

OK

221-222

The ReplicaClient should not be named 'bc', since it stands for BitarchiveClient. It should be remaned to 'rc'.

Cosmetic

OK

234, 255

Would it be a good idea to turn this into a batch job in the case of it not being a checksum replica? What happens now if I use this method on a bitarchive replica? There seems to be no logic for handling it?

Minor

Postpone

243

The ReplicaClient should be rename 'rc', since 'cc' refers to a ChecksumClient.

Cosmetic

OK

253-257

The message 'msg' is not mentioned in the java-doc

Cosmetic

OK

261

The ReplicaClient should be rename 'rc', since 'cc' refers to a ChecksumClient.

Cosmetic

OK

278

What is the '@Override' policy? Do we use override or do we not. Send mail on mailing list.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/GetChecksumMessage.java', revision 1006

Lines

Description

Classification

Status

38

Of course this should be sent to a specific replica! This means that the message should have the replica id as a variable and a constructor argument.

Minor

Postpone

41-42

Are we using serialVersionUID? If not, then this should be deleted.

Cosmetic

OK

81-82

HANDLE! Document that a null means that the file was not found.

Cosmetic

OK

92

ArgumentNotValid check

Cosmetic

OK

111

Add checksum and replica id

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/ChecksumArchiveAPI.java', revision 937

Lines

Description

Classification

Status

31-39

Add a @see dk.netarkivet.archive.checksum.FileChecksumArchive, and eventually also the DatabaseChecksumArchive, when it has been implemented.

Cosmetic

OK

40

Shouldn't this have a common superclass with an API for BitArchives? Also, it is often considered bad practice to use words like "API" and "Interface" in interface names

Cosmetic

OK

56

Why would I need the checksum for that? I can understand that you need the wrong checksum, to make sure you only remove what you expect Make method "correct" not "remove"

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/checksum/distribute/CorrectMessage.java', revision 944

Lines

Description

Classification

Status

10

This message should correct a bad entry in the archive, whether bad line in ChecksumArchive or a bad file in BitArchive. Not just 'Something'!

Minor

OK

14-19

Need java-doc.

Cosmetic

OK

36-43

Shouldn't this function be replaced by the same kind of 'getData()' function as in GetAllChecksumMessage or GetAllFilenamesMessage?

Cosmetic

OK

54-62

Doesn't really make sense on impl.

Cosmetic

OK

Comments on file 'trunk/scripts/sql/createBitpreservationDB.sql', revision 941

Lines

Description

Classification

Status

Comments on file 'trunk/src/dk/netarkivet/common/distribute/arcrepository/Replica.java', revision 1006

Lines

Description

Classification

Status

247

This should not be part of the 'return' statement in the java-doc.

Cosmetic

OK

249

Rename of function. It should be called getIdentificationChannel. The current name could easily be understood as the id of the channel, e.g. the channel-name.

Cosmetic

OK

Comments on file 'trunk/src/dk/netarkivet/archive/arcrepository/ArcRepository.java', revision 1006

Lines

Description

Classification

Status

General

There is a lot of references to BitArchiveStoreState. This class should be renamed to ReplicaStoreState, since it also deals with ChecksumArchives.

Cosmetic

OK

74

Describe that batch jobs are only executed on bitarchive replicas.

Cosmetic

OK

169-173

Again a strong indication that we really need a datastructure here.

Minor

Postpone

182-184

Make check for checksum replica, and if that is not the case, then throw an exception. Actually, use a switch on the enum, so we get compile time checking of all cases handled.

Minor

Postpone

222, 225-227, 229-245

These methods seem unnecessary, since they only have one statement, that has a rather obvious effect

Cosmetic

OK

248-253, 255-257

Make new javadoc

Minor

OK

260

throws also ArgumentNotValid

Cosmetic

OK

265

This should not be a warning! Change back to 'log.info'.

Cosmetic

OK

346-379

REMODEL!

Minor

Postpone

362, 368, 371

Use enum switch

Minor

Postpone

436

NOT BITARCHIVES!!! It is called replicas when it also handle the checksum replicas.

Cosmetic

OK

465, 470, 472

This handles both kinds of replica, and should therefore not only refer to Bitarchives.

Cosmetic

OK

482

Stranger things happen at sea :) I vote we just remove the TODO, returning true is really the only sensible thing we can do

Cosmetic

OK

492, 497, 499

Why does this only refer to BitArchive replicas, when the function also handles Checksum replica?

Cosmetic

OK

519, 522

REPLICA CLIENT!!! it handles both kinds of replicas now!

Cosmetic

OK

534-536

Why doesn't Replica.getReplicaFromId throw UnknownID rather than return null? Then we could kill this method.

Cosmetic

Reject

542-559

Perhaps this method should be moved to Channels? I don't really like that we parse the channel names.

Cosmetic

Postpone

583-586

Change to HTML-tags. The current numeration looks bad in java-doc.

Cosmetic

OK

602

Describe why this function is called.

Cosmetic

OK

719

Consider: Is there any way we can avoid talking about messages in the ArcRepository class, and move that abstraction to the ArcRepositoryServer class? In that case, we could probably handle responses from batch jobs and checksum replies in the same message, rather than individually for checksum- and bitreplicas

Cosmetic

Postpone

728-730

Incorrect indentation.

Cosmetic

OK

842

Where does this end up? Will it be a reply with store failed? It certainly should be!

Minor

Postpone

896-897

The String is not called replicaId, but replicaChannelName!

Cosmetic

OK

982, 987

It is not the replica id but the identification channel of the replica.

Minor

OK

1073

The method in the message should be changed to getReplicaId().

Cosmetic

OK

Comments on file 'trunk/webpages/BitPreservation/Bitpreservation-filestatus-checksum.jsp', revision 975

Lines

Description

Classification

Status

IssuesFromNs88 (last edited 2010-08-16 10:24:42 by localhost)