Task list and timetable for iteration 43

Status

OK/Not Ok

1. Highlights approved

OK

2. Assignment of tasks

OK

3. Task list and time table approved

OK

4. Implementation phase started

OK

5. Release test phase started

6. Assignment phase for next iteration started

7. Iteration 43 completed

Highlights for Iteration

Development procedure

Table of tasks

Tasks for iteration 43. Updated 12. May 2010

Estimate md

Main responsible

Reviewer

Remaining md at 12. May 2010

Comments

Status

Implementation phase (task x-n)

Open Source release + bugs and feature request

Total 3

-

-

Total 3

-

Support of Open Source Release

1. [http://kb-prod-udv-001.kb.dk/twiki/bin/view/Netarkiv/SupportNetarchiveSuite Support] of released NetarchiveSuite

2

All (Google calender)

2

Ongoing

2. Implement translateprocess. Adjustment to Open Source partners.

1

CSR

SVC

-

3. Maintain French Translation files.

1

Nicolas/Sara

SVC

See also Task 22

-

4. Maintain Italian and german Translation files.

1

Andreas/Eleonora

SVC

See also Task 22

-

Bugs and Feature requests

Prioritized bugs according to [https://gforge.statsbiblioteket.dk/tracker/index.php?group_id=7&atid=105 list] of priority 4 and priority 3 tasks.

Total 5

-

-

SubTotal 0

..

-

Priority 5 bug

5 Module harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1856 Bug 1856] Schedule problem after first start on NAS 3.10.0. No schedule started

1

CSR

SVC

Bug seemingly fixed with patch release 3.10.2

Fixed

6 Module monitor?: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1895 Bug 1895] Running checksum gives Garbage Collector OutOfMemoryError and schedule stops

1

SVC

CSR

Max heap for Bitarchive monitors raised to 1936MB in prod. Awaiting upgrade of java in production environment

Priority 4 bugs

7..

.

Priority 3 bugs

8. Module harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=688 Feature request 688] hosts-report should be IDNA decoded when writing harvestInfo to the DB

-

9. Module Access: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=823 Bug 823] No index = Internal server error

10. Module Monitor: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1756 Bug 1756] JMX status page does not update when a new application is started on previously used JMX port

11. Module Archive: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1782 Bug 1782] Same datetime repeated many times, while logging batch checksum of files

12. Module Documentation: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1779 Bug 1779] Improve documentation of the additional tools

13. Module Archive: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1708 Bug 1708] bitpreservation logic offers "add to archive" for file that is not in either location

14. Module Documentation: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1732 Bug 1732] LocalArcRepositoryClient not documented

Fixed in System Design 3.12 manual

Fixed

15. Module Archive: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1260 Bug 1260] Too much and wrong feedback information on "Missing pages"

16. Module Monitor: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1205 Bug 1205] Security policy for unit tests contains hardcoded path to development environment

17. Module Archive: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1193 Bug 1193] Exceptions from FileBatchJob stop batch job processing

..

Prioritized Feature Requests according to [:TaskTableFromMay2009Workshop:list] of priority 4 and priority 3 tasks

Total 21

-

-

SubTotal 21

-

Priority 4 Feature request

18. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1696 Feature request 1696] Ingest domain seed URLs

?

Nicolas

SVC

19. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1688 Feature request 1688] Monitoring broad crawls.

?

Nicolas/Sara

SVC

http://kb-prod-udv-001.kb.dk:8060/cru/NS-152.

Review and followup done

20. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1134 Feature request 1134] Filter job lists by category

?

Nicolas/Sara

CSR

http://kb-prod-udv-001.kb.dk:8060/cru/NS-151

Review and followup done

21. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1668 Feature request 1668] Paginate and make sortable and searchable the list of jobs

?

Nicolas/Sara

CSR

http://kb-prod-udv-001.kb.dk:8060/cru/NS-151

Review and followup done

21a. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1924 Feature request 1924] Allow to search a domain in active jobs (in case of webmaster complain)

?

Nicolas/Sara

http://kb-prod-udv-001.kb.dk:8060/cru/NS-151||<bgcolor="#cccccc" style="text-align: center;">Review and followup done

21b. Common: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1925 Feature request 1925] PostgreSQL connectivity (using the PostgreSQL driver version 8.4 - JDBC 4)

?

Nicolas

http://kb-prod-udv-001.kb.dk:8060/cru/NS-151

Review and followup done

21c. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1926 Feature request 1926] Ability to disable the inactivity check

?

Nicolas

http://kb-prod-udv-001.kb.dk:8060/cru/NS-152

Waiting for review

21d. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1927 Feature request 1927] Delay job end to allow Heritrix report generation

?

Nicolas/Sara

http://kb-prod-udv-001.kb.dk:8060/cru/NS-152

Review and followup done

21e. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1928 Feature request 1928] Ability to easily resubmit a selection of failed jobs

?

Nicolas/Sara

?

?

21f. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1929 Feature request 1929] 15 second level TLD related to the .fr and .re domains

?

Nicolas/Sara

Fixed

21g. Module Harvester: [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1930 Feature request 1930] Ability to implement a different crawl control loop via HeritrixLauncher / new Heritrix JMX controller

?

Nicolas

http://kb-prod-udv-001.kb.dk:8060/cru/NS-152

Review and followup done

22. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1813 Feature request 1813] An extra resubmit button to make it visible which jobs have already been handled

?

SVC

CSR

.

23. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=929 Feature request 929] Documentation needed for how we split jobs (incl. maybe additional splitting modularity)

?

SVC

CSR

24. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1774 Feature request 1774] Stop using the JMS queues for queuing snapshot harvests

?

SVC

CSR

25. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1896 Feature request 1896] Crawl of password protected FTP-sites

2

SVC

CSR

High Priority

-

25a. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1951 Feature request 1951] Upgrade to Heritrix 1.14.4

2

SVC

CSR

High Priority

-

Priority 3 Feature request

26.Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1774 Feature request 1774] Stop using the JMS queues for queuing snapshot harvests

-

27. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1681 Feature request 1681] Add seed to DB via webservice (via Browser Extension/Rich Client)

Andreas

-

28. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1682 Feature request 1682] Statistics (DB access, scripts, batch jobs ....)

Andreas

-

29. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1683 Feature request 1683] Util for regenerate admin.data file

Andreas

-

30. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1684 Feature request 1684] Activity when domain is to be crawled. One table for seed

Andreas

-

31. Module Archive:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1743 Feature request 1743] When accessing Bitpreservation this takes really long time

Andreas

-

32. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1120 Feature request 1120] Crawlertrap info should be shareable between institutions

Andreas

SVC will add comments to this FR. Might be an easy solution to share Crawlertraps by emailing files with crawler trap informations.

Redundant (Copy of ??)

33. Module Harvester:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1066 Feature request 1066] Show whether seed URL existed

Andreas

-

34. Module Archive:[https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1809 Feature request 1809] Write assignment for improving batchjob interface

JOLF

Hign priority

In progress

Roadmap tasks

Total 52?

-

-

Total 8,5

-

Tasks from ...

35. QA: Assignment for enhanced QA tools

2

SVC

CSR

High priority

In progress

36. WARC: Finalize [:AssignmentHarvester2:Assigment] for Harvester for support of WARC format

?

37. Archive: Implement [:AssignmentGroupB2:Assignment B.2.3] - Use segments in bitarchives

6

38. Archive: Implement [:AssignmentGroupB2:Assignment B.2.4] - Write BitPreservation scheduler

5

39. Archive: Implement [:AssignmentGroupB2:Assignment B.2.5] - Write BitPreservation webinterface

6

-

..

..

[http://netarkivet.dk/netarkivet/index.php?title=Kendte_problemer Crawl-problems] (Netarchive.dk) .

Total x

-

-

Total x

-

Focus on following crawl-problems

40. [http://netarkivet.dk/netarkivet/index.php?title=Dinby.dk dinby.dk] 2009-02-17

1

CSR

SVC

1

High priority

..

41. [http://netarkivet.dk/netarkivet/index.php?title=Kino.dk Kino.dk] 2009-03-25

1

CSR

SVC

1

High priority

Awaiting review

42. [http://netarkivet.dk/netarkivet/index.php?title=Webmuseum.re-cph.com Webmuseum.re-cph.com] 2009-08-04

1

CSR

SVC

1

High priority

Awaiting review

43. [http://netarkivet.dk/netarkivet/index.php?title=Epn.dk Epn.dk] 2009-08-30

1

CSR

SVC

1

High priority

..

44. [http://netarkivet.dk/netarkivet/index.php?title=statstidende.dk Statstidende.dk]

0.5

CSR

SVC

High priority

Awaiting review

45. [http://netarkivet.dk/netarkivet/index.php?title=seoghoer.dk seoghoer.dk]

0.5

CSR

SVC

High priority

Awaiting review

45.b http://netarkivet.dk/netarkivet/index.php?title=Berlingske.dk

2

CSR

High priority

Awaiting review

46. [https://gforge.statsbiblioteket.dk/tracker/index.php?func=detail&aid=1881 Feature Request 1881] Quality assurance through a batchjob interface

0.5

JOLF

CSR

High priority

In progress

Wayback/Nutchwax tasks independent of NetarchiveSuite code-freeze.

Total x

-

-

Total x

-

Tasks from ...

47. Wayback: Implement Indexer

11

CSR

SVC

High Priotrity

Awaiting Review

48. Wayback: Implement Index Aggregator

4

MKS

JOLF

High Priority

58.Wayback: Documentation of Indexer/Aggregator

2

CSR

JOLF

1

High priority

-

59. Wayback deploy: If not possible with Jetty then test of Wayback deploy using Tomcat

2

JOLF

CSR

2

High priority

-

..

Converting old Web collections to Netarchive.dk. See [http://udvikling.kb.dk/cvsshadow/digiliv/ProjektDokumenter/omkostninger%20ved%20indsamling%20af%20gammelt%20materiale-3.doc proposal]. These task will be independent of NetarchiveSuite code-freeze.

Total x

-

-

Total x

-

Tasks from ...

60. Old Web collection: Old KB Webarchive

SVC

JOLF

High priority

In progress

61. Old Web collection: Old Webarchive from Niels Brugger collected by HTTrack

HBK

SVC

High priority

In progress

62. Old Web collection: Prepare ingest of extracted data from Internet Archive into Netarkivet.dk

SVC

HBK

Wait for IA correction

63. Old Web collection: Ingest received data from Internet Archive into Netarkivet.dk

CLO

SVC

-.

Common tasks calculated as implementation tasks

Total x

-

-

Total x

-

Others

Total x

-

-

SubTotal 2

-

64. Upgrade: New KB-PROD-UDV

5

SVC

TLR

-

65. Batch: Create/execute a batch test script specified by 1 or 2 researches

2

JOLF

TLR

-

..

..

-

..

..

Prepare release test

Total x

-

-

SubTotal 12

-

66. Prepare [http://netarchive.dk/suite/Iteration43Releasetest release test]

6

6

OK

Available man-days for implementation phase

Total x

-

-

Total x

-

Release test phase (task ...)

Release test

Total x

-

-

Total 12

-

67. Execute [http://netarchive.dk/suite/Iteration43Releasetest release test].

12

TLR

All

12

Started

'

..

Release notes

Total x

-

-

Total 0,5

-

68. Write release note

0,5

SVC

Awaiting end of code freeze

Available man-days for release test phase

Total x

-

-

Total 10

-

Assignment phase for next iteration (task ...)

69. Component bug/feature fix/management

QA

..

70. Define goals for [http://netarchive.dk/suite/Iteration44TaskList Iteration 44 task list]

CHH

..

71. Presentation of goals and tasks for Iteration 43. Achieve a common understanding of the purpose of the iteration and each task on status meeting

SVC

..

72. Assignment of tasks, bugs and feature request

QA

..

73. Update release test procedure

TLR

..

Available man-days for assigment phase

Total x

-

-

Total 22

-

Timetable

Timetable iteration 43. Updated 4. May 2010

Start time

End time

Responsible

Baseline 4. May 2010. Start time

Baseline 4. May 2010'. End time

1. Implementation of decided tasks

4. May 2010

31. May 2010

4. May 2010

31. May 2010

2. Code freeze. Create the build for release test and notify when build is ready

1. June 2010

SVC

1. June 2010

3. Release test

1. June 2010

3. June 2010

TLR

1. June 2010

3. June 2010

4. Code unfreeze

4. June 2010

SVC

4. June 2010

5. Assignments, bug components and bug fixes

2. June 2010

3. June 2010

2. June 2010

3. June 2010