⇤ ← Revision 1 as of 2008-03-03 10:35:13
7615
Comment:
|
7668
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
* ''Collecting and preserving the World Wide Web: A feasibility study undertaken for the JISC and Wellcome Trust'', Michael Day, UKOLN, University of Bath, Version 1.0, 25 February 2003 | * [http://library.wellcome.ac.uk/assets/WTL039229.pdf ''Collecting and preserving the World Wide Web: A feasibility study undertaken for the JISC and Wellcome Trust''], Michael Day, UKOLN, University of Bath, Version 1.0, 25 February 2003 |
Articles about web archiving in general
[http://library.wellcome.ac.uk/assets/WTL039229.pdf Collecting and preserving the World Wide Web: A feasibility study undertaken for the JISC and Wellcome Trust], Michael Day, UKOLN, University of Bath, Version 1.0, 25 February 2003
Articles about NetarchiveSuite
[http://netarchive.dk/publikationer/nhc-kb-dk-msst2006.pdf A formal analysis of recovery in a preservational data grid], Niels H. Christensen, Royal Library of Denmark, Dept. of Digital Preservation & Netarkivet.dk, presented at MSST2006, the 14th NASA Goddard - 23rd IEEE Conference on Mass Storage Systems and Technologies, May 15-18, 2006, College Park, Maryland, USA
Articles about harvesting
[http://www.dia.uniroma3.it/~vldbproc/017_129.pdf Crawling the Hidden Web], Sriram Raghavan and Hector Garcia-Molina, Computer Science Department, Stanford University, Stanford, CA 94305, USA, Proceedings of the 27th VLDB Conference, Roma, Italy, 2001
[http://nicomedia.math.upatras.gr/courses/mnets/mat/Lawrence&Giles_AccessibilityOfInformationOnTheWeb.pdf Accessibility of information on the web], Steve Lawrence and C. Lee Giles, Nature, Vol 400, 8 July 1999
[http://www10.org/cdrom/papers/102/index.html Effective Web Data Extraction with Standard XML Technologies], Jussi Myllymaki, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA, WWW10, May 2-5, 2001, Hong Kong
[http://kushmerick.org/nick/research/download/kushmerick-odbase2003.pdf Learning to invoke Web forms], Nicholas Kushmerick, Computer Science Department, University College Dublin, Ireland, Proc. Int. Conf. Ontologies, Databases & Applications of Semantics (Catania, 2003).
[http://dx.doi.org/10.1016/j.ijar.2003.07.002 Information retrieval in the Web: beyond current search engines], Ricardo Baeza-Yates, Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada, 2120, Santiago, Chile, International Journal of Approximate Reasoning, Volume 34, Issues 2-3, November 2003, Pages 97-104
[http://dx.doi.org/10.1016/S1389-1286(02)00213-X Search engines and Web dynamics], Knut Magne Risvik, Rolf Michelsen, Fast Search and Transfer ASA, Stoperigata 2, P.O. Box 1677, Vika NO-0120 Oslo, Norway, Computer Networks, Volume 39, Issue 3, 21 June 2002, Pages 289-302
[http://dx.doi.org/10.1016/S0098-7913(02)00184-3 Invisible Web: Uncovering Information Sources Search Engines Can't See], Chris Sherman and Gary Price, NJ: Information Today, Inc., 2001. 439 p. $29.95. ISBN 0-910965-51-X
[http://portal.acm.org/citation.cfm?id=779235 High-performance web crawling], Marc Najork, Compaq Computer Corporation Systems Research Center, Palo Alto, CA and Allan Heydon, Model N, Inc., South San Francisco, CA, Handbook of massive data sets, pages 25 - 45, 2002, 1-4020-0489-3
Alias detection
[http://lis.sagepub.com/cgi/reprint/31/1/21 Classifying Web sites and Web pages: the use of metrics and URL characteristics as markers], Wallace C. Koehler, Jr., University of Oklahoma, Journal of Librarianship and Information Science, Vol. 31, No. 1, 21-31 (1999)
[http://portal.acm.org/ft_gateway.cfm?id=511484&type=pdf&coll=GUIDE&dl=GUIDE&CFID=57533127&CFTOKEN=37490285 Aliasing on the world wide web: prevalence and performance implications], Terence Kelly, University of Michigan, Ann Arbor, MI and Jeffrey Mogul, Compaq Western Research Lab, Palo Alto, CA, Proceedings of the 11th international conference on World Wide Web, Honolulu, Hawaii, USA, pages 281 - 292, 2002, 1-58113-449-5
[http://dbpubs.stanford.edu/pub/showDoc.Fulltext?lang=en&doc=1999-39&format=pdf&compression=&name=1999-39.pdf Finding replicated web collections.], J. Cho,N. Shivakumar,H. Garcia-Molina, Proceedings of 2000 ACM International Conference on Management of Data (SIGMOD) Conference, May 2000
[http://portal.acm.org/ft_gateway.cfm?id=1083676&type=pdf&coll=GUIDE&dl=GUIDE&CFID=18586673&CFTOKEN=47728477 Discovering large dense subgraphs in massive graphs], David Gibson, Ravi Kumar, and Andrew Tomkins, IBM Almaden Research Center, San Jose, CA, Proceedings of the 31st international conference on Very large data bases, Trondheim, Norway, pages 721 - 732, 1-59593-154-6
[http://dx.doi.org/10.1016/S1389-1286(99)00021-3 Mirror, mirror on the Web: a study of host pairs with replicated content], Krishna Bharat, and Andrei Broder, Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301, USA, Computer Networks, Volume 31, Issues 11-16, 17 May 1999, Pages 1579-1590
[http://dx.doi.org/10.1002/1097-4571(2000)9999:9999<::AID-ASI1025>3.0.CO;2-0 A comparison of techniques to find mirrored hosts on the WWW], Krishna Bharat, Andrei Broder, and Jeffrey Dean, Google Inc., 2400 Bayshore Ave., Mountain View, CA 94043, and Monika R. Henzinger, AltaVista Company, 1825 S. Grant St., San Mateo, CA 94402, Journal of the American Society for Information Science, Volume 51, Issue 12, Pages 1114 - 1122
[http://doi.acm.org/10.1145/336296.336345 Defining Logical Domains in a Web Site], Wen-Syan Li, Okan Kolak, Quoc Vu, and Hajime Takano, Proceedings of the eleventh ACM Conference on Hypertext and hypermedia, San Antonio, Texas, United States, pages: 123 - 132, 1-58113-227-1
Etags, datestamps, and adaptive revisiting
[http://dx.doi.org/10.1016/S1389-1286(99)00037-7 Towards a better understanding of Web resources and server responses for improved caching], Craig E. Wills1 and Mikhail Mikhailov, Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Computer Networks, Volume 31, Issues 11-16, 17 May 1999, Pages 1231-1243
[http://workshop99.ircache.net/Papers/wills-final.ps.gz Examining the cacheability of user-requested Web resources], C. E. Wills and M. Mikhailov, Proc. of the Fourth Int. Workshop on Web Content Caching and Distribution, Apr. 1999.
[http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-99-3.pdf Errors in timestamp-based HTTP header values], Mogul, J., Tech. rep. 99/3, (Dec.) Compaq Computer Corporation, Western Research Laboratory.
[http://dx.doi.org/10.1016/S0169-7552(98)00108-1 Efficient crawling through URL ordering], Junghoo Cho, Hector Garcia-Molina, and Lawrence Page, Proceedings of the Seventh International Conference on World Wide Web 7, Brisbane, Australia, pages 161 - 172, ISSN:0169-7552
[http://sites.computer.org/debull/A02mar/short-a.ps Computing web page importance without storing the graph of the web (extended abstract)], S. Abiteboul, M. Preda, and G. Cobena, IEEE-CS Data Engineering Bulletin, Volume 25, 2002.
[http://www.springerlink.com/content/jdyw0e7kqlf6u4aw/fulltext.pdf Issues in Monitoring Web Data], Serge Abiteboul, INRIA and Xyleme, France, R. Cicchetti et al. (Eds.): DEXA 2002, LNCS 2453, pp. 1â8, 2002.
[http://dx.doi.org/10.1045/december2002-masanes Towards Continuous Web Archiving: First Results and an Agenda for the Future], Julien Masanès, Bibliothèque Nationale de France, D-Lib Magazine, December 2002, Volume 8 Number 12, ISSN 1082-9873
Articles about archiving and preservation
An Architectural Framework for the IIPC Toolset, Svein Arne Solbakk and Sverre Bang, National Library of Norway, IIPC working draft, October 2005
[http://www.niso.org/international/SC4/N595.pdf Information and documentation - The WARC File Format], ISO TC 46/SC 4 N 595, Working Draft.