Tuesday, 29 June 2010

Archiving online citations: are we all Americans now?

As an occasional academic himself, the IPKat takes great interest in the issue raised in the following request for information from his friend Susan Hall (Cobbetts), who writes to him as follows:

"As citing online sources in academic papers becomes more acceptable, one issue which is becoming more important is the fragile nature of online materials and the obvious concern when reviewing or answering papers which cite online sources, that such sources remain accessible for verification and follow-up. A tool called WebCite, offered here, purports to offer an answer, allowing people who cite online sources to cache them on the WebCite servers, on the terms set out on the site in question.

However, this seems to produce a number of interesting IP implications on both sides of the Atlantic (incidentally, the examples of archiving of site uses are drawn from the Guardian).

The advertised service appears intended to protect people using online sources in academic research from dead links and future take-downs, but there seem difficulties fitting the WebCite model into a UK copyright framework. Further, although the WebCite FAQs put forward a justification based on fair use under US law, this is itself not devoid of problems:

"Caching and archiving webpages is widely done (e.g. by Google, Internet Archive etc.), and is not considered a copyright infringement, as long as the copyright owner has the ability to remove the archived material and to opt out. WebCite® honors robot exclusion standards, as well as no-cache and no-archive tags. Please contact us if you are the copyright owner of an archived webpage which you want to have removed".
A U.S. court has recently (Jan 19th, 2006) ruled that caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL, see also news article on Government Technology). Implied license refers to the industry standards mentioned above: If the copyright holder does not use any no-archive tags and robot exclusion standards to prevent caching, WebCite® can (as Google does) assume that a license to archive has been granted. Fair use is even more obvious in the case of WebCite® than for Google, as Google uses a “shotgun” approach, whereas WebCite® archives selectively only material that is relevant for scholarly work. Fair use is therefore justifiable based on the fair-use principles of purpose (caching constitutes transformative and socially valuable use for the purposes of archiving, in the case of WebCite® also specifically for academic research), the nature of the cached material (previously made available for free on the Internet, in the case of WebCite® also mainly scholarly material), amount and substantiality (in the case of WebCite® only cited webpages, rarely entire websites), and effect of the use on the potential market for or value of the copyrighted work (in the case of Google it was ruled that there is no economic effect, the same is true for WebCite®)." (FAQs)
Asks Susan, "Is anyone doing any work on archiving of online sources and the legal issues entailed?" If so, she -- and, out of sheer curiosity, the IPKat -- would love to know.


Anonymous said...

"Please contact us if you are the copyright owner of an archived webpage which you want to have removed."

This introduces the same lack of control that renders reliance on such as the Internet Archive legally questionable - material can disappear at the copyright owner's whim.

Hugo said...

Try these guys:

Francis Davey said...

An alternative would be to encourage academics from other fields to adopt the highly successful model of arXiv. Quality appears to be very high (peer-reviewed journals are not free from error) and the physics community in particular have been working this way for a long time.

Consensual sharing strikes me as being a more workable solution, given varying copyright laws around the world, than something like this. Of course that doesn't deal with older material, but that is a problem anyway.

I'd be interested to know what US lawyers think of the fair use argument.

Lakanal said...

A case can be made that the ambiguity of the US fair use doctrine allows entrepreneurs of all stripes to try out new business models for at least as long as it takes for a copyright owner to work out what the law is and bring an action to judgment. Even from a US perspective, WebCite's justification is magnificent in its preposterousness. No caching is taking place. No "transformative use" is taking place. And Berne's three step test? A joke to those who believe in a society based on law; meat and drink to Google and co.

Javi said...

there was a similar case in Barcelona regarding Google's cache system.

Charles Oppenheim said...

The only person, I think, who is researching these issues is Adrienne Muir from Loughborough University; she knows all there is to know about legal issues of archiving web pages. Incidentally, it may be acceptable under US copyright law, but it is definitely illegal under UK law, so the Webcite service is not recommended for UK folk!

Maximilian Schubert said...

Art 43b (1) of the Austrian "Medien Gesetz" explicitly allows the Austrian National Library to archive websites under Austrian top-level domains (.a) or websites which contain content that is related to Austria.
Web@rchive Austria
The data can be accessed only from inside the National Library trough a single terminal. You may look at the website, copying is not allowed, print-outs are possible.

Although the service is seriously limited (one user at a time) I have to say that the service works surprisingly well and I think its just a matter of time until this service is extended to more frequent "harvests" (only two so far) and more depth (only a few MB per site so far). The fact however that users can not copy but only "print" websites ... should serve as a reminder that Austrian copyright obviously & desperately needs an overhaul.

