Tuesday 6 April 2010

Digital preservation and the gfc

As content increasingly moves online we face a problem - what happens when the content owner ceases to exist (or ceases to maintain content)?

In the old days it was simple - content was mostly marks on paper collected together in things called journals and books and we stored the content on shelves in buildings called libraries, and they could stay there for a very long time without much maintenance at all.

Of course, content has now gone digital and the content resides on the servers of the content providers. And these servers need patching, upgrading, and the content needs to be periodially migrated to new hardware - all of which costs money, and unlike dead tree material with high production and distribution costs, instead has ongoing substantial service continuity costs.

Obviously there is a risk that if the content providers go out of business the content disappears as no one is going to maintain the servers and pay the power bill. Initiatives such as CLOCKSS provide an escrow service for the content - the assumption being that universities will pay to provide an escrow service.

Now they probably will, out of self interest if nothing else. But of course not all journals are covered, nor is all the scholarly output of a university stored on corporate servers - some of it is on flickr, youTube and the rest - creating preservation headaches, and some of it resides on random servers under people's desks.

So, let us assume for the moment that a university has managed to capture its scholarly output and has managed to put it all on a server or servers. It then has to pay to maintain these servers, and maintain the content.

And one thing that the global financial crisis has certainly demonstrated is that universities are not immune from outside funding pressures. So what happens in a few year's time when the content is no longer required for reporting on the success or otherwise of research funding, or when Dr X has left to go elsewhere, and Dr Y has retired, and even worse when a department has closed and the staff dispersed to the four winds? There will be a natural tendency to ask why should we keep all this old stuff spinning, and then start to cull material.

Some of it may be valuable, some of it may not be. However, no one can go through Dr Y's notes ten years on to check on a possible anomaly in his data once it's deleted, and it will be too late to start pointing fingers - once it's gone it's gone.

So, before we start saying we will start keeping content we need an explicit statement of what content will be kept and how long, and more importantly we also need to say what happens when data is expired. Do we delete it, do we burn it to dvd and post it to the owner, do we offer it to other institutions - after all one might imagine that if Dr Y worked on slavonic philology say, somewhere else that still maintained an interest in that field might want to host the body of his work.

It is, as they say a problem ...

1 comment:

Anonymous said...

It is a problem that really needed a solution some years ago when funding for the UK's Arts and Humanities Data Service was cut, thus completely undermining its purpose (which was to provide a permanent archive for the digital contents generated by Arts and Humanities Research Council projects). As far as I know, beyond contributing to the Internet Archive and hoping, we haven't got anywhere.

I also get the old-school version of this, as our department hosts several scholarly archives of erstwhile numismatists, which get used by approximately one person a decade. They don't take much conservation, but they take space, and one secretary has been working on indexing one set for as long as I've been employed here, so more than five years. It's exactly the same dilemma; there's only one copy and we don't know why we're keeping it and whether it's worthwhile. We only know that when it's gone, it's gone...