Saturday 27 March 2010

Archival filesystems

Over the past few years of this blog one of the threads has been that of filestore design, in particular student filestores, which have the annoying attributes of being unstructured, chatty and having lots of small files.

In contrast filestores designed for archival purposes are

  • not subject to rapid change
  • objects are only deleted according to business rules
  • ~ 90% of the content is accessed only once

However they are unstructured and do consist of lots of small files, be they pdf’s of research papers, invoices, recordings of Malian ritual chants, or whatever.

As these files need to be kept for a long time, and we’ll say that a long time is more than one technology refresh we need to guard against corruption, and also have an easy means of migrating content faithfully from one storage platform to another. Equally as most of the content is unchanging backing up doesn’t really work, especially as one never knows if a version of the files is corrupt or not, especially as the files are rarely if ever accessed.

So how to do it – it’s really quite simple:

Most archival file systems sit under a document or content management system of some sort, which implies a workflow and ingest process. Most of these document or content management systems will have a database that holds information – metadata – about objects in the filestore.

  • Modify the workflow so that you compute the md5 checksum on ingest, and record that in the metadata held about the object.
  • Write the object to the filestore.
  • Copy the object to a second filestore, preferably in a different city, but if not the other side of the city
  • Copy the database transaction to a copy of the database in this other location. Remember that without a copy of the database you can’t tell what’s in your filestore if you lose your primary copy.
  • Create a cron job that periodically checks the md5 checksum of the objects on disk against the value stored in the database
  • If an object fails the checksum check run the same query against the remote copy. If it passes, copy the good copy back
  • Run database consistency checks.

Now this is very simple. No fancy clustered fault tolerant filestores. But it ticks the boxes. It’s essentially an updated version of the consistency checking found in SAMFS/QFS for long term storage of data to tape to guard against bitrot

At the UK mirror service as implemented at the end of the nineties the system worked essentially like that, two geographically separate nodes with a consistency check and replication.

And the joy of it it could potentially be built with cheap off the shelf technology, say two separate Apple Xsan implementations. It’s also extensible, as nothing in the design precludes adding a third node for increased resilience, or indeed for migrating content.

Equally abstracting the automated data curation from the property of the filesystem allows both easy migration to new filesystems but also avoids expensive vendor lockin, two things that are important in any longterms curation centric design.

No comments: