Saturday, 17 March 2018

And now, an old Thinkpad

My old Dell Inspiron that I've used since 2010 is finally reaching the stage where it's running out of puff, not disastrously, but getting to the stage where one would start to think about replacing it.

Of course I could just put up with it's huffing and puffing and use my MacBook Air as a day to day machine, but I'd reached the stage where a new Windows machine seemed like a necessity.

The only problem was that we'd just paid out for a trip to Borneo, and I really couldn't justify the extra money right now.

Well, I'd always half planned to buy myself an ex-lease Thinkpad for Linux work, so after a little bit of agonising I bought myself a X230 with Windows 7 professional license, reasoning that I could use it as a windows machine, and perhaps even convert it to dual boot - Windows 7 and Linux, for the simple reason that the documentation project I've volunteered for is built around windows, meaning I need One Note, and that most of the rest of my personal notes and documentation is in Evernote, which again is not available for Linux.

So I paid my two hundred and thirty bucks to one of these companies that refurbish ex-lease machines and a few days later it arrived, beautifully packed in shock absorbing packaging and with a refurbisher's test report, and nicely imaged with a clean copy of Windows 7 with an install of Open Office 3 thrown in.

Out of the box, battery life was better than my Air, which realistically manages about two hours work these days between charges. The Thinkpad claimed a realistic four hours thirty out of the box and the what's more the battery is easy to replace down the track if needs be.

So, installing things.

First off were the standard utilities that I use:

  • Focuswriter - for distraction free writing
  • Kate - for when only a text editor would do
  • Open live writer - an open source clone of windows live writer for bloggin
  • Tweeten - a desktop twitter client
  • Gnumeric - spreadsheet for data manipulation
  • Libre Office - when I need to write something and format it nicely
  • Thunderbird - for email and calendaring
  • Texts - for wysiwyg markdown editing

And then it was the data intensive things

  • Dropbox - for data sharing
  • One Drive - cloudy filestore
  • One Note - Microsoft's note management tool which has all my project documentation
  • Evernote - which basically contains my entire life, invoices, bills, research notes and so on

All in all, close to 40GB of data to download, which took around a day with a few timeouts when we wanted to watch the morning news on iView, or actually use the internet. Given that our internet and phone plan has a stupidly large cap ( a terabyte of data per 28 days - effectively it's unlimited, we usually only use around 10% of it) I wan't worried by the download size..

Also given my general interest in note taking applications I installed Standard Notes for fun, and I should probably also install the windows version of CherryTree given that I waxed lyrical about it a few months ago.

It might have been quicker, but windows also wanted to download a zillion patches (actually a little over 200) and apply them, all of which took time out of the process.

At the end of it I've a machine with reasonable battery life, a decent form factor for working on the train and these silly little tables.

I havn't installed virtual box yet, but I'm planning to do so to build a virtual machine to put together a prototype Omeka site to showcase the project so far.

Sooner of later I'll probably add a couple of extra applications - such as the Gramps family history tool.

The only Linux software I really need is tesseract and cuneiform for OCR work on pdfs from old printed documents and they'll run equally well in a Linux VM.

So, next steps.

Basically use it, and keep the old Inspiron for backing up data from documentation project.

I do face a decision down the track as to whether I keep the machine, or migrate it to linux as originally intended. If I keep the machine I probably need to think about an upgrade to Windows 10, but for the moment seven is good enough. After all it's the software base that's important, not the operating system...

[update 18/03/2018]

... and this morning I was looking up some references and discovered I'd totally forgotten to install my preferred refernce maneger - Zotero, doh!

Wednesday, 7 March 2018

Lecture recordings and intellectual property

There's a strike over pensions in the UK university system at the moment and it's brought to light an interesting little argument over intellectual property.

Obviously, if a lecturer is on strike, a scheduled lecture is not going to be delivered. Some universities have tried to force lecturers to deliver cancelled lectures once they return to work, or persuade non striking colleagues to deliver them with varied degrees of success.

But some have tried a different tack, giving students access to the lecture recording of the previous years lecture.

Lecture recordings vary. Some simply record voice or else voice and video somewhat in the style of nineteen seventies Open University recordings. Others record voice and the accompanying powerpoint slides.

Lecture recording systems are usually touted as a revision aid for students, or else as allowing students at multi site institutions access to material delivered at another location. Cynically, it allows students who discover Statistics 1B is timetabled for 0830 on a Monday an extra hour in bed.

Individual universities rules on intellectual property and lecture content all differ slightly.

In many cases they were drawn up  some years ago before lecture capture systems were in widespread use, and before the world went digital.

In some cases the university owns the teaching material, some cases the individual owns it, and in some cases the lecture is owned by the university, but handouts, including the powerpoint slides, are owned by the individual.

And some lecture recording products have terms of use that require consent by the lecturer before the material can be reused. And of course there's the case where a teaching assistant delivers a lecture using existing notes and material when the lecturer whose course it is is on sabbatical. We won't talk about MOOCs here, but that's another problem, especially if material from other lecture courses is reused.

Basically it's a very grey area. In fact once you start to poke into it it's a complete nightmare ...

Friday, 2 March 2018

When an isbn isn't really an isbn...

Now we all know that isbn's are persistent identifiers par excellence, but I recently came across a case where they weren't

I'd bought a version of Valentine Baker's Clouds in the East, the book he wrote while imprisoned for his assault on Miss Dickinson, as part of my reading about the Great Game.

I'd bought the reprint from one of these Indian print on demand companies that reprint out of print out of copyright nineteenth century books.

Unlike some of these reprints this one came nicely bound with a card, as opposed to paper, cover and had a barcode, an isbn, and a suggested price in both Indian Rupees and US dollars, in other words rather than print on demand it looked like one of batch produced for retail sale.

So I entered it into LibraryThing - no such ISBN. Now I know from past experience of having bought books from India that they usually in Amazon's database, so I was a little surprised.

I tried Amazon India directly - no such luck. Neither was it on or, so I'm guessing it's an invalid isbn generated by the publisher when packaging the book up to make it look like a 'proper' retail copy.

Strange, hadn't come across that before ...

Wednesday, 28 February 2018

Working in a really small public library ...

Today on the documentation project I was chased out for an Hour or so when we had a large tour group come through.

It was too early for a coffee, and I had some notes to write up, so what to do?

It wasn't worth going home, but then I had a brainwave. I googled local library, and discovered there was a local branch library in the town, and it was (a) open and (b) had wifi.

I've written before about working in larger public libraries, but this wasn't the case here - this library was basically a largeish room like a conference room in an old local government building and didn't really have much in the way of workspace provision - just a couple of comfy chairs and an old table with a couple of desktop computers.

But they did have wifi - which claimed to be 5G, and had been moved from some other library as it still had the old name as the SSID, but it worked, or at least it did once I asked the library staff to reset the router for me - apparently a known problem.

I don't actually know what they were using, from my view of it it was a standard looking wifi router -  how it was connected I don't know but I guess over whatever infrastructure the local library corporation provides - I'd guess ADSL.

As a working experience,  it was really good - quiet, I could get my work done, no background clamour of coffee making and enough space to sit comfortably if unergonomically with my laptop on my knees, and notebook on the chair adjoining.

So next time you need a place to sit and use wifi, think about (and support) your local public library!

Tuesday, 20 February 2018

Provenance - it's all about provenance

Six months ago, on a plane between Singapore and Melbourne, I watched a remarkable documentary about the attempt by the city of Detroit to sell off the contents of its art museum to defray the city's debts.

The scheme ultimately foundered - because of provenance.

One of the original founders of the museum apparently used to go on collecting tours of Europe, buying paintings from cash strapped aristocrats who had lost everything in the first world war - so you would think it would be easy to work out provenance.

But, no. The person in question was an art collector in his own right, and while he would sometimes use the museum's money to pay for items, sometimes he would use his own, and sometimes he would 'sell' a piece to the museum at below cost and claim it as a tax loss.

And the records were a complete mess.

It wasn't clear which had been bought on behalf of the museum, which were on loan, and which were donations - and of course the more saleable paintings' records were as confused as the less valuable.

In this case having an unclear provenance worked for the museum - they couldn't sell what wasn't theirs, and the didn't know what wasn't theirs.

And I suspect that this is the case with a lot of museums who developed their collections in the late nineteenth and early twentieth century - the documentation is quite unclear.

Not for everything of course, for example the Elgin Marbles have a clear provenance and the case really depends on the legality or otherwise of Elgin's actions and whether his firman from the Ottoman governor really gave him permission.

But then we have cases like the Nizam of Hyderabad's mummy, which I blogged about back in 2015, where provenance is unclear, we know he bought it, but not if it was illegally acquired. Likewise in Amelia Edwards' account of her trip up the Nile in the 1870's, she recounts the story of the tourists who bought a mummy at vast expense, and after a week or so found that they could not stand the sweet odour emanating from it, and (literally) jettisoned their losses by throwing it in the Nile.

And this all makes the problem of artefacts acquired during the period of European colonialism.

Were the items acquired legally, were they acquired under duress or what.

And of course rules change. Egypt, for example started to license archaeological digs quite early and had clear rules about both documentation and ownership - basically that the more significant items were automatically property of the Egyptian department of Antiquities, which is why so much of the material is in the Egyptian Museum in Cairo, but as we know, there are significant collections elsewhere, and as the Nizam of Hyderabad case shows us, the system was not perfect.

Other countries, especially those under colonial rule, were not so strict.

And for this reason, probably one thing that should be done is to digitise the museum records and correspondence, as well as that of individual archaeologists and collectors, to both settle the question of provenance, but also to provide an unrivalled insight into the history of archaeology, and it's relationship to the antiquities trade in the nineteenth century ...

Sunday, 21 January 2018

Lenovo Ideapad K1 six years on ...

Yesterday was ferociously hot, so I did what I usually do when it’s too hot for gardening, and played with some old hardware, this time J’s old Lenovo IdeaPad K1, an android tablet dating from late 2011.

In its day it was pretty slick, slicker than the zPad, and a pretty nice bit of kit with an excellent screen - being an artist J spends a lot of time looking at pictures and illustrations - but it was a bit heavy to hold, and even though we'd invested in stand cum charging station for it, it could be a pain to use for extended periods. Not only that, it would occasionally lose its network connection, or more accurately not recover gracefully when our router flipped from adsl to the backup 3G connection, so eventually it was replaced by a Samsung Galaxy.

By the time it was replaced, Lenovo had more or less abandoned the K1, but had unusually, provided an option to upgrade it to an unsupported version of Android 4 - the K1 having originally shipped with 3.2.

We never followed that up at the time, as the only thing I used it for was downloading podcasts, and gPodder was happy with things as they were.

In retrospect, this was probably not such a good idea, as the links to the generic version have now (understandably) disappeared off of Lenovo’s website.

So, what can you do with 3.2?

Well, no modern browser, but Opera mini installs and runs quite nicely.

The previously installed wikipedia, gmail and twitter apps still work as does inoreader - an rss feed reader. You can’t, of course install anything recent, which means no decent text editor or anything like that.

But, given that most of what I use my  current tablet for is wikipedia, email and twitter, plus a bit of rss feed reading it isn’t a disaster. Not having access to OneNote or Evernote is a bit of a pain, but were my existing tablet to unexpectedly come to a bad end it would be good enough for a stopgap, which isn’t too bad for a device over six years old running an old operating system ...

Friday, 5 January 2018

Transcribing a blot

One of the tasks in documenting artifacts as part of the project is transcribing labels on the bottles of materia medica in the pharmacy.

Mostly this is fairly straightforward - the labels are on the whole beautifully stencilled in india ink on good quality paper, and so while they may be a little yellowed they're perfectly legible. It's the early twentieth century ones that are more of a problem - cheaper paper and sloppilly writen in faded fountain pen ink.

To be sure they have their peculiarities - the extensive use of Æ  in nineteenth century pharmaceutical latin and outdated abbreviations like TṚ for tincture, but it's all fairly straightforward.

Until a couple of days ago, when I came across the following

where the label had been corrected at a later date - if you look carefully you can see what appears to be an extra L which has been blotted out in a different thinner ink. presumably at a later date.

This of course raises an number of questions about transcribing the label - should I transcribe the label as it was meant to be read, or include the blot, or transcribe it as the original text and note that the first L had been blotted out at (presumably) a later date.

I decided to go for the middle route and transcribe the label as you would read it today, blot and all.

While I knew about the Text Encoding Initiative and the Leiden Epigraphy conventions, which I'm using to indicate missing or illegible characters, I didn't know about blots.

My first thought was to simply insert a unicode blot symbol, except there isn't one - as a stopgap until I could spend more time with Google I decided to use the cyrillic Zhe (Ж) as

  • there was no cyrillic text involved in the pharmacy anywhere
  • it sort of looked like the H^HZ^HN sequence we used to use in Wordstar days to generate a cursor symbol on daisywheel printers when doing documentation
  • having learned to read and write Russian I could write it with a degree of fluidity
I guess I could have used the unicode block character ( █ ) but as I also keep a longhand paper workbook in parallel with the transcription spreadsheet Ж seemed a better choice.

I started off by searching for things like 'epigraphy blot' without much success - well I guess stone inscriptions don't have blots, although they do have erasures, so I don't think it was that silly a search. 

Changing the search terms to something like 'TEI transcription blot' was more useful and produced a lot of information on how to represent blots in XML as well as important questions such as whether it was a correction by the author or a correction at a later date and differentiating between the two, as well as what to do if you weren't sure.

The only problem was all this information was for creating XML markup, and I was transcribing the labels to an excel spreadsheet using unicode, and I needed a standard pre-XML way of doing this that was going to be intelligible to someone else.

In the end I found the answer in the epidoc documentation maintained by Under erased and lost  it not only documented the TEI XML but also referenced previous pre XML paper technology conventions, in this case [[[...]]], which was ideal.

This little journey has raised a whole lot of questions, including should we be using TEI XML encoding for the labels.

The short answer is probably not, unicode in excel plus some standard notation is more than adequate in 99.9% of cases, and the whole majestic edifice that is TEI seems like complete overkill, but certainly this little diversion shows the importance of discussing and agreeing on transcription standards before starting on something as seemingly straightforward as a sequence on nineteenth century materia medica labels ...