Friday, 15 December 2017

Laptops for data collection

Over the years, a number of people have asked me about what I would suggest in the way of a computer for fieldwork, or research work in dusty libraries without internet or convenient power sockets.

Fieldwork computers tend to have a hard life, carried about repeatedly, bounced about in trucks, and always at risk of the wet, either as rain or spillages, or from dust and dirt.

My advice has always been to aim for the longest battery life for the lowest cost to keep the replacement cost down. Also these devices don’t need to do a lot - run a spreadsheet to record data, some sort of note management program and a text editor.

I’ve tried the cheap android tablet and keyboard combo. and that’s pretty good for straight note taking or even creating structured text (eg markdown) but tends not to shine for creating tabular data. Which is a pity as they are cheap enough to be treated as a consumable.

So recently I’ve swung back to the refurbished netbook or laptop with linux, and a combination of basic tools. The software base of linux is so large that you can find just about anything, but I tend to favour CherryTree for notes management, Gnumeric for recording tabular data, gedit or kate for basic text, and perhaps something more specialist such as ReText for structured text, although kate’s syntax checker is pretty good.

If you want something for writing up draft reports, Focuswriter is fast and lightweight.

The downside is that battery life is poor. Two hours, three hours at most. Not enough for a decent session.

However, there are a number of these cheap eMMc memory based  windows laptops available. Mostly I’ve avoided these as the amount of storage, typically 32Gb, is too small, given that Windows will take around 20Gb, depending exactly how it’s configured.

Add a few extra programs and a bit of data, and there’s not a lot of headroom there. However devices with 64Gb storage are beginning to appear at a price that’s reasonable, for example the Lenovo Yoga 310-6K can be picked up from the usual suspects at around $400 - 450 from the usual suspects, which is about the midway price for a refurbished laptop.

But there’s two downsides to the refurbished laptop route - firstly if you want to keep windows, you’ll probably end up having to pay for a Windows 10 upgrade, and secondly battery life won’t be great. And if you go for an older or cheaper machine it’ll probably have a 5400 rpm SATA drive, so you won’t be getting lightening disk performance anyway.

These cheaper eMMc laptops come with Windows 10. Versions of CherryTree, Gnumeric, and Focuswriter are available for windows. There’s always notepad or windows Codewriter as an editor, and if you need something a little more flashy for structured text there’s Typora, or Texts.io which will cost you around US$15 for a licence key.

What of course you’re getting is the longer battery life. You also get the bonus of being able to use the device in tablet mode, which makes showing people images - be it of plants, finds, sites, or handwritten text - much easier than on a laptop. The other bonus is OneNote, Microsoft’s note management tool.

I didn’t use to like OneNote - it seemed clumsy and slow compared to Evernote, but since working on the Dow’s Pharmacy project I’ve warmed to it.

Evernote remains the best ragbag management tool ever for categorising snippets garnered from everywhere. OneNote really isn’t good at imposing structure on chaos. What it is good for is building up a collection or collections of related notes - a subtle difference but an important one.

And of course you can have the best of both worlds and have both Evernote and OneNote on your machine.

So, what would I choose?

A few months ago I would have gone down the refurbished laptop with linux route, and if we’re talking about clever stuff like using R or iPython notebooks for on site data management and analysis I still would. For pure data collection, I’m not so sure. The increased storage and longer battery life certainly makes these eMMc based devices an interesting option ...

Friday, 1 December 2017

More on spreadsheet preservation and normalisation


Yesterday, inspired on a post about preserving Google sheets I blogged about spreadsheet preservation in general.

As  is the way of these things the question has been rumbling round my brain ever since.

A long time ago, the National Archive of Australia released Xena, a normalisation tool that converts files into open xml based formats - essentially the open office formats used by Libre Office and others, on the basis that the xml produced is both documented  and readily parsable and that it would be possible to recover the data and the calculations from any preservation file.

And in fact when we built the original ANU data archive, we silently implemented this normalisation process as part of the workflow. We didn't use Xena, but after using Pronom to work out if we could recognise the file type, and if we had a normalisation engine for it - essentially an xml export tool, we would use that to produce a long term preservation copy which we would store, along with the original, in a bagit archive.

The idea of storing both, of course, is that as we didn't test the normalisation processes, and tended to trust the tools, it is just possible we could have produced garbage as part of the normalisation process.

In fact we deliberately ignored the year 1900 problem, as we reckoned that only a small number of spreadsheets would be affected.

So what does this mean for Google sheets?

Exporting to an xml format such as ods would seem to be the way to go, but given that it's not possible to preserve the original document, the sensible thing would be to download the spreadsheet in two formats, both ods and xlsx, given that both are in xml and that parsers exist for both formats.

The reconstituted spreadsheets should of course give identical results imported into the appropriate utilities.

Exporting a single sheet spreadsheet as as csv, or whatever, is only appropriate where there are no calculations involved, an example being where the spreadsheet was used to record species abundances in a number of quadrats.

The decision about whether to use an ascii format such as csv is best left to the researcher, they know their data, and whether it's appropriate.

The standard procedure should be to use a richer xml based format, and preferably two of them.

Ideally there should be some sanity checking before ingest ...

Thursday, 30 November 2017

Preserving spreadsheets

Spreadsheets are used in lots of ways in research, and that means that we need to think about their preservation as part of the long term preservation of data.


And this is actually more complicated than it sounds - as demonstrated by a recent post on preserving Google Spreadsheets.


The best preservation practice really comes down to how the spreadsheet was used.


If we are using it passively, ie as a way of recording data in the way that I’m doing so on the Dow’s pharmacy project, export as comma separated, tab separated etc, is the way to go, and also circumvents the Year 1900 problem in excel. Basically you just get the characters and that’s all you want.


And this is great for survey data, botanical field data, archaeological data and the rest - a true lowest common denominator format.


And that’s a very good thing as if you have any pre-1900 dates in your spreadsheet exporting from Excel to Libre Office calc on the basis that calc’s .ods format is open, and non proprietary can cause problems.


And that’s the problem with spreadsheets, if there’s any calculation you need to ensure that the exported version correctly reproduces both the calculations and the results, which is a complicated problem.


It would probably be simple to start with a product that uses an open format - such as Gnumeric or Libre Office calc and then export the document to Google Drive, Dropbox or OneDrive for sharing rather than start with an online spreadsheet - and if you need to start with an online spreadsheet, Microsoft’s online version of Excel might be a better departure point due to it’s compatibility with the stand alone version of Excel giving a better chance of conversion to an archival format ...

Wednesday, 29 November 2017

Gnumeric ...

At various times I've said in my hand waving way that you could as easily use Gnumeric as a spreadsheet for recording data as use excel or a more heavy weight alternative  product such as Libre Office Calc.

However I've got to confess I've never actually used it for real work.

That may be about to change as my corporate supplied copy of excel has started wingeing about product activation failures. I'm sure it's just an expired licence key, and not being on the corporate network it can't see the licence server to update.

The only problem is that as a volunteer archivist I can't contact corporate IT support directly, my boss has to log the job, and just by chance she's overseas at the moment.

It's not a drama - for the moment everything seems to work, I can still create and save data, but just in case there's a grace period that's about to expire on me I installed the latest commonly available windows version of Gnumeric this morning so I can keep working if I get locked out.

If I end up using it in anger I'll post on my experiences ...

Fun with a legacy wireless bridge ...

A long time ago, more or less ten years ago, I bought myself a Linksys WET54G wireless bridge which let me connect an old mac (running linux) and a couple of home made linux servers cobbled out of scrap machines that I had in the garage to our home network.

The setup was fairly simple, linux boxes, a $20 white box unmanaged switch and the wreless bridge. Performance was fairly impressive given that the garage was built of corrugated iron and did a nice imitation of a faraday cage. Putting the bridge next the sole window gave me a reasonable signal.

Fast forward to 2017.

I no longer have any homemade servers - they died horribly in a flood, and I now live in a wooden house nicely lined with metallized sarking - hello Faraday age.

We also have a studio, which is a separate building, and is in fact a converted garage, and is lined with guess what ? metallized sarking.

The net result is that getting a network signal in the studio is a big ask. I bought one of these no name $15 repeaters, which managed to get a decent signal onto the back deck and a weak but stable one into the studio.

Machines are usable with the current signal  but I wanted to move my old imac into the studio and set up a second desk in there for a book scanning project I have in mind which would involve shoving some large files about.

Now the linksys is quite good with weak signals so I thought I could use it to get a better signal and then use an old wireless router to drive a local network, or indeed a local wired network.

I still had the bridge, but of course no configuration manual, but about twenty minutes with google told me all I needed to know. A little bit of network jiggery pokery and I could both see the home network and the wireless repeater and get a better signal than by relying on my old imac’s hardware alone.

I could connect, but not really. The linksys doesn’t support WPA2 even though you can run WPA with AES encryption, which mean that to authenticate I’d have to lower security on my home network. The linksys lets you apparently reauthenticate but actually fails silently. I had the same problem with my old Asus Internet Radio, which is why it’s now plugged into the wired network at home.

Wireless bridges of course need a wireless connection.

During testing I even managed to fool myself into thinking that I’d got it to work - I hadn’t, after changing the encryption from TKIP to AES I’d forgotten to turn off wireless networking on my laptop after rebooting it for testing, but that wasted an hour while I worked out I’d been an idiot, rather than having broken something.

So, basically the Linksys is useless, or more or less useless. A hunt for firmware updates that support wpa2 drew a blank. Still I had fun playing with hardware for the first time in years, so the time wasn't wasted, even if I did spend almost a day playing with it.

I’ve now admitted defeat and ordered myself a second no name whitebox wifi repeater. The studio has a decently large glass door and the home repeater for the back deck is next the door so hopefully I can daisy chain the two ...

Monday, 20 November 2017

In praise of Linux (again)

A few days ago there was an article in the Irish Times praising linux on the desktop for its utility and ability to extend the life of old and otherwise perfectly usable hardware.

I am in fact writing this on my five nearly six year old Linux netbook.

Why?

Windows updates. Ever since I had the Windows 10 creators update installed I've had a storm of minor fixes and updates, all off which seem to leave my machine in an odd state requiring not only a reboot but a fifteen minute session of placatory messages while Windows plays with itself.

That said I actually quite like Windows 10 as an environment and am quite happy with the fact that when I eventually replace my elderly Dell Inspiron it'll be with a Windows machine.

However, I can't help but contrast the paind I'm going through with Windows at the moment with the ease at which I ran my latest set of Linux of updates it was a fairly painless exercise.

What's more I even installed a suite of optical character recognition software. Think about it - running OCR software on a six year old Intel Atom powered machine.

That said my first attempt, with OCRfeeder, which I'd successfully used with Debian to OCR a collection Vietnam war era newspaper cuttings from North Vietnam didn't quite work - basically OCRfeeder and Xfce seem to have an incompatibility. Changing to Yagf which uses the same underlying recognition engines, tesseract and cuneiform - seemed to work.

Preliminary, and fairly basic tests, seem to show that it works, if a little slowly, but good enough for some of J's family history stuff where we have some good jpegs of documents.

And that of course is the other great virtue of Linux - there's always more than one way of solving a problem or carrying out a particular task.

Now I'm not going to tell you that Linux is a panacea. It's not. Sometimes it's flexibility is a curse more than a blessing - for example I have never ever been able to get bluetooth to work with Xfce despite having it work successfully with other Linux front ends.

I am not going to tell you to throw out your Macs and your windows machines. My MacBook Air for example remains one of the best machines I have ever owned for travelling and note taking in the field - the only machine that ever came close was the Linux EeePc 701SD. But what I will say that if you need a low cost and effective solution try Linux.

Monday, 13 November 2017

Zpad six and a bit years on

Six and a bit years ago I bought myself a zPad, a no name Chinese android 2.2 tablet skinned to look like an iPad.

It was bought as an experiment at a time when iPads seemed to be taking over the world to see if cheap whitebox Android devices could mount a challenge, and provide an alternative tablet based solution.

Ipads are of course still dominant but Samsung, Lenovo and the others have turned Android into a viable alternative platform for tablet computing. What hasn't happened is that cheap whitebox devices have taken over the world - most Android tablet sales are for brand name devices, most of which are both cheap and offer reasonable performance.

Enough history - back to the zPad.

Amazingly I'm still using it (occasionally) six and a bit years on.

The operating system is hopelessly out of date, upgrades just don't happen anymore but gMail and twitter still work, as does a weather app, and for that reason it continues to live on a shelf in my shed so that I can check the weather and my email with I'm covered in dirt after a serious gardening session.

Surprised (a) that I still use it, and that (b) it's still proving useful ...