Friday, 18 December 2009

The answer's always in the question

I'm probably the only person in the known universe who's going to be doing this but it's worth my noting it down as a reminder of the general principle.

We've bought into smartsm, which is an evidence-based stock management tool. We're working on Dynix, which provides a ton and a half of really useful data but which provides far too much detail for everyday working. The idea is that we can use smartsm to pull together the data into reports which can be run by front-line staff who can then take whatever action is appropriate. (The idea's a bit more and a bit bigger than that, but that'll do for the purposes of this narrative.)

Dynix is an old library management system and interoperability isn't it's strong suit, by a very long chalk. The good news, though, is that it's very easy to do snapshots of any combination of data and then export that out as text. Being an old Pick system, the data's not held in tables like you'd see in an SQL database; the best way I can describe it is that for all practical purposes the data's held suspended in mid air connected together by bits of string. These bits of string ("dicts") are one of the most powerful data manipulation tools I've had the pleasure of playing with and I'll miss them greatly when we eventually move onto a new LMS. They do three jobs:

  • They define which piece of data you're looking at;
  • They define how you're going to see the data; and
  • They can link the data in one file to that in another file.

So, for instance, the dict called BARCODES in the bibliographic record file called BIB reports the data found in the first line of a BIB record in columnar format 16 characters wide. The dict called L-COLLECTION reads the data in the first line of a BIB record, uses those barcodes to look for the appropriate records in the HOLDINGS file, reads the code in the fifteenth line of each record, translates the code(s) into the appropriate collection labels and reports these in Title Format in a column 40 characters wide. And once a dict is set up and shown to work OK you can forget about all the intermediate steps and just get the data.

So what's this got to do with smartsm?

We need to do a monthly extract of our catalogue data providing the bibliographical data for each item plus its location, collection, current status and its use. And smartsm need this as comma-delimited data in a set format so that they can map it against their reporting processes. Most of which is easy enough to do: if you tell me that you need the title of each book up to a maximum of 100 characters it's literally less than a minute's work to do. You want it comma-delimited, it'll take a minute or two to remember how to impose a constant comma in front of the reporting data. And for the most part it really has been that easy. In some cases I've wanted to aggregate the data into something more useful, for instance we have umpteen item statuses providing different reasons why these items are not available for loan (being repaired; being boxed up for transfer to another library; audio items bought then held back under the terms of the performance licence, etc.), which are usually useful when you're looking for a particular item but drag in a bit too much confusing detail for reporting purposes, so I set up a dict that just reports these as "not available." A few bits were more complicated but I got there in the end.

And then there was the date format.

Date formats are a pain in the arse, no two ways about it. The data extract had to provide the dates in a particular format, which isn't any of the date formats available on Dynix. I spent six weeks trying, and failing, to write a dict that jiggled the components around a bit to give the right format. I was beside myself with frustration.

On the bus home one night I realised I'd been a prat.

We need comma-delimited data, right? If I downloaded the entire catalogue database and stuck a comma at the beginning and a comma at the end this would be treated as one piece of data, one column wide and one row deep. I didn't need a dict that juggled the day, month and year data. I needed a dict for day, a dict for year and a dict for month. With a comma before the day data. Which was five minutes' work the next day.

Note to self: in future, pay attention to all the question. The answers come easier that way.

No comments:

Post a Comment