Tuesday 18 October 2016

Library data part two: what do we know about the stock?

In principle stock data is much the least problematic data set held by libraries when it comes to trying to map it and potentially share it across local authority boundaries or make the data openly-available. There are good reasons for this:
  • Every English public library service has a catalogue of resources
  • There has been decades' worth of data-sharing for the purposes of interlibrary loans including, but not limited to, the UnityUK database
  • There are long-established standards for title-level bibliographic data
  • The outsourcing of most bibliographic metadata, limits the number of original sources of data and so imposes some consistency
Added to this can be the data mapping work involved in setting up an interface with the evidence-based stock management system CollectionHQ and the increased use of library management systems in consortium settings. Both of these get library systems people thinking about the way their data maps against external frameworks,

Technically, data about virtual stock holdings can be treated the same way as physical stock holdings. Culturally, there is some variation in approach between library services.

For the purposes of this post we'll assume that all stock has been catalogued and the records held in the library management system. In reality this will be true of most, if not all, lending library stock and a high proportion of whatever reference library stock there is these days. Many local studies collections and special collections are still playing catch-up

Title-level bibliographic data

All the bibliographic records come from the same place so this is standard data and would be easy to share and compare, right? Well… up to a point, Lord Copper.
  • Not all library authorities are buying in MARC records.
  • Of those that do, not all of them are retrospectively updating their old records so they'll have a mix of bought-in MARC records and locally-sourced records which may or may not be good MARC records in the first place and which certainly have variations in the mapping details.
  • Those that did do a retrospective update may have hit a few glitches. Like the library authority that had an LMS that had ISBN as a required field and so had to put dummy data in this field which turned out to be the valid ISBNs of extremely different titles to the ones they actually had. (This wasn't Rochdale, though it did cause us some collateral damage.)
  • There may be local additions to commercial MARC records, for instance local context-specific subject headings and notes.
  • Commercial MARC records may not be available for some very local or special collection materials so these will need to be locally-sourced.
Taking these into factors into consideration this would be much the most the most reliably uniform component of a national core data set for libraries if any such were ever developed. The data available would be either:
  • A full MARC record + the unique identifier for this bib record in this LMS (this is required to act as a link between the title-level data and the item-level data); or
  • A non-MARC record including:
    • Title
    • Author
    • Publisher
    • Publication date
    • ISBN/ISSN or other appropriate control number, if available
    • Class number
    • Unique identifier for this bib record
    (I think there's a limit to the amount of non-MARC data that should be admissible.)
For the purposes of this game RDA-compliant records can be assumed to be ordinary MARC21 records (there's a heap of potential MARC mapping issues involved in any national sharing exercise which we won't go into here). I can see the need for the use of FRBR by public libraries but I don't see it happening any time soon so it's not considered here.

Item-level holdings data

The library catalogue includes holdings data as well as bibliographic data so that, too, could be part of a national data set. The detail and format of this data can vary between LMSs and from one library authority to another:
  • Some, but not all, item records may have at least some of their data held in MARC 876 — 878 tag format
  • The traditional concept of a "collection" may be described in different fields according to the LMS or the local policy. Usually it would be labelled as one or other of item type, item category or collection.
Which data to include? Or rather, which would be most likely to be consistently-recorded? My guess:
  • Unique identifier (usually a barcode)
  • Location
  • Key linking to the appropriate bibliographic record
  • Item type/item category/collection label best approximating to the traditional concept of "collection"
  • Cost/value
  • Use, which would generally mean the number of issues
  • Current status of the item
After that the variations start to kick in big time.

There are a few devils in the detail, for instance:
  • There is no standard set of "collections," though there is a de facto standard set of higher-level item types:
    • Adult Fiction
    • Adult Non-Fiction
    • Children's Fiction
    • Children's Non-Fiction
    • Reference
    • Audiovisual
    • Everything else
    The item type/item category/ collection for each library authority would need to be mapped against a standard schedule of “Item types.” For instance, when I used to pull out stock data for CIPFA returns I didn't have the appropriate categories available in fields in the item records; so in Dynix I had a dictionary item set up to do the necessary in Recall and with Spydus I set up a formula field in a Crystal Report, in both cases it involved a formula including sixty-odd "If… Then… Else…" statements.
    • Are those already used for CIPFA adequate or would a new suite need to be developed and agreed?
    • Would this translation be done at the library output stage or the data aggregation stage?
      For CIPFA our translation was done at output, for CollectionHQ it was done at data aggregation stage according to previously-defined mapping.
  • Cost could be the actual acquired cost including discount; the supplier's list price at time of purchase, without discount; or the default replacement cost for that type of item applied by the LMS.
  • Use count data may be tricky:
    • It could be for the lifetime of the item or just from the time that data was added to this particular LMS if the legacy data was lost during the migration from one system to another. 
    • Some LMSs record both "current use" (e.g. reset at the beginning of the financial year) and total use. You need to be able to identify one from the other.
    • The use of loanable e-books/e-audiobooks may not be available as this depends on the integration of the LMS with the supplier’s management system.
    • Curated web pages would be treated as reference stock and not have a use count.
    • Some LMSs allow the recording of reference use as in-house use.
  • Item status is always interesting:
    • Does this status mean the item is actually in stock?
    • Is the item available?
    • Has the item gone walkies/been withdrawn?
    • Again, this would have to be a mapping exercise, similar to the one we did for CollectionHQ

So what have we got?

Overall, then, we could say that every public library could put their hand to a fair bit of title-level data that's reasonably consistent in both structure and content; and some item-level data that wouldn't be difficult to be structurally consistent but would need a bit of work to map the content to a consistent level.



Monday 17 October 2016

Library data part one: variations on a theme

Over the Summer I've been doing a bit of work for the Public Libraries Taskforce and that set me thinking about the data that public library services hold. Each one holds a shedload of data about its resources, its customers and its performance, but each one holds a slightly different shedload to its neighbours. Why would that be?

Technical reasons

  • There are surprisingly few standard data structures in play in public libraries
  • Different management systems hold data in different ways
  • Even if the data has the same structure a different suite of descriptive labels may be in use

Human reasons

  • An organisation might not feel the need to record the data at all
  • The quality — or not — of the data may not be a priority so elements may be missing
  • Naming conventions, etc. may change over time without retroactive conversion, leading to internal inconsistency
  • The data may still be on bits of paper
Having said that there are some key data that are generally common to all, though variable in detail. I'll have a look at those over the next few posts.

Disheartening the visitor

For a long time — nearly twenty years — I had a very clear candidate for Worst Entrance To A Public Library Ever, though thankfully that particular entrance barely survived the millennium. I've now found one that's worse. No names, no pack drill, it wouldn't be fair to the staff who I know are trying their best in very trying circumstances.

The other day I popped into this library. I've been meaning to go and have a nosy for a while. Up to a few years ago this town had a reasonably busy little library, nothing special, in a simple brick two-story box of a building. The shopping area of the town got redeveloped quite extensively, one of the casualties being the old library. It was the replacement I'd been meaning to visit.

The good news is that the building's well-signed in the shopping area and made easy to find because there's a lot of colourful and useful library posters in the window. The first bit of bad news is that it's on the first floor above a supermarket so you can't idly walk past, see the library in use and be tempted in. But the posters and notices in the window try to draw you in.

library lobby with escalatorSadly, once you are drawn in you're in a small lobby with just enough room to wheel a buggy round to a lift or else take the escalator directly in front of you. Everything is grey: pale grey walls, mid grey ceiling, dark grey carpet, steel grey lift doors and escalator. It's all a bit soulless. Nothing much invites you to go up the escalator: it rises up into a dark grey shadow with no knowing that anything's up there, least of all a library. All in all pretty nasty.

Once you get upstairs it's slightly better, though that's despite the design of the library not because of it. The colour scheme is followed again, relentlessly, with grey metal shelving and an extensive network of exposed pipes in the ceiling space also painted mid-grey, the building designer obviously being a big fan of warehouse shopping chic. Or else one of the Borg. The overall effect was softened as far as possible by posters and displays but there wasn't physically a lot of scope for making it a much more human environment. Which was a shame as there were good things going on in there including a very enthusiastic rhythm and rhyme session going on in the enclosure that was the children's library. Lots of colourful books on the shelves may have helped a bit but this is a local authority that was closing libraries and cutting book funds back when the rest of us were refurbishing and replenishing so the staff didn't have many resources to play with there.

Generally speaking this is just the worst of a trend I've seen over the past few years, new library builds by architects and designers who see the library space as being like an office or else just a room with a few shelves of books in it. For all the consultations that go on it's evident that the designers haven't made any effort to understand how the business of the library is run:
  • The need to invite the visitor in, ease them back out again and leave them wanting to come back soon; 
  • The different lines of flow for different kinds of use and different kinds of customer;
  • The ease of navigation so that somebody standing in the entrance knows immediately where they need to go;
  • The essential requirement of lines of sight for staff so that they can provide unobtrusive supervision and support;
  • The capability for change in response to early experience of use (like some landscape designers who only hard pave paths after a few months so that paths follow the "cow lines" established by the people using the space) and to allow for development of delivery of the services on offer;
  • Most of all, the acknowledgement that the library space is a human space so people have to feel comfortable in it.
None of this costs anything except a bit of effort and a willingness to understand the desired outcomes that are being designed for, Sadly…