Tuesday, 18 October 2016

Library data part two: what do we know about the stock?

In principle stock data is much the least problematic data set held by libraries when it comes to trying to map it and potentially share it across local authority boundaries or make the data openly-available. There are good reasons for this:
  • Every English public library service has a catalogue of resources
  • There has been decades' worth of data-sharing for the purposes of interlibrary loans including, but not limited to, the UnityUK database
  • There are long-established standards for title-level bibliographic data
  • The outsourcing of most bibliographic metadata, limits the number of original sources of data and so imposes some consistency
Added to this can be the data mapping work involved in setting up an interface with the evidence-based stock management system CollectionHQ and the increased use of library management systems in consortium settings. Both of these get library systems people thinking about the way their data maps against external frameworks,

Technically, data about virtual stock holdings can be treated the same way as physical stock holdings. Culturally, there is some variation in approach between library services.

For the purposes of this post we'll assume that all stock has been catalogued and the records held in the library management system. In reality this will be true of most, if not all, lending library stock and a high proportion of whatever reference library stock there is these days. Many local studies collections and special collections are still playing catch-up

Title-level bibliographic data

All the bibliographic records come from the same place so this is standard data and would be easy to share and compare, right? Well… up to a point, Lord Copper.
  • Not all library authorities are buying in MARC records.
  • Of those that do, not all of them are retrospectively updating their old records so they'll have a mix of bought-in MARC records and locally-sourced records which may or may not be good MARC records in the first place and which certainly have variations in the mapping details.
  • Those that did do a retrospective update may have hit a few glitches. Like the library authority that had an LMS that had ISBN as a required field and so had to put dummy data in this field which turned out to be the valid ISBNs of extremely different titles to the ones they actually had. (This wasn't Rochdale, though it did cause us some collateral damage.)
  • There may be local additions to commercial MARC records, for instance local context-specific subject headings and notes.
  • Commercial MARC records may not be available for some very local or special collection materials so these will need to be locally-sourced.
Taking these into factors into consideration this would be much the most the most reliably uniform component of a national core data set for libraries if any such were ever developed. The data available would be either:
  • A full MARC record + the unique identifier for this bib record in this LMS (this is required to act as a link between the title-level data and the item-level data); or
  • A non-MARC record including:
    • Title
    • Author
    • Publisher
    • Publication date
    • ISBN/ISSN or other appropriate control number, if available
    • Class number
    • Unique identifier for this bib record
    (I think there's a limit to the amount of non-MARC data that should be admissible.)
For the purposes of this game RDA-compliant records can be assumed to be ordinary MARC21 records (there's a heap of potential MARC mapping issues involved in any national sharing exercise which we won't go into here). I can see the need for the use of FRBR by public libraries but I don't see it happening any time soon so it's not considered here.

Item-level holdings data

The library catalogue includes holdings data as well as bibliographic data so that, too, could be part of a national data set. The detail and format of this data can vary between LMSs and from one library authority to another:
  • Some, but not all, item records may have at least some of their data held in MARC 876 — 878 tag format
  • The traditional concept of a "collection" may be described in different fields according to the LMS or the local policy. Usually it would be labelled as one or other of item type, item category or collection.
Which data to include? Or rather, which would be most likely to be consistently-recorded? My guess:
  • Unique identifier (usually a barcode)
  • Location
  • Key linking to the appropriate bibliographic record
  • Item type/item category/collection label best approximating to the traditional concept of "collection"
  • Cost/value
  • Use, which would generally mean the number of issues
  • Current status of the item
After that the variations start to kick in big time.

There are a few devils in the detail, for instance:
  • There is no standard set of "collections," though there is a de facto standard set of higher-level item types:
    • Adult Fiction
    • Adult Non-Fiction
    • Children's Fiction
    • Children's Non-Fiction
    • Reference
    • Audiovisual
    • Everything else
    The item type/item category/ collection for each library authority would need to be mapped against a standard schedule of “Item types.” For instance, when I used to pull out stock data for CIPFA returns I didn't have the appropriate categories available in fields in the item records; so in Dynix I had a dictionary item set up to do the necessary in Recall and with Spydus I set up a formula field in a Crystal Report, in both cases it involved a formula including sixty-odd "If… Then… Else…" statements.
    • Are those already used for CIPFA adequate or would a new suite need to be developed and agreed?
    • Would this translation be done at the library output stage or the data aggregation stage?
      For CIPFA our translation was done at output, for CollectionHQ it was done at data aggregation stage according to previously-defined mapping.
  • Cost could be the actual acquired cost including discount; the supplier's list price at time of purchase, without discount; or the default replacement cost for that type of item applied by the LMS.
  • Use count data may be tricky:
    • It could be for the lifetime of the item or just from the time that data was added to this particular LMS if the legacy data was lost during the migration from one system to another. 
    • Some LMSs record both "current use" (e.g. reset at the beginning of the financial year) and total use. You need to be able to identify one from the other.
    • The use of loanable e-books/e-audiobooks may not be available as this depends on the integration of the LMS with the supplier’s management system.
    • Curated web pages would be treated as reference stock and not have a use count.
    • Some LMSs allow the recording of reference use as in-house use.
  • Item status is always interesting:
    • Does this status mean the item is actually in stock?
    • Is the item available?
    • Has the item gone walkies/been withdrawn?
    • Again, this would have to be a mapping exercise, similar to the one we did for CollectionHQ

So what have we got?

Overall, then, we could say that every public library could put their hand to a fair bit of title-level data that's reasonably consistent in both structure and content; and some item-level data that wouldn't be difficult to be structurally consistent but would need a bit of work to map the content to a consistent level.

No comments:

Post a Comment