Showing posts with label data sharing. Show all posts
Showing posts with label data sharing. Show all posts

Wednesday, 2 November 2016

Library data part three-and-a-bit: sharing customer data

Having had a quick scamper through the worry list, what customer data could be shared openly?

Let's start with what can't be shared:
  • Name
  • Full address
  • Unique identifier for the data record
  • Nearly all combinations of data elements within the record
The first two are obvious Data Protection precautions; the last two are less obvious precautions for the same reason: they make it possible to identify the individual data subject.

Any data extraction for release as open data must specify the required data elements. Required fields need to be selected for extraction rather than having fields not required filtered out post-extraction. This prevents any accidents. Once data's openly out in the wild it's out in the wild.

"Registration location" and "Library/libraries used" (if available) are both safe in themselves as they aren't personal data and will have data sets broad enough not to be able to identify individual data subjects. They could be combined with each other and any one of the following:
  • Category (e.g. type of borrower)
  • Ethnicity
  • Disability
  • Gender
  • Year of birth/age in years (if only date of birth can be extracted then this data shouldn't be used)
The data extract could be:
       Bedlam Library     Child
       Bedlam Library     Child
       Bedlam Library     Adult
       Bedlam Library     Adult

But not:
       Bedlam Library     Child     Male
       Bedlam Library     Child     Female
       Bedlam Library     Adult     Female
       Bedlam Library     Adult     Male

Any two of these could be combined:
  • Category (e.g. type of borrower)
  • Ethnicity
  • Disability
  • Gender
  • Year of birth/age in years (if only date of birth can be extracted then this data shouldn't be used)
A postcode dump for the whole library authority could be made available but not combined with any other data because of its very specific nature for identification purposes.

I think that's pretty much it. And I'd still want to run it by an Information Governance expert before going ahead (and for them to check my Privacy Impact Assessment).

Tuesday, 1 November 2016

Library data part three: dangerous demographics

The data about the people registered with a library is at one and the same time the most potentially useful and the most potentially dangerous. So dangerous, in fact, that when it comes to making this data openly-available the default position must be: Don't. Do. It.

That position will be strongly challenged by many so I'll devote the rest of this post to explaining the dangers and the next one will have a look at the data that might be openly-shareable so long as all the necessary precautions are taken.

Demographic information is immensely useful to a library service. Operationally it's important to see that the service is meeting the needs of all its communities and not just providing a service "for people like us by people like us." It's important to be able to make sure that particular services are reaching their target audiences and that you're not doing anything to put sections of the community off using your services. And it's essential that you have this data for Equality Impact Assessment of policy decisions. So why would you not want to share the data to get a bigger picture?

Generally speaking there are three main concerns:
  • Privacy. The library is one of the few safe public places left for the individual. Removing the right to privacy is an information governance issue just as much as an ethical one and both need to be taken very seriously (both are generally given too much lip service and too little analysis and action).

    It also compromises the quality of the service being provided: if library customers know that how they as individuals use the library will be made public a good many of them will modify their behaviour and not use the library the way they want or need to. If they don't know this the library will have committed a significant breach of trust. 
  • Legality. Does the library have the legal right to share the data? If it is possible to identify individual data subjects then the answer is categorically: No, unless the data subject has explicitly said that their data may be shared.

    Anonymising the data so that it is no longer personal data is easier said than done. It isn't a matter of just removing all the names. We'll have a look at this later on.

    The agreement has to be an opt-in and the purpose of this data sharing has to be clearly stated. "We want to make your data open so that other, as yet unknown, people can manipulate it to get as yet unknown information and outcomes" would be an open invitation to the Information Commissioners' Office to come and investigate your organisation.
  • Safeguarding. This is the most problematic and under-appreciated concern. Anybody knowing whether or not a person even visits a library, let alone uses it, may put that person in actual physical danger. In some controlling relationships a partner may only be allowed out to go to the shops and heaven help them if they do anything else. They may be allowed to take a child to library activities such as story times but not for themselves. An abusive partner discovering that somebody was somewhere they shouldn't be — the wrong end of town or even the wrong town — could be a trigger for violence. The test here isn't: "What is reasonable?" because this isn't about safeguarding people against reasonable action. It's about safeguarding them from action that may be anything but reasonable.

    In my head I can hear somebody saying: "If they let us know that they're in an abusive relationship we could put a flag in their record to say their data's not to be shared."

    • This requires the data subject to actively opt out of data sharing.
    • Identifying yourself as a person in an abusive relationship is a brave thing to do and not something that should be required to be done at a public service point in a library.
    • The library suddenly becomes a less safe place.
    • Someone's got to remember to filter out the flagged records before sharing the data.

    I don't think any of that is acceptable. (And said so when it was said to me that time).
Anonymising the data requires more than stripping out all the names. The Information Commissioner's Office has a useful checklist (pdf).

In public libraries the combination of nearly any two data elements may be enough to make that data subject identifiable, or at least narrow the number of possibilities down enough to make it statistically probable they could be identified. The combination of "library where registered" and "library used" plus one other datum is usually OK but this needs to be tested with the particular data set, in case of nasty surprises. Other combinations very quickly narrow down to the individual.

A lot depends on the data itself: if the categories used are very general it might be safe to combine it, though it may be so general as to be pretty useless. I really did once work with a library service that thought it was OK to have two ethnic identifiers in the system: blank for "people like us" and "ethnic" for anyone who looked or sounded a bit foreign; I put a block on that the first chance I got; even so it wasn't until we got all the libraries onto the library management system that we finally got right of the last of the old Browne Issue tickets with a red E on them (disturbing symbologies like that make me wonder what librarians were thinking about in the eighties).
  • I'd imagine my local library authority will have thousands of white adult males in their database. How many — or few — teenage Bangladeshi females would there be?
  • Postcode data very quickly narrows down. There are perhaps ninety people in my postcode area. Twenty-odd adult white males. About four males in their fifties. One white male in his fifties.
  • Age data gets very specific very quickly. "Adult" and "Child" is pretty safe but as soon as you start refining that down it becomes problematic. Full date of birth is so specific it 's a red flag. 
So we would need to be very careful about what data — and what combination of data — is made available. In a library consortium setting this should be governed by formal data sharing protocols that had been passed by each authority's information governance experts and given the OK by whoever is responsible for the authority's information risk so all the data of all the people who have actively agreed to their data's being shared can be made available to the appropriate staff for the appropriate purpose within the consortium. That's a very specific remit for a very specific purpose for the use of a very specific group of people, with checks and balances and sanctions for abuse.

Which is exactly not the case with the open release of data, so different rules need to apply and need to be applied proactively (the genie doesn't go back into the bottle if you find you've made a mistake). Hence the greater need for precaution.

Wednesday, 17 August 2016

How many?

I've started doing some work with the Libraries Taskforce. I'd been to one of their workshops and it was pretty apparent that potentially there should be a lot of work needing doing by the less than a handful of people involved and it wasn't easy to see how they'd be able to do it on their own. I've got some time now that I've retired from Rochdale Council so I asked them if they needed a hand with anything and they said yes please. So I'm lending a hand with the work strand that's hoping to develop a core data set for English public libraries that can be openly-available for both public use and operational analysis. It's a voluntary effort on my part; it's something I'm interested in and have been impatient about and it's a piece of work that should have some very useful outcomes.

Whenever you start talking about English public libraries data the elephant in the room very quickly makes its presence known. Before we can talk credibly about anything very much there is one inescapable question desperately needing an answer:
Just how many English public libraries are there anyway?
There is no definitive answer. There is no definitive list. There are at least half a dozen well-founded, properly researched lists. They each give a different answer and when you start comparing them you find differences in the detail. There are perfectly valid reasons for this:
  • Each had been devised and researched for its own purposes without reference to what had gone before. Each started from scratch and each had a differently-patchy response from library authorities when questionnaires were posted.
  • This data's not easy to keep up to date at a national level — especially these days! So some libraries will have closed, a few will have opened, some will have moved and some will have been renamed. 
  • It wasn't always clear just how old the lists were. Some had been compiled as part of some wider project and there wouldn't necessarily have been the resource available to do any updating anyway.
So the decision was made to tackle this head on so that it could be settled once and for all so that the world could move on and they were a few weeks into this work when I signed on. Very broadly, here's the process:
  • Julia from the Taskforce, who has infinitely more patience than me, trawled every English local authority's web site for the details of their public libraries.
  • Between us we scoured the other lists and added any libraries we found in there that we couldn't find in Julia's list.
  • We then went through this amended list to see if we could identify any points of confusion, for instance where "Trumpton Central Library" has moved from one place to another or where "Greendale Library" has become "The Mrs Goggins Memorial Information and Learning Hub."
  • The Taskforce has sent each library authority a list of what we think are their libraries asking them to check to see whether or not these details are correct.
  • The results will be collated and the data published by the Taskforce.
Ten years ago this would have been pretty straightforward. These days the picture is complicated by the various forms of "community library" that have sprung up over the past few years. These run the gamut from "this library is part of the statutory provision though it is staffed by volunteers some of the time" all the way to "we wish them well on their venture but they're nothing to do with us." So where a public library has become a "community library" of one sort or another that needs ro be indicated in the data.

Will this list be 100% correct? Probably not at first, this is a human venture after all. But even if it's only 98% correct in the first instance it should be treated as the definite article. It will then need to be corrected and updated as a matter of course; if that's devolved to the individual library authorities the work becomes manageable and the data becomes authoritative.

Why should anyone bother?

What's in it for anyone to keep their bit of this list up to date and details correct? In my opinion:
  • It's basic information that should as a matter of principle be available to the public.
  • In the past year alone, this question has tied up time and effort that could have been more usefully-occupied. All those enquiries, and FoI requests, and debates about data that could just be openly-available and signposted whenever the question arose.
  • It is essential to the credibility of any English public library statistics. If the number of libraries is suspect then how trustworthy are any of the statistics being bandied around? If the simplest quantitative evidence — the number of libraries — is iffy then how much faith can be placed in quantitative or qualitative evidence that's more exacting to collect?

    For instance, counting the number of libraries within a local authority boundary if you're responsible for supporting or managing them is a piece of piss. Reliably counting the number of visitors to any one of those libraries most definitely isn't — I have 80% confidence in the numbers coming out of any automated system (not necessarily due to technical issues) and to my mind if you're relying on manual counts you may as well be burning chicken feathers. So when I hear that visits to English public libraries have dropped by a significant percentage over a given number of years I may be prepared to accept this in the light of a wider narrative, personal observation and anecdotal evidence but I have no empirical reason to know that this is the case. 
That's why.

Sunday, 2 August 2015

Figure skating

One of the things that has become horribly apparent over the past couple of years is the abject lack of any evidence-based government data that would lend themselves to a statistical analysis of the decline of the national public library service.

There are no official figures in the public domain for anything that's happening out there: not for visits, or use of libraries or even — God help us! — for the number of publicly-funded public libraries run by local authorities in this country.

This leads to nonsense like the recent claim that there's been an increase in the number of libraries despite all the cuts over the past few years. Anyone wanting to know the number of libraries is better off going to Ian Anstice's Public Libraries News blog than any official government site or press release. All kudos and good karma to Ian for doing the work but this isn't a good state of affairs for a democracy or open government.

One reason often cited for this lack is that the figures are available but only from CIPFA, which charges a hefty fee for their use. And that fee pays for just the figures for one library authority for that year's figures, so pulling together a national picture becomes an expensive business.

Which it would.

If that was the way you were doing it.

But it shouldn't be:

  • The presentation and analysis of those statistics are CIPFA's property to do as they will with. Which is fair enough as they've done that work.
  • The data that informs CIPFA's statistics are available within each and every library authority in the land and is collected each year — at no small expense to you the taxpayer — by local council staff then copied into a spreadsheet that's parcelled up and sent to CIPFA. 
  • There is absolutely no good reason why that data — not CIPFA's subsequent work with that data — can't be put into the public domain to be worked on by decision-makers, lobbyists, "Armchair Auditors" or just people who like playing with numbers. 
The easiest way to do this would be for each local authority to submit a copy of each year's data — perhaps as a CSV file — to a dataset in Data.Gov.uk or similar. This would then be in the public domain and available for proper analysis of services and trends. It wouldn't cost anything very much to actually do: the data's available, it just needs somewhere to go. And it would be a damned site cheaper than having each local authority have to go through the administrative processes required to deal with a Freedom of Information Request asking the same questions as those on the CIPFA spreadsheet. Or even multiple requests for that data. Once it's in the public domain FoI doesn't apply.

So it would be possible to have an official, verifiable benchmark figure for the number of public libraries in this country at the beginning of the financial year and the net loss/gain at the beginning of the following year.

Which could be why it isn't happening.

Thursday, 31 July 2014

Data sharing between libraries

We're at the stage in the evolution of the AGMA library consortium where we're starting to work through the practical — and legal — implications of shared services.

  • Sharing our catalogue data is relatively easy: the data standards are well-established and most the data itself is published in the public domain on library OPAC's, etc. Which doesn't mean that it was all plain sailing and we've not got some more work to do. 
  • Sharing borrower data is obviously fraught with all sorts of information governance and data protection issues on top of the problem that there isn't any data standard save that imposed by the structure of our shared LMS and the commonalities we've discussed and agreed on a case-by-case basis.
  • Virtually every circulation dataset is a back door into the borrower data.
I've been thinking through some of the questions we need to be asking ourselves on this journey. It's still early days so isn't exhaustive; at this stage I'm trying to work out what we need to worry about at a general level prior to starting work on a risk analysis.

Purpose Type of Information Recipients Data Controller Notes/queries
Membership information including contact details –voluntary service, customers will be asked if they want to opt in
Customer name, address and contact information, DOB.

Disability, ethnicity and other demographic details

Family relationship details

Lending history
Library staff (including all other authorised Spydus users) of approved Authorities within the scheme Local Authority
(Data Subject’s Local Authority will be the data controller)
Which data is to be shared? Is it all or nothing?

  • If partial, which parts and how managed?

Same question applies to who the data is being shared with

  • What would be the position of volunteer-managed community libraries?

How do we switch sharing on/off?

  • What happens if a customer changes their mind? How are they “quarantined?”

What happens to the data held in loans, charges and reservations?

What happens to any outstanding loans, fines and charges?

Who owns (and is responsible for) the data?
Loans information Details of the loan including borrower, item, location and status of loan.

Loans history
Library staff

Specific customers can see all details of their loan(s)

All customers can see some details of the loan(s)
Local Authority
(which?)
This is the crucial element to be managed:

  • It is the purpose of the data-sharing agreement
  • It is the bridging element between the personal customer data and nearly all the other data sets

There is a hierarchy of viewing permissions

If a customer has said “no” to data-sharing, how is the borrower data in the loan, charges and reservation records expressed?

  • If the customer changes their mind about sharing their data, is it automatically redacted from these records?

Who owns (and is responsible for) this data?

Whose loan policies?

  • Applied from the lending library?
  • Including fines and charges?
  • How do exceptions apply?
  • “Non-default” borrower types and collections
Overdue/pre-overdue notices Contact details including borrower name, address, telephone and email; loan due dates and items involved Library staff

Specific customer
Local Authority (which?) Derived from loans data and subject to same questions

It would make sense to aggregate these to improve efficiency and save costs (see notes on charges, etc.)
Reservations Contact details including borrower name, address, telephone and email and items requested Library staff

Specific customer
Local Authority
(which?)
All the questions for loans apply for reservations (which are effectively loans-in-waiting)

Whose charge régime applies?

Would the Data Controller be the “owner” of the customer record, the library that placed the reservation or the library it will be picked up from (if a different library authority)?
Requests Contact details including borrower name, address, telephone and email and items/articles requested Library staff

Specific customer

ILL system (bibliographic and/or article data only)
Local Authority
(which?)
In nearly all respects as reservations, just more complicated charges

[The operating procedures would probably need modifying in the light of the shared lending environment.]

This will need to be revised in the event of a fuller integration with UnityWeb or equivalent third-party systems
Notifications for any reserved items Contact details including borrower name, address, telephone and email and items requested Library staff

Specific customer
Local Authority
(which?)
Derived from reservations/requests data and subject to the same questions

It would make sense to aggregate these to improve efficiency and save costs (see notes on charges, etc.)
Charges/fines/fees Contact details including borrower name, address, telephone and email; details of the transaction that generated the charge Library staff

Specific customer
Local Authority
(which?)
Derived from loans and reservations/requests data and subject to the same questions

How will these be managed:

  • Payable only where incurred?
  • Payable globally?
  • Impact on traps/alerts (whose parameters apply?)

In the event of recovery, who legally owns the charge?

In the light of the above, what would be the effect (if any) of aggregated notices?
Catalogue/ discovery records — bibliographic data Title-level catalogue data Library staff

Library customers and general public
Local Authority
(which?)
Bibliographic data – already shared data

Don’t forget that there is a link to the borrower record from the review/rating in the bib data in Staff Enquiry

  • Potentially links to more than one Data Subject, so which would be the Data Controller for this catalogue data?
  • Shared responsibility? How?
  • Similar questions are required of other customer-created content such as tags (these are lost in the current versions of Spydus 9)

(Not all data are published for the public)
Catalogue/discovery records — holdings/item-level data Catalogue data, including electronic holdings Library staff

Library customers and general public
Local Authority
(Which?)
Holdings data

Links to personal data via loans/loan history and status/status history

  • Potentially these link to more than one Data Subject, so which would be this Data Controller for the catalogue data?
  • Logically should be the owner of the holding item

(Not all data are published for the public)
Management Information/ Business Intelligence Reports detailing usage of service, per location Library Managers Local Authority
(Data Subject’s Local Authority will be the data controller)
Essentially should be summary data, though we’d need to have safeguards against breaches caused by very small sample data

Proper safeguards and risk analyses are required before making this data available to third parties
Demographic breakdowns Library Managers

Designated authorised analysts
Local Authority
(Data Subject’s Local Authority will be the data controller)
Most would be summary data, though we’d need to have safeguards against breaches caused by very small sample data

Some data (e.g. lists of postcodes) are granular enough to easily identify Data Subjects so safeguards need to be in place on the use and presentation of this data are required before making this data available to third parties
Marketing databases Library Managers

Designated authorised marketing staff
Local Authority (Data Subject’s Local Authority will be the data controller) Is the “I agree to receive marketing” (or equivalent) field global or local?

The selection of data explicitly must be limited to those customers who have agreed to contact so as to comply with Privacy and Electronic Communications Regulations.

Proper safeguards and risk analyses are required before making this data available to third parties
Stock management data Library staff

Designated authorised third-party service providers
Local Authority (which?) Nothing pertaining to Data Subjects should be included in this data.

Stock ownership should be straightforward.

Stock usage more problematic:

  • Global usage figures recorded against bibliographic/holdings data?
  • Local usage only?
  • How would (if at all?) third-party stock analysis systems like CollectionHQ differentiate between local and extralimital use?

In the early days at least there will be pressure to be able to provide evidence that stock is being used “fairly” with local library customers having first dibs for local stock
Ad hoc data requests Library Managers

Designated authorised third parties
Local Authority (Data Subject’s Local Authority will be the data controller) Most would be summary data, though we’d need to have safeguards against breaches caused by very small sample data

Some data (e.g. lists of postcodes) are granular enough to easily identify Data Subjects so safeguards need to be in place on the use and presentation of this data

Proper safeguards and risk analyses are required before making this data available to third parties

FoI requests would be subject to the proper exclusions
SIP2 data Data used for interfacing between Spydus and third-party systems Library staff

Specific customer
Local Authority (Data Subject’s Local Authority will be the data controller) The particular case at the moment would be where data held in the customer record determines the access or not to third-party systems and services.


  • Would the data be determined globally or locally?
  • Standard use of data fields?
  • Standard coding sets?

I'd be interested to know if/how this analysis sits with the experience of established consortium libraries, especially if I've missed something that could cause us problems.