Steven Heywood's Blog o'Library Stuff: November 2016

Wednesday, 2 November 2016

Library data part three-and-a-bit: sharing customer data

Having had a quick scamper through the worry list, what customer data could be shared openly?

Let's start with what can't be shared:

Name
Full address
Unique identifier for the data record
Nearly all combinations of data elements within the record

The first two are obvious Data Protection precautions; the last two are less obvious precautions for the same reason: they make it possible to identify the individual data subject.

Any data extraction for release as open data must specify the required data elements. Required fields need to be selected for extraction rather than having fields not required filtered out post-extraction. This prevents any accidents. Once data's openly out in the wild it's out in the wild.

"Registration location" and "Library/libraries used" (if available) are both safe in themselves as they aren't personal data and will have data sets broad enough not to be able to identify individual data subjects. They could be combined with each other and any one of the following:

Category (e.g. type of borrower)
Ethnicity
Disability
Gender
Year of birth/age in years (if only date of birth can be extracted then this data shouldn't be used)

The data extract could be:

       Bedlam Library     Child
       Bedlam Library     Child
       Bedlam Library     Adult
       Bedlam Library     Adult

But not:

       Bedlam Library     Child     Male
       Bedlam Library     Child     Female
       Bedlam Library     Adult     Female
       Bedlam Library     Adult     Male

Any two of these could be combined:

Category (e.g. type of borrower)
Ethnicity
Disability
Gender
Year of birth/age in years (if only date of birth can be extracted then this data shouldn't be used)

A postcode dump for the whole library authority could be made available but not combined with any other data because of its very specific nature for identification purposes.

I think that's pretty much it. And I'd still want to run it by an Information Governance expert before going ahead (and for them to check my Privacy Impact Assessment).

Tuesday, 1 November 2016

Library data part three: dangerous demographics

The data about the people registered with a library is at one and the same time the most potentially useful and the most potentially dangerous. So dangerous, in fact, that when it comes to making this data openly-available the default position must be: Don't. Do. It.

That position will be strongly challenged by many so I'll devote the rest of this post to explaining the dangers and the next one will have a look at the data that might be openly-shareable so long as all the necessary precautions are taken.

Demographic information is immensely useful to a library service. Operationally it's important to see that the service is meeting the needs of all its communities and not just providing a service "for people like us by people like us." It's important to be able to make sure that particular services are reaching their target audiences and that you're not doing anything to put sections of the community off using your services. And it's essential that you have this data for Equality Impact Assessment of policy decisions. So why would you not want to share the data to get a bigger picture?

Generally speaking there are three main concerns:

Privacy. The library is one of the few safe public places left for the individual. Removing the right to privacy is an information governance issue just as much as an ethical one and both need to be taken very seriously (both are generally given too much lip service and too little analysis and action).

It also compromises the quality of the service being provided: if library customers know that how they as individuals use the library will be made public a good many of them will modify their behaviour and not use the library the way they want or need to. If they don't know this the library will have committed a significant breach of trust.

Legality. Does the library have the legal right to share the data? If it is possible to identify individual data subjects then the answer is categorically: No, unless the data subject has explicitly said that their data may be shared.

Anonymising the data so that it is no longer personal data is easier said than done. It isn't a matter of just removing all the names. We'll have a look at this later on.

The agreement has to be an opt-in and the purpose of this data sharing has to be clearly stated. "We want to make your data open so that other, as yet unknown, people can manipulate it to get as yet unknown information and outcomes" would be an open invitation to the Information Commissioners' Office to come and investigate your organisation.

Safeguarding. This is the most problematic and under-appreciated concern. Anybody knowing whether or not a person even visits a library, let alone uses it, may put that person in actual physical danger. In some controlling relationships a partner may only be allowed out to go to the shops and heaven help them if they do anything else. They may be allowed to take a child to library activities such as story times but not for themselves. An abusive partner discovering that somebody was somewhere they shouldn't be — the wrong end of town or even the wrong town — could be a trigger for violence. The test here isn't: "What is reasonable?" because this isn't about safeguarding people against reasonable action. It's about safeguarding them from action that may be anything but reasonable.

In my head I can hear somebody saying: "If they let us know that they're in an abusive relationship we could put a flag in their record to say their data's not to be shared."

This requires the data subject to actively opt out of data sharing.

Identifying yourself as a person in an abusive relationship is a brave thing to do and not something that should be required to be done at a public service point in a library.

The library suddenly becomes a less safe place.

Someone's got to remember to filter out the flagged records before sharing the data.

Anonymising the data requires more than stripping out all the names. The Information Commissioner's Office has a useful checklist (pdf).

In public libraries the combination of nearly any two data elements may be enough to make that data subject identifiable, or at least narrow the number of possibilities down enough to make it statistically probable they could be identified. The combination of "library where registered" and "library used" plus one other datum is usually OK but this needs to be tested with the particular data set, in case of nasty surprises. Other combinations very quickly narrow down to the individual.

A lot depends on the data itself: if the categories used are very general it might be safe to combine it, though it may be so general as to be pretty useless. I really did once work with a library service that thought it was OK to have two ethnic identifiers in the system: blank for "people like us" and "ethnic" for anyone who looked or sounded a bit foreign; I put a block on that the first chance I got; even so it wasn't until we got all the libraries onto the library management system that we finally got right of the last of the old Browne Issue tickets with a red E on them (disturbing symbologies like that make me wonder what librarians were thinking about in the eighties).

I'd imagine my local library authority will have thousands of white adult males in their database. How many — or few — teenage Bangladeshi females would there be?

Postcode data very quickly narrows down. There are perhaps ninety people in my postcode area. Twenty-odd adult white males. About four males in their fifties. One white male in his fifties.

Age data gets very specific very quickly. "Adult" and "Child" is pretty safe but as soon as you start refining that down it becomes problematic. Full date of birth is so specific it 's a red flag.

So we would need to be very careful about what data — and what combination of data — is made available. In a library consortium setting this should be governed by formal data sharing protocols that had been passed by each authority's information governance experts and given the OK by whoever is responsible for the authority's information risk so all the data of all the people who have actively agreed to their data's being shared can be made available to the appropriate staff for the appropriate purpose within the consortium. That's a very specific remit for a very specific purpose for the use of a very specific group of people, with checks and balances and sanctions for abuse.

Which is exactly not the case with the open release of data, so different rules need to apply and need to be applied proactively (the genie doesn't go back into the bottle if you find you've made a mistake). Hence the greater need for precaution.

Steven Heywood's Blog o'Library Stuff

Wednesday, 2 November 2016

Library data part three-and-a-bit: sharing customer data

Tuesday, 1 November 2016

Library data part three: dangerous demographics

About Me

Meanwhile…

Libraries

Library photos

Labels

About this blog

Blog Archive

Keeping an eye on...

Links

Visitors