Tuesday 1 November 2016

Library data part three: dangerous demographics

The data about the people registered with a library is at one and the same time the most potentially useful and the most potentially dangerous. So dangerous, in fact, that when it comes to making this data openly-available the default position must be: Don't. Do. It.

That position will be strongly challenged by many so I'll devote the rest of this post to explaining the dangers and the next one will have a look at the data that might be openly-shareable so long as all the necessary precautions are taken.

Demographic information is immensely useful to a library service. Operationally it's important to see that the service is meeting the needs of all its communities and not just providing a service "for people like us by people like us." It's important to be able to make sure that particular services are reaching their target audiences and that you're not doing anything to put sections of the community off using your services. And it's essential that you have this data for Equality Impact Assessment of policy decisions. So why would you not want to share the data to get a bigger picture?

Generally speaking there are three main concerns:
  • Privacy. The library is one of the few safe public places left for the individual. Removing the right to privacy is an information governance issue just as much as an ethical one and both need to be taken very seriously (both are generally given too much lip service and too little analysis and action).

    It also compromises the quality of the service being provided: if library customers know that how they as individuals use the library will be made public a good many of them will modify their behaviour and not use the library the way they want or need to. If they don't know this the library will have committed a significant breach of trust. 
  • Legality. Does the library have the legal right to share the data? If it is possible to identify individual data subjects then the answer is categorically: No, unless the data subject has explicitly said that their data may be shared.

    Anonymising the data so that it is no longer personal data is easier said than done. It isn't a matter of just removing all the names. We'll have a look at this later on.

    The agreement has to be an opt-in and the purpose of this data sharing has to be clearly stated. "We want to make your data open so that other, as yet unknown, people can manipulate it to get as yet unknown information and outcomes" would be an open invitation to the Information Commissioners' Office to come and investigate your organisation.
  • Safeguarding. This is the most problematic and under-appreciated concern. Anybody knowing whether or not a person even visits a library, let alone uses it, may put that person in actual physical danger. In some controlling relationships a partner may only be allowed out to go to the shops and heaven help them if they do anything else. They may be allowed to take a child to library activities such as story times but not for themselves. An abusive partner discovering that somebody was somewhere they shouldn't be — the wrong end of town or even the wrong town — could be a trigger for violence. The test here isn't: "What is reasonable?" because this isn't about safeguarding people against reasonable action. It's about safeguarding them from action that may be anything but reasonable.

    In my head I can hear somebody saying: "If they let us know that they're in an abusive relationship we could put a flag in their record to say their data's not to be shared."

    • This requires the data subject to actively opt out of data sharing.
    • Identifying yourself as a person in an abusive relationship is a brave thing to do and not something that should be required to be done at a public service point in a library.
    • The library suddenly becomes a less safe place.
    • Someone's got to remember to filter out the flagged records before sharing the data.

    I don't think any of that is acceptable. (And said so when it was said to me that time).
Anonymising the data requires more than stripping out all the names. The Information Commissioner's Office has a useful checklist (pdf).

In public libraries the combination of nearly any two data elements may be enough to make that data subject identifiable, or at least narrow the number of possibilities down enough to make it statistically probable they could be identified. The combination of "library where registered" and "library used" plus one other datum is usually OK but this needs to be tested with the particular data set, in case of nasty surprises. Other combinations very quickly narrow down to the individual.

A lot depends on the data itself: if the categories used are very general it might be safe to combine it, though it may be so general as to be pretty useless. I really did once work with a library service that thought it was OK to have two ethnic identifiers in the system: blank for "people like us" and "ethnic" for anyone who looked or sounded a bit foreign; I put a block on that the first chance I got; even so it wasn't until we got all the libraries onto the library management system that we finally got right of the last of the old Browne Issue tickets with a red E on them (disturbing symbologies like that make me wonder what librarians were thinking about in the eighties).
  • I'd imagine my local library authority will have thousands of white adult males in their database. How many — or few — teenage Bangladeshi females would there be?
  • Postcode data very quickly narrows down. There are perhaps ninety people in my postcode area. Twenty-odd adult white males. About four males in their fifties. One white male in his fifties.
  • Age data gets very specific very quickly. "Adult" and "Child" is pretty safe but as soon as you start refining that down it becomes problematic. Full date of birth is so specific it 's a red flag. 
So we would need to be very careful about what data — and what combination of data — is made available. In a library consortium setting this should be governed by formal data sharing protocols that had been passed by each authority's information governance experts and given the OK by whoever is responsible for the authority's information risk so all the data of all the people who have actively agreed to their data's being shared can be made available to the appropriate staff for the appropriate purpose within the consortium. That's a very specific remit for a very specific purpose for the use of a very specific group of people, with checks and balances and sanctions for abuse.

Which is exactly not the case with the open release of data, so different rules need to apply and need to be applied proactively (the genie doesn't go back into the bottle if you find you've made a mistake). Hence the greater need for precaution.

No comments:

Post a Comment