Nobody knows what the National Data Library is (not even the government)
A policy wrapped in a mystery inside an enigma
There was an intriguing sentence in the “driving innovation” section of Labour’s 2024 election manifesto.
“[W]e will create a National Data Library to bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit.”
This policy – to create a “National Data Library” (NDL) – is one that has been referred to regularly by Labour figures when they have been talking tech, and it is often cited as the answer to various knotty data problems.
For example, in September, Science and Technology minister Peter Kyle referred to the policy in a House of Commons debate about technology in public services. He said that:
“We must also manage public sector data as a national strategic resource. For far too long, public sector data has been undervalued and underused. We must replace chaos with co-ordination, and confusion with coherence. That is what the National Data Library will do. With a coherent data access policy and a library and exchange service, it will transform the way we manage our public sector data. It will have a relentless focus on maximising the value of that data for public good, on growing the economy and creating new jobs, and on delivering the data-driven AI-powered public services that they deserve.”
Lord Patrick Vallance, the Science Minister, has mentioned it in the House of Lords, too.
“We have committed to creating a National Data Library that will make it easier to access data, deliver data-driven public services, support research and create opportunities for economic growth, while maintaining strong safeguards.”
But here’s my question: What… actually… is… it? How would a “national data library” work? What would it do? How would it be structured? Who would be involved?
The answer, it turns out, is that nobody knows.
For the past couple of months I’ve been asking around, trying to figure out what this thing is. I’ve spoken to a bunch of people who you would expect to know, at least in broad terms, what the NDL will be or how it will work. But to my surprise, the overwhelming reaction has been a shrug of the shoulders. Nobody knows anything about it.1
And this seems a little weird for a policy that made it into the manifesto and has been referenced by government figures in Parliament. So what on earth is going on?
Before reading the rest of this, why not subscribe (for free!) to receive nerdy politics, policy, tech and media takes direct to your inbox?
Unravelling the mystery
If you’re anything like me, seeing words like “data” and “public benefit” in a manifesto sounds pretty exciting.
Perhaps the NDL could mean more open data? Or a new platform that helps developers build cool apps and services that use government data? Or – dare we dream – could it relate to liberating the Postcode Address File from the shackles of Royal Mail, and releasing this critical national dataset for free, for the public good?
As I’ve explained above, it really isn’t clear. But there are plenty of suggestions about what the NDL could be. And all of them are pretty significantly different from each other.2
One early suggestion came from the centre-right think-tank Onward, which pitched that the NDL could become part of the UK’s AI Research Resource, which is a government super-computer programme that provides computing power for training AI models.
If Onward’s idea were to be taken up, the focus of the NDL would be on acquiring datasets from across the public sector, and ensuring that they are “AI legible”, and as an organisation, the NDL would play the role of ensuring that data is cleaned up and meets certain quality standards.
For example, Onward suggests that the NDL could store anonymised medical scans from the NHS, all formatted consistently, so that AI tools can be trained to spot specific ailments.
A separate proposal came from Dr Emma Gordon. She’s the director of Administrative Data Research UK, a public sector body that was created in 2018, which works with government departments to wrestle datasets out of government and into the hands of academic researchers.
She sees one of the roles of the NDL as the creator of “trusted research environments” that researchers can use to interrogate government data securely. If I’m understanding her correctly, this purpose goes beyond simply publishing datasets, and instead involves creating and maintaining some of the technical tools that people can use to interrogate the data – perhaps a bit like how OpenSafely has opened up some NHS data to scientists.
Then finally, there have also been more hands-off proposals, or potential sources of inspiration, too.
For example, the Mayor of London’s Chief Digital Officer Theo Blackwell is already working on a London Data Library, which he hopes to launch later this year. It takes a rather different approach to the above: Instead of acting as a body that collects and manages datasets, it takes a “federated” approach.
So under this model, each contributing organisation would continue to manage and publish their own datasets, but the ‘data library’ would serve as a central portal for discovery.
In London’s case, this could mean that if, say, you wanted to find out how many rough sleepers there are on Oxford Street, you wouldn’t need to know whether you need to talk to TfL, Westminster Council or a homelessness charity – instead all of the datasets would be organised into one central directory.
And you can imagine how a National Data Platform could do something similar: Given how there’s data being produced all over the place – every government department, arms-length bodies like the Office of National Statistics, local government and so on, you can imagine how such a portal indexing where data can be found would be pretty useful in and of itself.
Ultimately then, there are numerous ways that Britain could build something called the “National Data Library”. And I’ve hardly scratched the surface here, as there are many more downstream questions. For example, who would have access to the data within? Would be for the general public, or just for government insiders, or perhaps trusted researchers too?3
This gets me back to the fundamental problem with the National Data Library policy, which I think data policy guru Gave Freeguard summed up rather well. He wrote recently:
“What purposes should be prioritised? What services will it provide? In whose interests will it operate? How will it demonstrate trustworthiness and build trust? What ‘data’ are we talking about? What problems is it trying to solve?”
In other words, he’s putting it in far politer terms than I am, as Gavin is a professional and a gentleman4 – but it all boils down to the same question: wtf is this thing even supposed to be?
An answer to every question
In essence then, despite being referenced in the manifesto and at the dispatch box, as best as I can tell at the moment, the National Data Library is little more than a cool-sounding phrase.
One person I spoke to characterised the ‘Library’ as being in the “ideation” phase – which is the same excuse I use when my partner asks me why I haven’t got off the sofa and gone for the run that I claimed that I would.
So I think we can say at the moment, that functionally the National Data Library is less a fully worked out policy, and more of a bucket into which the government can throw any difficult questions it receives that contain the word “data”. It buys time until the government has figured out what it actually wants to do with the constellation of ideas and issues described above.5
However, this all said, despite this striking lack of clarity, I am going to cut the government some slack on this. It has, after all, only been in office for little over five months, and it took power much earlier than expected. So we can’t expect it to have all of its plans worked out ahead of time, without an army of civil servants to do the heavy lifting.
And even talking up the cool name alone, without having a fully worked out plan behind it, isn’t necessarily bad. I think it’s a lot like the "Levelling Up” policy was for the Boris Johnson government: A maddeningly vague idea, but one that is indicative of the desired direction of travel.
So just as there was a sense that Johnson wanted to do something to uplift the north,6 it’s good that the current government is telling us that it wants to do something better with government data than the status quo.
Let’s just hope that it eventually figures out what that thing actually is.
If you enjoy ultra-nerdy politics and politics takes, then you will enjoy my newsletter. Sign up (for free!) to get more of this sort of thing in your inbox. And if you happen to know what the National Data Library actually is… please get in touch!
Finally here’s some previous writing on digital government matters.
I even emailed DSIT last month and asked if someone could explain it to me, but sadly I did not receive a reply.
Apologies in advance for how wonky and technical this is about to get.
One approach could be to try and correlate the different proposals to the words from the Ministers and manifesto above, but the problem is that it’s all just a haze of data-adjacent words. Though they all sound like nice things, there’s no clear direction or intent when you add it all together.
By contrast “being a bit rude and trying to draw attention to myself on the internet” is basically my business model.
And to be clear, despite the lack of clarity, I do think that there’s definitely something here. It’s a bit like the film Prometheus (2012) – in that there’s a good film buried in there somewhere, but it’s hidden behind an enormous mess.
Whether or not he actually did anything to deliver on these good intentions is a different question.
I think the problem with the NDL (and lots of other policies/projects we see proposed to and by the gov) is it can only be launched with something sufficiently ambitious.
There’s a real lack of appetite for throwing out small, less than perfect but better than what we have right now bits and bobs. Stuff like a website that gives accessible guidance on how to figure out datasets you might need, how to find out if they exist already, how to request datasets.
Things like this chip away at the barriers that big ambitious projects like the NDL - whatever it is - are trying to tear down, but the thing is you might as well chip away for awhile to making tearing down easier.