We finally (sort of) know what the National Data Library is
They even let me speak to the Digital Minister about it
Today is a big day. One of the core mysteries at the heart of British politics might have finally been solved.
No, sadly I’m not talking about the final resting place of the infamous ‘Ed Stone’, nor have we discovered why the Postcode Address File is still trapped in the clutches of Royal Mail, in defiance of all logic.
Instead, I’m talking about the mystery of the National Data Library (NDL).
I’ve written about the NDL before. The phrase first emerged out of various think-tanks and think-pieces spitballing in the run-up to the last General Election about how the government could do better things with data. Then it was hastily adopted by the Labour Party, making its way into the manifesto, which pledged that:
“[W]e will create a National Data Library to bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit.”
In truth though, it was never exactly clear what the Labour Party, and now Labour government, thought the NDL was supposed to be. If I were to be both accurate and unkind, until recently it was mostly an intriguing name, backed by little of substance.
And this was driving me pretty crazy. For the past couple of years I’ve made a habit of asking anyone I encounter even tangentially connected with the Department of Science, Innovation and Technology (DSIT) if they know what the NDL is supposed to be. Not a single person was able to give a clear answer.
But now, everything changes, at least a little bit.
Today the story the government wants to talk about is a new pilot project to better connect health, education and childcare data in three local authorities – Hammersmith, Leeds and Liverpool. The idea is that by facilitating GPs and educators sharing data, kids will not fall through the cracks, and will receive the support they need.
This is undoubtedly a worthy and good project in and of itself. But the real nerds watching closely will also spot something else about today’s announcement: That the government is explicitly drawing a connection between this pilot1 and the plans for the NDL. That’s why, in tandem with this, today the long-neglected government data repository, data.gov.uk, has been officially rebranded as the National Data Library.
And perhaps even more notable, an actual government minister has kindly taken the time to answer some of my questions about it.
So what actually is the NDL? What will it do? How will it work? And what problems will it solve?
To explain, I’m going to split this post into two parts – first, the short chat I had with Digital Minister Ian Murray, and then below that I’ll offer my own analysis and explanation of what it all means.
If you enjoy ultra-nerdy policy coverage like this, then make sure you subscribe to get more of This Sort Of Thing direct to your inbox.
An interview with Digital Minister Ian Murray
James: Today marks the relaunch of data.gov.uk as the National Data Library. In a nutshell, what do you think are the biggest upgrades that are now coming in because of the NDL?
Murray: The NDL is essentially the top skin for what we’ve already got. So we’ve got data.gov.uk. It’s a bit of a mess. So the first stage of NDL is to get all of those datasets maintained, get them not necessarily standardised, but interoperable, get all the broken links sorted.
The second stage is then to add more datasets and have the NDL as an engine that can search them and bring those datasets together [thematically].
The third thing would be to use the NDL as a source the private sector can use to innovate etc, so the data would have a real value as well not just for innovation but also in terms of a monetisation value too.
James: In the slightly longer term, what sort of problems do you think the NDL will be specifically solving?
Murray: We have a data rich environment in the UK, and data has a huge economic value. But also there’s the value of bringing datasets together for better public policy – being able to understand what’s happening better.
But it’s also about the individuals. I’ve just spoken to parents who try to navigate an impenetrable system, with government departments that don’t talk to each other, because they can’t.
I think the best phrase that somebody gave me today was that ‘parents are experts on their children but not the system, and the [government] are experts on the system but not children’ – and that’s essentially what we’re trying to resolve with the National Data Library.
James: My audience are very much the data-nerds who want to connect things together and go digging through datasets and so on…
Murray: If you think about this from an “I love data” data-nerds perspective, the National Data Library is essentially going to be bringing together all the datasets that government has, and new datasets, giving you the ability to innovate.

James: I know the early years launch today is a project which is technically separate to the NDL, but it’s thematically related. In the longer term, do you see the role of the NDL also going in the direction of handling personal data or more specialist data, with the NDL as the mediating layer? Or is it going to stay focused on high-level, aggregated data sets?
Murray: We’ll always have the aggregated and federated datasets because government departments will still hold [their own] data. But the whole point of the NDL is to try and bring it together.
So from a data perspective, there’s probably four levels of data [that] the NDL, when it’s all operational, and in a utopian world, [will use].
There’s anonymised data that it can bring together for academics and innovators to see what’s going on, and to innovate with new products and services on that. TfL data, for example, is a really good example of open data.
The second level is that personalised data that the public [sector] holds. So the public have all of this data already. It’s just all over the place. So bringing that together for the benefit of the individual in a personalised way is really important. I suppose the One Patient Record for the NHS is a way of thinking about that.
The third thing is to bring data together to allow practitioners to develop services that are much more targeted. And then the fourth thing is for data to come together for governments to put public policy in place that actually responds to problems.
So from a data perspective, you can see how the National Data Library is the building block for which you can try to do all of these things and that’s what we’re trying to achieve.
James: Is the NDL team, or are you as the minister, going to have a stick to beat other government departments, or other bits of the public sector, who hold useful data, to tell them to publish it, or improve the way it’s published, or basically wrangle their data into a better shape?
Murray: It’s actually more carrot than stick, because for government departments the carrot is that they end up with better, more efficient services. It saves them money, and they end up with a citizen interaction and engagement that is much more positive and efficient.
It also frees up an awful lot of professional work. If you think about a social worker who deals with the most complex and emotive child protection cases, if they’re able to use data to make sure they can do more [casework] rather than being in the weeds of [being] not quite sure where to target, then you can see how a service at that level can become much better.
A practitioner in children’s health said to me today, if we can find a system that allows data to reach them, and then do something [with] it, then that’s what the data should be used for.
James: Do you have a rule about how you decide what datasets are going to be published freely and openly versus which are going to be made commercially available for a price? I’m very keen on opening up geospatial data. So things like the Postcode Address File….
Murray: Yeah…
James: Is that an endorsement of opening up the Postcode Address File? Can I put you on the spot about that?
Murray: Well, it’s my endorsement of saying that data is very valuable and very useful!
James: So my question there is basically, do you have a rule or a way of differentiating between what data should be put out for free and what should be made commercially available, and where you draw the line?
Murray: No, not yet, but that’s what the NDL is developing. But I suppose to answer your question on that basis, there’s probably three or four [types] of data that government has, isn’t there?
There’s internal government data for government to use. There’s open access data for everyone to use to innovate. There’s then personal data that is held by government that only specific parts of government and individuals should be able to use. And then there’s government data that can be commercialised.
What falls into those four buckets, a lot of it is fairly obvious, but there’ll be other stuff that’s not. So the structure of NDL, and the way I’d like to see it operate, is for as much of it to be open data as possible for innovation purposes.
James: Finally, can you paint me a picture? Imagine the NDL has been going for five years. What role will the NDL be playing?
Murray: We’ve talked a lot about the technicalities of data and the NDL. But if we turn the telescope around and look at it from the other side, it’s got to be outcomes based. And those outcomes, for me, would be much more innovation because we have open data, much better interoperability between government departments, much better personal control of the data that people hold on themselves.
But ultimately, if we look at what we’ve launched today in terms of the [early years pilot], we’re trying to cut the gap of the third of children who go to school without the basic needs met. And that’s a half of the children who are on free school meals. So the [goal] is to lower those figures considerably. I think we want to get to them down to 25 percent, with the trend lower. So if you turn the whole NDL telescope around, it’s about it’s about good outcomes for the public, and I think the NDL will help us deliver that.
This transcript has been edited for clarity, as human speech is inherently messy. I have focused on preserving the substance of the Minister’s answers.
London Gateway Services
So that’s what Minister Ian Murray had to say. Now let’s get into the weeds, and disentangle some of the details of what the NDL actually is.
The reality is that since the government came to power, the question of exactly what the NDL is for has been a subject of pretty intense debate. A year or so ago, I remember being told about how inside the department, there was a fairly long-running plan to figure out how to translate the pledge into policy. At one point, someone described to me how the NDL team was bringing in some hugely talented and competent people – but there was, er, a ‘lack of strategic direction’.
And what’s clear from today’s announcement is that, perhaps reflecting this uncertainty, DSIT has decided to take an incremental approach to figuring out the full scope of what the NDL’s role should be – and it is starting rather modestly.
So what is the NDL today? At its core, it is a relaunched data.gov.uk. This was an open-data gateway launched way back in 2010 by Tim Berners-Lee, the idea being that the government would host and link to public sector datasets, all from one place.
And if you head there today, you’ll see pretty much the same thing, with various datasets and APIs from across government grouped into different thematic areas, for users to poke around in.
This perhaps doesn’t match some of the loftier claims made about the NDL over the past couple of years, but the renewed focus is a useful one. By the Minister’s own admission above, the old data.gov.uk was a mess, with various broken links, out-of-date datasets and so on.2 So the current NDL team has focused on basically bringing the gateway up-to-date, including improving things like the available metadata that describes the datasets that are available, to make them more discoverable, so that more people will make use of them.
Then what’s also notable following today’s launch is that we now also have some clarity about what the NDL is not – or at least not yet.
For example, today’s announcement has been coupled with the early years data-sharing pilot described above. But other than a broad thematic connection (both the pilot and the NDL are something about ‘data’), there isn’t really any connection between the two things.3
Under the current NDL roadmap, there are no plans for the NDL to be the place to go to, say, securely share personal data across public services. It won’t be where GPs and teachers login to connect the datasets. That will presumably happen on separate systems, built by other people, in other places.
Instead, the current intention for the NDL is that it will remain focused on aggregating high-level, open, public datasets of the sort you can see on data.gov.uk now – everything from coastal erosion data to immigration statistics.
In fact, this is where the ‘library’ metaphor perhaps works best – the idea is that the NDL will ‘curate’ the most important datasets, much like how a library has to choose which books it stocks.
Mystery solved?
In the title of this piece I say that we ‘sort of’ know what the National Data Library is now. I say this, because I think what’s also clear is that the future of the NDL is largely still up for grabs.
That’s why everyone from the Minister downwards is talking about what the platform ‘could’ do in the future – and, I assume, why the NDL programme was given £100m last year. That implies something more ambitious than an updated website.
And it’s possible to imagine plenty of futures for the NDL. Could it become the front door to Trusted Research Environments, that does let researchers poke around inside sensitive public sector data, a bit like the excellent OpenSafely?
Or could it become the gateway to government data not just for the human data nerds, but for the AI bots? One idea I particularly like is this proposal by the Open Data Institute, which pitches that the NDL’s job should be wrangling government data into a format so that AI agents, acting autonomously, can access public sector data. This will be particularly important in the future – as government data can provide valuable ground-truth information, that will make AI chatbots and agents more reliable.
And this brings me to what I think is perhaps the biggest missed opportunity with the NDL launch so far – and that’s the government not giving DSIT some form of enforcement power to oblige other government departments and public sector bodies to open up their data where they can, and set the standards for how it should be released.
This is because if you look back at the early 2010s, the reason the Government Digital Service (GDS) was able to transform how the government does digital with the GOV.UK website, it wasn’t just because they had the smartest coders or the boldest vision – it was because they had the power. At the time, GDS was handed the authority from David Cameron to approve or deny any digital spending across the entire government, which meant that every other department had to follow new, user-centric GDS rules.
And my fear is that if the NDL team doesn’t have a baseball bat covered in spikes with which to enforce its will, then it will be harder to persuade the rest of government to get with the programme and open up their datasets and build APIs.
But these are questions for the future. What’s notable today is that we – finally – know what the National Data Library is. Or at least what it is right now. And though the government is starting small, it’s clearly coupled with some pretty bold aspirations in the longer run.
Now, about that postcode data…
If you’ve read this much about the National Data Library, then you will definitely enjoy my newsletter. So make sure that you’re subscribed (for free!) to get more stuff like this.
The project is actually being described as a ‘kickstarter’, but I’ve used the word pilot here, so nobody mistakenly thinks Keir Starmer is trying to crowdfund support for thousands of kids.
I understand that by one count, there were as many as 8,000 broken links by one count!
From a communications perspective, this makes total sense. Governments of all stripes bundle announcements of largely unrelated things, as that’s how you get the press to cover the boring, technocratic stuff.





