Here's a pitch for what the National Data Library should actually be
Do you want to be a National Data Librarian?
You might remember that last year I wrote about the so-called “National Data Library” (NDL) – an idea from the Labour manifesto that is regularly referenced whenever Keir Starmer and technology secretary Peter Kyle are asked tech questions. However – as I identified previously there is just one problem… nobody knows what it actually is supposed to be. It’s not clear what it would be for, how it would be structured, or even what data would be involved.
That’s why I’m very excited to share with you today a guest post from my friend Alexander Iosad, Director of Government Innovation Policy at the Tony Blair Institute. Today with his colleagues at TBI and Anastasia Bektimirova at The Entrepreneurs Network, he is launching a new paper setting out their ideas for what the NDL should be.
And he’s very kindly written the below, to give us a sense of their big idea is. It gets a bit technical, but I like sharing ideas for the real nerds – so let’s dig into the policy weeds!
But first, a quick, last-minute plug…
TONIGHT! In a massive yet delightful coincidence, I’ll actually be speaking to Alexander tonight (25th February) as he’s the guest at my latest event, on How AI can (maybe) fix the government. There’s a few tickets left, so come along after work!
Now here’s Alexander talking NDL!
Has it become any clearer what the NDL actually is?
The short answer is sadly “no”. Before you stop reading, though, let me tell you the good news: the research, data and policy community have done a lot of thinking on what the answer should be.
This has its pros and cons. Back in October, James (foot)noted one of the challenges here:
One approach [to figuring out the answer] could be to try and correlate the different proposals to the words from the Ministers and manifesto above, but the problem is that it’s all just a haze of data-adjacent words. Though they all sound like nice things, there’s no clear direction or intent when you add it all together.
This government clearly believes that better uses of data can help it accomplish big things. The NDL has proven a convenient short-hand for how this could play out. So, by our latest count, the NDL has been mentioned in Parliament 29 times and has featured in at least five policy papers.1
These include the AI Opportunities Action Plan, which points to the NDL’s role in helping the UK become “a maker, not a taker” of AI, and the Blueprint for Modern Digital Government, which says it will make it “easier to find and reuse data across public sector organisations”. Add to this continued commentary from the data community and a technical white paper challenge run by the ESRC and the Wellcome Trust, and there are plenty of ideas to work through.
A kind of consensus is emerging, though, on what the NDL should not be:
A giant data lake or single database that puts lots of data into one place – the NDL should instead help to “federate” access (that is, use data from different places without moving or copying records).
A marketplace for selling data or commercial data broker. As a government initiative, the power of the NDL is in the public value it should create. There is certainly an argument for covering the costs of the NDL by capturing upfront some of the commercial value it generates as a by-product, but we can’t lose citizen trust or levy another “innovation tax”.2
A superficial branding exercise. There may be a temptation to cut corners by simply applying the NDL label to anything data-related and calling it a day. Don’t.
There is also an inherent tension in the level of ambition the government has for the NDL. On the one hand, there is always a risk that the NDL runs into the same delivery challenges that large data and IT projects in government can be prone to, so we need to start small and prove value as we go. On the other hand, without a big vision, it could just end up a collection of loose data initiatives that don’t add up to much collectively. And so in an effort to help square that particular circle, we have come up with a plan and roadmap that I would describe as “fairly detailed”.3
Here is the key takeaway: for the NDL to succeed itmust be vision-led, not technology-led. This means prioritising the ends (quickly delivering tangible benefits to citizens, researchers and businesses) over the means (any one particular architecture or platform). If we agree on that vision, and hold to it, it becomes much easier to align all the small, easy-to-correct steps we need to take on the way to getting there.
A vision for the NDL, you say… what is it?
To answer this, we have to ask: where can the NDL add the most value? What can the NDL do that other initiatives can’t? And what outcomes will this create?
The NDL needs to be more than a data platform (there are plenty of those). It should focus on the untapped value of public-sector data, something no-one else can do, by making it quick, simple and safe to use. And it should remove existing data-related barriers to better policy-making, more effective public services and faster economic growth. It will be successful if it makes data easier and faster to use, reduces bureaucracy while preserving privacy, empowers those who hold data to make it useful and this delivers impact in the real world.
Now, there is a lot that needs to happen on the tech side of things – but while the Department of Science, Innovation and Technology (DSIT) are yet to announce any firm plans for what the NDL will do, much of the wider picture for digital transformation has been filled in. We now have commitments to make government data interoperable and reusable by default4 and to build a form of digital ID via the soon-to-come GOV.UK wallet app.5 This helps reduce some of the uncertainty of how, exactly, the NDL could be built because it takes care of many, though not all, of its fundamental building blocks. And this leaves it free to focus on the key purpose of making public-sector data easier and quicker to access and use safely.
How would it work – and what could stop it from working?
There are actually lots of things underway to help connect data across different silos in the public sector. To name just a few, we have the Office for National Statistics’ Integrated Data Service, the ESRC’s Administrative Data Research UK initiative, the Geospatial Commission’s National Underground Asset Register (my favourite dark horse) and all kinds of data-sharing agreements between individual departments. Which offers the NDL a wonderful starting point – making existing systems work better, not from a technical point of view but by streamlining the process of discovery and access. That means focusing in the first instance on governance and access controls needed for responsible, fast and scalable data sharing.
I keep coming back to this point about speed because the current process for doing perfectly legitimate and useful things with data in government is a bureaucratic nightmare. It’s very common for requests to share data to take six months or more to review. The best thing the NDL could do here is streamline this process by housing a single Data Access Committee (DAC) – a group of data and ethics experts who consider requests for using data that is included within the NDL’s remit based on a shared set of principles, eventually automating low-risk requests.
Government departments today often lack the capacity and expertise to do data sharing well. So the NDL should do what any library would do and hire some National Data Librarians – one for each department – all reporting to a Chief National Data Librarian. They can work together to agree best practices and quickly resolve disagreements while becoming deep experts in their host departments’ data structures and practices. The best of both worlds!
But what about the readers? Here, too, there is a major roadblock for making the most of the data we already have: user access controls. These are obviously necessary – lots of the data government holds is pretty sensitive and you wouldn’t want just anyone getting their hands on it. But today, even approved researchers often have to get permission for each data set from scratch, completing the same compliance training and filling out endless forms over and over again. And at the same time, there is no incentive for organisations to share data with you, because while you get (eventually, if you persevere) the benefits, they carry all the risk.
We should set up a Reader Pass system for the NDL, where users from the public or private sector or from academia can get accredited once to use different data within a certain category of risk or sensitivity – and if they abuse this trust, they should be barred from the library. This can also, incidentally, become the backbone of a financially sustainable model for corporate users and even a competitive advantage for the UK internationally.6
You might have noticed that I am yet to mention any of the technical infrastructure that might be part of the NDL. Eventually, we will have to build some. But this is a good example of how the most immediate barriers are in the “service layer”. Couple tackling them with some quick wins on the tech side,7 and you build some real momentum behind this.
What could the National Data Library actually do?
So why go to all this trouble? Well, if you tackle the underlying barriers to better uses of data – especially linked data from different sources – you open up a wide field of opportunities in the private sector, academic research, policy-making and even service delivery. For example:
Medical scientists would be able to determine treatment effectiveness faster by linking NHS and social care records, ensuring more precise, data-driven clinical decisions.
Climate scientists could integrate air pollution and health data, pinpointing the direct impact on respiratory illnesses and shaping stronger public health response.
AI developers could access unique, anonymised datasets to train more accurate, less biased models that would improve outcomes in education or speed getting planning permissions.
Government departments could quickly link data to answer questions they might have about how effective existing policies or new proposals would be.
Local councils would be able to act sooner to prevent hardship or exclusions, using social care, health, and welfare data to identify at-risk households, intervene sooner and prevent crises before they require costly intervention.
Better uses of data are already creating significant benefits for all of us – the ADR UK programme I mentioned earlier has a 1:5 cost-benefit ratio. So investing into making, let’s say, ten high-priority data sets easily and securely available (which might cost, based on existing projects, perhaps up to £200 million) could lead to returns of over £1b a year. More generally, better data access has been estimated to potentially generate as much as £319 billion in societal benefits by 2050. And ultimately, it is only by getting the data infrastructure right that the big prize on offer from AI in the public sector – a new operating model and as much as £40 billion a year in productivity gains – could be realised.
Which brings us to the elephant in the room: the Postcode Address File.8 In the long run, some of the greatest potential benefits of the NDL would be as a centre of expertise not only for making the most of existing data sets but for deciding what new data should be made available, where to source it and under what conditions to make it available. One of the big question marks around the plan to free the PAF is where to house it, and the version of the NDL that we are proposing – vision-led, not tech-led – is a natural home. Our proposed structure would also help to resolve the coordination problem that seems to be the main barrier to PAF liberation today. It would also create a place where researchers, policy-makers and developers could come together to build and use novel data sets for new and exciting tools, with or without AI, that tackle pressing societal challenges.
And this is the last point I’ll make: the NDL is a manifesto commitment, so there is political will to make it happen. But that also makes it vulnerable, because parties in power change. So within this term, Parliament should legislate to establish the NDL as an independent arms-length body, closely connected to both Downing Street and DSIT but ultimately charged with a really exciting mission: making Britain the first place in the world to build the infrastructure needed to fully harness data and AI for public good.
And it really is exciting. To give credit where it’s due, we would have none of the vibrant debate about the NDL if the government hadn’t made this very ambitious commitment in the first place. That it did, and has kept coming back to the idea as an important element of its vision for public services and economic growth, is a refreshing change of pace. If we get it right, the NDL could be a genuine breakthrough for how the UK uses data to fix existing services and develop new products, driving better outcomes for all and spawning novel industries. This is what makes it worthwhile – and if there’s still detail of the “how” to work out, well, that’s where all of us can help.
Alexander Iosad is Director of Government Innovation Policy at the Tony Blair Institute. You can find him on Twitter here, or Bluesky here. And you can find the full paper here.
And if you enjoy nerdy politics, policy and tech chat, then make sure you subscribe (for free!) to get more from me (James) direct to your inbox.
Fun fact: we originally had this at 26 but a last minute fact-check showed the mentions just keep coming. By the time this is published, the number may be higher still.
Yes, a PAF reference alert. Read on, I’ll get to it.
This turned out to be one of those papers that started out as “let’s capture some high-level ideas” and ended up diving deep on section 65(4) of the Digital Economy Act. But not everyone has time to read close to fifteen thousand words about the NDL. Hence, this guest post!
Specifically, a) to have a documented API, or machine-readable access point, for all public sector data sets – something also referred to as a “Bezos mandate” (relevant tech lore here); and b) to implement a “once-only principle” of never asking citizens to provide information the government already has.
I was going to reference James’ post on digital ID here but it’s actually quite badly out of date now. Consider this a pitch for my next guest post.
Let’s staple a NDL Reader Pass to every Exceptional Talent visa.
Incredibly, even something as simple as publishing a list of existing data sets on Github can qualify as progress in this space.
I know my audience.
You may find the postcode (actually lack of) mention here interesting https://www.gov.uk/government/publications/mandating-postcode-provision-for-the-freeports-and-investment-zones-secondary-class-1-national-insurance-contributions-relief
Hmrc mandating postcode be included in payroll submissions. But in a few cases, no postcode allocated so the workaround is to ask hmrc for a made up one - what could go wrong?