On a philosophical level, I'm not sure that an LLM "scraping" copyrighted material and using it as training data is all that different from human artists being "inspired" by other artists.
This is not Judy Dench clutching her pearls. Creatives are asking to be paid for their work. In a tight, restrictive, copyright regime, the AI megafauna could have all the training materials they wanted. They'd just have to pay. I think it is telling that the megafauna choose to pay vast amounts to lobbyists, not set up creative arts funds with their trillions.
But isn't James's point that, unless global rules are drawn up, creatives get the same material ripped off in other more lax regimes, to the same level of detriment, but with the extra downside that the UK doesn't see the benefit?
I should say, it's not something I've thought about much and don't particularly have a 'side' (if anything, I'm suspicious of the 'megafauna') - but my instant reaction, partly spurred by the WWI point in the piece, is that tight protection would maybe hold AI to a higher standard than human creators. Music is what I know best and there it works like: sample a piece and you pay licencing; 'interpolate' a melody or lyrics and you'll likely have to concede a songwriting credit; but be 'influenced by' other artists - that's just how most music is made. e.g. Oasis 'sounding like' the Beatles and/or Slade is beyond cliche but they won't get paid (other than for the cover versions - and the New Seekers and Stevie Wonder do get credits). What's supposed to be the difference? That at least Noel Gallagher is open about the source material? That AI will be ruthlessly efficient at it?
Though I was halfway through writing it when I remembered that Oasis are actually a terrible example because they *have* actually had to give credit on at least three songs. But maybe it works to illustrate the point - I'm pretty sure for both Shakermaker and Step Out, they didn't voluntarily identify the writers and they arose out of disputes (not sure about Hello, but who would voluntarily give money to Gary Glitter...?), whereas no-one seriously thinks all the various influences should get paid, even if in the case of an act like Oasis they are quite narrow and obvious.
So surely the usual rules still apply? If AI rips something off to the extent that Step Out rips off Uptight, or Men and Work rip off the flute riff in 'Kookaburra Sits in the Old Gum Tree', Steve Wonder or ... the current copyright owners of a long-deceased nursery rhyme author ... can go to court for royalties in the same way? I guess maybe the issue is it will happen a lot more given AI is probably a better copyist but (currently at least) a worse thinker than a human?
It is much more like feisty startup ABC records being able to produce XYZ studio 'like' records. (Although nothing produced by AI can be copyrighted itself.) There's a strong argument if not knock down certain UK and US copyright law breached. That's why they're lobbying the government. (I'd be happy to publish my books only in countries which prohibit AI scraping.) (This needs to be a long discussion, about 15-20 minutes in my experience.
I would have thought (in music) that ship sailed long before AI though. There's long been the tools to exactly replicate the sound of someone else, and some studios set out to do exactly that. e.g. Daptone make brilliant records that sound like they were made at Motown, Stax or Treasure Isle - albeit using completely analogue techniques rather than technological wizardry. They don't have copyright infringement issues unless they take melodies, lyrics, samples etc. So as I say, is the issue this becomes far easier with AI?
There's a bigger issue of course in what the overall point of blindly copying an existing style is. For me, in the Daptone case it deserves huge respect as they do it do well - but more generally, why listen to an AI record that *sounds* like it could have been in the 1960s? I'm interested in the real thing. Innovative copies at a push, but not something completely lazy and counterfeit.
James, thank you for the post. I think you're largely right that Britain has managed to find the worst possible position — but I'd push back on some of the framing.
On the Brexit analogy — the Irish border was never the strict binary you present. The EU maintains complex arrangements with Switzerland, Norway, Turkey and others. The problem wasn't that only two options existed — it was that neither side was willing to be honest about the trade-offs. The Windsor Framework “works” because everyone agreed to stop examining it too closely.
On the core copyright argument — I agree with you that unilateral UK restrictiveness achieves nothing when permissive regimes exist elsewhere. But I think you slightly understate the options. Just because an AI model can train anywhere doesn't mean it can act everywhere. If the UK and EU agreed a common copyright framework together, they could simply refuse to permit models that don't comply with it. UK plus EU plus near-neighbours is a big enough market to move the needle globally. We've seen this before — GDPR reshaped global privacy practices, the EU's interventions on browser bundling changed Microsoft's behaviour worldwide. The "Brussels Effect" is real, and a London-Brussels joint position on AI copyright would carry similar weight.
There's also a precedent closer to home. The music licensing system — PRS, PPL — was created to solve exactly this kind of problem: you can't pay every individual contributor for every use of their work, so you pay a collective licence fee that gets distributed. Obviously scaling that from music to every copyrighted work is a different order of complexity. But the principle — collective licensing as a pragmatic middle ground between "free for all" and "permission from every rights holder" — is proven and could be adapted.
That said, I think the copyright debate risks becoming a distraction from the more fundamental question of Britain's AI competitiveness. We could get copyright perfectly right in either direction and still lose the race if we don't have the data centres, the energy supply, the skills, and the capital to actually run the models.
A lot of discourse around LLM AI seems to centre around the idea of inevitability. This thing has been invented, loads of money has been sunk in it: it's obviously superior to all the previous developments in this direction. In what world would this not be the future? Well, Concorde, anyone?
It seems to me that use of LLMs is currently copyright infringement in the UK. It's clear that LLMs store a representation of much of their training data and that a copy of the training data may be elicited from that representation -- clear, because there is good evidence of this actually happening.
Perhaps a useful analogy is with, say, digital photographs or digital recordings. The creative input (picture, music, text) is processed into numeric data in a way that captures much if not all of the signal, and that numeric data is stored and copied. From any such copy, a user with the appropriate hardware (player, computer, phone) and software (browser, GPT) can obtain a more-or-less faithful copy of a specified creative input.
It has been previously generally understood that to do this without the permission of the rights owner is an infringement, and that the file of numeric data is an infringing copy, even if it is physically very different from the creative original.
So what would an appropriate remedy be? Well, we have two main collective licensing systems already. One is Public Lending Right, which compensates authors for the use made of their intellectual property by the public library system. The other is the Authors Lending and Copyright Society. Either of these could be either a model or indeed a mechanism for compensating authors. Currently PLR distributes about £6million a year and ALCS about £50million.
An opening position might be that LLM owners should contribute to ALCS, say £100million for each training run and then £5million per year during the lieftime of the model. Currently OpenAI is spending about $3billion on training, and is supposedly valued at nearly $1trillion, suggesting that people believe there's a good chance it will return profits of the order of $100million a year for decades to come. Asking for 5% royalties doesn't seem onerous.
Disclaimer: my wife is one of the claimants in the Anthropic settlement, which has been agreed at $3000 per work.
The USA has previously had a pattern of failure to accept other nation's intellectual property. I recall that JRR Tolkein had at least one of his works published without authorisation or payment. Even the caselaw on Anthropic seems to only apply to claimants who have registered US copyright. Something which seems not to cohere with the international agreements.
It's only because of Brexit that you can advocate your option: the UK is not bound by the EU AI Act, though that may be something that we have to adopt as the Government pursues closer alignment with the EU.
On a philosophical level, I'm not sure that an LLM "scraping" copyrighted material and using it as training data is all that different from human artists being "inspired" by other artists.
This is not Judy Dench clutching her pearls. Creatives are asking to be paid for their work. In a tight, restrictive, copyright regime, the AI megafauna could have all the training materials they wanted. They'd just have to pay. I think it is telling that the megafauna choose to pay vast amounts to lobbyists, not set up creative arts funds with their trillions.
But isn't James's point that, unless global rules are drawn up, creatives get the same material ripped off in other more lax regimes, to the same level of detriment, but with the extra downside that the UK doesn't see the benefit?
I should say, it's not something I've thought about much and don't particularly have a 'side' (if anything, I'm suspicious of the 'megafauna') - but my instant reaction, partly spurred by the WWI point in the piece, is that tight protection would maybe hold AI to a higher standard than human creators. Music is what I know best and there it works like: sample a piece and you pay licencing; 'interpolate' a melody or lyrics and you'll likely have to concede a songwriting credit; but be 'influenced by' other artists - that's just how most music is made. e.g. Oasis 'sounding like' the Beatles and/or Slade is beyond cliche but they won't get paid (other than for the cover versions - and the New Seekers and Stevie Wonder do get credits). What's supposed to be the difference? That at least Noel Gallagher is open about the source material? That AI will be ruthlessly efficient at it?
Exactly this - and great analogy! I might, er, steal it (or be inspired by it...).
Thank you - you'll be hearing from my solicitors.
Though I was halfway through writing it when I remembered that Oasis are actually a terrible example because they *have* actually had to give credit on at least three songs. But maybe it works to illustrate the point - I'm pretty sure for both Shakermaker and Step Out, they didn't voluntarily identify the writers and they arose out of disputes (not sure about Hello, but who would voluntarily give money to Gary Glitter...?), whereas no-one seriously thinks all the various influences should get paid, even if in the case of an act like Oasis they are quite narrow and obvious.
So surely the usual rules still apply? If AI rips something off to the extent that Step Out rips off Uptight, or Men and Work rip off the flute riff in 'Kookaburra Sits in the Old Gum Tree', Steve Wonder or ... the current copyright owners of a long-deceased nursery rhyme author ... can go to court for royalties in the same way? I guess maybe the issue is it will happen a lot more given AI is probably a better copyist but (currently at least) a worse thinker than a human?
It is much more like feisty startup ABC records being able to produce XYZ studio 'like' records. (Although nothing produced by AI can be copyrighted itself.) There's a strong argument if not knock down certain UK and US copyright law breached. That's why they're lobbying the government. (I'd be happy to publish my books only in countries which prohibit AI scraping.) (This needs to be a long discussion, about 15-20 minutes in my experience.
I would have thought (in music) that ship sailed long before AI though. There's long been the tools to exactly replicate the sound of someone else, and some studios set out to do exactly that. e.g. Daptone make brilliant records that sound like they were made at Motown, Stax or Treasure Isle - albeit using completely analogue techniques rather than technological wizardry. They don't have copyright infringement issues unless they take melodies, lyrics, samples etc. So as I say, is the issue this becomes far easier with AI?
There's a bigger issue of course in what the overall point of blindly copying an existing style is. For me, in the Daptone case it deserves huge respect as they do it do well - but more generally, why listen to an AI record that *sounds* like it could have been in the 1960s? I'm interested in the real thing. Innovative copies at a push, but not something completely lazy and counterfeit.
James, thank you for the post. I think you're largely right that Britain has managed to find the worst possible position — but I'd push back on some of the framing.
On the Brexit analogy — the Irish border was never the strict binary you present. The EU maintains complex arrangements with Switzerland, Norway, Turkey and others. The problem wasn't that only two options existed — it was that neither side was willing to be honest about the trade-offs. The Windsor Framework “works” because everyone agreed to stop examining it too closely.
On the core copyright argument — I agree with you that unilateral UK restrictiveness achieves nothing when permissive regimes exist elsewhere. But I think you slightly understate the options. Just because an AI model can train anywhere doesn't mean it can act everywhere. If the UK and EU agreed a common copyright framework together, they could simply refuse to permit models that don't comply with it. UK plus EU plus near-neighbours is a big enough market to move the needle globally. We've seen this before — GDPR reshaped global privacy practices, the EU's interventions on browser bundling changed Microsoft's behaviour worldwide. The "Brussels Effect" is real, and a London-Brussels joint position on AI copyright would carry similar weight.
There's also a precedent closer to home. The music licensing system — PRS, PPL — was created to solve exactly this kind of problem: you can't pay every individual contributor for every use of their work, so you pay a collective licence fee that gets distributed. Obviously scaling that from music to every copyrighted work is a different order of complexity. But the principle — collective licensing as a pragmatic middle ground between "free for all" and "permission from every rights holder" — is proven and could be adapted.
That said, I think the copyright debate risks becoming a distraction from the more fundamental question of Britain's AI competitiveness. We could get copyright perfectly right in either direction and still lose the race if we don't have the data centres, the energy supply, the skills, and the capital to actually run the models.
A lot of discourse around LLM AI seems to centre around the idea of inevitability. This thing has been invented, loads of money has been sunk in it: it's obviously superior to all the previous developments in this direction. In what world would this not be the future? Well, Concorde, anyone?
It seems to me that use of LLMs is currently copyright infringement in the UK. It's clear that LLMs store a representation of much of their training data and that a copy of the training data may be elicited from that representation -- clear, because there is good evidence of this actually happening.
Perhaps a useful analogy is with, say, digital photographs or digital recordings. The creative input (picture, music, text) is processed into numeric data in a way that captures much if not all of the signal, and that numeric data is stored and copied. From any such copy, a user with the appropriate hardware (player, computer, phone) and software (browser, GPT) can obtain a more-or-less faithful copy of a specified creative input.
It has been previously generally understood that to do this without the permission of the rights owner is an infringement, and that the file of numeric data is an infringing copy, even if it is physically very different from the creative original.
So what would an appropriate remedy be? Well, we have two main collective licensing systems already. One is Public Lending Right, which compensates authors for the use made of their intellectual property by the public library system. The other is the Authors Lending and Copyright Society. Either of these could be either a model or indeed a mechanism for compensating authors. Currently PLR distributes about £6million a year and ALCS about £50million.
An opening position might be that LLM owners should contribute to ALCS, say £100million for each training run and then £5million per year during the lieftime of the model. Currently OpenAI is spending about $3billion on training, and is supposedly valued at nearly $1trillion, suggesting that people believe there's a good chance it will return profits of the order of $100million a year for decades to come. Asking for 5% royalties doesn't seem onerous.
Disclaimer: my wife is one of the claimants in the Anthropic settlement, which has been agreed at $3000 per work.
The USA has previously had a pattern of failure to accept other nation's intellectual property. I recall that JRR Tolkein had at least one of his works published without authorisation or payment. Even the caselaw on Anthropic seems to only apply to claimants who have registered US copyright. Something which seems not to cohere with the international agreements.
Slightly linking your two items, there is the legendary Irish copyright precedent. See: https://en.wikipedia.org/wiki/Battle_of_C%C3%BAl_Dreimhne.
It's only because of Brexit that you can advocate your option: the UK is not bound by the EU AI Act, though that may be something that we have to adopt as the Government pursues closer alignment with the EU.
Gah you’re right and it makes me so mad!!
(But I also use AI for boring drafting tasks, so…)