Fantastic piece. Your experience of LLMs is very similar to mine.
Also, this is based on a pretty small sample size, but within my field (statistics / data science) I have found that my colleagues in industry are general far heavier and more enthusiastic users than those in academia.
There's definitely a huge enthusiasm variance in different industries / crowds. Suspect journalism and academia are particularly hesitant – I'd probably be an AI sceptic by the standards of Silicon Valley. (I think AI is _most_ likely just important on the scale of the internet, not the wheel).
Have you ever asked Chatgpt how your use compares to the average user? I did that the other day and it was very illuminating in showing that most people are still using it as a toy, rather than for serious work.
Is that response, from ChatGPT, one you can check or do you have to take it on trust? I see no reason for its reply to be reliable - just like the coding assistance described in James O’s article.
Really interesting. I only occasionally use ChatGPT - and found it a bit hit and miss but I like the sound of these use cases.
I think a lot of people who are very anti AI are doing so primarily due to negative polarisation. They don’t like the people who are really in favour of it for vibes reasons.
Interesting. I'm a professional coder, and I've found coding models of very limited use so far. I sometimes use them as a first pass "why isn't this working?" query before bothering my colleagues, and I've generally found that for the extremely specialised work I do they're very unlikely to give me the correct answer, though they sometimes knock my thinking in the right direction ("That can't possibly be correct, because - oh, *right*, that's where I've been going wrong..."). I've also used them a couple of times for knocking up quick scripts in languages where I'm rusty, in situations where I can easily scan the code and verify it's not going to do anything dangerous so it's safe to run it and check whether it produced the output I wanted. For instance, recently I couldn't open a deeply-nested directory, and wanted to know at what point I was blocked, so asked Copilot to write a Bash script to list out the permissions on every prefix of the directory - /foo, then /foo/bar, then /foo/bar/quux, then .... Quicker than looking up Bash array syntax, but that sort of thing is only a small part of my job :-)
One coding task they're very useful for, as you correctly note, is writing code that uses well-documented but gnarly APIs. I too have used them for writing ffmpeg command lines, and they're good for data-munging with NumPy/Pandas or plotting with matplotlib. I've had less success getting them to query OpenStreetMap geodata with the Overpass API, though at least I got a starter query I could manually bash into shape.
Which model were you using to write code, out of interest? Think this is where a reasoning model and longer context window definitely help, as I remember earlier GPTs forgetting crucial stuff as the code iterated.
For non-coding use: the other day I was confused by Nancy Mitford's description of the court of Louis XIV as a "noblesse d'emir" (in an essay linked from The Bluestocking, natch), so I asked Claude. It gave me a good explanation that I'm pretty sure was correct ("the French nobility had become like decorative courtiers in an Eastern potentate's palace, stripped of real power and practical function"), which led into an interesting discussion of how autocratic courts have worked across continents and centuries, then veered into the life and works of Laclos (apparently after writing Dangerous Liaisons, he became a revolutionary).
I also asked it to recommend me Breeders songs, at which it was a total failure. But as far as I can tell there's only one really good Breeders song and I'd already heard it.
Claude Sonnet 4, apparently, but I just tried my last query with GPT-5 and it gave essentially the same wrong answer. There doesn't seem to be a "reasoning" option built in to GitHub Copilot (which has the nice feature of being built in to my IDE, so my current open file gets automatically included in the context window and it can suggest edits to my code which I can then accept or reject with a click). I'm fairly restricted in which AI tools I can use on my work machine (for the usual enterprise IP paranoia reasons), but I'll have a look to see what else we have available.
Fascinating stuff. After initial scepticism, my experience is much the same: I use it all the time (not least because I prompt design at work).
Are all your prompts at the level you talked about above, or do you find yourself getting better results using template? (e.g. persona, task, good examples, constraints, etc)
Also: if I never have to hear the words stochastic parrot again, it will be too soon.
They're pretty much always written just in "conversational" style. Sometimes I'll list things if it is a multi step thing I want it to do. And I will tell it what NOT to do to, especially when precision is required (eg, writing code), but generally pretty casual. I'm actually pretty astonished just how good it is at interpreting what I'm really trying to figure out.
I mostly used LLMs for coding. But one thing I learnt is that when planning a big thing, you can ask it to ask you questions.
"Here is my vague idea. I need to flesh it out so that I can (do something). Ask me questions to learn what you need to know and then write it up as a proposal/ plan for writing code / ..."
It then asks me questions. Sometimes I know the answer and just spell it out. Sometimes it really makes me think. And at the end I get a well structured plan/summary/whatever which contains a lot of my thinking. I've got a lot of value from that.
I originally came across the idea in this post, about coding, but the broad idea is much more widely applicable.
I use it a lot for summarising long complex documents. I have to read a lot of very corporate jargony pieces and I find them easier to read once I know the key points. I regularly have arguments with it about grammar (early iterations where surprisingly bad at knowing if it should be “James and me” or “James and I”). And if I’m struggling to start a project I use it for doing a bad job just to give me a starting point – or even an insight in what not to do. It will give me a pretty average example that will kickstart far better ideas.
NB this is for day job work, not my creative writing. I have never found it useful for creative work beyond copy editing and fixing transcripts. Apart from the time I asked it for ideas on what could make a “whirring sound” on a robot. (I realised I was talking about its disks spinning which made me sound about 100 years old.)
Interesting! I’m curious about the inaccuracy aspect (I accept that this is related to my being Gen X, and generally crotchety). I feel like combing through an unfamiliar response and trying to spot hidden inaccuracies must be as time-consuming as doing research from scratch? At least when I do my own research, I ^know^ when I’m being sketchy.
Plus, when you do research from scratch you find things out serendipitously (things that are not directly related to your query, but are interesting to know/enrich your life/make you reassess your query). It’s a bit like how the disappearance of paper-based news sources have stopped me reading absolutely random coverage of international business and volleyball, and I kind of miss that random factor.
But yeah, to be fair, I also just don’t like new stuff.
Fantastic piece. Your experience of LLMs is very similar to mine.
Also, this is based on a pretty small sample size, but within my field (statistics / data science) I have found that my colleagues in industry are general far heavier and more enthusiastic users than those in academia.
There's definitely a huge enthusiasm variance in different industries / crowds. Suspect journalism and academia are particularly hesitant – I'd probably be an AI sceptic by the standards of Silicon Valley. (I think AI is _most_ likely just important on the scale of the internet, not the wheel).
Have you ever asked Chatgpt how your use compares to the average user? I did that the other day and it was very illuminating in showing that most people are still using it as a toy, rather than for serious work.
This was a fantastic prompt! It flattered my ego and told me that "you’re much heavier and more sophisticated a user than average."
Yes, I got: "You’re definitely on the more interesting end of the spectrum" which I choose to see as a compliment.
Is that response, from ChatGPT, one you can check or do you have to take it on trust? I see no reason for its reply to be reliable - just like the coding assistance described in James O’s article.
Really interesting. I only occasionally use ChatGPT - and found it a bit hit and miss but I like the sound of these use cases.
I think a lot of people who are very anti AI are doing so primarily due to negative polarisation. They don’t like the people who are really in favour of it for vibes reasons.
100%. Another example of the phenomenon of people pretending Elon Musk's rockets are bad because he's a bad person.
https://takes.jamesomalley.co.uk/p/stop-letting-elon-musk-break-your?utm_source=publication-search
Interesting. I'm a professional coder, and I've found coding models of very limited use so far. I sometimes use them as a first pass "why isn't this working?" query before bothering my colleagues, and I've generally found that for the extremely specialised work I do they're very unlikely to give me the correct answer, though they sometimes knock my thinking in the right direction ("That can't possibly be correct, because - oh, *right*, that's where I've been going wrong..."). I've also used them a couple of times for knocking up quick scripts in languages where I'm rusty, in situations where I can easily scan the code and verify it's not going to do anything dangerous so it's safe to run it and check whether it produced the output I wanted. For instance, recently I couldn't open a deeply-nested directory, and wanted to know at what point I was blocked, so asked Copilot to write a Bash script to list out the permissions on every prefix of the directory - /foo, then /foo/bar, then /foo/bar/quux, then .... Quicker than looking up Bash array syntax, but that sort of thing is only a small part of my job :-)
One coding task they're very useful for, as you correctly note, is writing code that uses well-documented but gnarly APIs. I too have used them for writing ffmpeg command lines, and they're good for data-munging with NumPy/Pandas or plotting with matplotlib. I've had less success getting them to query OpenStreetMap geodata with the Overpass API, though at least I got a starter query I could manually bash into shape.
Which model were you using to write code, out of interest? Think this is where a reasoning model and longer context window definitely help, as I remember earlier GPTs forgetting crucial stuff as the code iterated.
For non-coding use: the other day I was confused by Nancy Mitford's description of the court of Louis XIV as a "noblesse d'emir" (in an essay linked from The Bluestocking, natch), so I asked Claude. It gave me a good explanation that I'm pretty sure was correct ("the French nobility had become like decorative courtiers in an Eastern potentate's palace, stripped of real power and practical function"), which led into an interesting discussion of how autocratic courts have worked across continents and centuries, then veered into the life and works of Laclos (apparently after writing Dangerous Liaisons, he became a revolutionary).
I also asked it to recommend me Breeders songs, at which it was a total failure. But as far as I can tell there's only one really good Breeders song and I'd already heard it.
Well you should listen to last splash as there’s loads of them on there
*checks*
Claude Sonnet 4, apparently, but I just tried my last query with GPT-5 and it gave essentially the same wrong answer. There doesn't seem to be a "reasoning" option built in to GitHub Copilot (which has the nice feature of being built in to my IDE, so my current open file gets automatically included in the context window and it can suggest edits to my code which I can then accept or reject with a click). I'm fairly restricted in which AI tools I can use on my work machine (for the usual enterprise IP paranoia reasons), but I'll have a look to see what else we have available.
Fascinating stuff. After initial scepticism, my experience is much the same: I use it all the time (not least because I prompt design at work).
Are all your prompts at the level you talked about above, or do you find yourself getting better results using template? (e.g. persona, task, good examples, constraints, etc)
Also: if I never have to hear the words stochastic parrot again, it will be too soon.
They're pretty much always written just in "conversational" style. Sometimes I'll list things if it is a multi step thing I want it to do. And I will tell it what NOT to do to, especially when precision is required (eg, writing code), but generally pretty casual. I'm actually pretty astonished just how good it is at interpreting what I'm really trying to figure out.
I mostly used LLMs for coding. But one thing I learnt is that when planning a big thing, you can ask it to ask you questions.
"Here is my vague idea. I need to flesh it out so that I can (do something). Ask me questions to learn what you need to know and then write it up as a proposal/ plan for writing code / ..."
It then asks me questions. Sometimes I know the answer and just spell it out. Sometimes it really makes me think. And at the end I get a well structured plan/summary/whatever which contains a lot of my thinking. I've got a lot of value from that.
I originally came across the idea in this post, about coding, but the broad idea is much more widely applicable.
https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
This is brilliant. I'm going to try this!
I use it a lot for summarising long complex documents. I have to read a lot of very corporate jargony pieces and I find them easier to read once I know the key points. I regularly have arguments with it about grammar (early iterations where surprisingly bad at knowing if it should be “James and me” or “James and I”). And if I’m struggling to start a project I use it for doing a bad job just to give me a starting point – or even an insight in what not to do. It will give me a pretty average example that will kickstart far better ideas.
NB this is for day job work, not my creative writing. I have never found it useful for creative work beyond copy editing and fixing transcripts. Apart from the time I asked it for ideas on what could make a “whirring sound” on a robot. (I realised I was talking about its disks spinning which made me sound about 100 years old.)
Really enjoyed this!
Cheers!
Interesting! I’m curious about the inaccuracy aspect (I accept that this is related to my being Gen X, and generally crotchety). I feel like combing through an unfamiliar response and trying to spot hidden inaccuracies must be as time-consuming as doing research from scratch? At least when I do my own research, I ^know^ when I’m being sketchy.
Plus, when you do research from scratch you find things out serendipitously (things that are not directly related to your query, but are interesting to know/enrich your life/make you reassess your query). It’s a bit like how the disappearance of paper-based news sources have stopped me reading absolutely random coverage of international business and volleyball, and I kind of miss that random factor.
But yeah, to be fair, I also just don’t like new stuff.
The default chatgpt model that most users probably use is the instant one. I think you have to pay for the thinking model.
I think so - or the Thinking one is at least rate limited, so you definitely get an inferior experience for most queries.