I am extremely sceptical of claims that "Gender Critical" tweeters are bots

So I crunched the data to check

Feb 07, 2022

A story in the New Statesman grabbed my attention last week. “Gender-critical feminism is not as popular as its supporters may want you to believe,” read the headline.

The piece, by an author whose work I usually really enjoy1, was premised on a report published by “fact checking” firm Logically AI, and it argued that many of the retweets on the hashtag “#KeepPrisonsSingleSex" were the work of bots.

The hashtag itself is a reference to arguably the most contentious aspects of the already spectacularly acrimonious civil war currently raging within feminism, over which type of prisons male-born offenders who now identify as women should be housed. And though it has been used by so-called “gender critical” feminists for some time, it notably trended in January after Baroness Claire Fox mentioned it in the House of Lords, during a debate on an amendment to the Police, Crime, Sentencing and Courts Bill that specifically addressed the controversy.

It is a debate I have no desire to wade in on the substance of, because I’m not an absolute maniac, but what is of interest to me is the claim about a botnet being behind much of the activity on the hashtag, and the reason it ended up trending2.

“A Logically investigation has uncovered evidence of a Twitter botnet promoting the hashtag #KeepPrisonsSingleSex,” reads the piece’s subheading.

However, having read both the NS story and Logically’s write-up of their findings, I’m extremely sceptical that bots are really the reason the hashtag made it to trending, and I think the report’s conclusions could be overstating the “evidence” that a botnet was involved at all.

So after spending some time contemplating whether I really want to dip a toe into The Worst Debate On Twitter, I decided to risk it because of my unwise dedication to being the sort of arsehole who pipes up and ruins everyone’s fun.

And more sincerely, though this is just a minor hashtag controversy in the grand scheme of things, I do think this is important. I can easily imagine a world where a shaky claim about bots could embed itself as oft-repeated received wisdom in a debate that already generates far more heat than light. And if any common ground is ever going to be found, it will have to be achieved by both sides first agreeing a common set of facts.

So let’s try and figure out just how seriously we should take the botnet claims.

Oh god, what am I doing wading into this? Please don’t make it futile by signing up for my Substack - it’s free! - for more takes on a wide range of topics, and if this is successful, maybe I’ll do some more data mashing too.

Getting into the weeds

To grab the Twitter data, Logically says that it used the Twitter API3 to periodically grab tweets - up to a thousand per day on several days last month.

The company then used some sort of algorithm to sort tweets that used the hashtag by different criteria, such as identifying the “communities” of Twitter users that retweeted the hashtag, and analysing how tweets were clustered. Using this data, they were able to generate some very slick looking network diagrams showing these connections.

As for identifying the bots, Logically says that it used a service called Truthnest, which claims to score individual accounts for bot-like characteristics, and the authors also point to to a number of other facts about the data they collected that makes them think there is evidence of a botnet at work. To quote from the report:

“[R]ather than tens of thousands of individual accounts tweeting support clustered around prominent gender critical influencers, we found a small number of anonymous accounts spamming retweets back and forth to each other.”

“[T]he accounts that exercised the highest degree of influence on the network were ‘name+number’ accounts, which often indicates a bot.”

“These accounts never tweeted original content, instead rapidly retweeting and cross-retweeting content.”

And narratively, the report tells a story of how the hashtag evolved in popularity over the four days studied, and virtually disappearing by the end.

After reading the explanation of how they arrived on these conclusions, I remained sceptical for one major reason: Because getting something to trend isn’t actually that hard. So I was curious to learn more about the data they say points to a botnet conclusion.

Now, I can’t claim to have the same sort of sophisticated data analysis tools or skills that the authors have, but I have crunched a lot of Twitter data in the past on various projects4, so I emailed the company asking if they’d be willing to share their dataset to let me have a poke around - sadly, they declined.

But then something extremely useful happened.

Apples and Oranges

I first tried to download the Twitter data for the same days in January myself, but unfortunately, the Twitter API limits access for cheapskates like me to only the last seven days of search data.

But I was able to use a third party scraper to grab the tweets that used the hashtag. By my count this was 5,059 tweets over the course of the five days. Adding up the numeric count of retweets on all of these gets us to 38,176 total retweets. So this is broadly in line with the 37,000 uses of the hashtag that Logically report5.

Unfortunately however, the lack of API access and the limitations of the third party scraper meant that I couldn’t access the data that would solve the most crucial part of the puzzle, who had actually been retweeting the hashtag.

So it was some fortuitous timing then that on Thursday, February 3rd, #KeepPrisonsSingleSex once again made it into the trending topics. Amusingly, this was partially in response to the NS piece and the report, and many of the tweets also contained the hashtag #NotABot in response.

This time around, I was ready. I was able to use some old code I’d written to download all of the tweets that had used the hashtag during the past week - including the retweeters - and perform a basic analysis of my own.

My thinking was that if the hashtag is trending again, it must surely have achieved a similar amount of attention to the first time around in January6, when Logically performed their analysis. If a botnet had been at work in January, maybe whichever unknown, shadowy figures were operating it would also be responsible for it trending second time around? After all, gender critical opinions are apparently “not as popular as its supporters may want you to believe” - so it would have to be bots, right?

To be absolutely clear then, this is technically an apples to oranges comparison as the two datasets cover different time periods, when the same hashtag trended for different reasons. But perhaps looking at the data on how it trended second time around could help us understand why it trended first time.

What does it take to trend?

So exactly how much engagement was required to make #KeepPrisonsSingleSex trend second time around in early February? According to the data I scraped, in the seven days leading up to about 3pm on the 4th Feb (so covering the five or six days before, and most of the day after), there were 4,442 original tweets that used #KeepPrisonsSingleSex, and 11,9327 retweets of tweets containing the tag. On February 3rd specifically, the day it began trending again, there were 3,351 original tweets and 8,483 retweets.

Over the course of the entire week in question, tweets and retweets were posted by 3,614 unique Twitter users. The most prolific tweeter was @Siaroncatrin, who tweeted the hashtag 243 times, and the most prolific retweeter was @xx4MMF, who retweeted it an impressive 258 times. And you thought your Twitter usage was excessive.

My initial reaction on seeing these two sets of numbers is… that’s not actually that many people. When you think about how many Twitter users there are, mobilising almost 4000 people to post or retweet a hashtag does not feel that insurmountable for any suitably controversial political cause – especially one as obviously contentious and relentlessly talked about as the trans prisoners debate.

But one part of the puzzle does remain, which is the differing ratio of retweets. In the February sample, there are 2.69 times as many retweets as original tweets, and in the January sample there are 7.55 times as many. Could this be an indication that Logically are right, and that many of those retweets are inauthentic bots spamming retweets?

As luck would have it, we can do a bit of a sanity check on this, and put the numbers in a little bit more context, because happily, this isn’t my first rodeo.

Back in 2018, when I was editor of the now defunct tech website Gizmodo UK, I performed a similar analysis of the #ResignWatson hashtag, which trended one Sunday evening as supporters of Jeremy Corbyn tried to persuade then deputy Labour leader Tom Watson to, er, resign8.

In that case, in the two days of Twitter activity I analysed, 15,001 original tweets, and 74,745 retweets were made by 12,195 unique Twitter users, with the most enthusiastic poster retweeting the hashtag a staggering 573 times. This gives #ResignWatson an original tweets-to-retweets multiplier of 4.98x.

Similarly, a few months after this I used the same code to scrape data on other interesting hashtags I saw. For example, #BringBackGalloway which trended a few months after Watson, in support of George Galloway.

In this case, the hashtag trended with the help of just 2,868 original tweets and 14,704 retweets, posted by just 2,881 users. This gives us a multiplier of 5.13x.

So the January sample is still a larger multiplier still, but I don’t think it is unimaginably larger. What’s clear from these numbers is that to get a topic trending, you don’t actually need much activity, and that the numbers involved in KeepPrisonsSingleSex are not that out of whack with comparable trending political hashtags.

Names and Numbers

We can also look at more than just the numbers. My much more subjective impression, based on skimming through the February data and manually looking up some of the users is that the “name+number” accounts that Logically says is indicative of bots is that they… really don’t look like bots to me9.

Take @jeanniegene1, for example, the fourth most prolific retweeter during the February re-trend, who retweeted the hashtag 209 times during the week we’re looking at.

In terms of bot potential, Jeannie has no profile image or any obviously identifiable information - in fact, their Twitter bio literally says “KeepPrisonsSingleSex”. But look at their timeline and yes it is mostly (entirely?) retweets, and nearly all of them contain a gender critical message but… some do not. And not all of them are actually using the crucial hashtag the “bots” were trying to trend. Tweets containing the hashtag are just interwoven with a more general array of feminist/gender critical stuff.

For another example, what about @PHughes74470229, another name and number retweeter, who was responsible for 192 retweets of the hashtag? Again, loads of retweets but also engagement on other topics too, including the campaign to make misogyny a hate crime, and even #FBPE, which has nothing to do with feminism.

Skimming much further down the list to, say, @perkybadboy5000 and @workingout123, who retweeted tweets containing the hashtag six times each, on manual inspection they also appear to be normal gender critical people, retweeting many of the names and stories that are relevant to that community, as well as other things.

I admit, I haven’t been through all of the other accounts systematically10. There are upwards of 800 accounts that contain numbers in the screen name. But picking accounts at random to check, I’m yet to see one where I’d conclude it is more likely a bot than an actual living, breathing, Twitter-addled gender critical person.

I suppose conceivably the bots could be just really good at tricking me. Or that the botnet was deployed in January, but not in February. But I have to wonder what is more plausible? Maybe one of the lads at the Moscow bot farm got really into feminism, and is method acting a bunch of very convincing characters who each have a plausible constellation of interests… or could they just be real people who for better or worse are extremely interested in this one issue?

Occam’s Razor

In addition to the numbers being broadly in line with other trends, and my more subjective impression, there are other factors I would ask more questions about first before concluding that “botnet” is the most likely answer. For example, the fact that engagement on the hashtag peaked in line with the House of Lords actually debating whether prisons should be kept single sex, and then tapered off after - that seems to me to be just… completely normal? If you look at the hashtag for a football match several days after the match, the number of people tweeting about it will have fallen.

At another point in the report, Logically’s authors point to research by David Allsop as evidence the hashtag could be manipulated, but if you actually read the analysis he performed on a similar gender critical hashtag, he concludes there’s no evidence that bots were at work. Instead, he argues that the hashtag was ‘manipulated’ by actual, real gender critical feminists hammering the retweet button. So essentially, the same thing that I am proposing as a more likely explanation here.

Ultimately what I think is pretty clear looking at the comparative data is that you don’t need bots to make the story work. The numbers are small enough that a high-tech explanation is not needed.

So I am yet to be convinced that the reason #KeepPrisonsSingleSex trended is because of bots11. All you need is a small but dedicated group of supporters, and you can probably make your pet cause trend, and on balance, I think without further evidence, this rather boring explanation is probably more likely the case than a mysterious botnet.

I’m open to the idea that I could be completely wrong. Maybe Logically could share some more detailed information that would change how I weigh the evidence in my mind. But as things currently stand, I think that attributing the hashtag’s activity to a botnet, and making that claim in a headline risks overstating what the data actually shows. Such an extraordinary claim should rightly require some significantly more compelling evidence.

Congratulations! You made it to the end! If you enjoyed reading this then you might also enjoy my more regular fare of smart-arse analysis, blazing hot takes and mildly contrarian opinion on a wide range of topics.

Follow me on Twitter

I recommend reading this bizarre story about how a bunch of architecture Twitter accounts have some weird alt-right links.

The crucial thing to know about the trending algorithm is that it does not just take volume into account - it also takes other factors, such as uniqueness or growth into account - otherwise the topic trending topics all day, every day would probably be K-Pop bands and football teams. In fact, I’ve seen this myself, as I’ve been at multiple events in the past where enough of us in the relatively meagre audience of a few hundred people were tweeting along to send the event to the trending topics page.

An API is an “application programming interface” - basically a way for different apps to send data to each other. Twitter has a public API that anyone with a little technical knowledge can use to interact with Twitter - such as to build their own Twitter clients, have their own apps post tweets - or, indeed, analyse tweets and build bots.

I literally have a cat called Hashtag.

The numbers are never going to be exactly the same because the corpus of tweets is ever changing - tweets get deleted, accounts get suspended and unsuspended, and so on. Plus the actual number of reported retweets listed next to any given tweet is not necessarily entirely accurate to the reality, as Twitter caches these numbers to save itself having to count afresh every time someone opens a tweet.

I’m sure it has trended on a number of occasions in the past too, but I’m going to stick to referring to the January and February trends as “first” and “second” to make this already complicated piece comprehensible.

Curiously, if you add up the actual number of retweets listed on each tweet, rather than counting them as they came up in the search results, you end up with 13,208 as the number of retweets - about 1,200 more. Though I suspect this is mostly variation due to similar reasons as the above footnote.

I wish I could link to my original piece but annoyingly when the site was shuttered the domain was redirected to its US parent site, so all of the UK-originated content including loads of my work disappeared from the internet. Yes, this is very annoying.

The Twitter API doesn’t reveal information like IP addresses, so neither Logically or I can determine for sure whether they are posting from Moscow or whatever. Instead, we have to rely on other signals - such as, as Logically suggests, “name and number” accounts perhaps being indicative of someone creating multiple accounts and not caring what their actual username is, or the connections they have in terms of who they choose to follow.

Amusingly, I did notice some of the gender critical accounts also making questionable tech claims about, for example, other hashtags being suppressed by Twitter.

I’m not the only one. This computer science professor, who posts in German, is similarly sceptical.