
Hate speech detection and sentiment analysis in AVALANCHE
29/05/2026Why the origin of a spread matters more than the spread itself

When a piece of disinformation goes viral, our instinct is to look at the wreckage. The misleading headline that reached two million people. The deepfake video shared by a politician. The hateful meme that triggered a real-world incident. We count the shares, screenshot the worst replies, and ask how something so obviously false could travel so fast. But by the time we are counting, we are already too late. The interesting question and the one that actually helps action is not how far something spread, but where it started, who pushed it, and what pattern of behaviour carried it there.
This is what we call origin of spread analysis, and it sits at the heart of how investigators are starting to think in the domain.
The wrong way to look at a viral post
For years, the default response to a viral piece of disinformation was to play whack a mole. A platform flags the post, a fact-checker writes a rebuttal, a journalist publishes a thread debunking it, and everyone moves on. Sometimes the post is removed. Sometimes it isn’t. Either way, the same kind of content shows up again a week later, dressed in slightly different clothes.
The problem with this approach is that it treats every viral piece of content as if it appeared out of nowhere, as if some random user just happened to type it out, and it just happened to catch fire. Sometimes that is what happens, but more often than not, a piece of content that ends up shaping public opinion has been pushed there deliberately, methodically, by a relatively small group of accounts working in a coordinated way.
When doing origin-of-spread analysis, this is the assumption we start from. If you only look at the content, you miss the campaign. If you only look at the viral peak, you miss the launch.
What “origin of spread” really means
Origin of spread analysis is the practice of working backwards from a viral piece of content to reconstruct the path it travelled. Who shared it first? Which accounts amplified it in the critical early hours? Which of those accounts seem to act in unison? Which ones reappear, again and again, in unrelated campaigns?
You can think of it like epidemiology. When public health officials investigate a disease outbreak, they don’t just count patients. They trace cases back to identify “patient zero” and the network of contacts that allowed the disease to spread. They look for super-spreaders and they look for clusters of infection that point to a common source, a restaurant, a conference, a contaminated water supply.
Disinformation works in a strikingly similar way, and this is the perspective AVALANCHE brings to the problem. There is almost always a patient zero, the first account to post a particular framing of a story, or the first website to publish a fabricated quote. There are almost always super-spreaders accounts with disproportionate reach that amplify the content within minutes of it appearing. There are also almost always clusters networks of accounts that act together with a regularity that looks nothing like organic behaviour.
The skill of origin of spread analysis (one of the core capabilities we are building in AVALANCHE) is being able to see these patterns clearly enough.
Behaviour, not content, gives the campaign away
This is where things get interesting, because the most reliable signal in an influence campaign is rarely the content itself. Anyone can write a misleading post. Anyone can share a deepfake. What is hard to fake at scale is behaviour.
Real users behave irregularly. They post at odd hours. They take weekends off. They write about their dog, then their job, then something they read in the news. Their interests overlap with their friends’, but not perfectly. They go quiet for months and then come back.
Conversely, coordinated networks rarely look like that. They post in synchronised bursts. They share the same links within minutes of each other. They use the same phrasing and sometimes the same typos. They appear out of nowhere with fully formed opinions on every trending topic. Some of them have profile pictures that don’t quite match their names, or follower counts that don’t quite match their activity.
Of course, none of these signals is enough on its own. Plenty of real people post at strange hours. Plenty of friend groups share the same articles, but put twenty signals together, across thousands of accounts, and the difference between organic engagement and a coordinated campaign becomes statistically obvious. Surfacing those signals and combining them into something investigators can actually act on is precisely the kind of behavioural analysis the AVALANCHE pipeline is designed to do.
This is why the most powerful question we can ask about a viral post is not “is this true?” but “is this organic?” because organic falsehoods, painful as they are, behave very differently from manufactured ones.

From individual posts to network reconstruction
Once investigators take this seriously, they stop thinking about posts and start thinking about graphs. Every account is a node, every share, retweet, mention, and reply is an edge. Out of millions of these connections, certain shapes start to emerge: tightly clustered communities that interact almost exclusively with each other. Hubs that broadcast to many but follow few and bridges that carry content from a fringe community into the mainstream.
The accounts that originate disinformation are rarely the ones that make it go viral. More often, they sit quietly in an outlying community, planting content that is then picked up by the bridges and pushed to the hubs. When the wider public sees it, then the trail back to the origin is hidden under a thousand legitimate-looking shares.
Reconstructing that trail is painstaking work, and one of the central problems AVALANCHE tries to solve. The methods we develop turn scattered interactions into a structured picture of who is actually running the campaign – this is the information that matters for investigators. Taking down a thousand individual posts does not stop a campaign, identifying and disrupting the network behind them does.
Why this matters now
We have entered a moment when the gap between what is real and what looks real is shrinking by the month. Deepfakes are cheap, generative text is convincing, coordinated influence is not science fiction, it is a documented feature of almost every major election or geopolitical crisis.
In this environment, debunking individual pieces of content is no longer sufficient. We need investigators who can look at a wave of seemingly unrelated posts and see the shape of the campaign behind them. We need tools to surface behavioural anomalies, not just content violations.
And, beyond tech, we also need the public to understand that the most important question about a viral post is who put it in front of you, and why. This shift from content to behaviour, from posts to networks, from reaction to reconstruction is the direction AVALANCHE is pushing.
The origin of a spread is rarely the loudest voice; it is the quiet one where the spark was planted.
