An introduction to the AVALANCHE disinformation detection pipeline

AVALANCHE at FIC 2026: showcasing and building connections

22/04/2026

30/04/2026

Information is everywhere; trust is not. Every day, content is produced and shared at a massive scale. News spreads instantly, opinions amplify rapidly and information systems contribute to both the creation and distribution of information.

The challenge is no longer access; it’s understanding what is actually reliable.

What makes this problem difficult is not volume, but structure. Information is rarely entirely true or entirely false. Most content contains a mix of factual statements, interpretations, and sometimes misleading claims. When we evaluate everything at once, we may lose this distinction. That’s why a more useful way to approach disinformation is to stop treating content as a single block, and focus on the individual statements inside.

This is the core idea behind the approach we developed in AVALANCHE. Instead of asking “is this article true?”, we ask: “What is the main claim here?”, and “is it supported by reliable evidence?”

To answer this, our system is built around two tightly connected flows (see figure).

The first flow is about building a ‘trusted corpus’. The system continuously collects information only from selected, high-credibility and ‘authoritative’ sources. These can be established news outlets, institutional websites, or other domains that are explicitly and manually identified by the user.

This matters a lot. Most systems implicitly trust whatever they retrieve. Here, trust is explicitly defined upfront. Every piece of evidence comes from a known source, which makes the process auditable. If a result is questioned, you can always trace it back to where it came from.

The second flow is how the verification happens. When a new article enters the system, it is reduced to a single, clear, factual statement that represents its core meaning. This step removes ambiguity and forces the system to focus on something that can actually be checked. From there, the system searches for evidence inside the trusted sources, in two complementary ways: on the one hand, it looks for exact matches, eg. names, dates, specific terms; on the other, it also looks for meaning-level similarity, so it can find relevant information even when the wording is different. This combination allows it to capture both precision and context.

Once relevant documents are retrieved, they are broken down further into smaller pieces, typically individual sentences and each sentence is then evaluated against the claim to determine whether it supports it, contradicts it, or is unrelated.

This is where the process becomes very concrete.Instead of producing a single opaque score, the system builds its conclusion step by step:

here is the claim
here are the relevant sources
here is the exact sentence that supports or contradicts it

And the final outcome is a traceable reasoning path.

To make this more tangible, consider a simple example.

Imagine an article claims that:
“Country X has officially banned a specific technology across all public services.”

The system extracts this as the main claim and searches within trusted sources.

It might find:

an official announcement stating that the technology is restricted only in certain sectors → partial contradiction
a news report confirming limitations but not a full ban → weak support
other sources that do not mention such a policy → neutral

Instead of collapsing everything into a binary decision, the system evaluates these signals together. The result might indicate that the claim is not fully supported, while also showing exactly which parts of it are inaccurate. This level of granularity reflects how information actually works in the real world.

Another key aspect of this approach is that it avoids speculative reasoning. The system does not generate explanations or fill gaps with assumptions. If there is no clear evidence, it does not try to “complete the picture”. It simply acknowledges uncertainty. This behavior is intentional. In disinformation detection, being cautious is just critical. A system that confidently validates something without strong evidence can be more harmful than one that remains neutral.

This leads to a different philosophy compared to many AI-driven solutions. Instead of optimizing for prediction, the focus shifts to verification. The goal is to justify every output and make it explainable in terms of actual evidence. This also makes the system more robust because it relies on external, verifiable sources and is less prone to generating unsupported conclusions. And because the sources are controlled, it reduces the risk of amplifying low-quality information.

Moreover, the modular nature of this disinformation detection pipeline allows each part to evolve independently. The trusted corpus (our set of trusted sources) can be updated at will, retrieval methods can be improved, and verification flows refined, all without redesigning the entire system. This modularity matters to keep up with both content and threats evolving rapidly.

Also the implications beyond just the technical side are broad: disinformation affects how people form opinions, how decisions are made, and how trust is built or lost. A system that simply outputs “true” or “false” does not provide enough context to understand these dynamics. On the other hand, a system that shows (a) what is being claimed; (b) what evidence exists and (c) how that evidence relates to the claim, can support a much more informed way of engaging with information. This is useful not only for researchers or analysts, but also for journalists, organizations, and even everyday users who want to better understand what they are reading. It’s also important to note that this approach does not theoretically replace existing methods. High-level classification has value, especially for large-scale filtering. But our AVALANCHE approach adds depth and explainability as an additional layer of analysis.

As the volume of online content continues to grow, the need for this kind of structured verification becomes more clear. The challenge is not just identifying what is false, but understanding what is supported, what is questionable, and what cannot yet be verified. And we believe our claim-level and evidence-based verification is one step in that direction.

An introduction to the AVALANCHE disinformation detection pipeline

Project factsheet

Quick nav

Newsletter

Disclaimer