Something shifted when I was looking at Portal Kombat.

I had been tracking the Pravda network for months: hundreds of websites publishing Kremlin-aligned content, updated hourly, across 28 countries. The scale was already remarkable. More than 7 million articles, coordinated, multilingual, designed to flood the information environment with a particular reading of the world.

Then I started noticing where the content was landing. Not just on obscure pro-Russian blogs. On Wikipedia. And Wikipedia, as anyone paying attention knows, is one of the primary training sources for large language models.

Information manipulation used to work through reach. You flooded social media, you bought ads, you built networks of inauthentic accounts. The goal was to get your content in front of enough people that some of them would believe it, share it, or at least be confused by it.

Researchers learned to study this. We built tools to track it. Platforms (eventually, reluctantly) built systems to detect it. The cat-and-mouse game was brutal and expensive, but at least the terrain was familiar.

The terrain has changed.

The new attack surface

Large language models are trained on web content. The quality and composition of that training data determines, in no small part, what the model will say when asked about a given topic (cf DeepSeek). If you can systematically influence what is written on the web, at scale, over time, you can influence the outputs of AI systems that have not even been built yet.

This is not a hypothetical. We documented it.

The Pravda network is not just targeting human readers. It is seeding content into the broader information ecosystem: Wikipedia articles, forums, news aggregators. Platforms that carry real epistemic weight and that feed the datasets used to train and fine-tune AI systems.

When someone asks a chatbot about the war in Ukraine, about NATO, about a specific politician, about migration, the answer draws on a corpus that has been, to some degree, deliberately contaminated. The implications are different from traditional information manipulation in a few important ways.

First, the effect is cumulative and slow. A single wave of coordinated content does not rewrite an LLM overnight. But persistent, large-scale operations conducted over months and years do shift the composition of training data in measurable ways. By the time the effect is visible, the cause is already historical.

Second, the feedback loop is invisible to most users. When someone reads a suspicious article, there is at least the possibility of noticing it feels off, checking the source, looking for corroboration. When someone receives a confident, fluent, apparently well-sourced answer from a chatbot, the provenance of that answer is completely opaque. The model does not tell you it was trained on a network of Kremlin-aligned websites.

Third, the attack is hard to attribute. Classic influence operations leave traces: accounts, IP addresses, ad spend, posting timestamps. Poisoning a training corpus leaves almost nothing traceable at the point of harm, which is when the model is queried, potentially years after the poisoning occurred.

We have built reasonable tools for tracking influence operations. We are significantly less equipped for tracking the longer-term epistemic effects of those operations on AI systems but there are some things worth working on.

Dataset auditing, specifically for influence operation content, should be standard practice before any model training. This is technically hard but not impossible. The fingerprints of coordinated inauthentic content are not always obvious at the article level, but they become visible at the network level: timing patterns, cross-site duplication, infrastructure overlap. We can detect this.

Wikipedia, specifically, deserves more attention as an attack surface. It is open, which is both its great strength and its vulnerability. Open means anyone can contribute. It also means coordinated actors can contribute at scale. The Wikimedia community does serious work on this, but they are under-resourced relative to the threat.

And the AI companies themselves need to be more transparent about their training data. Not in a bureaucratic, compliance-box-ticking way. In a genuinely verifiable way. If their model was trained on web content, what web content? What was the date range? What sources were included or excluded? These are not unreasonable questions. They are the minimum you would expect from any research publication.

The approach I believe in

I have spent years doing adversarial research in the open.
Everything published.
Datasets available.
Methodology documented so that anyone can verify or challenge the findings.

That approach matters even more when the subject is AI.

The opacity of AI systems is a structural advantage for those who want to use them for information manipulation. Every layer of inscrutability, every closed dataset, every model trained on data no one can inspect, is a gift to the actors we are trying to track.

Open methods are not just an academic preference. In this field, they are a necessary condition for doing the work at all.

The next few years will determine whether researchers and institutions can keep up with how AI is being weaponised. I am genuinely uncertain about the answer. But I am fairly confident that we will not manage it by doing research behind closed doors.