By the time you find a fake news outlet, it has usually already done its job.

The Doppelganger network cloned the homepages of Le Monde, Der Spiegel, and Bild. Storm-1516 seeded fabricated videos and cloned media sites across France, Germany, Italy, and Ukraine. What these operations share is infrastructure: domains registered in batches, given SSL certificates, dressed up to look like local news outlets, and then used to flood the information environment with content designed to be indistinguishable from the real thing.

Researchers typically find these domains after the fact. Someone notices a suspicious link. A journalist spots a cloned article. A platform removes a page and the removal itself attracts attention. By then the content has circulated, the narrative has spread, and the domain has either gone dark or already been replaced by a new one.

There is a better moment to look. Several of them, actually. The trick is knowing what signals to watch.

What a heuristic is

A heuristic, in this context, is a testable hypothesis about how influence operation infrastructure behaves.

Operations need domains. Domains need to be registered. They need DNS hosting. They need SSL certificates. They need web hosting. Often they are built on the same CMS, with the same plugins, configured the same way by the same operators. Each of these steps leaves a trace, and each trace is a potential detection signal.

A heuristic is a structured way of asking: given what I know about how this specific operation or class of operations works, what observable patterns should I look for, and how confident should I be when I find them?

The useful thing about this framing is that it separates detection logic from detection infrastructure. You can run the same underlying hypothesis against different data sources: DNS records, certificate logs, WHOIS history, live web pages. And you can chain heuristics together, using one signal to surface candidates and another to confirm attribution.

Operations do not build their own internet infrastructure from scratch. They use hosting providers, CDN services, domain parking platforms. Those choices are observable.

One heuristic I was used to watch is for domains hosted on Megafon CDN nameservers. Megafon is one of Russia's largest telecoms. Finding a freshly registered domain with news-adjacent naming on Megafon CDN infrastructure is a meaningful signal. Not conclusive on its own, but meaningful.

DNS infrastructure heuristics are fast and cheap to run. They require no HTTP requests, no page loading, no JavaScript execution. They work at the moment of domain registration. Their weakness is specificity: infrastructure gets recycled, reused, and shared. Megafon hosts plenty of legitimate Russian traffic. Cloudflare hosts millions of completely unremarkable domains. The DNS signal alone rarely justifies a high-confidence attribution but it surfaces candidates for further investigation.

Once you have a set of candidate domains, naming analysis adds a layer. This is where country-specific knowledge becomes essential.

Take France, which is currently the single largest CopyCop target in Europe: [141 confirmed [.]fr domains identified by Recorded Future](https://www.recordedfuture.com/research/copycop-deepens-its-playbook-with-new-websites-and-targets) in the first half of 2025 alone, plus ongoing Storm-1516 operations running into 2026. The confirmed IOCs make the naming patterns concrete.

l-actualite-provencale[.]fr, sud-ouest-direct[.]fr These are not random combinations. They follow a recognisable French template: article + geographic identifier + news word. L'actualite provençale sounds exactly like the kind of regional daily that exists in Provence. Sud-Ouest Direct sounds like a real-time news feed from the paper that covers Bordeaux and the south-west. Neither outlet exists. Both names are plausible enough that a reader catching a glimpse of the URL would not immediately question it.

The political modifier pattern is equally consistent. Franceencolere[.]fr, Veritecachee[.]fr, La-france-souveraine[.]fr, Partiroyaliste[.]fr. These are not media impersonations; they are synthetic political outlets constructed to look like grassroots movements. The vocabulary is predictable once you have seen enough of it: colere, souveraine, verite, libre, patriote, debout. The truth-framing words (verite, vrai, authentique) appear across operations in multiple countries because they serve the same function everywhere: performing credibility in the domain name itself before anyone has read a word of content.

The candidate impersonation pattern is newer and more targeted. Ensemble-24[.]fr impersonated Macron's coalition during the 2024 snap elections. MacronAvecBournazel[.]fr was registered in February 2026 and used to spread a fabricated claim linking Macron to a Paris mayoral candidate. Storm-1516 appears to have generalised this into a template: MacronAvec[candidate][.]fr, timed to coincide with the March 2026 municipal elections.

For Doppelganger operations, the naming logic is different: clone the outlet name exactly, change the TLD. Leparisien[.]pm, Leparisien[.]cc, Lepoint[.]info. The domains look identical to the real ones until you notice the extension. The TLD list here is instructive: .ltd, .pm, .cc, .cam, .lol, .beauty, .fun, .ink. These extensions were chosen precisely because they are unfamiliar enough to slip past casual reading but technically valid. Lemonde[.]ltd is a functioning website. It is not Le Monde.

The heuristic task is making this machine-detectable. For France in 2026, that means: a vocabulary bank of confirmed outlet names (lemonde, leparisien, sudouest, laprovence, francetv, ...); a set of geographic adjective forms (provençale, marseillaise, bretonne, lyonnaise, ...); a set of political modifier terms drawn from confirmed IOCs; a set of suspicious TLDs correlated with Doppelganger and CopyCop operations; and the structural patterns that appear consistently: prefix-article + topic + geo-qualifier for the regional outlet template, political-noun + modifier for the synthetic movement template, outlet-name + suspicious-TLD for the Doppelganger template.

None of this vocabulary is stable. The 2027 French presidential election is the next major target window. The infrastructure from the 2026 municipal campaign is expected to be repurposed. New candidate names will be added to the impersonation target list. The heuristics have to be maintained as the threat evolves, not configured once and left to run.

Content fingerprinting

This is where it gets genuinely interesting.

Once a candidate domain has been identified through DNS infrastructure or naming signals, you can fetch the page and look at the HTML. And influence operations, it turns out, leave fingerprints.

Some Storm-1516 sites have a recognisable WordPress stack, specific enough to be a high-confidence attribution signal when found together. This is not a named theme or a branded plugin. It is a combination of tools that the operation's operators chose, probably for mundane reasons: performance, SEO rankings, a shared setup template. But that choice became a fingerprint.

The confidence model matters here. A single weak signal gets a low score. The combination of the same components on a recent domain gets a high-confidence attribution and is locked: the system will not re-check it in future runs, because the evidence is strong enough to treat it as settled. If a domain with election-adjacent naming, registered within the last month, on non-mainstream nameservers, is serving a bare "Hello world!" page, something is being prepared. The infrastructure is standing by. The content has not been loaded yet. The operation has not started.

Finding these is finding operations before they have done anything. That is the furthest upstream the detection can go.

Chaining heuristics

None of these signals are definitive in isolation. What makes the methodology robust is chaining them.

A domain surfaces in a DNS parking query because it contains "election" and "Melenchon" and is parked on a suspicious nameserver. That is a weak signal: low confidence, candidate status. You fetch the page. It is a bare install. That strengthens the signal: medium confidence, flagged for monitoring. Three weeks later, the domain moves to active hosting. You fetch it again. It is serving content with specific CMS, plugins and/or theme. High confidence.

This also means the methodology generates a record. Not just a list of attributed domains, but a sequence of signals over time that documents how the attribution was reached. That record matters when you are making claims that will be scrutinised.

Influence operations are infrastructure problems before they are content problems. The content is the visible part. The infrastructure is the part that takes time to build, and that leaves traces in public data whether the operators intend it to or not.