Citation is not traffic. Don’t confuse the two.
AI citation counts are about to become the new keyword rankings — a real number that mostly misleads. The substitution gradient is what actually matters.
There's a study sitting in the arXiv feed this week that almost nobody is going to read properly. It's from researchers looking at what happens to Reddit communities when Google AI Overviews start citing them, and it found something genuinely interesting: Safe-for-Work communities that get surfaced in AI summaries see daily comments rise by 12% and active commenters by 12.3% versus communities that don't get cited. Then Google rolled out AI Mode — the conversational layer — and those gains largely evaporated.
That's two findings sitting on top of each other, and both matter. Citation creates engagement when the user still has a reason to come back to the source. Add a conversational layer that satisfies the question without the visit, and the lift disappears.
I've watched this industry mistake correlation for strategy for eighteen years. We are about to do it again, at scale, with AI citations.
The piece everyone is currently writing — and it's a fine piece, I've written variants of it myself — is "how to get cited in AI search." Schema this, brand-anchor that, earn mentions in places the models trust, structure your content for retrieval. The Search Engine Journal piece on non-commodity content this week is one version. The Microsoft Clarity AI citation reporting features are another. The Mike King Bing-versus-Google framing is a third. They're all arguing about the same question: how do you get cited?
That's the wrong question to anchor on. The right question is what citation is actually worth, and the honest answer is *it depends on what the interface does next.*
Two scenarios that look identical on a citation dashboard
Imagine two pages. Both get cited 1,000 times this month in AI assistant answers.
Page A is cited inside a long ChatGPT response where the user asks a follow-up, then another, then exports the answer to a Google Doc. The user never visits Page A. They got what they needed from the synthesis. Page A gets the citation impression and zero referral traffic, zero session events, zero direct attribution downstream.
Page B is cited inside a Perplexity answer that includes a hard-to-summarise specific — a calculator, a dataset, an interactive comparison. The user clicks through to use it. Page B gets the citation impression *and* the visit *and* the chance to convert the user.
On every GEO dashboard being built right now, those two outcomes look the same. One citation each. Same score. Same celebratory Slack message.
They are not the same. They are barely the same kind of event.
The Reddit study is telling us something the industry isn't reading
Let me stay on the arXiv paper a second longer, because it deserves more attention than it's getting.
Citation generated traffic only when the interface required the user to leave to get more.
The researchers used a clean natural experiment. Google indexes both SFW and NSFW Reddit communities in organic search, but only references SFW communities in AI Overviews — NSFW is policy-blocked from AIO. So you have a difference-in-differences design where the only meaningful variable between the two cohorts is whether AI Overviews cite them.
When AI Overviews launched, the cited communities grew. Comments up 12%. Active users up 12.3%. The effect concentrated in experience-based threads — opinions, advice, personal accounts — rather than fact-based information. That's the bit that should make you sit up. AI Overviews surfaced Reddit as a place to *go for human texture*, and people went.
Then AI Mode launched. Conversational AI. Multi-turn. The user could ask follow-ups inside the AI surface instead of bouncing out to read the source. The engagement lift to Reddit largely vanished.
Citation generated traffic only when the interface required the user to leave to get more.
That's the finding. That's the whole finding. And it has nothing to do with how good Reddit's structured data is, or whether its content is "AI-optimised," or anything else the GEO industry is currently selling.
It has to do with whether the interface lets you finish the job inside the chat window.
The substitution gradient nobody is mapping
If we're being serious about this, every AI citation sits somewhere on a gradient I'd call the substitution gradient. On one end, citations that the AI fully substitutes for — answers where the model's synthesis is good enough that the user has zero remaining reason to visit the source. On the other end, citations that the AI *can't* substitute for, where the cited page does something the chat interface fundamentally can't reproduce.

Most published content sits at the bad end of that gradient. Most "best X for Y" pages, most definitional content, most encyclopaedic explainers, most product comparison posts. These were designed to answer questions that AI systems can now answer in the chat window without needing the user to click.
Earning a citation for that kind of page is not a win. It's the announcement that you've successfully donated your content to someone else's interface.
The pages that sit at the other end — the ones where citation correlates with actual visit — share specific traits. They have specifics the AI can't reproduce without inventing things: real data, real interactivity, real opinion, real personality, real services. They reward depth. They have something the user needs to *use*, not just *read about.*
Lily Ray's piece this week — and her tracking of 220+ AI-content-vendor case study sites — gestures at this from a different angle. The sites in her dataset that built AI-assisted content factories almost universally produced content that sat at the bad end of the substitution gradient. Templated, evergreen, summarisable. Even when those pages ranked, they were ranking for queries that AI Overviews could now answer in-line. The collapse pattern she's documenting is partly an algorithmic story and partly a substitution story, and you can't fully separate the two.
What this does to the GEO measurement industry
There are, by my count, somewhere north of fifty tools now offering "AI citation tracking" in some form. Microsoft just shipped one inside Clarity. Bing's added AI Performance to Webmaster Tools. Profound, Otterly, Goodie, half a dozen others are racing to be the Ahrefs of AI search.
These tools are useful. They are also dangerous, because they are normalising a metric that doesn't carry the weight people are going to assume it carries.
A citation count without an interface-substitution adjustment is the AI-era equivalent of impression count in organic search. Technically a real number. Roughly directional. Wildly misleading if you treat it as a substitute for traffic, leads, or revenue.
Most "GEO performance dashboards" being sold right now are about to mislead a generation of marketers in exactly the way SEO rank-tracking misled the last one.
The honest version of that dashboard needs three things the current crop doesn't have:
First, substitution scoring per citation. Was the user's question answered fully in the AI response, or was the citation surfacing something the AI couldn't reproduce? Models can do this analysis. Nobody's exposing it as a metric yet.
Second, downstream behaviour modelling — what proportion of citations in this query class historically generate visits, branded searches, or downstream actions? You can approximate this from log data. Almost nobody is.
Third, query-class context. A citation for a navigational query is worth nothing — the user was always going to find you anyway. A citation for a research-phase query in your buyer's journey is worth a lot. A citation for a transactional query is worth almost everything. Citation counts aggregate all three into a single number and obscure exactly the thing you need to know.
The strategic shift this actually implies
If citation is interface-dependent rather than universally valuable, the content strategy that follows is the opposite of what most GEO playbooks recommend.
Most playbooks tell you to make your content more retrievable. More chunkable. More structured. More semantic. Easier for a model to extract a clean answer from.
That advice is correct for the wrong end of the substitution gradient — for the content that's going to get cannibalised anyway, you might as well be cited rather than ignored. But it's actively counterproductive for the content where citation should *not* be substitutional.
For pages that ought to drive visits, the strategy is the inverse: make the page genuinely impossible to summarise without loss. Embed specifics. Build calculators and tools. Include data the model can't pre-train on because you generated it yourself. Argue with personality. Make claims that need your context to land. Don't write to be chunkable — write to be *necessary.*
This is roughly what the better operators have been doing for years and calling it something else. Demand generation. Brand. Editorial voice. Owned IP. The new wrinkle is that it's now defensible against a specific failure mode — citation-without-traffic — that the industry hasn't named yet.
The honest limits
I should be straight about what this argument doesn't cover.
The substitution gradient is real, but it's not the only thing happening. Brand exposure inside AI responses has independent value even without click-through — being mentioned matters for memory, salience, downstream branded search, and trust signals at the moment of decision. The Reddit study doesn't capture any of that, because Reddit doesn't have a sales pipeline.
I'm also probably underweighting agentic behaviour. As AI agents start performing tasks rather than just answering questions — booking, buying, comparing on behalf of users — the calculus changes again. An agent visiting Page B to use the calculator doesn't generate a human session, but it might generate a transaction. The citation-to-visit-to-conversion chain is going to get weirder, not simpler.
And the Reddit finding might not generalise cleanly to commercial content. Reddit threads benefit from a "go see the actual humans" pull that a B2B SaaS product page doesn't have. The mechanism by which AI Mode killed the engagement lift might work differently when the cited source isn't community-driven.
What I'd defend hard is the underlying principle: a citation's value depends on what the user does next, and what the user does next depends on what the interface lets them do without leaving. Any GEO strategy that doesn't model that variable is going to be roughly as useful as keyword rank tracking was after personalised search arrived.
Where to actually look
If you only have time to track one thing as AI search matures, it's not your citation count. It's your *citation-to-visit ratio* in the query classes that matter to your business, segmented by which AI surface is doing the citing.
That number doesn't exist as a clean export from any tool I know of yet. You have to construct it from server logs, referrer data, and citation tracking stitched together. It's awkward. It will be awkward for another twelve to eighteen months at least.
But it's the number that tells you whether being cited is doing anything for you, or whether you're just being mined for free synthesis. And those are very different businesses to be in.
The industry is about to spend two years optimising for citation counts the way it spent ten years optimising for keyword rankings — and find out, again, that the metric on the dashboard wasn't quite the thing that paid the bills. We could skip that detour this time. We probably won't.
Ready to improve your visibility in AI search?
If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.
Book a discovery call