GEO and AEO

The ROI conversation about AI traffic uses the wrong vocabulary

AI systems were never built to route traffic. Measuring AI visibility with SEO-style ROI metrics is asking the wrong architectural question.

The ROI conversation about AI traffic uses the wrong vocabulary

The piece that landed today on Search Engine Journal — Duane Forrester's argument that AI systems were never built to send traffic in the first place — is the one I'd been waiting for somebody to write. Not because it says anything most working consultants don't already suspect, but because it puts the architectural argument on the page in a form you can hand to a client.

The short version: search engines were designed to route. They surface a field of options, the user picks one, the user owns the choice. That entire design exists partly because routing is useful and partly because routing is a liability shield. LLMs were not built that way. They were built to answer the question in place. Citations, when they appear, aren't routing instruments — they're grounding artifacts, or confidence hedges, or both. Whichever read you prefer, none of them describe a system designed to send traffic somewhere else.

Sit with that for a moment, because almost every "GEO measurement" conversation currently happening in the industry assumes the opposite.

The whole vocabulary we've imported from SEO — referral rate, attributed traffic, click-through from an AI answer, citation share as a leading indicator of visits — is built on the assumption that there's a routing mechanism on the other end of the visibility. There isn't. Whatever traffic does come through is a byproduct of an architecture that was designed to resolve the query, not redistribute it.

This isn't a small framing problem. It changes what the metric should even be measuring.

Two systems, two jobs, two completely different ROI questions

A search engine offers options. The user selects. The user's click is the routing event, and you can measure the routing event because it's the literal mechanic of the system. SEO ROI conversations work because the architecture cooperates with the measurement. Impressions, clicks, sessions, conversions — these things sit on top of a system that was built to produce them.

The metric you can measure isn't the metric that matters, and the metric that matters isn't a metric the system is built to produce.

LLMs produce answers. The mechanic isn't a click — it's a synthesis. The system has done its job whether or not you ever appear in a clickstream. Visibility inside an AI answer can deliver real commercial value (brand recall, decision influence, downstream branded search, the thing actually shaping the purchase) without ever generating an event your analytics stack can see.

So when a vendor sells you a "GEO ROI dashboard" measuring AI referral traffic, you should ask what they think they're measuring. Because the architecture they're measuring against was never designed to produce the signal they're selling you a tool to track. They're measuring the trickle of byproducts and presenting it as the main event.

The metric you can measure isn't the metric that matters, and the metric that matters isn't a metric the system is built to produce.

That's the actual ROI problem. Not "we don't have the right tools yet." The shape of the question is wrong.

The grounding-vs-search-indexing split nobody is sitting with

There's a related architectural point that hasn't been properly absorbed. Search indexing and grounding indexing are diverging into different systems with different signals and different goals. Search indexing exists to rank a candidate set against a query for human routing. Grounding indexing exists to retrieve passages a model can use to produce a confident, factually-anchored answer.

Asymmetric radiating lines from a central node showing AI answer synthesis versus search routing

Bing's engineering team has been transparent about this — I wrote about their grounding framework a few weeks ago — and once you internalise that grounding is a separate retrieval system optimised for *passage usefulness to a generation model*, a lot of GEO advice starts looking suspicious.

The signals that matter for grounding aren't necessarily the signals that matter for ranking. They overlap heavily — a well-structured, authoritative, technically clean page is good for both — but they aren't the same thing. A page that ranks well might never get retrieved as a grounding passage if its content density per chunk is wrong, its claims aren't atomic and self-contained, its semantic boundaries don't survive being sliced into 500-token windows. A page that gets retrieved heavily for grounding might never rank well as a destination because grounding doesn't care whether the user wants to land there.

You can be highly visible inside AI answers and barely visible in search. You can be visible in search and invisible inside AI answers. These outcomes don't track each other and the strategy implications are different in each direction.

Most of the dashboards being sold right now flatten this into a single "AI visibility" number, which tells you almost nothing about which system you're winning in or why.

Where the liability surface moved, and why that matters for ROI too

Forrester's piece spends considerable time on the liability angle, and at first read it seems like a tangent from the measurement question. It isn't. It's structurally the same problem viewed from a different side.

The reason search engines could afford to be neutral routers is that the user owned the choice. The platform surfaced ten options, the user picked one, and whatever happened next wasn't the platform's editorial output. That architecture protected the platform legally — Section 230 was structured around exactly this kind of arrangement — and it also created the metric chain we've all been working with for two decades. Impressions, clicks, sessions, conversions are all measurable because the user's decision is observable in the system.

LLMs collapse the decision. The model produces the answer in its own voice. There's no field of options for the user to choose from, so there's no observable decision point, so there's no event to measure, so there's no liability shield to hide behind. The same architectural choice that creates the legal exposure is what removes the measurement chain. You can't have one without the other.

Walters v. OpenAI was dismissed in May 2025 partly on the basis that disclaimers and a sophisticated reader covered the platform's exposure for a general-purpose chatbot. Air Canada was held liable for its support chatbot's false statements about its own bereavement fare policy because customers reasonably relied on it. The line being drawn is around reasonable reliance, and the more authoritative and specialised the AI surface, the harder the disclaimer defence runs.

That matters for ROI because the same authoritativeness that creates legal exposure is what makes the AI answer commercially valuable to brands cited within it. Brands appearing inside trusted, specialised AI answers are getting decision-influence value precisely because users are relying on the answer. The reliance is the value. It's also the liability surface. They're the same surface seen from different angles, and you cannot measure one without engaging with the other.

What this means for the consulting conversation

The honest answer to a client asking "what's the ROI on AI search visibility?" isn't a number. It's a reframing.

The first thing to say is that the system delivering the visibility wasn't built to send traffic, so traffic-based ROI is the wrong question. The second thing is that the value being delivered — brand recall, decision influence, presence at the moment of consideration — is real and economically significant, but it sits in the same category as TV advertising, sponsorship, or PR. You can measure it directionally through brand search lift, branded direct traffic, share of voice in tracked queries, and assisted conversion patterns. You cannot measure it as a clean attribution chain, and pretending you can is dishonest.

The third thing is that this isn't temporary. The vendors selling "AI attribution will mature in 18 months" are wrong about the shape of the problem. The architecture isn't producing the events your attribution stack needs. Better dashboards on top of a system that doesn't emit click decisions won't produce click decisions. The shape of the data is downstream of the shape of the architecture.

Which means the ROI conversation has to move up a level. Not "what did this content earn in attributable revenue this quarter" but "what is our presence inside the systems that increasingly mediate consideration, and is it improving or eroding." That's a brand health question, not a performance marketing question, and most marketing teams don't have the framework or the budget structure to answer it that way.

This is the awkward part of the conversation. Marketing leaders who came up through performance channels have spent fifteen years building organisations that optimise around clean attribution. The thing increasingly shaping their customers' decisions doesn't produce clean attribution and won't, ever. The skill set, the org chart, the budget approval process, and the dashboard suite are all calibrated to a measurement model the new surface doesn't support.

The honest limits of this argument

Two places where I'd want to be careful before pushing the argument harder.

First, AI surfaces do produce some traffic, and that traffic appears to convert disproportionately well when it does arrive. Several practitioners have shared logs showing AI-referred sessions with high engagement and conversion rates relative to organic baselines. The byproduct isn't worthless. It's just not the main thing, and treating it as the main thing distorts strategy. The right framing is probably: measure the byproduct because you can, but don't let the measurable thing crowd out the unmeasurable thing that matters more.

Second, the architecture isn't fixed. OpenAI's self-serve ads manager introduced clickable sponsored placements with conversion tracking inside ChatGPT, which is a deliberate move to bolt routing infrastructure onto a system that didn't natively have it. If sponsored AI placements become a significant share of AI surface real estate, some of the routing-and-measurement architecture comes back, but it comes back as paid media, not organic visibility. The implication is that the only AI visibility with clean attribution will be the visibility you pay for, which is a different commercial conversation than the one most GEO vendors are currently having.

Third, the measurement gap is a real opportunity for the consultants and tools that get this right. Log-file analysis at the edge, server-side citation monitoring, branded search lift modelling — these are genuinely useful and genuinely underdeveloped. I'm sceptical of most current GEO tooling but the underlying work of building proper measurement for this surface is real consulting work that needs doing.

Where this leaves the strategy conversation

The piece of advice I'm giving clients right now is this: stop asking what your AI search ROI is and start asking what your presence inside AI-mediated consideration looks like. Those are different questions with different answers, and only one of them is answerable.

The vendors who win the next two years aren't the ones with the best AI traffic dashboards. They're the ones who can help marketing leaders move their organisations off a measurement model the new surface won't support.

The ROI question assumes a routing architecture. The presence question assumes a synthesis architecture. We're operating in the second, but the conversation in the industry is still being conducted in the vocabulary of the first. The vendors who win the next two years aren't the ones with the best AI traffic dashboards. They're the ones who can help marketing leaders move their organisations off a measurement model the new surface won't support.

That migration is going to be painful, slow, and politically expensive inside large organisations, because it asks performance marketing leaders to admit that the measurement infrastructure they've built their careers on doesn't work for the channel that's eating consideration. Most of them won't admit it. The ones who do are the ones whose brands will still be cited inside AI answers in 2028.

The architecture has already moved. The vocabulary hasn't caught up. That's the gap the next phase of the consulting work lives in.

Ready to get started?

Ready to improve your visibility in AI search?

If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.

Book a discovery call