GEO and AEO

Mt. Stupid has a pricing page. And it’s been busy.

AI builders publicly admit they don't know how their models work. AI optimisation consultants sell guaranteed outcomes. The gap is the story.

Mt. Stupid has a pricing page. And it’s been busy.

Pedro Dias published a piece this week with one of the cleanest framings I've seen in a year of GEO discourse. He calls it Mt. Stupid with a pricing page. The argument runs like this: the people who actually built these AI systems — Anthropic, DeepMind's interpretability team, Ilya Sutskever — describe their own models with public humility about what they don't understand. The people selling optimisation services for those same systems describe them with the language of certainty. Guarantees. Decimal-point percentages. Frameworks named, ensured, dictated.

The confidence gradient runs the wrong way. That's the whole problem. And it ran in two more directions this week, both worth tracking.

Mike King published a long piece arguing that single-shot RAG is dead and every major AI platform has moved to agentic, multi-hop retrieval. Search Engine Journal flagged that Google Search and Google Chrome — same company — now publish directly contradictory guidance on whether you need an `llms.txt` file. Lily Ray published 220 case studies of AI content programs collapsing.

These aren't separate stories. They're the same story told from three angles. The systems are getting more complex. The official guidance is getting less coherent. And a consulting layer has emerged in the gap that talks about all of it with the confidence of a man explaining roulette to his mates.

The thing the builders are saying

Worth quoting Dias's piece directly because the contrast does the work. Anthropic's interpretability post from 2024: *"We mostly treat AI models as a black box: something goes in and a response comes out, and it's not clear why the model gave that particular response instead of another."* That's the company that makes Claude, writing about Claude.

The model that Anthropic publicly cannot fully account for is being optimised against by people who claim to know exactly what they're doing.

Neel Nanda, who runs mechanistic interpretability at Google DeepMind, told 80,000 Hours in September 2025 that the most ambitious version of his discipline is probably dead. He doesn't see a realistic path to the robust guarantees people want from interpretability research. The person whose literal job is reading AI minds is publicly conceding the project, as originally scoped, won't get there.

Ilya Sutskever at NeurIPS 2024: *"The more it reasons, the more unpredictable it becomes."* Coming from the man whose career is the scaling hypothesis with a face on it, that's not a throwaway line. That's a thesis statement.

Now scroll LinkedIn. A consultant guarantees inclusion in AI Overviews. An agency advertises a 13% citation lift derived from data the agency itself generated about the agency's own prescriptions. A widely-shared post claims that maintaining a 300-character paragraph limit "dictates" how a vector database chunks your content. Someone is selling "78% share of model" as a measurable thing.

The model that Anthropic publicly cannot fully account for is being optimised against by people who claim to know exactly what they're doing.

Either Anthropic is being suspiciously modest, or the consultants are full of it. Pick one.

The official guidance can't agree with itself

Last week Google Search published an updated optimisation guide that essentially said: don't bother with `llms.txt`, don't rewrite content for AI, don't chunk for retrieval, AEO and GEO are still just SEO. The "it's all still SEO" camp on LinkedIn took a victory lap. I wrote about that guidance favourably myself, because most of the dismissals were correct — the GEO tooling industry has been selling cargo-cult tactics on top of standard SEO for two years.

confidence gradient running opposite to expertise across a stratified field

This week, Google Chrome's Lighthouse tool shipped version 13.3, which added an "Agentic Browsing" audit category. That category includes — wait for it — an `llms.txt` audit. It flags sites that don't have one. The Lighthouse docs describe `llms.txt` as helping agents understand site structure faster.

So Google Search says don't bother. Google Chrome says you should. Same company. Same week.

You can construct a defence of this — Search and Chrome are optimising for different things, search visibility is one job, agentic browser readiness is another. Fine. But that defence is exactly the point: there is no single "what Google wants" anymore, because Google itself doesn't have a single answer. The company that defined what "good optimisation" meant for twenty years is now publishing internal contradictions on a two-day cycle.

This isn't a Google failure. It's a sign of where the discipline actually is. Different AI surfaces want different things. The unified Google guidance era is over. Anyone selling you a unified "do these five things to win at AI search" framework is selling you a fiction that doesn't even map to the issuing company's own product surfaces.

The architecture under it has changed too

Mike King's agentic RAG piece is the most important technical write-up of the month, and I say that as someone who pushes back on King's framing routinely. The argument: every major AI search platform — Google AI Mode, ChatGPT Search, Perplexity Pro, Claude with computer use, Gemini Deep Research — has moved past the single-shot retrieval pipeline that defined the first wave of RAG.

If your GEO tool is showing you "citation share" as a clean number on a dashboard, what it's actually showing you is the surface of an iceberg whose shape it cannot describe.

In the old model, a query came in, an embedding model encoded it, a vector index returned the top-k passages, and the model generated an answer. If your content was in the top-k, you had a chance. If it wasn't, you didn't. Citation tracking worked because the retrieval set was the citation set.

In the new model, a single user query triggers somewhere between five and twenty internal sub-retrievals. The agent plans, routes between tools, retrieves, reads, retrieves again, drafts, grades its own draft, and decides whether to go back for more before synthesising the final answer. Retrieval isn't an event anymore. It's a multi-stage pipeline you can only see the end of.

Which has a brutal implication for measurement. You can see whether you ended up cited in the final answer. You cannot see the agent rejecting you at sub-retrieval seven of twelve, or upweighting a different source at the reflection stage. The traditional reverse-engineering playbook — rank checking, citation counting, prompt-by-prompt sampling — only sees the last frame of a film that was being assembled for thirty seconds before it.

If your GEO tool is showing you "citation share" as a clean number on a dashboard, what it's actually showing you is the surface of an iceberg whose shape it cannot describe. And it's charging you a monthly fee to look at the tip.

And the content programs are breaking in public

Lily Ray's piece It Works Until It Doesn't is the third leg of this. She tracked 220+ websites that were publicly cited as customers of AI content platforms — companies whose own case study pages name them as success stories. The pattern she found, across both Ahrefs and Sistrix data, was consistent enough that she felt obliged to write about it: traffic spikes during the AI content rollout, then collapses, usually aligned with a subsequent Google update.

The vendors stop publishing the case studies. The case study pages get quietly removed. The "success" framing was true in month three and false in month eighteen.

Ray is careful about correlation versus causation — the declines could come from many sources, and she's not saying any specific tool caused any specific outcome. But the pattern across 220 sites is the kind of signal you don't get to ignore. The "we 10x'd content output and traffic went up 400%" pitch deck was the snapshot. The deletion of the case study eighteen months later is the truth.

The certainty you were sold was a snapshot taken during the spike.

This is where Dias's Mt. Stupid framing gets teeth. The 13% citation lift, the 2.8x conversion improvement, the guaranteed AI Overview inclusion — these aren't claims about durable performance. They're claims about a window. Optimise the window, harvest the case study, move on before the algorithm catches up.

What I think is actually going on

The GEO consulting market is in an awkward adolescence. The work that genuinely matters — building real authority, earning real citations, producing content with editorial floor, getting your technical hygiene right — is slow, unsexy, and indistinguishable from good SEO. The work that's easy to package and sell — schema markup checklists, paragraph-length rules, AI citation dashboards, `llms.txt` files — is fast, sexy, brandable, and largely cosmetic.

Guess which side of that line has the LinkedIn presence.

The honest version of this work sounds like: *"We don't fully know how these systems weight what they retrieve. We have strong hypotheses about what correlates with citation. We can measure your visibility imperfectly across a few surfaces. We can build content that has a defensible chance of being selected. We can audit your technical foundations. We cannot guarantee anything because the people who built the systems don't guarantee anything."*

That paragraph doesn't fit on a sales deck. So the industry largely doesn't write it.

Instead it writes the four-pillar Technical GEO Framework™ and the 78% share-of-model metric, even though Anthropic's own interpretability team is publicly saying they don't know why their model picks one source over another. The gap between what the model's builders publicly admit and what the model's optimisers publicly claim is the whole story. Everything else is decoration on top of that gap.

The honest limits

There's a version of this argument that goes too far and I don't want to write it. A few things I'd concede:

Some of the GEO-specific tactics being sold do work, at the margin, sometimes. Structured data helps. Clear semantic HTML helps. Being cited in the right places helps. Speed helps. None of these are surprises and all of them are also normal SEO, but it would be wrong to say nothing in the GEO playbook moves anything.

Some agencies and consultants are doing genuinely careful work — running controlled experiments, being honest about variance, not promising outcomes. The Lily Rays and Mike Kings of the world have substantive disagreements with each other but neither is selling cargo-cult certainty. The problem isn't the discipline. It's the marketing layer that's grown on top of it.

And reasonable people can disagree about whether the consulting layer's overconfidence is a temporary feature of an immature market or a permanent feature of selling to small businesses. I think it's temporary — the case studies will keep disappearing, the contradictions in official guidance will keep widening, and the dashboards will keep failing to predict outcomes, until clients learn to ask harder questions. But that's a guess.

What to do with this if you're hiring someone

The single most useful filter, if you're considering hiring an agency or consultant for AI search work in 2026: ask them what they don't know.

Ask them how confident they are that any specific intervention will improve your citation rate, and how they'd measure it. Ask them what they think the failure rate of their interventions is. Ask them what proportion of their tactical recommendations are GEO-specific versus standard SEO that they're rebranding. Ask them what Anthropic and Google DeepMind have publicly said about how their own models work, and how that affects what's possible to predict.

If the answer comes back as confident, decimal-pointed, framework-named certainty — they're selling you a pricing page on Mt. Stupid. If the answer comes back hedged, honest about variance, clear about what's measurable and what isn't, willing to say "we don't know but here's what we'd test" — that's the calibrated valley where the actual work happens.

The work is real. The discipline is real. The opportunity is real. But the only people worth paying for it right now are the ones who'll tell you, in writing, where their confidence runs out.

The builders of these systems are telling you exactly where the limits are. They've been telling you for two years. The only question is whether the consulting layer between you and the model catches up before your case study gets quietly removed from a vendor's website.

Ready to get started?

Ready to improve your visibility in AI search?

If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.

Book a discovery call