The AI search measurement layer is being built by outsiders
Cloudflare, Microsoft and independent researchers are building the AI search measurement layer. Google isn't. The gap tells you everything.
Three things shipped in the last two weeks that, taken together, describe the actual state of AI search measurement better than anything the SEO industry has produced this year. None of them came from an SEO company.
Cloudflare shipped an Agent Readiness Score — a public scanner at isitagentready.com that grades any URL on sixteen checks across five categories, exposed as an API, exposed as an MCP endpoint that agents themselves can call before deciding how to interact with you. Microsoft shipped AI Performance reporting in Bing Webmaster Tools, with page-level citation activity across Copilot and Bing's AI summaries. iPullRank ran a controlled experiment on Google Personal Intelligence and found that Gmail signals moved brand visibility in AI Mode by 46 percentage points — measurement built by a consultancy because the platform isn't offering it.
Meanwhile, Google's official guidance for the same period told everyone to skip llms.txt, dismiss chunking, treat GEO as SEO, and trust the existing system. The contrast is not subtle.
I think we're watching the measurement layer of AI search get built in public, by infrastructure companies and independent researchers, while the dominant player insists no new measurement is needed. That asymmetry is the story. It tells you who thinks they have something to lose, who thinks they have something to prove, and where the next two years of useful tooling is going to come from. It also has direct implications for anyone making spending decisions on SEO, GEO, or whatever we're calling it this quarter.
This piece is about that gap — what's being measured, who's measuring it, who isn't, and what to do about it before the people selling tools catch up.
The thing that actually shipped this fortnight
Let me be specific, because the headlines have been doing a lot of work and the substance has been doing less.
Cloudflare's Agent Readiness Score is a real piece of infrastructure. Sixteen checks. Five categories — discoverability, navigability, retrievability, actionability, and trust. Public scanner, API, MCP endpoint. The MCP endpoint matters more than people are saying. It means an agent visiting your site can, before deciding what to do, call a third-party tool that audits how legible your site is to agents. The auditor and the audited are sharing a surface. That's new.
Microsoft's AI Performance report in Bing Webmaster Tools shows you which of your pages are being cited inside Copilot and Bing's AI summaries, with frequency data. It's framed explicitly as "an early step toward Generative Engine Optimization tooling." Microsoft is naming the discipline and shipping the dashboards for it.
iPullRank's Personal Intelligence study tested whether Gmail signals could shift AI Mode recommendations. Across 1,922 responses, seeded brands appeared 46 points more often in the Personal Intelligence-connected account than in the control. Brands moved into the top three 23 points more often, into the top ten 42 points more often. Gmail signals beat Photos. The implication is that part of AI Mode's ranking surface is now inside the user's inbox — a surface no publisher can see, audit, or scrape.
And on the other side of the contrast, Google. Four days before I/O, Google's optimisation guide for generative AI in Search told publishers there is no new optimisation discipline, no chunking strategy worth pursuing, no llms.txt to bother with. AI Mode crossed a billion monthly users. The new Search box accepts images, files, videos, and Chrome tabs. Information agents launched in beta. None of it came with publisher-facing measurement tools. The closest thing was an updated optimisation guide that, as Mike King has argued bluntly, reads like a document designed to keep SEOs busy on the wrong work.
Two trillion-dollar companies. Same fortnight. One is shipping measurement infrastructure and naming the new discipline. The other is shipping product surface area and telling everyone the discipline doesn't exist.
That's not coincidence. That's posture.
Who measures what, and why that's the whole story
There's a useful rule for understanding any new technology market: the people building measurement tools tell you what the market actually values.
The company with the most search market share has the least incentive to make AI search measurable, and the companies with less to lose are building the dashboards.
In the early days of paid search, Google built AdWords reporting before anyone else because Google needed advertisers to trust that the auction worked. When social ads exploded, Facebook built attribution windows and conversion APIs because Facebook needed brands to believe spend on social produced outcomes. Measurement is never neutral. It's built by whoever benefits from the buyer being able to see the result.
So look at who is building AI search measurement right now and ask what they're each trying to make legible.
Cloudflare is building agent-legibility scoring. They sell infrastructure that sits between websites and clients. If agents become a meaningful source of traffic — and Cloudflare has the network data to know that they're rising — then Cloudflare wants publishers thinking about agent-readiness the way they think about page speed, because Cloudflare sells the products that fix it. Bot Management. Cache rules. Workers. The scanner is a sales funnel. That's fine. It's also genuinely useful, which is what makes the sales funnel work.
Microsoft is building citation reporting in Webmaster Tools. Bing has roughly four percent of the search market. Copilot is gaining share but is nowhere near ChatGPT or AI Mode in usage. Microsoft has every incentive to make their AI surfaces measurable because publishers will optimise for what they can measure, and publishers optimising for Bing means Bing wins citation diversity over Google. The same logic that made Bing Webmaster Tools more publisher-friendly than Search Console for the last decade is making AI Performance shipping first.
iPullRank is doing primary research because the GEO consulting market needs studies they can cite to clients. King's bet is that consultancies who do real testing build a moat over consultancies who recycle Twitter takes. The Personal Intelligence study is a sales asset. It's also genuinely good research.
Google is building product surface area and asking publishers to trust the system. Google has the most to lose from measurement, because measurement tends to reveal that AI Overviews compress click-through rates and that AI Mode citations correlate with traditional SEO signals — both of which undercut the premium Google can charge for performance media and the urgency Google can manufacture around new optimisation playbooks. Google's incentive is to keep the surface area expanding while keeping the measurement layer thin.
This is not a moral judgement. Google is doing what trillion-dollar incumbents do. But if you're a buyer of SEO or GEO services, this is the part you need to read clearly: the company with the most search market share has the least incentive to make AI search measurable, and the companies with less to lose are building the dashboards.
That's why the tooling is going to come from Cloudflare, Microsoft, and a handful of independent labs and consultancies for at least the next eighteen months. Not from Google. And it's why anyone telling you the existing Search Console reports are sufficient to plan AI search strategy is either not paying attention or selling you the wrong product.
What the new measurement layer actually measures
Strip away the marketing and there are four distinct things now being measured across these tools. They don't overlap as much as you'd think.

Agent-legibility (how a machine reads your site)
This is what the Cloudflare scanner is testing. Robots.txt parses cleanly, sitemap exists and resolves, Link headers point to canonicals, well-known files are in place, structured data validates, content is reachable without JavaScript execution, redirects don't loop, authentication doesn't block crawlers that should be allowed.
It's a technical SEO audit with an agent-shaped wrapper. The checks themselves are not new. Some of them are decades old — RFC 8288 was finalised in 2017 and the underlying concept predates it. What's new is the framing: not "is this site Google-friendly" but "is this site machine-friendly." It's the same underlying property measured against a broader set of consumers.
Citation activity (who actually shows up in answers)
This is what Microsoft's AI Performance dashboard is doing. Page-level data on which of your URLs are being cited in Copilot and Bing AI summaries, with frequency. It's the analogue of impression data for the AI surface — for the first time, a publisher can see which of their pages a generative system has actually pulled into an answer, rather than guessing based on prompt sampling.
Google has no equivalent. AI Mode does not expose citation reporting to publishers. AI Overviews data inside Search Console is aggregated in a way that makes per-page analysis difficult to impossible. If you want to know whether your content is being cited inside Google's AI surfaces, you are either running synthetic prompt audits, paying a vendor to do it for you, or guessing.
Personal-context influence (signals you cannot see)
This is what iPullRank measured. Gmail content, Photos content, and presumably eventually Drive, Calendar, and the other Google surfaces are now inputs into AI Mode's recommendation set. Forty-six points of lift is not a rounding error. It's the largest measurable visibility shift any independent study has surfaced this year.
And it's structurally unobservable from outside. You cannot scrape someone's inbox. You cannot reverse-engineer which retailers are sending which transactional emails to which users. You cannot audit a recommendation surface whose primary input is private user data. The best you can do is what King's team did: build controlled test accounts, seed signals, run prompts, measure lift. That's expensive, slow, and unrepeatable at the scale of an in-house marketing team.
This is the surface I wrote about last week, and it's the part of the measurement stack that I think is genuinely unsolvable with current tooling. You can't audit what you can't see.
Retrieval architecture (what the system is actually doing)
This is what King's piece on agentic RAG is wrestling with. The pipeline is no longer "query → retrieve → generate." It's "query → plan → tool-route → retrieve → read → grade → retrieve again → synthesise." Between five and twenty internal sub-retrievals per user query. The citation set is the output of a multi-stage process, and everything upstream of the final synthesis is a black box.
You cannot measure this from outside the system. You can run prompts, observe which sources got cited, and infer. You can't see the rejected drafts. You can't see the rerankers' scoring. You can't see which sub-queries the planner generated. The only honest position is that the measurement that matters most — why your content was or wasn't selected at each stage — is information the platforms have and publishers don't.
Four pillars. Two are measurable from publisher-side tools (Cloudflare and Bing). Two are partially or fully unmeasurable from outside (personal context and retrieval internals). That's the shape of the actual measurement problem, and it's a useful corrective to the "GEO is just SEO, you don't need new tooling" line.
The agency problem this creates
"You cannot solve a measurement problem with a strategy deck."
Most strategy decks being shown to UK businesses right now are operating on assumptions the data has just contradicted.
Here's how the typical agency conversation goes in mid-2026. The agency tells the client they need a "GEO strategy." The strategy involves content briefs optimised for AI consumption, schema implementation, llms.txt files (despite Google saying don't bother), and a monthly report showing prompt-sampling results across ChatGPT, Perplexity, and AI Mode. The report shows that the client's brand appears in X percent of seeded prompts. Number goes up over time. Invoice gets paid.
That entire model is built on the assumption that prompt sampling is a reasonable proxy for AI search visibility. Six months ago it was the best available proxy. Today, with what we now know about agentic retrieval and Personal Intelligence, it's a proxy that misses most of what matters.
Prompt sampling cannot see:
- The rejected sub-retrievals upstream of the final answer
- The Personal Intelligence layer that's shifting recommendations by tens of percentage points based on private user data
- The agent-legibility failures that prevent your content from being retrieved in the first place
- The citation patterns inside platforms like Bing Copilot where actual publisher-facing data is now available but most agencies aren't pulling it
- The retrieval bias introduced by layout-oriented chunking, which a recent paper from arXiv suggests is now block-level rather than page-level in state-of-the-art systems
What prompt sampling can see is whether your brand name appears in a single-shot response to a single phrasing of a query in a logged-out browser session. That's a real data point. It's also a tiny slice of the actual surface.
If you're paying an agency that's reporting on AI search performance and the report consists entirely of prompt-sampling output, you're paying for a thermometer that measures one square inch of a room. The room might be on fire elsewhere. You can't tell.
What the contrast between Google and Microsoft is signalling
I want to come back to the Google–Bing contrast because I think it's the cleanest signal in the market right now, and I wrote about it last week but it has sharpened considerably since.
Google's posture: GEO is SEO. Don't bother with llms.txt. Don't chunk your content. Don't restructure for AI. Trust the ranking systems. Here's a billion-MAU AI Mode and an information agent product, optimise for them with the same playbook you've been using since 2015.
Bing's posture, in Jordi Ribas's words: "a new optimisation discipline called Generative Engine Optimization is emerging in response." Grounding requirements are explicit. Citation activity is reported. Krishna Madhavan dropped not-yet-announced AI Search reporting features for Bing Webmaster Tools at SEO Week. Microsoft is publishing posts explaining how their index is changing.
The platform with more to lose is denying the discipline exists. The platform with more to gain is naming it and shipping the dashboards.
If you've been in this industry long enough you'll recognise the pattern. It's the same shape as Google's posture on link-building in 2012, when they publicly insisted link-building wasn't really a thing while internally weighting links as a primary ranking factor. It's the same shape as Google's posture on click data in 2019, when they publicly insisted click data wasn't used for rankings, until the antitrust trial revealed Navboost. The public guidance is downstream of the strategic interest, not the engineering reality.
I don't think Google is lying in any actionable sense. I think Google is doing what dominant platforms do, which is shape the public discourse in a direction that protects their margin and their leverage. The honest reading of the same fortnight's news is that AI search is sufficiently new and sufficiently different that the old measurement is incomplete, and that Microsoft has decided the way to gain ground is to be the platform that admits this and gives publishers tools to navigate it.
Where reasonable people might disagree
Let me steelman the other position because it deserves more than dismissal.
If your agency report consists of Search Console data, GA4 data, and prompt-sampling data, you have visibility into roughly half of what you need to make decisions about.
The strongest version of "GEO is just SEO" goes like this: every empirical study of AI citations to date — including the recent Princeton work, the SEMrush analyses, and several agency-published studies — has found that traditional SEO signals correlate strongly with AI citation rates. Domain authority matters. Quality backlinks matter. Useful content matters. Technical hygiene matters. The signals AI systems use to select citations are heavily overlapping with the signals Google uses to rank pages, because the AI systems are largely built on top of search indices and use related quality models.
If that's true — and the evidence is reasonably strong that it largely is — then a publisher doing solid SEO is doing most of what they need to do to be cited in AI surfaces. The marginal value of GEO-specific tactics is small. Most of the work is the same work.
I think that's directionally correct, and I've made the argument myself more than once. The fundamentals haven't changed. The honest GEO playbook is the SEO playbook with structured data and earned media on top.
But the steelman has a gap, and the gap is exactly what this fortnight's news has surfaced. Even if the *inputs* to citation selection overlap heavily with traditional SEO signals, the *measurement* of citation outcomes does not overlap with traditional SEO measurement at all. You cannot use Google Search Console to understand which of your pages are being cited in Bing Copilot. You cannot use rank tracking to understand whether Gmail signals are shifting your visibility in AI Mode. You cannot use prompt sampling to understand the agentic retrieval pipeline that's deciding whether to include you.
So the position I'd actually hold is: the optimisation work is mostly the same, but the measurement work is genuinely new, and the new measurement is being built by people who aren't traditional SEO vendors. Confusing those two things is the mistake. If your agency is selling you "new optimisation" they probably aren't, but if they're selling you "the existing reporting is fine" they almost certainly are wrong about that.
What this means for anyone spending money on this
I don't want to write a tactics list because the situation is moving too quickly and lists go stale within a quarter. But there are three positions that are durable.
Treat the measurement gap as the budget question, not the strategy question. The interesting decision isn't whether to do GEO. The interesting decision is how much of your reporting budget should go to tools and audits that can see into surfaces your existing stack can't. If your agency report consists of Search Console data, GA4 data, and prompt-sampling data, you have visibility into roughly half of what you need to make decisions about. The other half is being built right now by Cloudflare, Microsoft, and a handful of independent researchers. Some of it you can buy. Some of it you can run yourself. Some of it doesn't exist yet and won't for a while.
Use the tools that are publisher-friendly while they're publisher-friendly. Bing Webmaster Tools is going to give you per-page citation data inside Copilot's AI surfaces. That data has no equivalent inside Google's products. Use it. Run your Cloudflare scan, fix the legitimate failures, ignore the categories that don't apply to your site. Build the habit now of looking at multiple platforms' measurement views, because the days of treating Google reporting as the universal reporting layer are ending.
Be sceptical of anyone selling certainty in a market where the platforms themselves are publishing contradictory guidance. I wrote a few weeks back about the confidence gradient between AI builders and AI consultants — the builders are saying "we don't fully know how this works" and the consultants are selling guaranteed outcomes. That gap has widened, not narrowed, this fortnight. If someone is offering you a GEO package with guaranteed citation lift, ask them which of the four measurement pillars above they can actually observe. If the answer is fewer than two, the guarantee isn't worth what they're charging for it.
The honest limits of this argument
I should be clear about what this piece isn't claiming.
I'm not saying Google is wrong about everything. The technical guidance Google publishes is largely accurate at the level of "what makes content rank." I'm saying the strategic framing — that no new measurement is needed — is the part to be sceptical of, because it's the part where Google has the strongest commercial incentive to shape the discourse.
I'm not saying Microsoft is acting from pure motives. Microsoft is shipping publisher-friendly tooling because Microsoft has four percent of the search market and a real interest in publishers paying attention to Bing. The tooling is genuinely useful and the motives are genuinely commercial, and both of those things are true at once.
I'm not saying GEO is a separate discipline from SEO. The optimisation work overlaps heavily. The measurement work doesn't, and that's the distinction that matters operationally.
And I'm not claiming Personal Intelligence is going to remain a primary ranking signal at the scale King's study suggested. It's an opt-in feature. Most users won't connect Gmail. The signal might be diluted across the broader user base. What I am claiming is that the surface exists, the influence is measurable, and the publisher visibility into it is zero — which is itself a structural fact regardless of how widely the feature is adopted.
The thing I'm holding firm on is that the measurement layer of AI search is being built right now, by people who are not the dominant search platform, and that publishers who don't notice this are going to spend the next two years optimising for what they can see while the meaningful action happens in places they can't.
The shape of the next eighteen months
The trajectory from here is reasonably predictable, even if the timing isn't.
Cloudflare will extend the Agent Readiness Score into a paid product. The free scanner is a wedge. The follow-on products — workers that auto-fix the failing checks, dashboards that monitor scores over time, alerts when your score drops — will sit inside Cloudflare's existing customer base. That's not a bad outcome for publishers. It's an extension of the same model that gave the web universal HTTPS through Cloudflare's free SSL.
Microsoft will keep shipping publisher-facing AI tooling because it's the cheapest way to gain ground against Google. Expect Copilot citation data to get richer, expect more granularity in AI Performance reports, expect Microsoft to talk publicly about GEO in ways that frame Bing as the platform that respects publishers.
Google will continue doing what Google does. The product surface area will keep expanding. The measurement layer for publishers will lag, because giving publishers more visibility into AI surfaces would surface uncomfortable truths about click-through rates and citation patterns. Eventually — probably eighteen to twenty-four months out — antitrust pressure or competitive pressure will force more disclosure, and at that point Google will ship the dashboards. By then the independent measurement market will already exist.
The agencies and consultancies that thrive in this period are going to be the ones that stop selling "strategy decks" and start selling "we will look at the things your current reporting can't show you." That's a different business model. It requires real research, real testing, real engineering. It does not scale by hiring junior account managers. It scales by being right about what to measure, which is a slower and less profitable growth curve than the one most agencies are on.
The agencies that don't thrive will keep selling prompt-sampling reports, telling clients their brand mentions are trending up, and quietly losing budget to in-house teams that figure out the rest themselves.
I know which side I want to be on. The interesting work, the work that's going to matter in three years, is the measurement work — not the optimisation work. And the measurement work is being defined right now, this month, by people who are not currently in the room when most UK businesses are choosing their next SEO agency.
That's the gap. That's the opportunity. And that's why this fortnight, more than any other in 2026 so far, has clarified what the next two years are actually going to look like.
Ready to improve your visibility in AI search?
If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.
Book a discovery call