GEO and AEO

The Agent Readiness Score is useful. The number is a trap.

Cloudflare's Agent Readiness Score measures real things. The composite number is structurally misleading — and the industry is about to misuse it.

The Agent Readiness Score is useful. The number is a trap.

Cloudflare shipped something genuinely interesting last week. They built a public scanner at isitagentready.com that tests any website against sixteen checks across five categories and returns a composite score out of 100 with a level designation — Bot-Unaware, Bot-Aware, Agent-Ready, Agent-Native. They exposed it via API. They wired it into Cloudflare Radar. They turned it into an MCP endpoint so any agent can call the scanner as a tool and decide how to interact with a site based on the result.

That last part deserves more attention than it's getting. The measurement layer and the measured surface are now the same surface. Agents can audit you before they visit you. That is a structural change in how website legibility works, and Cloudflare has shipped the first public infrastructure that makes it real.

So I want to be clear about something before I start picking holes: the scanner is a useful tool. It checks real things. The five categories — discoverability, machine-readability, agent access, trust, performance — are coherent. The individual checks are mostly sensible. If you've never thought about agent legibility before, running the scanner on your site will surface things worth fixing.

The problem is the number. The composite score is a single integer between 0 and 100 sitting at the top of a report, and that integer is going to do enormous damage to how this conversation evolves over the next year. Because the number is the thing people will screenshot. The number is what they'll put in pitch decks. The number is what agencies will quote when selling work. And the number, by itself, is structurally misleading in ways that matter.

Let me explain what I mean.

The score collapses across categories that don't apply to you

Mike King's iPullRank piece on Cloudflare's tool — and Slobodan Manic's writeup for Search Engine Journal — both flag the same issue, though Manic puts it most plainly. He ran the scanner on his own site, scored 33 out of 100, and the score collapsed because content-only websites genuinely don't need most of what the scanner checks for.

The number is a sales tool wrapped in a measurement tool.

This is the core problem. The scanner has three modes — All Checks, Content Site, API/Application — but the composite number is still presented as if it's a universal measure of agent-readiness. It isn't. It's a measure of how much of Cloudflare's checklist your site implements, and the checklist is biased toward sites that should expose APIs, machine-readable endpoints, and agent-accessible action surfaces.

If you run a service business with twenty pages of content, you do not need an MCP endpoint. You do not need an Agents.json manifest. You do not need to expose your booking system as a machine-callable API. None of that would make you more discoverable to ChatGPT, Perplexity, or Google's AI Mode tomorrow. Your content needs to be crawlable, structured, fast, and citation-worthy. That's the actual agent-readiness story for the vast majority of UK businesses I work with.

But the scanner will score you 30 or 40 out of 100, and an agency will use that score to sell you twelve months of work you don't need.

The number is a sales tool wrapped in a measurement tool.

I don't think Cloudflare designed it cynically. I think they built it for the world they want to exist — a world where every commercial website is an agent-callable surface, exposing actions through standard manifests, ready to transact with autonomous buyers. That world might arrive. It might even arrive faster than I expect. But it's not the world your dentist, your accountant, or your regional B2B services business lives in this quarter, and selling them on a 30/100 score as if it represents an urgent infrastructure deficit is the same playbook the GEO tools industry has been running for eighteen months.

What the scanner actually gets right

I want to be fair. The scanner checks things that do matter, even for content-only sites.

The discoverability category is straightforward and useful. Robots.txt presence, sitemap.xml availability, Link headers pointing to canonical and alternate resources. None of this is new. All of it still matters. If your site fails these checks, agents — and traditional crawlers — will struggle to enumerate your pages. Fix it.

The machine-readability checks lean into structured data and semantic HTML, which I've been writing about as ground-floor agent legibility for months. Schema markup, clear heading hierarchy, sensible information architecture. These are SEO fundamentals with a new label, which is fine. The label doesn't change the work.

The performance category is the one most underdiscussed in the wider GEO conversation and the one Cloudflare quietly gets most right. Agents have time budgets. Slow sites get abandoned mid-retrieval. If a multi-hop agentic system is doing five to twenty sub-retrievals to answer a query — which it is, per Mike King's recent piece on agentic RAG — your Time to First Byte matters more than it ever did for human users, because the agent doesn't wait politely.

These are real things. The scanner surfaces them. That's useful.

Where it goes wrong

The structural misleadingness comes from three places, and they compound.

The score isn't a strategy. It's an artefact.

First, the composite score weights agent-access infrastructure as if it's universally relevant. Agents.json, MCP endpoints, machine-callable action surfaces — these are real specifications, but they apply to a narrow class of sites that genuinely need to expose programmatic actions. A content publisher doesn't. A consultancy doesn't. A local services business doesn't. Yet failing those checks drags your score down by 20 or 30 points and lands you in "Bot-Aware" purgatory.

Second, the level designations — Bot-Unaware, Bot-Aware, Agent-Ready, Agent-Native — imply a maturity ladder where the obvious goal is to climb it. But the ladder is only meaningful if "Agent-Native" is actually the right state for your business. For most sites, it isn't. The ladder framing creates an artificial urgency that doesn't map to commercial reality.

Third — and this is the one that'll cause the most damage — the scanner gives agencies a single number to point at. I've watched this movie before. Core Web Vitals had the same problem when Google launched them. The metric was useful. The composite score was usable. And then the SEO industry spent eighteen months selling Core Web Vitals audits to businesses that needed almost nothing else fixed, because the score was concrete and quotable and easy to put on an invoice.

The score isn't a strategy. It's an artefact.

The same pattern is coming for Agent Readiness Scores. I'd bet on agency proposals citing them within sixty days.

The deeper problem: agent-readiness is a moving target

There's a more fundamental issue here, and Mike King's recent work on agentic RAG points at it directly. The retrieve-once-then-generate model that defined the first wave of AI search is already obsolete. Modern agentic systems plan, route, retrieve, evaluate, re-retrieve, and reflect. They do five to twenty sub-retrievals per query. They grade their own intermediate results. They decide what to fetch next based on what they've already read.

A single composite bar floating above a stack of underlying signal bars

A static scanner that runs sixteen checks against a homepage and a handful of well-known paths cannot meaningfully measure how your site performs inside that kind of multi-hop pipeline. The scanner is a snapshot of legibility against a checklist. Agent-readiness in practice is dynamic — it's about whether your content gets selected at the chunk level, whether it survives the agent's evaluation step, whether it gets cited in the synthesis stage, whether it gets re-fetched on the next hop.

None of those things are measurable from a single static scan. Cloudflare can't measure them. Nobody can yet — this is the measurement problem I keep coming back to. Citation counts aren't traffic. Static scores aren't selection probability. We're operating partially blind on AI search performance, and a confident-looking integer doesn't change that.

What the scanner can tell you is whether you've cleared the floor. Robots.txt parses. Sitemap exists. Schema validates. Site loads quickly. That floor is necessary. It is nowhere near sufficient.

What the score doesn't capture

The things that actually drive AI citation are mostly invisible to a sixteen-check scan.

Brand recognition. AI systems cite brands they've heard of, and they've heard of brands that get mentioned across the open web — in Reddit threads, YouTube transcripts, news articles, podcast notes, industry analyses. No scanner measures whether ChatGPT has read about you in 4,000 places. But that's the signal that determines whether you get cited when a user asks "who's a good [X] in [Y]?"

Editorial floor. The Graphite study I've seen referenced this week — roughly half of all new web content is now AI-generated, and the share has plateaued for over a year — points at a quality-dilution problem the scanner can't see. You can have a perfect 100/100 readiness score and still produce content that AI systems quietly deprioritise because it's slop. Agent-readiness as a structural property is a separate dimension from content quality, and you need both.

Citation-worthiness. The thing that makes content get pulled into an AI answer isn't just that it's machine-readable. It's that it contains a defensible, attributable, specific claim worth quoting. A scanner cannot evaluate that. Only editorial judgement can.

Personal context. The recent iPullRank study on Google Personal Intelligence found that Gmail signals materially shift which brands appear in AI Mode recommendations — a 46-point lift in brand appearance for seeded signals. That's a ranking factor nobody can audit, optimise, or scan for. The scanner doesn't see it. No scanner can.

The Agent Readiness Score measures a real but narrow slice of what matters. The rest of the surface — brand, citations, editorial quality, personal context — is structurally invisible to it.

The honest limits of my position

I want to be careful here, because I'm pushing back against a tool I genuinely think is useful.

I'm not saying ignore the scanner. Run it. Look at the checks. Fix the obvious things. If your robots.txt is broken, fix it. If your sitemap is missing, add one. If your Time to First Byte is awful, address it. These are good things to do regardless of the agentic future.

I'm also not saying that agent-callable infrastructure will never matter. For some sites — e-commerce, booking platforms, SaaS with public APIs, marketplaces — exposing machine-readable action surfaces probably will become commercially important in the next two to three years. Cloudflare is building for that future and they're not wrong that it's coming.

What I am saying is this: the composite score is going to be misused, the level designations will create artificial urgency, and agencies will quote 30/100 readiness scores at clients who genuinely don't need 80% of the infrastructure the scanner checks for. The number will outrun the nuance, the way every composite metric in this industry does.

If you're running an SEO programme right now, treat the scanner as a diagnostic for specific technical issues, not as a strategy. Look at which checks failed and ask whether they're relevant to your business model. Most of them probably are. Some of them probably aren't. The composite score is the least useful number on the page — what matters is the individual signals underneath it.

The deeper story this week isn't Cloudflare's scanner. It's that the agent-readiness conversation has crossed from concept to measurable infrastructure, and the moment something becomes measurable, the industry's instinct is to optimise for the measurement rather than the underlying property. We did it with PageRank. We did it with Core Web Vitals. We're about to do it with Agent Readiness Scores.

The underlying property — being legible, useful, and citation-worthy to a new class of multi-hop agentic retrieval systems — is genuinely important. Most UK businesses I work with should be thinking about it. Almost none of them should be measuring it with a single integer.

The number is a trap. The work isn't.

Ready to get started?

Ready to improve your visibility in AI search?

If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.

Book a discovery call