The tool registry is the new search index
LLM tool registries are an unregulated ad market. Puffery wins, disclosures fail, and almost nobody in SEO is talking about it yet.
A paper went up on arXiv this week that should have made more noise than it did. It's called *Agent-Facing Information Design in LLM Tool Registries*, and the finding is short enough to fit on a Post-it: in the registries that LLM agents use to choose which tools to call, subjective superlatives in the free-text description capture 100% of the optimisation effect. Fabricated claims add nothing on top. And system-prompt warnings — the "disclosure" mechanism the industry has been quietly assuming would handle this — produce zero measurable effect in four out of five models tested.
Translation: the layer that decides which tools an AI agent reaches for, on your behalf, is currently an unregulated advertising market. The puffery works. The disclosures don't. And there is no equivalent of a viewability standard, no quality score, no outcome audit. Nothing.
If you've been paying attention to where search is actually going — not the AI Overviews argument that's been chewed to bone, but the agentic layer underneath it — this is the more important story. The registry is becoming the index. And we're rebuilding 1998 with none of the lessons.
What the registry actually is
When an LLM agent decides to do something — book a flight, query a database, fetch a product list, summarise a document — it doesn't have those capabilities baked in. It selects from a registry of available tools. Each tool has a name, a description written in natural language by the provider, and a schema describing inputs and outputs. The agent reads the descriptions, picks one, calls it.
That description field is the SERP of the agentic web. It's the only surface the agent has to evaluate which tool is most relevant to the task it's trying to complete. And — this is the part that should be making people sit up — it's written entirely by the tool provider, with no normalisation, no review, and no audit trail tying the description to actual tool behaviour.
The arXiv authors ran 17,700 trials across five LLMs and ten domains. They found that "legal puffery" — subjective superlatives, benefit framing, the kind of language that would get a billboard pulled if it appeared on a billboard — moved selection by +0.35 on their scoring metric. Fabricated claims (statements that were factually untrue) added literally zero incremental bias on top of the puffery. The model wasn't being fooled by lies. It was being seduced by adjectives.
And the proposed fix — system-prompt warnings telling the model to ignore marketing language — did nothing for four of the five LLMs tested. The behaviour was structurally below the ceiling at which prompt-level intervention could meaningfully correct it.
Why this is the SEO conversation
The SEO industry has spent two years arguing about AI Overviews, citation share, and whether the click is dead. That argument is real, but it's an argument about the *retrieval layer* — what AI systems pull from the open web to ground an answer. The registry conversation is about the *action layer* — what AI systems do once a user wants more than an answer. Book the table. File the expense. Pull the report. Buy the shoes.
The action layer is where commercial value is going. Mike King has been writing about agentic RAG for months — the shift from single-shot retrieval to multi-stage pipelines where an agent plans, routes between tools, retrieves, reads, retrieves again, and reflects. I wrote about why Mike King's framing is the one the SEO industry should be paying attention to, and this is the missing piece of that argument. The retrieval layer is being scrutinised. The action layer is the unsupervised second half.
If your business depends on being reachable by an AI agent — and increasingly that includes anyone with a booking system, a product catalogue, a quote form, a checkout — your visibility is not just a function of whether your site is indexed and machine-readable. It's a function of whether your tool description outranks the other tool descriptions inside whatever registry the agent is consulting. And right now, the way you win that race is by writing better marketing copy in your tool description than your competitors wrote in theirs.
The registry is the index, the description is the title tag, and the model is the algorithm. Except nobody is even pretending there's a quality team behind it.
The Google parallel is exact
The thing that makes this worth writing about is the historical rhyme. In 1998 there was no quality signal in search. The first search engines ranked by keyword density and meta-tag stuffing because that's what they had. Within five years the entire industry around ranking those signals had become a black-hat economy of doorway pages, link farms, and exact-match domains. Google's whole pitch was that PageRank introduced an external quality signal — the link graph — that couldn't be gamed by editing your own page.
The registry is the index, the description is the title tag, and the model is the algorithm. Except nobody is even pretending there's a quality team behind it.
Twenty-eight years on, tool registries are sitting in the 1998 position. There is no PageRank for tools. There is no external quality signal. There is no behavioural audit that says *"this tool's description claims it can do X, but in production it succeeds at X only 60% of the time."* The selection mechanism is the description, and the description is provider-controlled.
The arXiv authors propose a concrete fix: separate *selection-facing descriptions* (structured, registry-controlled, normalised) from *marketing-facing descriptions* (provider-authored, shown only after the agent has selected the tool). They also propose an Agent Attention Quality Score that distinguishes capability from copywriting. This is a sensible structural intervention. It is also exactly the kind of intervention that nobody currently has the authority to impose, because there is no central registry authority. OpenAI has one. Anthropic has one. Google will have one. Microsoft has one. Each is making different design choices, and none of them are normalising descriptions in the way the paper recommends.
So the most likely outcome — given how every previous version of this story has played out — is that the registries become economic battlegrounds first and get regulated later, after the damage compounds. The window where you can win by writing punchier tool descriptions than your competitors is open right now. And it will close, but it'll close after a lot of value has been redistributed in ways that have nothing to do with which tool is actually better.
What "machine-first" misses
Slobodan Manic published a piece in Search Engine Journal this week about Machine-First Architecture — a four-pillar framework covering Identity, Structure, Content, and Interaction. It's a careful piece and the framework is sound for what it covers. But it stops at the edge of what I'd call the *passive machine-readable web*: making sure agents can identify your brand, parse your structure, evaluate your content, and complete a transaction on your site.

What it doesn't cover — and what almost nothing in the GEO conversation covers — is the *active* layer where agents select tools to call before they ever reach your site. The agent doesn't navigate to your domain and read your schema. It looks at a registry, picks a tool, and the tool returns structured data that the agent uses to construct an answer. By the time your website's machine-readability is being evaluated, the selection decision has already been made upstream.
This is the part of the agentic journey that most consultants don't have language for yet. I've written before about machine-readability being the underrated structural problem, and that argument still stands. But the registry layer is one step upstream of even that. You can have flawless schema, perfect entity resolution, a clean Knowledge Graph presence, and still lose every agentic transaction because the agent never selected the tool that would have pointed at your data.
This is not a content problem. It's not a schema problem. It's a *distribution* problem — and it's playing out in a layer that almost no business currently has visibility into.
What this means for actual businesses
The honest answer is: not much, yet. For most small and mid-sized businesses in the UK, the agentic layer is still mostly theoretical. Your customers are not yet sending agents to book your services, fetch your prices, or initiate refunds on their behalf. The volume isn't there.
But the trajectory is unambiguous. ChatGPT's agentic features, Claude's Computer Use, Google's Gemini agents, Microsoft's Copilot agents — all of them are converging on the same model, where users issue intent and agents resolve it across tools. When that volume crosses some threshold (and the threshold will be different for different sectors — travel, finance, and e-commerce will get there first), the businesses that haven't thought about tool-registry presence will discover they're missing from a search surface they didn't know existed.
A few things that are worth thinking about now, even if you can't act on most of them yet:
Your API documentation is becoming customer-facing. Not in the sense that humans will read it, but in the sense that LLMs will. The phrasing of your endpoint descriptions, the clarity of your parameter naming, the quality of your example responses — these will increasingly determine whether agents reach for your tool over a competitor's. The technical writing function inside businesses is about to matter in ways it hasn't since the early 2010s.
Schema is necessary but no longer sufficient. You still need it. It's still ground-floor machine-readability. But the registry layer sits above schema, and a beautifully marked-up site that has no presence in the major agentic registries is a tree falling in an empty forest for any query that's resolved agentically.
The measurement problem just got harder. I wrote a few weeks ago about how the AI search measurement layer is being built by outsiders — Cloudflare, Microsoft, independent researchers — because Google has no incentive to provide one. The registry layer is worse. There is no Cloudflare for tool-registry visibility. There is no log-file analogue for "your tool was considered but rejected by the agent." The selection happens inside the model, and the model doesn't emit a SERP.
The limits of this argument
A few honest caveats, because the temptation with a piece like this is to overweight the trajectory.
First: the arXiv paper is one study. The result is striking and the methodology is solid (17,700 trials is not nothing), but it's testing a specific class of registry interactions on five LLMs, and the field will move fast. The "puffery works, lies add nothing" finding might not generalise as registries become more structured. It probably will, but it's one paper.
Second: the agentic volume is still mostly future tense. If you're a service business with 30 enquiries a month from organic search, the registry conversation is a 2027 problem, not a 2026 one. Don't restructure your priorities around it yet.
Third: the structural fixes the paper proposes — separating selection-facing from marketing-facing descriptions, introducing an Agent Attention Quality Score — are sensible and will probably be implemented in some form by the major platforms eventually. The window for "puffery wins" is real but it's also finite. If you're tempted to build a business model around it, don't. The history of SEO is the history of arbitrage windows closing on people who treated them as permanent.
What I'm confident about is the structural direction: the action layer of AI is going to look more and more like an unregulated advertising market until something forces it not to. The retrieval-layer conversation about citations, AI Overviews, and zero-click is the *visible* part of the story. The action-layer conversation about tool registries, agent selection, and structured capability description is the part that's going to redistribute commercial value once agentic volume picks up.
Closing
There's a pattern in how new layers of the web get built. First somebody builds the layer. Then operators write whatever they want into it because there's no enforcement. Then the optimisers arrive and start gaming whatever signal the layer happens to use. Then the platforms scramble to introduce quality signals after the gaming has already distorted the layer. Then regulators show up, late and underpowered, and try to retrofit accountability onto a market that's already been shaped.
We are at step two of that cycle for LLM tool registries. The arXiv paper is the first credible attempt to describe what step three is going to look like. Step four is coming whether the platforms invite it or not.
The businesses that will do well in the agentic layer are not the ones with the punchiest tool descriptions. They're the ones who understand that the registry is a distribution surface, not just a developer-facing convenience, and who start thinking about presence in it the way they thought about presence in Google in 2004. Quietly. Early. Before everyone else figures out that's where the traffic was the whole time.
Ready to improve your visibility in AI search?
If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.
Book a discovery call