Rand Fishkin tested AI rank tracking. The answer matters.
Fishkin's new research with Gumshoe shows AI rank tracking is mostly noise — but visibility share is real. The distinction is where the money goes.
Rand Fishkin published research yesterday with Gumshoe testing whether AI visibility tracking is actually possible, and the answer is more useful than either camp wanted it to be. Rankings — the prompt-by-prompt, position-one-or-position-three flavour that vendors are quietly pricing like rank tracking circa 2014 — are essentially nonsense. Visibility, measured across many prompts and many runs, is surprisingly useful.
That distinction is the whole story. And almost nobody pitching AI tracking software to UK businesses right now is making it.
I've been watching this corner of the industry get loud, expensive, and structurally dishonest for about eighteen months. Fishkin's work is the first serious public attempt to ask whether the thing being sold is the thing that exists. The short version: it isn't, but something adjacent to it is real, and the difference matters enormously for how you spend money.
Rankings are the wrong unit
Ask ChatGPT the same question twice and you'll get different brands, different ordering, and sometimes a different list entirely. That's not a bug in the tracking tools. That's how the systems work. Temperature, retrieval variance, session memory, and the model's own non-determinism mean "position three for [query]" is a sentence that doesn't really refer to anything stable.
The tracking layer is being priced like rank tracking and sold like rank tracking, and the underlying reality doesn't support that framing.
Vendors selling you a dashboard that says you rank fourth in ChatGPT for a given prompt are selling you a screenshot. Run it again tomorrow and the screenshot changes. Run it five times in a row and you'll get five answers. The number on the dashboard is a sample, not a measurement, and most of these tools aren't being honest about how big the sampling problem is.
This is the part of Fishkin's research that confirms what anyone who's used these models for ten minutes already suspected. The tracking layer is being priced like rank tracking and sold like rank tracking, and the underlying reality doesn't support that framing.
The dashboard is a sample, not a measurement.
Visibility, however, is real
Here's where it gets more interesting. If you stop trying to track rank for individual prompts and instead measure share of mention across hundreds or thousands of prompts, repeated multiple times, the noise averages out. You get something that looks a lot like a brand visibility metric — what percentage of relevant queries mention you at all, how often you're mentioned alongside specific competitors, how the surface area of mentions changes over time.
That's a useful thing to know. It's not a ranking. It's a market share signal, and it behaves like one. It moves slowly, it correlates with the things you'd expect it to correlate with (brand strength, citation volume, editorial coverage), and it gives you a directional read on whether the work you're doing is actually changing your presence in AI surfaces.
This is a measurement, not a screenshot. The catch is that it requires running large prompt sets repeatedly, which costs real money in API calls, and the methodology has to be transparent or you're back in screenshot territory.
What the industry is actually selling
Most "AI visibility platforms" being pitched to UK businesses right now are doing the cheap version of this and presenting it as the expensive version. They run a small set of prompts once, hand you a dashboard, and charge enterprise prices for what is essentially a manual ChatGPT session with extra UI.

The honest version of this product is straightforward. You'd want:
- A large, defensible prompt set generated from real intent data, not the marketing team's guesses
- Multiple runs per prompt, with the variance shown to you as variance, not hidden behind an averaged number
- Clear methodology on which models, which versions, which retrieval modes
- Outputs framed as visibility share, not rank
- Cost transparency, because the API calls are the actual product
Almost nobody is selling this. The market is full of tools that look like SEO rank trackers because that's what buyers know how to budget for, even though the underlying signal doesn't support that framing. It's procurement-driven product design, and it's going to embarrass a lot of CMOs in eighteen months when the dashboards turn out to have been measuring noise.
The traffic numbers are also worth a sober look
Fishkin's piece includes a useful reality check on the volume question. Across all AI tools combined, share of visits sits around 2.9%. Search engines sit around 34%. The "ChatGPT will overtake Google in four years" projections being passed around LinkedIn are, to use his phrase, motivated thinking with extrapolated dots.
That doesn't mean AI search doesn't matter. It means the share of attention is much smaller than the share of conversation, and the per-visit value is higher because AI traffic skews toward people much further down the decision funnel. Both things are true at once, and the strategic implication isn't "ignore AI" or "AI is everything" — it's "AI is a small but high-intent surface that's growing, and you should treat it accordingly."
That's a much less exciting framing than what's being sold. It's also the correct one.
What I'd actually do with this
If you're a UK business currently being pitched a five-figure annual contract for AI visibility tracking, the question to ask is methodological, not commercial. How many prompts? How often are they run? How is variance reported? What's the cost per prompt-run, and does that cost make sense given what you'd pay yourself if you ran the API calls directly?
Most vendors won't have good answers, because the product was built to look like rank tracking rather than to do the underlying work properly. That's diagnostic.
For the work itself — the thing that actually moves AI visibility — the answer remains what it's been all year. Brand strength, citation volume, and editorial coverage are the levers. Schema and structured data help machines parse what you've already earned, but they don't ensure anything in a probabilistic system. The measurement question and the doing-the-work question are separate, and conflating them is how budgets get wasted.
Track visibility, not rank. Pay for methodology, not dashboards. And when someone shows you a number that says you're "position three in ChatGPT," ask them to run it again in front of you.
The number that comes back the second time is the actual product.
Ready to improve your visibility in AI search?
If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.
Book a discovery call