Lily Ray just put a number on the AI content collapse
Lily Ray tracked 220+ sites named in AI content vendor case studies. The pattern: climb, plateau, collapse. The press release is never the whole story.
Lily Ray published an analysis yesterday tracking more than 220 sites that were publicly named as customers of AI content platforms — the case studies the vendors themselves put on their websites. She watched what happened after the press release moment. The pattern, in her words: it works, until it doesn't.
This is the piece the industry has been quietly waiting for someone to write. Not a hot take, not a vibes-based warning — a methodical sweep across the customer rosters of more than a dozen AI content tools, cross-referenced against Ahrefs and Sistrix visibility data. The vendors gave her the list. She just kept watching after the case study was filed.
The result is the same shape, repeated across hundreds of sites. A sharp climb. A plateau. Then a fall, usually aligned with a Google update. In some cases the fall takes the site below where it started.
I want to be careful here, because Ray is careful in her piece. She isn't claiming the tools caused the declines. There are always confounding variables — algorithmic updates, on-site changes, brand events, seasonality. What she's describing is a correlation observed across a large, consistent dataset. But the dataset is the vendors' own customer lists. That's the part I keep coming back to.
The case study is the trap
The mechanic is almost too neat to be accidental.
A site signs up for the tool. The tool ships volume — sometimes hundreds of pages a month into a single subfolder. Rankings climb because Google still rewards coverage in the short term, especially in thin competitive niches. The vendor writes up the win. The site is now a logo on a customer-stories page.
Then the next core update lands, or the next helpful-content tightening, or the next quality refresh — Google has shipped several of these in the last 18 months specifically targeting overly-optimised, pattern-generated content — and the subfolder collapses. Ray says the declines often start in the exact area where the AI-assisted content was published, which is what makes the correlation hard to wave away.
The vendor's case study, of course, doesn't get a follow-up post.
Why RAG makes this worse, not better
The instinct among GEO vendors right now is to argue that even if AI-generated content struggles in classic search, it'll be fine in AI search. The volume itself is the moat. More pages, more surface area, more chances to be retrieved.
What's dangerous for SEO is dangerous for AI search. The two are not separate disciplines pretending to be.
This argument falls apart on contact with how retrieval-augmented generation actually works. AI systems aren't crawling the open web in real time and judging content fresh. They're pulling from indices — Google's, Bing's, their own — that are already filtered by the same quality signals that classic search uses. If your subfolder has been demoted by a core update, it's not magically discoverable to ChatGPT either. Mike King's recent work on 499 timeouts and AI eligibility makes a similar point from the infrastructure side: AI systems gate aggressively on signals the SEO industry has historically undervalued. Quality is one of those gates.
What's dangerous for SEO is dangerous for AI search. The two are not separate disciplines pretending to be.
This is the part that should worry the AI content vendors most. Their pitch was always "even if Google deprioritises this, the AI tools will save you." The data is starting to suggest the opposite — that AI tools inherit Google's quality judgements, and then layer their own retrieval gating on top.
The vendor incentive problem
There's a structural issue here that the industry doesn't talk about, and Ray's piece quietly surfaces it: the AI content vendors are the only people who control the data on whether their tools work long-term, and they have no commercial incentive to publish it.

Their case studies cover the launch period — the climb. By the time the fall happens, the case study is months old, the customer has churned or gone quiet, and the vendor has already moved on to the next logo. The customer-stories page is a snapshot of the best moment, indefinitely preserved.
What Ray has done is the closest thing the industry has to an audit trail. She took the snapshots and added time. That's all. And the picture changes.
I'd argue this is what every category of marketing software should be subjected to — third-party longitudinal review of vendor-claimed wins — but for some reason we only ever get the press releases.
What this means if you're commissioning content
The honest read on Ray's analysis, if you're a business owner or marketing lead trying to make a decision this week:
The question isn't *AI or human?* That framing is a dead end. Plenty of high-quality content involves AI in the workflow. The question is whether there's an editorial floor — whether someone with domain knowledge, judgement, and accountability is standing behind every page before it ships. The 220 sites Ray analysed weren't penalised for using AI. They were penalised for using AI without an editorial floor, at volume, into a single subfolder, on a clock.
That's a different problem. And it's the same problem the industry had in 2011 with content farms, and in 2016 with thin affiliate sites, and in 2019 with programmatic SEO. The technology changes. The failure mode does not.
If you're being pitched a tool right now that promises hundreds of pages a month and case studies of dramatic traffic climbs, ask the vendor for a list of customers whose case studies are at least 18 months old. Then look those domains up in Ahrefs yourself. It's not a hard test. It's the test the case study page is designed to prevent you from running.
The honest limit
Ray flags this herself, and I'll flag it too: correlation across 220 sites is suggestive, not conclusive. Some of those declines are doing other things — bad migrations, brand collapses, acquisitions, niches in structural decline. A handful might even be coincidental.
But the pattern is repeating across more than a dozen different tools, across hundreds of sites, with the timing roughly aligned to known Google update windows. At that scale, the burden of proof shifts. It's no longer reasonable for vendors to wave away the pattern as "individual context." If the pattern is this consistent, the question is what the tools have in common — and the answer, mostly, is volume without an editorial floor.
The case studies told us the climb. Ray's work is the first serious attempt to tell us about the fall. The industry should be grateful, take the data seriously, and stop pretending the press release is the whole story.
Ready to improve your visibility in AI search?
If you're an SME in Surrey or London and you want more qualified leads from search — including the growing AI answer layer — let's talk.
Book a discovery call