Prices rise for the Digiday Programmatic Marketing Summit after Mar. 24
Cloudflare’s compliant crawler highlights tension – and opportunity – in the emerging AI content market
Cloudflare is pushing to create a market for licensed AI content, offering a compliant crawler aimed at giving publishers more control and reducing inefficient site crawls — even as some question the irony of the company that once blocked scrapers now building one itself.
Last week, the company released a crawl API (its crawl endpoint within Cloudflare’s browser rendering API) that can scrape an entire website with one request. (You give it a URL, it crawls the whole site and returns the content in HTML, Markdown, or structured JSON). The announcement raised eyebrows and left some publishers confused and unsettled by the move to provide its own version of what it has so far protected publishers from.
That sentiment was exacerbated by some publishers noticing how they couldn’t block Cloudflare’s own scraper when they tried to adjust their settings. One independent publisher Thomas Baekdal was particularly vocal, posting on LinkedIn that Cloudflare had “betrayed every single publisher” with this tool.
Cloudflare’s senior director of product, James Smith, acknowledged there were teething issues with certain parts of the tool not quite ready in time, but that these have now been rectified.
“We probably didn’t get this launch right last week and should apologize for that,” Smith told Digiday. “I think our messaging wasn’t very good on it. We should have led with the message that it respects the existing controls, and the intention is to provide a crawler that respects publishers’ wishes.”
He added that Cloudflare operates like a fast-moving startup, and with different teams owning different products, a few launches didn’t line up perfectly. A ripple effect of that speed of development has meant that the publisher user experience means that clients have six different surfaces where they can express their bot blocking preferences, resulting in a user experience that “isn’t where we want it to be,” added Smith. “We’re in the process of fixing that right now and will get that to a place where we want it to be in the coming days.”
Digiday reached out to Baekdal, who retested on Monday, 16 March and was satisfied that the original issue, of not being able to block the Cloudflare crawler, had been resolved.
From protector to intermediary
While early skepticism grabbed attention, the bigger question is what this latest launch reveals about the tension Cloudflare faces in its new role: sitting in the middle between publishers and AI companies while balancing trust, control and monetization.
Cloudflare has built a reputation over the last few years as a guardian of publisher interests, giving them tools to more effectively block the illicit scraping that occurred for years as AI companies used their data for free to train their models.
It has always maintained that its business model (not having an LLM of its own) makes it an ideal neutral intermediary to build tools for both the supply side (aka publishers and creators) and the buy side (AI companies building models).
But this latest backlash is a reminder that being in the middle is tricky when you’re navigating competing interests. “We have a duty to earn the trust of publishers,” said Smith. “We want to make sure that we’re continuing to live up to that promise of trust – that trust the publishers are placing in us with all the work we’ve been doing to protect them against AI crawling.”
That said, not all publishers’ and AI companies’ interests are out of sync. For starters, publishers don’t like getting scraped. And not just for the obvious copyright reasons. It causes some serious strain on sites and servers, which affects page load speeds.
Several publishing execs who requested anonymity to speak candidly told Digiday that their sites have been under great strain from mass crawling, even if they’re blocking bots. One said that page load speed – always a publisher bug bear because it leads to higher bounce rates, which affects user experience and ad revenue – has been badly affected as a direct result of crawling. For that reason, they welcome tools like Cloudflare’s Markdown for Agents and the new Crawl endpoint tool, along with wider industry efforts from the IAB Tech Lab.
“We’re finding some crawlers are really taking serious resources – because they’re querying them [the servers] so often [for RAG purposes], they’re just really overpowering our servers. And they do slow the sites down and slow down our products,” said the exec.
So far, Cloudflare’s middle-ground position has differentiated it from other marketplace products like Microsoft’s and Amazon’s (both of which have their own supply and demand). Publishers want control and monetization, AI companies want easy access, and Cloudflare is trying to satisfy both.
It’s also key for publishers to remember that the bulk of sites on the internet do actually want AI systems to crawl them, noted Paul Bannister, chief strategy officer at Raptive. “If you sell shoes and someone types ‘what are the best shoes to choose from’ into ChatGPT, you want your name to come up – those companies want to be scraped,” he said. To a certain extent, that means publishers have to get real about just how much they can be prioritized when they represent a small sliver of the wider internet, he added.
The tension between access and trust
While Cloudflare emphasizes responsible access and compliance, some industry observers note that the tool’s launch underscores a fundamental tension: intermediaries can create efficiency and monetization opportunities, but they also concentrate control, which may make publishers wary.
For Cloudflare, its intention with the crawler is to attempt to establish best practices for crawlers generally, in the absence of market standards still in development, stressed Smith. “If I’m an AI builder, I may not be thinking about the politics or the economics of media, and the ‘do not crawl’ and all the content signals,” he said. “And if I had to go and build a crawler that could go out and consume information from the internet, I might not get all those things. I might pull a library for a crawler that isn’t compliant with all those types of new protocols that are emerging, and release it out into the internet to consume permission for me.”
Publishers say they don’t want their content to be involved. Most AI companies don’t want a lawsuit,” he stressed. “They don’t want the headache of dealing with that. But there isn’t really an easy way, especially if you’re an early-stage company just getting started, to think about this en masse, and now there is an option for them to do that,” he added.
The idea with the new crawl API is that AI developers can now tap the Cloudflare crawler, which is compliant with publishers’ preferences and with all the emerging best practices. There are a lot of badly behaved crawlers mining sites and ignoring publisher preferences on whether they can scrape or not. Smith believes that more legitimate crawlers will emerge over time because it will lead to higher-quality data and, therefore more robust products getting built.
He added that Cloudflare wants to double down on monetization options for publishers in the wake of its pay-per-crawl tool last year. “I don’t think we’ve taken monetization far enough. I think there are a lot of opportunities to improve that product and other ways to consume information, especially as we move into agents that present monetization options. That’s what my team is generally focused on….we are excited about building a new economy for the internet. And these are parts of the building blocks that get us to that.”
More in Media
Why Parker Thatch transformed its strip-mall storefront into a livestreaming studio
Parker Thatch recently remodeled its store to serve as a hybrid customer-facing retail experience and broadcast studio.
Media Briefing: What to expect at the Digiday Publishing Summit, March 2026 edition
Execs from The Atlantic, Arena Group, Bloomberg, Business Insider, The Guardian, New York Post, People Inc., Washington Post, and more, will share their strategies on everything from zero-click audience strategy, to AI licensing deals and RAG readiness, to how they’re embracing creator strategies to help boost engagement with younger audiences.
Why one creator commerce platform is connecting brands and creators for ‘flat-fee’ campaigns
Quick Collabs is LTK’s latest attempt to make creator marketing campaigns faster and more scalable. for brands and creators alike.