Media Briefing: ‘Cloudflare is locking the door’: Publishers celebrate victory against AI bot crawlers 

This Media Briefing covers the latest in media trends for Digiday+ members and is distributed over email every Thursday at 10 a.m. ET. More from the series →

This week’s Media Briefing looks into Cloudflare’s new tool that lets publishers block all AI crawlers – at the click of a button – and why publishers are celebrating.

  • An end to publishers’ AI crawler Whack-a-Mole
  • Google ends tests of a feature that previewed recipes, Forbes CEO shares AI strategy, and more.

Cloudflare’s red-button blocker

Publishers everywhere have had reason to celebrate this week as a single Cloudflare toggle gave them a rare, decisive victory in the battle to keep AI bot crawlers off their content.

After years of miserably watching their content get ransacked for free by millions of unidentified AI bot crawlers, publishers were finally thrown a viable lifeline

Dotdash Meredith was among the throngs of publishers celebrating at a Cloudflare party in NYC on Monday night, during which the publisher did what has quickly become known in the media industry as flipping “the red button” at midnight on July 1.

DDM’s chief innovation officer, Jon Roberts, told Digiday the publisher has been working with global infrastructure company Cloudflare on the project since March, and that preventing unauthorized AI crawlers from stealing content is the first step toward (re)building a viable internet economy. The tool effectively lets publishers block all AI crawlers at the click of a single button, but it also offers a pay-per-crawler option to help publishers monetize where they see fit.

For years, tech platforms have more or less understood that information on the internet isn’t free (even as they’ve relied on monetizing publisher content). Google has compensated publishers for their content by driving referral traffic, while Apple has given publishers a share of subscription fees from Apple News+. Facebook, too, has dipped into the content game by partnering with publishers to deliver news, albeit in a highly volatile way. And while it’s been far from a perfect situation, the total wild west of unauthorized bot scraping by LLM crawlers has flouted this economic model at an alarming and unprecedented rate. 

And let’s face it, the honor system of robots.txt — a text file in which publishers state which crawlers are authorized on their site — was never much of a real protection.

“I treat robots.txt a bit like a list of names behind the bar of people you won’t serve,” said Roberts. “Whereas Cloudflare is locking the door.” 

Now, publishers have renewed hope. 

“….Up until now, people have been undervaluing the information that AI consumes,” said Roberts. “But the AI companies have effectively believed that the internet is free, and the internet’s never been free,” he said. “This puts a wall between us and them.”

It’s also no bad thing for the AI companies that have attempted to honor copyright (albeit some may argue a little late in the game) and made licensing deals with publishers, as they too will now have exclusive rights to those publishers’ content, without the army of crawlers going in through the back door. For the rest. It’s game over, for now at least.

The results have been instantaneous. “We’ve blocked millions of AI requests from companies with whom we do not have a license, do have a partnership, who were taking our content for free since yesterday [Monday, June 30] afternoon,” said Roberts. 

An end to publishers’ AI crawler Whack-a-Mole

For months, publishers have battled the onslaught of AI bot crawlers, which have mined publishers’ entire archives for free. The scorched earth they have left behind has shown up for publishers in different guises: spiked server costs, programmatic revenue losses, and a real drain of resources.  

Martin Alderson, co-founder of web performance consultancy, likened the phenomenon this year to denial-of-service attacks. A knock-on effect is surging server costs, he said. “Most crawlers from LLM companies do not load analytics trackers (Meta being an exception) so they do not appear on Google Analytics or similar, and publishers tend to use GA or other analytics stacks as their single source of truth. To get the real impact, you have to look through their CDN [content delivery network] logs, which usually requires a developer to audit, ” he said. 

Alibaba’s LLM crawler has been cited by two ad tech sources who spoke to Digiday, as one causing such havoc.

As former chief revenue officer at Salon, Justin Wohl has been among the few who regularly check its CDN logs, and has been alarmed at the results, particularly in the last few months (he remains an advisor at Salon). “The crawlers show up by user agent and by IP address with blatantly non-human signatures at the CDN level. “That isn’t showing up on the books right away, but it’s coming up in their [publsihers’] IT departments’ periphery — server costs,” said Wohl. 

Over the last few months, partners have notified Wohl, currently vp of strategy at Aditude, that they’ve seen high spikes in invalid traffic on certain sites, and despite it being caused by a “known crawler,” they’ve had to shut off the programmatic monetization. He has been locked in conversations with both the SSPs and the ad tech vendor that monitors fraudulent traffic, so they don’t penalize the site for the bot traffic, which doesn’t get monetized because the crawlers are in and out of the site within a couple of seconds.

“Those crawlers’ behavior is resulting in [publishers’] IVT flags going up …not because they bought shitty traffic from a bot farm, not because they did anything at all, but because they started getting visited aggressively by these crawlers,” said Wohl. 

Wohl said when a major SSP blocks a domain due to an IVT report, that represents a “double-digit percentage loss in demand, pricing pressure, and programmatic potential for a publisher.” But Wohl is encouraged by the Cloudflare move, and a fan of the payment model, which looks more robust than existing options. 

Cloudfare has introduced a pay-per-crawler model, where publishers can now get paid for allowing bots to scrape their content — which gives publishers control over blocking the crawlers, and ultimately the ability to monetize it.

Prior to Cloudflare’s red-button blocker, publishers were forced to play an endless game of Whack-a-Mole with AI crawlers — including obvious AI crawlers and those who mask themselves to get in a side door, stressed Roberts. “Cloudflare sits on top of 20% to 25% of the entire web traffic on Earth — they get to play the game of Whack a Mole once for everyone,” he said. 

The road ahead is far from clear, but for the first time, publishers have real backup in the battle against AI crawlers.

As a CEO of a news organization, who asked to stay anonymous as they’re in negotiations with Cloudflare, said: “What’s not to like? Action at an infrastructure level, a limitation of supply – which is necessary for a functioning marketplace – and granular control.” — Jessica Davies

What we’ve heard

“Vogue needs an experienced leader. I know there’s the urge to elevate a young staffer to give them a chance and see what they do but the Vogue brand is too valuable to play with.”

A former Condé Nast editor on who will become Vogue’s head of editorial content.

Inside BBC’s dynamic paywall strategy

BBC launched its first paywall last week, charging U.S. readers $8.99 a month or $49.99 a year for unlimited access to its content. Its dynamic paywall doesn’t have a set number of articles readers will get for free, and varies based on how likely the reader is to subscribe.

Ben Goldberger, BBC Studios gm and executive director of editorial content, said BBC’s reach in the U.S. made it clear there was an audience willing to pay to support the BBC – even in a competitive market where publishers are aggressively trying to grow their subscription businesses. “What we offer is clearly additive in this market,” Goldberger told Digiday. “A rising tide can lift all boats.”

Goldberger said the dynamic paywall will also help mitigate dampening the scale BBC has achieved in the U.S., and potential increases in churn or bounce rates that can come with blocking readers’ access to online content. The BBC doubled its U.S. newsroom last year and relaunched its site and app for readers outside the U.K., resulting in a significant traffic boost.

“We’re very attuned to all of the potential risks that come with the launch of a dynamic pay model,” Goldberger said. “Our goal is to balance all the different inputs, because it is a dual revenue stream. And we want to ensure that we remain a large, growing site that can really serve our hopefully-growing base of subscribers while continuing to be a premium destination for international advertisers.”

Digiday has heard from a handful of publishing execs that they are experimenting with integrating AI technology into their existing paywalls, to make them smarter. Goldberger said this is something that the BBC is testing, but declined to share any more details. — Sara Guaglione

Numbers to know

83%: The percentage of Americans who have not paid for news in the past year, according to a Pew Research Center study.

49%: The percentage of Americans that said they believed at least one of the top false news claims online in June, according to a NewsGuard study.

3: The number of weeks Google says its second core update –  changes it makes to its search algorithms and systems – will take to complete this year

10%: The year-over-year revenue growth LBG Media – which owns properties like Ladbible and Betches – expects this year.

What we’ve covered

Future is using its own AI engine to turn deeper engagement into ad dollars 

  • Future is betting on AI to boost recirculation – and make that stickier audience more appealing to advertisers.
  • The publisher’s new proprietary AI-powered content categorization engine, called Advisor, acts like a “brain” trained on Future’s internal data and helped triple onsite CTRs.

Read more about the AI engine here.

The biggest misconceptions about AI content scraping

  • AI bots scraping publishers’ sites for real-time information are now scraping publishers’ sites more than the bots used to train large language models. And they’re harder to detect.
  • But scraping for AI training and scraping for real-time outputs present different challenges — and some opportunities — for publishers. And not all of them are fully understood. 

Read more about some of the misconceptions here.

In the age of AI Overviews, Tripadvisor wants to be the destination

  • Tripadvisor is adjusting its strategy as the foundation of search starts to shift. 
  • Delivering on that vision comes down to three things: improving the app experience, refreshing the membership program and moving Tripadvisor higher up the funnel – turning the service into a starting point, not just a step along the way.

Read more about Tripadvisor’s strategy here.

WTF is behind the explosion of faceless creators? 

  • Faceless creators, also known as user-generated content (UGC) creators, are people who make content and share it on social media without showing their faces.
  • Faceless creators are taking over social media, and now are coming for brands’ influencer marketing dollars.

Read more about why advertisers are working with them here.

What CMA’s crackdown on Google really means for publishers

  • The U.K.’s Competition Markets Authority is taking up publishers’ fight against Google’s search grip. 
  • To regulate Google, the CMA proposed to designate it as a “strategic market status” under the new Digital Markets Competition Regime.

Read more about what this means for publishers here.

What we’re reading

Google ends tests of a feature that previewed recipes

Google has stopped testing a feature that previewed recipe content in its search results, which creators and food bloggers were concerned would decimate their site traffic, Bloomberg reported.

Forbes CEO shares AI strategy

In a memo to staff shared by Talking Biz News, Forbes CEO Sherry Phillips outlined her strategy to use AI technology for revenue opportunities, audience experiences – and how AI is impacting traffic and business models.

Substack co-founder on partnering with legacy media

Substack co-founder Hamish McKenzie said Substack is increasingly working with legacy media – which once saw the platform as a threat to their businesses – amid the newsletter boom, Mediaite reported. 

BBC is testing two AI news production tools

The BBC is testing two AI tools for news production, to reformat reports and create bullet-point summaries of longer articles.

The Washington Post tests feature that allows sources to annotate articles

The Washington Post is testing a feature that allows sources to annotate the articles they’re mentioned in by name, with the goal to keep readers on its site for longer, The New York Times reported.

https://digiday.com/?p=582394

More in Media

Vogue faces new headwinds as Anna Wintour — who agency execs say made ad dollars flow — shifts focus

Anna Wintour’s successor at Vogue will have to overcome the myriad of challenges facing fashion media and the digital publishing ecosystem.

Here are the biggest misconceptions about AI content scraping

An increase in bots scraping content from publishers’ sites represents a huge threat to their businesses. But scraping for AI training and scraping for real-time outputs present different challenges and opportunities.

How Future is using its own AI engine to turn deeper engagement into ad dollars 

Future is betting on AI to boost recirculation – and make that stickier audience more appealing to advertisers.