Only nine seats remain

for the Digiday Programmatic Marketing Summit, May 6-8 in Palm Springs.

SECURE YOUR SEAT

In Graphic Detail: New data shows publishers face growing AI bot, third-party scraper activity

A third-party scraper economy is emerging beneath the big AI companies, making it harder than ever for publishers to know who is taking their content, let alone stop them.

“What’s changed is the separation of roles. The entity extracting the data is often not the one using or monetizing it. That abstraction layer is what’s enabling third-party resale markets to scale,” Brent Maynard, senior director of security technology and strategy at content delivery network company Akamai, said. “We’re hearing this directly from publishers. One large publisher described it as: We’re not being scraped by one company anymore, we’re being harvested by an ecosystem.”

Digiday has compiled four graphs revealing how AI web scraping is evolving, and what it means for publishers:

The rise of third-party scrapers selling publisher content

Source: MSG

Publishers have spent years fighting the big AI companies over scraping. The smaller operators — dozens of third-party vendors crawling the web and selling content to enterprise clients — have largely escaped scrutiny. Media analyst Matthew Scott Goldstein’s report on the “scraper economy”, presented at a recent meeting he regularly hosts between publishers and tech companies and media analysts, aims to change that. 

Publishers make no money from these relationships, according to Goldstein. He said this is a $1 billion industry, citing Mordor Intelligence data.

“Organized publisher supply into a licensed content marketplace doesn’t need to create new demand. Every enterprise in this report has demonstrated willingness to pay. The infrastructure exists. The buyers are named, funded, and growing. The only missing piece is organized publisher supply at scale — and the urgency to move before the compliance window closes,” Goldstein wrote in his report, which he shared with Digiday. 

The report identified 21 vendors doing this, including Firecrawl, Exa, Tavily, Brave, You.com, Perplexity Sonar and Bright Data. (TollBit also has a running index on third-party scrapers, identifying nearly 40 vendors.)

Over 70 companies were found to be paying for publisher content from these vendors, including BCG, IBM, Cohere, AWS, Salesforce, Apple, Latham & Watkins, Zoom, PwC, Shopify and Alibaba. 

Publishers are getting hit hard by AI bot activity surge

Source: Akamai

What’s exacerbating the problem is the rise in AI bot activity.

Akamai, which handles more than one-third of all global internet traffic, saw a 300 percent surge in AI bot activity in 2025, according to its recent “Protecting Publishing: Navigating the AI Bot Era” report. 

While most of this targeted the commerce industry (48 percent), media was in second place (13 percent), the report said. Specifically, publishers represented 40 percent of all media-related AI bot activity.

OpenAI, Meta, and ByteDance were the top three culprits between July and December last year. Of those, OpenAI generated the highest volume of AI bot traffic targeting media companies. Publishers accounted for 40 percent of that traffic, per the report. 

Worse still, the methods are getting more sophisticated. Akami found that “AI fetchers” — or bots that grab specific web pages in real time to answer specific user requests on AI assistants like chatbots –accounted for 24 percent of AI bot types in the media industry in the second half of last year. Publishers represented 43 percent of that. Meanwhile, AI training crawlers, which scan and collect large amounts of data from websites to train LLMs, made up 63 percent of the AI bot activity Akamai tracked.

“We’re seeing growth in real-time fetchers that pull content dynamically to answer queries,” Maynard said. “The risk isn’t just that content is taken, it’s that the visit never happens. We’re already seeing publishers connect this to declining referral traffic and changing user behavior… AI bot traffic is persistent and growing, but it doesn’t create value for publishers. It consumes infrastructure and content without contributing to revenue.”

Cybersecurity company Human Security found automated traffic is now growing eight times faster than human traffic, according to its recently published “2026 State of AI Traffic & Cyberthreat Benchmarks” report. AI scraper traffic grew 597 percent from January to December 2025, and AI-driven traffic overall grew 187% in 2025, nearly tripling year over year, per the report.

Plenty of bot activity, but little to show for it

Source: Raptive

Despite the volume of bot activity, publishers are seeing little in return. ChatGPT is driving less than 0.2% of traffic to Raptive’s network of 6,000 independent publishers, according to a recent Raptive guide on AI bot blocking — an issue Digiday has previously reported on. In short, the more AI companies take, the less they send back. 

“As a publisher, or content creator, you don’t get anything from this,” Paul Bannister, chief strategy officer at Raptive, said. “It’s minuscule. It grew fast last year and then it’s flat lined for a while. They’re not sending more traffic out. And until there’s a better quid pro quo here, whether it’s traffic or money or something else, what’s in it for you? What’s in it for any of us?”.

Bot blocking still a challenge

The solution to all of this, according to Goldstein, is easy: “Block the shit out of every single one of these companies. Block, block, block, and then block some more.”

The problem is that blocking is harder than it sounds. Most publishers rely on a single cybersecurity tool or bot detection system. AI companies have dozens of scraping tools to choose from, many designed specifically to circumvent bot blocking mechanisms

Tollbit’s latest “State of the Bots” report underscored the point. It found that about 30 percent of AI bot scrapes violate explicit instructions in robots.txt to “disallow” web crawlers to access their content.

“Robots.txt… is as useful as a chocolate teapot right now,” one publishing exec told Digiday, under the condition of anonymity.

There is some hope. Content delivery networks like Akamai, Fastly and Cloudflare have more sophisticated ways to identify AI bots and block them. That’s helped publishers like People Inc. recently take the route of blocking all bots and only allowing the ones they’ve deemed “permissible” to access their content, Jon Roberts, chief innovation officer at People Inc, said onstage at the Digiday Publishing Summit in Vail, Colorado last month.  Mark Howard, Time’s chief operating officer, also said at the same event that his company was planning on implementing a more aggressive bot-blocking approach. 

“The organizations that get ahead of this will treat AI access as both a security problem and a business model decision. If your content is fueling AI, you should have a say in how it’s accessed, used and monetized,” Maynard said.

The Atlantic has done similar, allowing only bots from companies that it has a commercial or strategic deal with to access its content.

More in Media

Forbes creates wine vertical, commerce shop and membership business as AI squeezes traffic

Forbes is launching a wine-focused vertical, commerce site and membership program to grow consumer revenue and offset declining traffic.

Digiday+ Research: Publishers favor generative AI over predictive AI

Publishers have worked to find the right fit for AI tools, and found that generative AI lends itself better to journalism than predictive AI.

MrBeast is so big, Beast Industries turns down eight-figure brand deals if they aren’t the right fit

Beast Industries CEO Jeff Housenbold tells Digiday how the company is expanding and growing the monolith that is MrBeast.