for the Digiday Programmatic Marketing Summit, May 6-8 in Palm Springs.
Media Briefing: Another AI threat emerges for publishers: the third-party scraper
This Media Briefing covers the latest in media trends for Digiday+ members and is distributed over email every Thursday at 10 a.m. ET. More from the series →
This week’s Media Briefing will dig into the growing alarm among publishers, as execs warn behind closed doors that third-party web scrapers are quietly fueling an AI marketplace they’re largely shut out of.
- The AI licensing black market
- TikTok’s head of publisher operations exits
- Tech billionaires are buying media, AP and Lee Enterprises clash over licensing contract, and more
The AI licensing black market
Twice a year, a group of publishing execs meet to discuss the biggest issues they’re facing, brought together by media analyst and consultant Matthew Scott Goldstein. At the most recent meeting, the topic dominating the agenda was third-party web scrapers.
Granted the topic isn’t new. Publishers have spent years blaming — and in some cases suing — AI and tech companies for scraping their content without appropriate attribution or compensation. What’s changed is who’s doing it. Smaller tech companies, many of which have recently pivoted to building web scrapers, are hoovering up content from across the web — including from publishers — and selling it to much larger firms. Publishers see none of that money. None of them have licensing agreements with those vendors, according to Goldstein. It’s a black market for AI content licensing.
And it’s one that will keep growing unless better guardrails are built around AI content licensing marketplaces. Because right now, those marketplaces are where the smaller scrapers go to quietly monetize what they’ve taken, selling it on to larger AI and enterprise companies with no questions asked. For the roughly 40 publishing execs and 20 tech companies at the closed-door meeting, that was the conclusion no one found particularly comfortable.
The biannual event is hosted in collaboration with Rich Caccappolo, vice chairman of media at Daily Mail parent company DMGT, Mark Howard, chief operating officer at Time, and Jon Roberts, chief innovation officer at People Inc.
One publishing exec, who was present at Goldstein’s recent event, echoed that this issue was a growing concern for publishers — at least for the ones who were aware of it.
“Right now I don’t know if most people even know this is happening,” they said, agreeing to speak with Digiday under the condition of anonymity given the meeting was off-the-record.
Another publishing exec who was at the event noted how challenging it was to block these web scrapers, due to the many tactics they use to circumvent bot blocking mechanisms. But the exec also said there was a positive side to all of this.
“This is just proof that there is an exploding need for quality inputs for AI. These companies would not be blossoming so quickly unless there was a massive demand-side need for this,” they said. “But until it exists, people are going to use these services, of course they will… But it’s still theft.”
The point — that there’s money going into getting content, but the majority of it is not going to the rights’ holders — was brought into sharper focus by a recent Goldstein report. It identified 21 vendors doing this, including Firecrawl, Exa, Tavily, Brave, You.com, Perplexity Sonar and Bright Data. Over 70 companies were found to be paying for publisher content from these vendors, including BCG, IBM, Cohere, AWS, Salesforce, Apple, Latham & Watkins, Zoom, PwC, Shopify and Alibaba.
What these companies are doing with the content varies based on what kind of company they are, according to Goldstein. LLMs are using the data for their AI models, finance companies are using it for news and information data and retailers are using it for pricing, for example, he said.
Goldstein told Digiday these customers were identified on the vendors’ own websites. Some of them explicitly advertised their ability to evade cybersecurity detection. Goldstein said he was sharing the report widely with publishing execs this week.
“There’s a huge amount of demand for IP from anyone developing or deploying an AI system, and most of that demand, at the moment, is not satisfied by securing a license… It’s fulfilled predominantly by infringement. And some of that infringement is done on a first-party [or third-party] basis,” said David Buttle, founder of media consultancy DJB Strategies, and former platform strategy chief at the Financial Times. While Buttle was not present at the recent event, he is familiar with people who were and the issues discussed.
Left unchecked, the problem threatens to make an already costly scraping issue considerably worse for publishers. And it’s made all the harder to do that since many of these scrapers do not comply with robots.txt, the mechanism that communicates to web crawlers which URLs they can and can’t access, a TollBit report found. The report, which documented nearly 40 web scrapers, also noted that advanced scraping services were charging up to $22 for 1,000 pages.
“This is a big issue,” said Paul Bannister, chief strategy officer at Raptive. “We don’t currently have great data on how prevalent it is, partly because bots from those companies are pretty nefarious and avoid easy detection. As the frontier model companies get blocked more and more, they are going to resort to using these third parties more often which just increases the cat and mouse game we’re all playing.”
Worse still, TollBit found that some scrapers were able to retrieve full versions of paywalled articles. From Q2 2025 through the end of last year, the rate of AI scraping increased with an average quarterly growth rate of 24.4 percent.
“The irony is all these AI companies, and all these Fortune 500 companies, are paying for the content and data. They’re just not paying the publishers for the content and data,” said Toshit Panigrahi, CEO and co-founder of TollBit.
Editor’s note: This story was updated to reflect that Dow Jones is not a customer of AI web scraping company Diffbot.
What we’ve heard
“Block the shit out of every single one of these companies. Block, block, block, and then block some more… This is the most evidence available that you should be blocking.”
– Matthew Scott Goldstein, a media analyst and consultant, on what publishers should do amid the growing market of third-party web scrapers selling their content.
TikTok’s head of publisher operations exits
Jonathon Oake, the global head of publisher operations and solutions at TikTok owner ByteDance, is leaving after nearly five years at the company, for another role. Oake made the announcement in a LinkedIn post on Tuesday.
Oake led global publisher operations for TikTok, managing teams across Los Angeles, New York City and London from Sydney, Australia, where he is based. According to his LinkedIn, he was responsible for designing and expanding internal tools and systems to support publishers, to help them grow their audiences and revenue from the platform.
Unlike Facebook, which began courting publishers with direct payments, TikTok pulled publishers to the platform with partnerships teams that helped them adapt to short-form, personality-driven video and onboarded them onto ad products like Pulse Premiere.
Heads of social and audience development praised TikTok for building out a publisher partnerships and monetization team at a time when they needed a helping hand, as Meta and X stepped away from them. Publishers could also make money from revenue-sharing tests and grants on the platform, as well as tools to drive traffic or conversions.
But ultimately, TikTok remains a platform for a top-of-funnel audience strategy – the draw for publishers is reach and cultural relevance, rather than scalable revenue, referral traffic or conversions.
At TikTok, Oake was hired by Harish Sarma, who was global head of publisher monetization at TikTok before he left the company in February 2025. Sarma is now vp of business development and partnerships at Yahoo.
Oake joined TikTok’s business development team in 2021, before moving into the publisher operations team in 2024.
Numbers to know
120: The number of Associated Press newsroom union members who received buyout offers, as the AP moves away from newspapers.
4.3 million: The number of Wall Street Journal subscribers, up 30% since 2023.
$400: The amount given to the winner of Business Insider’s new quarterly award for best use of AI within the company.
$2.6 million: The amount Salt Lake Tribune made in digital subscription revenue last year. The publisher will drop its paywall and move to a donation-based membership program next month.
What we’ve covered
Publishers apply AI to streamline tasks and improve audience experience
- AI has moved from the margins to the mainstream of editorial workflows. Publishers have increasingly embedded AI tools into daily functions, especially when it comes to streamlining tasks and improving the audience experience.
- According to a recent Digiday+ Research survey, publishing execs said they are using AI for internal chatbots and AI assistants, voice-to-text translation and multi-media generation.
Read the Digiday+ Research report here.
Ozone’s platform tries to simulate how publisher content appears in AI answers
- Ozone is developing a platform that lets publishers simulate how their content would surface in AI answer engines like ChatGPT, showing how it’s processed and cited so they can optimize structure, improve visibility — and help them set guardrails for future agreements.
- The simulation platform has been running for three months, and eight publishers have opted into the research, according to Bryan Scott, CMO at Ozone.
Read more here.
CNN builds in-house agent infrastructure as it prepares for AI-driven media trading
- CNN is developing an agentic infrastructure as part of a broader roadmap that will see it begin transacting media by the first quarter of 2027.
- CNN expects to have finished scoping agentic protocols for its international commercial arm by the end of Q2, will test to see how they’re interpreted by LLMs in Q3, and in Q4 will look at buyer behavior to see if budgets are being allocated toward agent-to-agent trading experiments internationally.
Read more here.
How a ‘TikTok doctorate’ made 26-year-old Griffin Johnson a venture capitalist
- In February 2019, Griffin Johnson was a junior in college, studying nursing and working in a steel factory. Six years later, he’s co-founder of venture fund Animal Capital, a journey he credits almost entirely to TikTok.
- Johnson says brands would seek his advice on how to translate his audience into product sales in ways that felt organic to creators. “Then we realized, we are not going to answer for free, so we started Animal Capital,” he said.
Read more here.
What we’re reading
A new wave of billionaires — from OpenAI’s Sam Altman to Oracle’s Larry Ellison — is scooping up media assets and platforms, a sign of a renewed focus by powerful people to control distribution and shape narratives in the AI era, according to The Hollywood Reporter.
AP sends warning to Lee Enterprises over potential contract breach
The Associated Press is warning Lee Enterprises that its attempt to exit their news licensing deal violates contract terms, while Lee argues it’s responding to the AP’s move away from newspapers, Axios reported.
Sports Illustrated plans for a comeback
After scandals, layoffs and declining relevance, Sports Illustrated is trying to reinvent itself with a diversified business, spanning events, streaming TV and licensing, the Los Angeles Times reported.
McClatchy journalists revolt over AI tool
Journalists at the McClatchy-owned Sacramento Bee are pushing back against a new AI tool that repurposes their reporting into automated content, with some saying they would withhold their bylines from stories created by the product, according to TheWrap.
InStyle’s revamp turns it into a flagship for People Inc.
InStyle, once a struggling legacy title, has turned into a flagship brand at People Inc. by ditching print and embracing social-first video and influencer-driven content to draw in younger audiences, A Media Operator reported.
More in Media
The Washington Post’s Arc XP adds TollBit to help publishers make money from AI bot traffic
The Washington Post’s Arc XP adds TollBit to help smaller publishers monetize AI bot traffic, offering a path into AI licensing revenue.
Digiday+ Research: Publishers apply AI to streamline tasks and improve audience experience
Publishers increasingly embed AI tools into daily functions, especially streamlining tasks and improving the audience experience.
Ozone’s platform tries to simulate how publisher content appears in AI answers
Ozone’s new simulation platform aims to crack AI’s black box to let publishers model how their content gets surfaced in AI answer engines.