Digiday Publishing Summit:

Connect with execs from The New York Times, TIME, Dotdash Meredith and many more

SECURE YOUR SEAT

Jargon buster: The key terms to know on AI bot traffic and monetization

As a Digiday+ member, you were able to access this article early through the Digiday+ Story Preview email. See other exclusives or manage your account.This article was provided as an exclusive preview for Digiday+ members, who were able to access it early. Check out the other features included with Digiday+ to help you stay ahead

As generative AI reshapes how content is surfaced, scraped, summarized and monetized, new vocabulary to describe those processes is cropping up daily in the media industry. 

In strategy decks, licensing talks, research reports and closed-door meetings, publishers are navigating a flood of new terms, including “zero-click traffic,” “RAG agents,” and “Model Content Protocol.”

Here’s a jargon buster to help make sense of the emerging vocabulary of AI-media economics, what these terms mean, and why they matter now. 

Pay-per-crawl

What it sounds like: A toddler fitness class

What it means: An emerging idea (and negotiating tactic) where AI companies would pay publishers every time a bot crawls their site to ingest content. It’s not a lump sum like a licensing deal, but it would charge for every digital bite. So any time an LLM crawls and recrawls a site, it’d pay a pre-agreed amount to the publisher. It remains to be seen whether it scales.

Pay-per-query

What it sounds like: A very expensive Google habit

What it means: This seems to be the favored AI payment concept among some so far. It’s a proposed model where publishers get paid every time their content powers an AI-generated answer. So not when it’s crawled, but when it’s used. (The argument is one crawl could feed multiple query results, which publishers wouldn’t get paid for.)

AI agent

What it sounds like: A robot assistant with its own agenda 

What it means: One of the hot terms of the year, these bots use generative AI technology to complete tasks on behalf of a user. They can act like virtual assistants. It’s a more sophisticated AI bot (sometimes used interchangeably, but “bots” typically perform simpler tasks in a conversational format — often with scripted responses to prompts) that can act independently. They can buy groceries and book restaurant reservations at the behest of a user. According to Google’s definition, bots are “reactive” while AI agents are “proactive.”

AI user agent

What it sounds like: An avatar in a virtual world

What it means: This term is also often used interchangeably with “AI agent,” but AI user agents specifically interact with online content on behalf of a user. For example, a user could ask an AI system, “What is Google Zero?” The system would tell the AI user agent to go retrieve that information, it would crawl sites like Digiday’s to get that information, and then the AI system would generate a response to the user based on that info. Some examples of the top AI user agents include: GPTBot, Google-Extended, ClaudeBot, Anthropic-AI, PerplexityBot, Bytespider and ChatGPT-User.

RAG agent

What it sounds like: A spy who moonlights as a fact checker 

What it means: There are many different types of bots crawling the web for information, and this is one of them. Short for Retrieval Augmented Generation — a technical process for extracting information from content to give to an LLM and improve the quality of its output — these AI agents search the web and retrieve up-to-date, recent information to respond to users’ prompts in AI chatbots like ChatGPT. Generated responses often include citations to original sources of information. The process is sometimes called “inference,” which refers to the time when an AI model answers a question. 

Training data crawler

What it sounds like: A content-hungry robot that never sleeps.

What it means: Unlike a RAG agent, this type of AI bot collects data to train LLMs, such as Meta’s Llama or OpenAI’s GPT. These bots scour the web and download content to build large datasets to teach an LLM how to generate responses to questions.

Some tech companies separate the web crawlers they use to train their AI models from the bots that scrape the web for real-time information, so that publishers can block those bots. But some, like Google, don’t.

For example, Google-Extended is Google’s bot that crawls the internet to train its AI models. This can be blocked with robots.txt (see below). Googlebot, meanwhile, crawls sites for search indexing and to power real-time results in its AI chatbot Gemini and AI Overviews, Google’s AI-generated summaries in its search results. It’s become a real sticking point for publishers, who can’t block the bot and protect their content from being used for training Google’s AI products, without removing themselves from search results.

RAG bot scraping overtook the scraping activity of training bots this year, according to a report by TollBit, a data marketplace for publishers and AI companies. 

“RAG scrapes are going to be far more numerous as AI agents continue to increase, while scrapes for training will be far fewer and potentially more valuable. It is also harder to price content for training, whereas RAG usage is easier to price and can be dynamic because it is constant, recurring access,” said Toshit Panigrahi, TollBit co-founder and CEO.

Model Context Protocol

What it sounds like: A sci-fi rulebook

What it means: This framework, created by AI company Anthropic, is a way to index content and plug it into an LLM so that it can directly pull information from a dataset. If RAG is like a bridge from an AI system to online content, MCP is like an AI system plugged directly into that content’s outlet.

For example, a user could take all of their news subscriptions and plug them into a chatbot so that they can access all of their news content in that interface, rather than going to a publisher’s site to get that information, according to Nicholas Diakopoulos, computational journalism professor at Northwestern University. 

That may sound terrifying to a publisher already seeing onsite traffic decline – but this could be another platform for content distribution, and even monetization, Diakopoulos said.

“As an end user… I can get all of the value of the content that I subscribe to through whatever interface I want. And I think that kind of user-oriented thinking may have some value,” he said. “Is there currently a market for that? Not that I can see, but there could be one, and I would think that news organizations might be at the forefront of inventing that market.”

Robots.txt

What it sounds like: Instructions for well-behaved machines

What it means: This humble text file tells bots which parts of our site they’re allowed to crawl. In theory, it’s like a “keep out” sign for AI scrapers. But in practice, most LLMs either ignore it, misread it, or completely flout it. Some will even create a new crawler under a new name if their original one has been told to stop crawling. Publishers’ bane. 

Google Zero

What it sounds like: A new dystopian movie where no one wins

What it means: The moment when Google is no longer the gateway to the internet and becomes an AI-powered answer engine to users’ queries. This new era is being driven by AI Overviews. It would be the end of a key traffic channel to publishers.

Crawl-to-referral ratio

What it sounds like: A weird math class

What it means: This refers to the ratio of webpage crawls to webpage referrals, by platform. Cloudflare tracks this, with data showing that AI companies’ crawl-to-referral ratio is massive compared to Google’s. (For example, in June 2025, Google crawls websites about 14 times for every referral, compared to OpenAI’s crawl-to-referral ratio of 1,700:1). In other words, bots are scraping publishers’ sites way, way more than they’re referring real people back to those sites.

AI browser

What it sounds like: The browser that knows you better than your best friend 

What it means: Browsers have come a long way since the OG Internet Explorer. These new platforms powered by AI technology don’t just show links to sites and provide a gateway to the web. They can perform tasks, understand natural language, send emails, generate briefs and book meetings. Notable AI browsers include Dia (from Browser Company of New York), Comet (from Perplexity) and OpenAI’s browser, reportedly coming soon.

LLM Content Ingest API

What it sounds like: A smoothie for robots

What it means: The IAB Tech Lab’s initiative to create a market standard for publisher-consented bot scraping, which attributes and compensates publishers when their content is scraped. It supports both pay-per-crawl and pay-per-query.

Red button

What it sounds like: Something from a spy movie

What it means: Cloudflare’s “red button” (launched on July 1) is basically a panic switch for publishers to block AI crawlers. It’s become affectionately known among publishers as the red button.

NLWeb

What it sounds like: A futuristic web built by aliens

What it means: Microsoft’s natural language web protocol that debuted in May. The goal behind the NLWeb project is to make it easier for website owners to add “conversational interfaces” (like chatbots) to their sites. It can also help make websites easier to access by AI agents that support MCP. Microsoft thinks this will be the new version of HTML. It could be the framework for more bot traffic monetization models.

A2A

What it sounds like: A type of battery

What it means: A Google protocol called Agent2Agent, announced in April. It’s aimed at creating a common language to help AI agents that are built on different frameworks to “talk” to each other, and create a more interoperable system. Like NLWeb, it also supports MPC. It was donated to the non-profit Linux Foundation in June.

https://digiday.com/?p=583571

More in Media

YouTube’s AI slop crackdown has creators concerned, marketers cheering

Despite the potential crackdown, both creators and marketers broadly view YouTube’s updated policies as a positive move. They believe it indicates that the platform is paying attention to the ways creators are using AI — and that it’s open to AI tools that don’t result in the propagation of so-called “AI slop” videos. 

Digiday+ Research: Publishers identify the top trends among Gen Z readers

Gen Z makes up a very small percentage of publishers’ readership, but those Gen Z audience members are consuming their news anytime, anywhere.

Epic Games to rework Fortnite IP rollouts following creator backlash

Among both creators and marketers, the consensus is that the lack of licensed “Squid Game” creator experiences on Fortnite since the release of “Squid Game” season three has cost Netflix meaningful marketing value.