Offer extended:

Save 50% on a 3-month Digiday+ membership. Ends Dec 12.

SUBSCRIBE

WTF is AI citation tracking?

This article is a WTF explainer, in which we break down media and marketing’s most confusing terms. More from the series →

It’s increasingly important for publishers to understand when, how and why their content is being used as a source inside AI tools and platforms (often without attribution or traffic).

The search landscape has changed exponentially this year, as more people turn to AI tools and platforms like ChatGPT to get information. And though referral traffic from AI platforms is still minuscule, many publishers are starting to track where they are cited in AI-generated responses to users’ queries. From there, they can monitor their brand visibility and have a clearer picture on what’s driving – the albeit small – traffic to their sites from AI platforms.

Publishers, analytics firms and SEO consultants are calling it citation tracking. Here’s how it works.

WTF is citation tracking?

It’s monitoring where, how and why a brand’s site is mentioned as a source in an AI-generated response to a user’s prompt in tools like ChatGPT. 

In ChatGPT and Perplexity, these citations appear as gray bubbles after the information is given. Users can click on those buttons to go to the source website.

An example of sites like Digiday and WARC cited as sources in an AI-generated response in Perplexity.

Citation tracking is a familiar term for PR professionals, well accustomed to monitoring how their clients’ businesses are referenced in SEO rankings. But, AI citation tracking has asserted itself across publishers as they try to get a clearer picture of how they’re being cited in AI search engines. 

The term has gone through a “huge rebirth” this year because of the growth of AI platforms, said Prasanna Dhungel, managing partner at marketing intelligence firm GrowByData.

Sounds pretty straightforward. Why is it necessary for publishers?

It’s not straightforward at all. And that’s all down to how AI search engines gather and deliver information when people search for it, compared with how traditional search engines have done so for decades. 

AI systems use a process called “query fan-out,” conducting multiple search queries around subtopics related to a user’s prompt. Then, those systems combine articles to give the best and most informed answer. So instead of a page of blue links, sorted (by Google usually) in order of quality, the end user gets a more in-depth and comprehensive AI-generated answer, which reduces their need to click through to a publisher’s site. Hence, the problem around site referrals dropping.

Also, it makes it far more complex to trace how the AI systems have picked up a publisher’s content, and which sites they are providing links to, in case the reader wants to see where the information came from.

How is this different from AI visibility?

It’s not different, per se. Visibility in AI tools and platforms is more general, usually referring to mentions in AI-generated responses.

For example, the ChatGPT response below includes the names of several publications – which can be monitored through AI visibility tools. But it doesn’t include links to sources, meaning these publications wouldn’t receive any referral traffic. That (lack of) attribution is what publishers want to monitor, too. If they understand what gets cited and what doesn’t, they can reshape how they prioritize content that does get cited. At least, that’s the theory. 

An example of sites like Digiday and Nieman Lab appearing in an AI-generated response in ChatGPT.

“The superset is brand visibility, and then the subset would be, how are you showing up? Then the citation is the why you’re showing up,” said Dhungel. You also want to make sure your brand is being mentioned positively by measuring brand sentiment, he noted. 

Many tools on the market automate citation tracking, such as those provided by analytics and marketing intelligence firms like Semrush and Similarweb. Some retrieve AI-generated responses through APIs, others through scraping, according to Jes Scholz, a content distribution and SEO marketing consultant. Some use synthetic data, while others use raw prompt data, she added.

“Due to this, you can receive vastly different responses from different tools with the exact same query,” Scholz noted.

That sounds like a problem.

It’s certainly more complex than traditional SEO, which involves monitoring your site’s ranking position on search engine pages for specific and exact words or phrases (keywords) to gauge SEO success. Whereas citation tracking measures how AI systems generate answers and which sources they choose to link, meaning publishers are competing for influence rather than slots in search rankings, Scholz said.

There’s plenty of data on popular search keywords, but far less viability into the prompts that trigger a publisher’s citation in an AI-generated answer, she added.

AI tools respond to open-ended prompts, not fixed keywords, so user inputs can vary dramatically and that makes it much harder for publishers to understand why or when they’re being cited. Therefore, citation tracking involves aggregating users’ prompts to create broader prompts or topics to try and see what type of queries triggered a citation.

Publishers don’t have access to the exact prompt a person typed into an AI search tool.

For example, in Google someone might search “winter running shoes.” But in ChatGPT, they might write “What are some good running shoes for snowy weather in Boston?” They generally are looking for the same thing, which is why aggregation is important in citation tracking.

Why does it matter?

There are three main reasons why this matters for publishers, said Adelle Kehoe, director of product marketing at Similarweb. The first is that a citation is the only way to get traffic from AI platforms.

The second is that it shows a publisher’s influence in a topic area, which can help lift brand awareness and marketing. 

Lastly, it could help drive future monetization opportunities. Publishers could pitch their AI visibility – and being used as a trusted source – as leverage for partnerships or other deals with AI companies, according to Kehoe. 

While no tool can link a user’s AI prompt directly to referral traffic, publishers can now see two things separately: which prompts generate AI responses that cite their sites, and how much traffic those platforms send back. They can piece the data together to get some understanding of how they are linked.

Citation tracking can also help determine what content publishers might want to keep open to AI crawlers to scrape and use in AI-generated summaries – and what they want to block, Kehoe added. Some publishers are trying to get more referral traffic from AI platforms (for more ad impressions), while others are actively trying to block AI crawlers from accessing their content without getting paid, even if it means losing out on the traffic those platforms may drive.

However, most AI queries cite many sources but drive little traffic, noted Semrush president Eugene Levin. He believes publishers should monetize their AI visibility, rather than wait for AI referral traffic to materialize. Publishers can prove their topical influence with citation tracking data to strengthen pitches to advertisers, Scholz said. Publishers can also tout the webpages that rank well in different LLMs and sell placements on those pages, added Levin.

Many citation tracking tools also let publishers see how their AI visibility is stacking up to other top publishers in their competitive set – and where there may be opportunities around any white spaces.

One publishing exec told Digiday – under the condition of anonymity – that they weren’t concerned about tracking AI citations yet, and wouldn’t put resources into monitoring this until they saw more traffic coming from AI tools.

“We’d love to be visible in those [AI platforms] but it’s not mapping on to something that – right now – is business critical,” the exec said. “The evidence that we see right now is just that the traffic back from a citation vs. the traffic back from a blue link… is just so fractional. It’s hard to know how to value it.”

Ultimately, it depends on how much a publisher believes that AI referral traffic will grow – especially as referral traffic from search is on the decline – and the audience development and business implications this will have in the future.

More in Media

As big brands flood the podcast ad space, startups are refining strategies to stand out

While a influx of big advertisers is good news for podcast companies, it also makes it more challenging for small- to mid-sized brands to stand out in the space.

Meta enters AI licensing fray, striking deals with People Inc., USA Today Co. and more

The platform has secured seven multi-year deals with publishers including CNN, Fox News, People Inc., USA Today Co to incorporate their content into its large language model (LLM) Llama. 

European publishers say the Digital Omnibus ‘cookie fix’ leaves them worse off

The European Union’s attempt at a legislative spring clean for Europe’s web of data privacy rules, has landed flat with publishers.