Publishers are hunting for AI prompt data — now they’re starting to get it from third-party companies
Next year, tracking AI search visibility may get a little easier for publishers.
Thanks to the rise of AI search, publishers’ usual tracking tools won’t cut it. They need a full arsenal of analytics and monitoring tools to know how, when and why they’re being mentioned across platforms and AI products.
Tech companies aren’t willing to share this data with publishers or other brands. So new third-party tools are cropping up to gather and report that data. Those tools are also expanding as they gather and model more prompt data, so that companies can see their AI visibility in Google’s AI Overviews and AI Mode. That’s a big deal. Previously, publishers and brands have been largely flying blind when it came to AI-driven discovery but now they are getting rare visibility into a part of AI search that has largely operated as a black box.
New ways to track AI visibility
There are a bunch of tools available to publishers and brands, for tracking when AI systems mention and cite their content. The most well-known are those provided by analytics and marketing intelligence firms like Profound, Semrush and Similarweb. They also track what prompts triggered those mentions and which AI-searched topics align with their brand.
Those firms are expanding their capabilities to meet the growing data gap publishers have had with AI search. For example, Forbes is creating audience cohorts of the people coming to its site from AI platforms, by tracking the referral traffic and which user prompts led to a Forbes article being cited in an AI answer.
Publishers using Similarweb’s tools previously only tracked citations — or when a brand is mentioned as a source in an AI-generated response to a user’s prompt in tools like ChatGPT — on ChatGPT. The company added Perplexity and AI Overviews and AI Mode to its AI visibility dashboard.
Semrush is developing improved attribution tools to show how many direct visits were influenced by AI platforms, which it plans to release next year.
Publishers can then look at their own first-party data to get a clearer picture of what they’re seeing in AI visibility dashboards. If they check their log files and see traffic coming from an AI platform to certain webpages, they know they are being cited as a source in that AI platform. They can pair that data with the AI visibility tools to check the prompts that led to an AI-generated response citing their site as a source, and have a better understanding of what people are asking that is leading them to their sites, and how frequently they’re being mentioned in the AI platform. That’s the best way to get more accurate information, according to Lily Ray, vp of SEO strategy and research at performance marketing agency Amsive.
Where does the data come from?
That’s the tricky part. AI visibility data isn’t shared directly by AI platforms and tech companies like Google.
“Every data provider is in the same boat. We just don’t have perfect information right now. I think that’s one of the biggest challenges. Even if you invest a ton of money into AI search, your reporting is always going to be questionable right now. The methodology is always going to be questionable,” Ray said.
Ultimately, how these analytics firms collect and model AI visibility is where the secret sauce lies.
Many of them use search APIs that conduct millions of synthetic prompts every day, according to Ray. Other firms buy clickstream data collected from a Chrome extension installed in the browser, she added.
Semrush and Similarweb aggregate users’ prompts to create broad prompts or topic categories, rather than individual prompts. Technically, it’s synthetic data because it’s clustering user prompt data, removing personally identifying information and matching them with specific topics.
AI tools respond to open-ended prompts, not fixed keywords like in traditional SEO, so user inputs can vary dramatically and that makes it much harder for publishers to understand why or when they’re being cited.
“We don’t really go and check any one specific prompt, because there’s no value in that. Everyone asks different questions differently,” said Semrush president Eugene Levin.
Semrush gathers consumer behavior data, and matches the questions people have that would lead them to a topic relevant to a publisher or brand, to a topic or prompt in its database.
“Then we say, now this person is in this audience. We calculate volume essentially by normalizing the panel, and say that many people within our panel seem to be interested in this topic,” Levin said.
So, how reliable is the data?
It’s as reliable as it can be right now.
“[AI] answers are probabilistic, so this is very directional at the moment. We’re not saying for every person in the US, they’re going to receive the exact same answer, but it is how businesses can get the best idea of how they’re ranking or if they’re in the conversation or not,” said Adelle Kehoe, director of product marketing at Similarweb.
Because different analytics firms have varying methodologies for gathering data on AI visibility, tracking prompt data should be considered a “comparison or directional metric,” said Michael King, founder and CEO of a content marketing and SEO agency iPullRank. The value is comparing the volume of different prompts to determine what to prioritize going after, King added.
How can publishers use this data?
Publishers can use this data to learn more about what’s driving people to click a link shown in an AI-generated response to get to their site.
But referral traffic from AI platforms is small, making up just 1% of all web traffic. So even if a company has a very high AI visibility ranking in one of these tools, it doesn’t mean they’re getting lots of clickthroughs to their site, by any means.
There are some other ways AI visibility can be valuable to publishers’ businesses, but it’s all a bit theoretical at the moment. Publishers can prove their topical influence and content surfacing in LLMs to strengthen pitches to advertisers, and potentially give them leverage in content licensing negotiations with AI and tech companies.
“Because it’s so nascent, businesses are exploring what it really means to them, what value they get back from it,” Kehoe said. “We see with a lot of publishers that they block these crawlers because their content is their product, and therefore they don’t want to give away their whole product without getting a guarantee of something back. Right now, the best guarantee is money from OpenAI or someone like that — to be able to say, ‘This is how good my site is. This is how reliable my site is. I’ll let you crawl it but what’s our contract going to be?’”
More in Media
Digiday+ Research: Publishers’ growing focus on video doesn’t translate to social platforms
Major publishers have made recent investments in vertical video, but that shift is not carrying over to social media platforms.
Technology x humanity: A conversation with Dayforce’s Amy Capellanti-Wolf
Capellanti-Wolf shared insight on everything from navigating AI adoption and combating burnout to rethinking talent strategies.
How The Arena Group is rewriting its commercial playbook for the zero-click era
The company is testing AI-powered content recommendation models to keep readers moving through its network of sites and, in doing so, bump up revenue per session – its core performance metric.