New Adalytics report raises new questions about use of AI systems for brand safety
A new report from Adalytics has advertisers and brand-safety experts asking new questions about the effectiveness of AI systems.
The latest report from the digital watchdog, released today, claims to have found hundreds of brands appearing next to unsafe content on user-generated content (UGC) websites such as Fandom wiki pages and other websites. Many of the ads have code for using brand-safety tech from providers such as DoubleVerify and Integral Ad Science. According to a draft of the report reviewed by Digiday, the research idea came after a multi-national brand’s global head of media asked Adalytics to review its ad placements for brand safety.
As major brand-safety companies rely more on AI to root out or prevent brand safety violations, advertisers, agencies and other brand safety experts say the report raises new questions around whether or not the tech is working as it should. It also raises new concerns around its effectiveness and whether there is a need for more transparency around how AI systems work.
According to more than a dozen sources, including current customers of IAS and DoubleVerify who reviewed the report, the sense is that those firms seem to be over-promising and under-delivering. Examples in the report don’t align with their brands’ standards for brand safety, several sources said. Even if the pages are niche or deliver little traffic, the sources suggest it’s a symptom of broader problems.
Advertisers already speculated AI systems haven’t been as effective as they’ve been portrayed, but they were still surprised to see even easily identifiable targets — such as various uses of racial slurs and sexual references in headlines and URLs — slip through the cracks.
Brand and agency sources said the pre-bid tech tools noted in the report were pitched as capable of providing real-time analysis of page-level content. One source who described the findings as “pretty damning” said they were under the impression that wrap tags for post-bid blocking were meant to keep brands safe.
Sources were also puzzled by inconsistencies in how brand safety firms’ AI tools categorized web pages for risks. The Adalytics report showed wiki pages with racist, violent or sexual content labeled as low risk, while pages from The Washington Post and Reuters were marked as medium or high risk despite lacking such content.
Most of the sources reached for this story had the same question: Is the tech not working as it should, or has it been pitched as something better than the systems are currently capable of offering?
The report seems to indicate AI tools don’t help with brand safety as well as companies describe, sources told Digiday. One noted it works well enough to “give people just enough ick reduction.” However, the findings now have them wondering if the expenditure, which can range in the millions per brand, is even worth spending.
“Brand safety is a joke, and the only people not in on the joke are the brands paying for it,” said one source. “We are a delusional industry because we think we can build tools fast enough to fix our problems.”
Jay Friedman, CEO of the Goodway Group, said the volume and severity of examples suggests brand safety tech isn’t protecting brands enough to be worth the time and money they cost. Like other agencies, brands and tech experts, he said more transparency is needed to help everyone understand the problem and find a better solution. That includes more comprehensive reporting on every facet of a campaign so everyone can make decisions from the same data.
“The old argument of, ‘We can’t tell you how it works because then the bad guys would know, too,’ is likely no longer a valid argument,” Friedman said. “These vendors charge advertisers billions of dollars per year and owe it to those paying customers to provide technology that works with transparency into how it works.”
IAS declined to comment until after it has seen the full report.
DoubleVerify issued a statement accusing Adalytics of seeking out its results by selectively searching for problematic terms without proper context. It also said the report omits crucial information regarding pre-bid and post-bid avoidance settings and incorrectly correlates code with advertiser actions. DoubleVerify also said the report doesn’t distinguish between DV’s publisher and advertiser services, which it claims contributed to these inaccuracies. However, outside sources told Digiday that the report accurately reflects both publisher and advertiser tags.
DoubleVerify also said advertisers can choose to run publisher campaigns based on exception lists, which would override content avoidance categories. However, one brand safety expert wondered if advertisers are aware how these settings work.
“This speaks to a broader issue: the results in this report are entirely manufactured, from the omission of client campaign setup information to the methodology itself, where the researcher arbitrarily searched for racist terms,” according to DoubleVerify’s statement. “The outcomes are manipulated, and lack any organic validity or scale. To be clear, no customer has expressed concerns about the accuracy of our content categories. DV’s own preliminary analysis confirms that all content shared with us was classified accurately for customers and partners.”
When asked for a statement, a Fandom spokesperson said the company hadn’t yet seen the full Adalytics’ report. However, the spokesperson said the few screenshot examples they have seen “identify an industry-wide problem,” but also noted examples included content from “old, extremely low trafficked wikis, which is why it was not flagged via our current moderation systems or Google’s Ad Server, both of which monitor our active wikis.”
One of the largest internet sites with 50 million pages of user-generated content, Fandom said less than 0.08% of content has been flagged as inappropriate, which it said mostly includes false flags. It also employs multiple security measures, including three industry vendors and a “robust internal trust & safety team which manually verifies and removes all additional flagged content as quickly as possible.”
“We do not condone the posting of inappropriate and racially insensitive material anywhere on our UGC platform — this content is not allowed per our guidelines and it won’t be tolerated,” according to Fandom’s statement. “Ensuring brand safety on our platform is of the utmost importance to us and we take matters like this very seriously. … Despite the fact that these narrow examples were not a problem at scale, we have added additional safety measures to proactively turn off ads on low trafficked wikis that don’t trigger flag thresholds.”
Advertisers say the issues with UCG pages warrant providing advertisers with more granular reporting so they can check URL-level pages rather than merely receive domain-level reports. One brand-side source said it’s worth considering changes to the economic model of brand safety. Instead of paying for using the pre-bid tech, they suggested maybe paying every time something is blocked, but acknowledged that might also set up other bad precedents.
Sources said they haven’t received answers about the accuracy of the brand safety AI systems. They’ve asked about sample sizes of page-level analysis to better understand if it’s based on a small scale or the full internet. However, they say they haven’t received adequate answers.
“Is it 99% of [a small number] of the websites it’s picking up and looking at, or 99% of the 4 billion other websites they’re looking at?” said one source at a large brand advertiser. “If it’s the former, then 99% isn’t the true number.”
Constantly crawling the web is a complicated process, noted Joseph Turow, a professor of Media Systems & Industries at the University of Pennsylvania’s Annenberg School for Communication. Turow and other academics wondered how often bots are continually crawling and surfacing issues or looking at pages before ad bids. If companies can do contextual targeting with AI, they should be also able to do better contextual brand safety in real time. However, while blocking curse words isn’t hard on its own, he said the speed required to analyze each page within nanoseconds before ad bids is another level of challenge.
“I wonder whether it’s a fool’s errand,” Turow said. “It’s horribly obnoxious, but I wonder if it’s really possible to do this in a way that’s 100% working.”
More in Media Buying
Media Buying Briefing: How incrementality testing helps tackle cookie, attribution challenges
Agencies are exploring more incremental testing and insights in order to guide clients on where to spend and cut. It requires a lot more data but the payoff may be worth it.
Political ad spending piles up in key states less than a month until Election Day
A report released from AdImpact found 74% of all presidential ad spending has gone to just seven states since Vice President Kamala Harris entered the race.
Advertising Week Briefing: Some worry the DOJ is ‘fighting yesterday’s war’ in ad tech antitrust case
At Advertising Week, attendees mull the potential consequences of Google’s travails with the Justice Department.