Cyber Week Sale:

Save 50% on a 3-month Digiday+ membership. Ends Dec 5.

SUBSCRIBE

WTF is multimodal AI for advertisers? | How AI models are enabling a new level of flexibility and precision in targeting

This WTF guide, sponsored by Dstillery, dives into what multimodal AI is, the many ways it can be used and how it can solve both current and future challenges for advertisers.

As AI adoption accelerates in advertising, the technology itself is evolving beyond what most advertisers initially encountered. Early applications of generative AI and large language models in programmatic advertising and media were powerful but limited, as they typically processed one type of data at a time.

Multimodal AI represents the next frontier in AI, enabling machines to understand and evaluate multiple data types, providing greater understanding and flexibility than a single data type could ever offer.

In this WTF explainer guide, Digiday and Dstillery explore what multimodal AI is, how to apply it in real-world settings, its benefits to advertisers and how it’s well-positioned to solve current and future challenges.

01
How is multimodal AI different from other generative AI methods currently available?

Instead of viewing multimodal AI as different from generative AI, it can be seen as an evolution of it. Tools like the recently released Sora 2 from OpenAI are multimodal as they can understand text prompts and generate a completely different modality of data, such as video. Self-driving car technology is also multimodal because it merges different modes of data, such as radar, vision and GPS, to create a comprehensive view of the road. 

For advertising and targeting, multimodal AI can learn from any form of data and apply it to any data format, offering brands and agencies flexibility and precision previously unavailable.

What makes multimodal AI distinct, such as the offering from Dstillery, is its application to behavioral prediction in advertising. It understands information from across the digital ecosystem — website visitation patterns, search behaviors, purchase intent signals, CTV viewership and even podcast listening — to create a fully connected view of audience behavior and uniquely inform targeting.


“What you need is prediction based on behavioral data, and that’s why we wanted to use multimodal AI for prediction in digital ad targeting,” said Melinda Han Williams, Chief Data Scientist at Dstillery. “The next wave of performance won’t come from bigger datasets or faster models, but from systems that can learn across modalities — unifying behavioral and contextual signals to deliver both performance and flexibility.” 

This mirrors how humans process information — like naturally combining visual cues, sounds, text and context to understand a situation. When someone sees a waiter slip on a wet floor near a caution sign, their brain instantly processes multiple inputs to grasp the full story. Multimodal AI aims for that same fusion of understanding.

02
OK, so what data can I use, and how do I create these models?

Instead of building separate models for search, display, CTV and everything else, multimodal AI creates one unified understanding that works everywhere. 

“What sets multimodal AI apart is its flexibility,” Taejin In, Chief Product Officer at Dstillery, said. “Whether you have rich first-party data or just a campaign brief, multimodal AI can transform virtually any starting point into precise, actionable audiences.”

Advertisers can start with almost anything — tags on a website, CRM data, search keywords, website URLs, past campaign logs or even just a paragraph describing their target audience. The key is quality over quantity; a small but accurate seed of the best customers often outperforms a large, unfocused dataset. 

03
How does multimodal AI work exactly?

At its core, multimodal AI learns about digital behaviors in the same way large language models learn about words. 

The system creates mathematical representations — embeddings — of different data types (search keywords, website URLs, CTV viewing patterns, browsing sequences) and maps them into a shared space where the AI can identify behavioral connections across formats. For example, it might learn that people searching for “best running shoes for marathons” often visit athletic nutrition sites and watch fitness content on CTV, even when there’s no identifier connecting those actions. 

This approach mirrors how consumers actually behave. Someone might research a product on a search engine, check review sites, watch video reviews, see ads while streaming and finally purchase days later. 

“Traditional targeting treats search, browsing and streaming as separate silos,” said In. “But consumers don’t live in silos. Multimodal AI connects these signals the way people actually move through their digital lives — and that’s what reveals true intent.”

04
Got it. Once my team has our models, how can we activate multimodal AI for targeting?

This is where multimodal AI really shines. Once advertisers have a model built, it’s like having a universal translator for their audience across all of programmatic. 

After creating a model that captures their audience’s core behavioral profile, the technology translates that intelligence across any digital advertising channel, like display, video, mobile in-app, CTV and audio, across any targeting tactic — user segments, contextual targeting, curated deals and custom bidding algorithms. 

“For example, if our multimodal AI model identifies that your audience shows strong interest in financial content and sustainability topics, it can build both a custom behavioral user segment and identify all the relevant content — whether they’re reading articles on desktop, listening to podcasts or streaming shows on CTV,” said In. “All from that single model.” 

And because one model can produce both user segments and contextual categories — unique to this modeling process — advertisers get more out of their strategy and maximize the advertising opportunities available to them. Being able to have a model that can be used as a set of users or as a contextual category allows advertisers to bid on impressions regardless of whether it has an ID attached to it. 

Advertisers also have several activation options. If teams want to activate quickly, some solutions allow them to choose from thousands of pre-built segments with different scale options. These pre-built segments can help advertisers get started immediately. For those with a little more time and who are willing to put in more effort, advertisers can bring their own seeds for a custom model that is finely tuned for their opportunity and more likely to meet their KPIs. 

Additionally, with the growth in sell-side decisioning, multimodal AI enables these models to activate seamlessly as deal IDs across major SSPs, while also being available natively in all leading DSPs. 

05
What are some examples of multimodal AI in action?

One stand-out example is an auto insurance provider trying to lower its cost per quote. Dstillery used the company’s first-party data — visitors to the website — and multimodal AI translated that into both user segments and contextual categories. 

The surprise? Contextual targeting, powered by behavioral signals, outperformed everything else tested by the insurance provider, including traditional ID-based lookalike segments. This reinforces the importance of testing multiple tactics, because advertisers can’t always predict which approach will drive the best outcomes.

In another campaign for a kitchen appliance brand using multimodal AI, the brand achieved a 94% average video completion rate on CTV, surpassing the client benchmark of 70% and delivering a 0.14% CTR — beating a benchmark of 0.10%.

The top-performing CTV networks, including Discovery Channel and Food Network, were all aligned with high behavioral intent signals captured in Dstillery’s multimodal embeddings.

“It really captures the intent of every one of those digital events, so we take that behavioral understanding and bring CTV signals that are just about specific channels, titles or series into that same space,” Williams said. “Predictions we make based on behavior to find what digital behaviors are predictive of a certain outcome for each of these brands can now be directly applied to any one of these CTV channels, titles or series.” 

06
How can this solve challenges I haven’t yet encountered?

Multimodal AI liberates advertisers, unlocking precise, high-performing targeting from diverse data signals, while eliminating the fragmentation, latency and guesswork that have long restricted programmatic targeting. 

“Multimodal AI empowers advertisers to harness any data signal, whatever form it takes, and activate it across any programmatic tactic and channel,” said In. “This doesn’t just solve today’s targeting challenges, it provides the connective tissue AI agents will need to understand, evaluate and operate across modalities.” 

While this technology is valuable in solving current targeting challenges, as the industry moves toward autonomous agentic systems, a unified foundation will become even more crucial, as AI agents will need to plug into tools and applications that work seamlessly across data modalities. 

“For agentic AI and advanced AI agents to truly deliver superior value, they need to understand all modes of data; you can’t do that without multimodality,” In said. “Imagine if an AI agent can only reason from contextual signals. It would be missing behavioral signals, which we all know are a much better predictor of intent.” 

07
How can I find the right multimodal AI technology or partner for my team?

When evaluating prospective partners and vendors, there are many features, capabilities and questions to consider to ensure a good fit. Not all vendors can take diverse signals and fully compare them in the same shared space. Other differentiating qualities include transparent methodology and unified activation, as well as the speed and scalability of solutions.

Aside from the logistics of evaluating technical capabilities, it’s crucial to ensure that both parties entering the partnership have the same goals in mind. For example, a set-it-and-forget-it mentality will beget a set-it-and-forget-it performance. Successful partnerships occur when there’s a willingness to collaborate. When onboarding a new technology partner, it’s crucial to view it as a partnership. 

“Don’t expect everything to work perfectly from day one,” In said. “The brands that win are the ones willing to test, share learnings and iterate. Once that flywheel gets going and you understand what drives performance for your brand specifically, that’s when you outperform everybody else.”


About Dstillery

Dstillery is the leading AI ad targeting company. We empower brands and agencies to target their best prospects for high-performing programmatic advertising campaigns. Our audience targeting solutions are powered by multimodal AI — a breakthrough in AI that learns from any form of data and applies it to any form of data, making it the most flexible targeting engine in the market. Backed by our award-winning Data Science, Dstillery has earned 24 patents (and counting) for the AI technology that powers our precise, scalable audiences. To learn more, visit us at www.dstillery.com or follow us on LinkedIn.

More from Digiday

The Trade Desk loosens its grip on pricing amid buyer pressure

Amid fierce DSP competition, media agencies are finding The Trade Desk’s reps in a negotiating mood.

Behind the rise of the chief productivity officer and what it means for companies and employees

The CPO is envisioned as the leader who orchestrates people and technology together to drive business outcomes.

TikTok Shop sheds bargain-bin reputation as average prices climb across categories

An analysis by e-commerce intelligence firm Charm shows average prices climbing across more than a dozen key categories.