7 seats left:

Join us Dec. 1-3 in New Orleans for the Digiday Programmatic Marketing Summit

SECURE YOUR SEAT

WTF are synthetic audiences?

As a Digiday+ member, you were able to access this article early through the Digiday+ Story Preview email. See other exclusives or manage your account.This article was provided as an exclusive preview for Digiday+ members, who were able to access it early. Check out the other features included with Digiday+ to help you stay ahead

What if you could test a new product idea and get your audience’s reaction before launching, without bothering real people?

Well, now you can, with synthetic audiences. 

The process involves taking a publisher or brand’s audience dataset and using AI to create a copy of those audience behavior patterns to use for market research. 

It’s a new term coming to light this year in the media and advertising industries. And though not everyone is comfortable with the way “synthetic audiences” sounds (perhaps too reminiscent of “fake” at a time when AI slop is rampant), it’s a practice that publishers like The Times are finding useful to test ideas behind product launches and editorial initiatives – and ad agencies like Dentsu are using to test media planning and audience targeting – for faster and cheaper.

I’ve heard the terms synthetic data and synthetic research used – are these the same as synthetic audiences? 

Pretty much. Like many new-fangled terms born out of new technological capabilities, there is more than one way to refer to it. Synthetic data/research is the umbrella term to describe using AI to create data mimicking the real data it’s given. 

Synthetic audiences refer more specifically to a panel trained on human behavior data, usually from CRMs or human-conducted surveys. Synthetic audiences refer specifically to simulating an audience group and its different cohorts, also called “digital twinning.” 

Typically, when a publisher or brand wants to conduct market research, they have to use surveys, focus groups and other data collection processes. With synthetic research, they can copy real audience data to create a virtual dataset to get real-time insights from. The AI systems use inference patterns to provide insights from these “synthetic panelists,” according to Cynthia Vega, who leads the integration of AI and analytics across Kantar, a data and consulting company.

However, there are other forms of synthetic research, such as “cohort boosting,” Vega said. That process involves expanding a dataset. For example, if a company has a focus group of 50 (real) people but wants to extrapolate that data to look at a larger audience segment, they can use AI to expand that dataset to then market test across 500 (synthetic) people. 

How does the process work?

Publishers share their audience data with an audience simulation platform to create the synthetic audience. 

In the case of The Times, the team behind the platform Electric Twin was embedded with the publisher’s data and insights teams for months to analyze its datasets.

Electric Twin’s platform is not too different from ChatGPT’s user experience: you can ask prompts either to the entire synthetic audience database, or to certain audience cohorts, and compare responses from different groups. You can also chat with a specific “persona” who is modeled after a specific type of person, such as an older, print newspaper reader living in Wales, for example. Alex Cooper, CEO and co-founder of Electric Twin, said their clients include print and broadcast media companies in the UK. 

Synthetic research is usually a combination of audience data and AI models. Additional behavioral data – such as publicly available government data – can be layered in. The synthetic audience “personas” don’t rely on real people’s personal information but are built on probabilistic data

There are three main use cases for publishers, according to Cooper: test products before they launch, test concepts and campaigns before they go live, and get deeper audience insights into their readers or viewers. This can mean testing subscription offers, or different campaign messaging to see what resonates the most, he said.  

Once these AI twins are created, publishers can talk to them to get insights on something in development, such as why they would want to buy a product or if they would listen to a specific podcast show idea.

Wait, what?

Think of a simple idea of a focus group: one person asks questions to a group of people in a room. Questions like, how often do you read the news in a week, or, which news sources do you trust the most?

Now, you can do that in a chatbot experience. You can ask the whole group, or ask questions one-on-one to a persona representing a Gen Z reader who reads the news every day and trusts left-leaning publications, for example. 

“Anything you would ask to a human, you could ask to a twin,” Vega said.

But it does require clean, high-quality data to work. 

What are the benefits to using synthetic audiences over real ones?

It can be faster and cheaper. If a publisher or brand is testing a new product idea, they need to test it with a group of people and gather feedback. All of that takes time and money. But testing an idea with a synthetic focus group can give much “faster access” to market research results, Vega said.

Using synthetic audience data can be “tens of thousands of times faster… It’s almost an instantaneous response,” Cooper said. A human survey could take weeks to complete. While it depends on how often a publisher is turning to the synthetic audience for insights, it’s arguably “one-thousand times cheaper” than doing the same audience research on humans, he added.

Publishers and brands can also keep asking questions to synthetic audiences without worrying about “survey fatigue” or the “one shot” you get to ask those questions when sending out a survey to human readers, Vega said. Electric Twin, for example, charges publishers a subscription fee to use its platform, but they can ask an unlimited number of questions.

Synthetic data can be used as the first step in market research, but shouldn’t be used for any final decision-making. It can help weed out the best idea from five, but then that idea should be tested with a human audience segment.

What can publishers do once they have created a synthetic audience?

Publishers can use synthetic audiences to test products and content ideas and get them to market faster. They can also refine marketing strategies. Publishers can also create specific audience segments from that larger synthetic audience dataset to test products and ideas for that cohort they want to target in particular.

For example, if certain areas of coverage are declining in readership, a publisher could ask synthetic audiences why they think this may be, or why they may unsubscribe, Vega said. 

The Times has used synthetic audiences to guide its editorial product plans this year, such as naming a new business podcast, Tracy Yaverbaun, gm of The Times and Sunday Times, said onstage at the Digiday Publishing Summit Europe last month. It also asked synthetic panelists what would drive deeper engagement, she said.

How can you trust that these synthetic audiences’ answers are correct?

Bingo. Nothing “synthetic” is going to replace what’s real. None of this is a stand-in for human market research.

“Human validation remains vital for accurate, bias-free results,” said Paul Hood, an independent digital strategist.

There are also ways in which responses from an AI-generated dataset may differ from what a person would say – and they’re not always reliable. When Vega asked a digital twin, “Do you drink alcohol in the morning?” it said yes, which seemed strange. So she asked why, and the digital twin responded that if there was a wedding or work event that took place in the morning, it might have a drink in that setting. 

“The fast response of a human would not include that,” she said. “It doesn’t mean it’s wrong. But it’s giving you [a different perspective] than a human would.”

In another example, a synthetic panelist refused to provide information that Kantar already knew it had. She learned it was simply mimicking human behavior – skipping the question the way a person might on a survey. 

The upshot: guardrails or rules are needed to ensure synthetic panelists behave in the right manner to get the best data outcomes, she said. Synthetic audiences also can’t be used to question the performance of content or advertising, Vega said. Instead, it is best for providing ideas or solutions. 

Lastly, Vega warned that synthetic audience data needs to be refreshed every once in a while and can’t be continuously used for market research, as human perceptions and sentiments change over time.

Overall, the accuracy of synthetic audience data insights is quite high – 94% for one use case, Vega said. When analyzing seven different surveys, Electric Twin found the synthetic audience data insights were just as accurate as the ones based on the real audience data, according to Cooper. It’s also important to build a synthetic audience that reflects a publisher’s needs, Cooper said. You probably shouldn’t use the Times’ synthetic audience data for insights on developing childcare products, for example.  

“We see this as supplementing, augmenting and making traditional research processes even better,” Cooper said. “It essentially provides another probe onto your audience if you’re trying to understand [them].”

More in Media

Forbes launches dynamic AI paywall as it ramps up post-search commercial diversification plans

For the latest Inside the publisher C-Suite series, Digiday spoke to Forbes CEO Sherry Phillips on its AI-era playbook, starting with its AI-powered dynamic paywall to new creator-led commercial opportunities.

Creators embrace Beehiiv’s push beyond newsletters

Creators are embracing Beehiiv’s new website, product and analytics tools to help them grow beyond the competitive newsletter space.

Illustration of a performer balancing money weights on a tightrope, symbolizing how brand safety tools help marketers maintain performance and control.

Media Briefing: Publishers turn to paid audience acquisition tactics to tackle traffic losses

Publishers facing declining organic traffic are buying audiences through paid ads and traffic arbitrage, and using AI tools to do it.