“Deterministic” and “probabilistic” identity data have become the new buzzwords in digital ad circles.
These terms have been familiar to digital advertisers, publishers and ad tech executives for years. But now that the entire industry is on the hunt for alternatives to the third-party cookie, they seem to be tossed around more frequently, especially in descriptions of how the new crop of so-called cookieless identifiers work.
Ad tech, of course, is riddled with made-up terminology. Not this time. Deterministic and probabilistic methods for making identifiable data connections have been around for years and in relation to a variety of subject areas that have absolutely nothing to do with digital advertising —from public health to education to risk analysis.
Better yet: the words actually reflect their meaning. (Even better yet — no acronyms!)
What is deterministic data?
Deterministic data is information that is known to be true and accurate because it is supplied by people directly or is personally identifiable, such as names or email addresses. It’s often referred to as authenticated data.
What is probabilistic data?
Probabilistic data is based on probabilities. It is comprised of individual pieces of information, such as a device’s operating system or IP address, and compiled to puzzle together a conclusion. In the case of ad tech, probabilistic data can be used to create an identifier.
How is deterministic data used for advertising identity?
Deterministic identifiers use deterministic data to assign identity to a person online or using a mobile device in order to track that identified person across websites or apps for ad targeting or measurement. The key ingredient in deterministic identity is typically information someone supplied herself, usually by logging in with a name, email address or phone number.
So, is deterministic data the same as first-party data?
Well, sometimes. First-party data gathered directly from people by a brand or publisher includes deterministic data such as names, emails or phone numbers. But first-party data also includes a variety of other information reflecting actions taken on a website, articles read, purchase transactions or other behavioral data.
So how is deterministic data used to assign identity?
Deterministic identity is achieved when an email address supplied by a publisher or advertiser is matched to the same email address in an identity graph or database of logged-in users. Or, a deterministic ID match could happen if two entities both recognize an ID and can accurately match them. Sometimes three pieces of deterministic information can be used to connect the dots. For example, if it’s known that ID1234 is email@example.com and firstname.lastname@example.org is ID6789, then ID1234 is a deterministic match to ID6789. Ultimately, to achieve a deterministic match, data fields must agree.
So what’s probabilistic data, and how is it used for advertising?
First, a bit on why probabilistic data is used. Deterministic data is hard to come by. Very often ad tech systems can’t match identities because someone is not logged in or an email address or other piece of deterministic data is not available. When advertisers complain about low match rates, it’s because there is a lack of deterministic data links.
Systems using probabilistic methods employ a variety of data points to decipher who a user might be. The easiest way to think about these methods is that they assign identity that is probably accurate. Basically, they’re taking their best guess to infer identity.
When publishers want to assign identity to someone who is not logged in, or a demand-side platform or identity graph provider wants to figure out if there’s a match between a site visitor and another existing ID, they employ probabilistic methods to assign identity based on a variety of probabilistic data points.
Do companies communicate whether an identity has been assigned based on deterministic or probabilistic data?
While identity tech firms provide information about how they create or link IDs in technical documentation and materials provided to clients, their IDs themselves don’t reveal whether deterministic or probabilistic methods are used. In fact, some firms take a hybrid approach to creating or matching identifiers.
What types of information is used to assign probabilistic identity?
Some identity tech firms call the information used to piece together probabilistic identity “soft signals” or “non-unique device characteristics.” Typical data points used include IP address, timestamps, browser version or screen resolution.
Um, isn’t this just fingerprinting?
Fingerprinting also triangulates a variety of data points to establish identity, but ad and identity tech execs often stress that there are distinctions between the two. They’re particularly compelled to draw distinctions because the practice of fingerprinting has fallen out of favor, especially since 2019 when Google said its Chrome browser would restrict its use and since the company prohibits ad tech vendor partners from using fingerprinting for identification. Other browsers like Safari and Firefox also restrict fingerprinting.
Companies employing probabilistic identification methods give varying reasons for why their techniques are distinct from fingerprinting. But the distinctions can seem convoluted or semantic.
For example, some identity tech firms argue that fingerprinting usually happens mainly on the advertiser side, when advertisers or ad tech firms want to create persistent identifiers without the knowledge or approval of people or publishers. Others, however, say fingerprinting happens on the publisher side, when publishers want to create IDs. Others suggest the distinction lies in that fingerprinting happens only at the device-level.
“It’s just in the language and that makes me furious,” one ad tech exec who spoke anonymously told Digiday. “Most ad tech companies, most identity solutions, the probabilistic IDs, these are based on fingerprinting technology — but they’re not calling it fingerprinting.”
WPP’s Rob Reilly on the power of creative excellence
Under Rob Reilly's creative lead, WPP won most creative company at the 2022 Cannes Lions Festival of Creativity. He talks about that and more in this Q&A.
Member ExclusiveMedia Briefing: What Axios’ sale says about the valuation of digital media companies
In this week’s Media Briefing, senior media reporter Sara Guaglione looks at what Axios's sale to Cox Enterprises signals about the current investment market for media companies.
Amid gloomy forecasts can ad tech weather the storm?
The recent Q2 results suggest there is more resilience and runway in the ad tech sector. But how long before push comes to shove?
SponsoredWhat gaming habits reveal about media consumption
Jordan Shlachter, head of research, Activision Blizzard Media Entertainment choices have never been more abundant, and gaming has emerged as one of the biggest winners in the battle for audiences’ attention. While gaming’s exponential growth has been well documented — there are currently nearly 3 billion gamers worldwide spanning a diverse set of demographics, interests […]
Member ExclusiveDigiday+ Research deep dive: Twitter’s strength holds among publishers
There is perhaps no social media platform that is more appropriate for publishers than Twitter. In this Digiday+ Research deep dive, we look at why this is.
La razón por la que Google y Samsung se asociaron con la personalidad de TikTok Addison Rae para una campaña nostálgica de los años 90
Este verano, Google y Samsung han lanzado su último esfuerzo de marketing conjunto, en el que los gigantes de la tecnología y el hardware aprovechan la nostalgia de principios de los años 90 y utilizan a la TikToker Addison Rae como musa de la generación Z. En su nueva campaña publicitaria con Rae, Google cuenta […]