WTF is the difference between deterministic and probabilistic identity data?
“Deterministic” and “probabilistic” identity data have become the new buzzwords in digital ad circles.
These terms have been familiar to digital advertisers, publishers and ad tech executives for years. But now that the entire industry is on the hunt for alternatives to the third-party cookie, they seem to be tossed around more frequently, especially in descriptions of how the new crop of so-called cookieless identifiers work.
Ad tech, of course, is riddled with made-up terminology. Not this time. Deterministic and probabilistic methods for making identifiable data connections have been around for years and in relation to a variety of subject areas that have absolutely nothing to do with digital advertising —from public health to education to risk analysis.
Better yet: the words actually reflect their meaning. (Even better yet — no acronyms!)
What is deterministic data?
Deterministic data is information that is known to be true and accurate because it is supplied by people directly or is personally identifiable, such as names or email addresses. It’s often referred to as authenticated data.
What is probabilistic data?
Probabilistic data is based on probabilities. It is comprised of individual pieces of information, such as a device’s operating system or IP address, and compiled to puzzle together a conclusion. In the case of ad tech, probabilistic data can be used to create an identifier.
How is deterministic data used for advertising identity?
Deterministic identifiers use deterministic data to assign identity to a person online or using a mobile device in order to track that identified person across websites or apps for ad targeting or measurement. The key ingredient in deterministic identity is typically information someone supplied herself, usually by logging in with a name, email address or phone number.
So, is deterministic data the same as first-party data?
Well, sometimes. First-party data gathered directly from people by a brand or publisher includes deterministic data such as names, emails or phone numbers. But first-party data also includes a variety of other information reflecting actions taken on a website, articles read, purchase transactions or other behavioral data.
So how is deterministic data used to assign identity?
Deterministic identity is achieved when an email address supplied by a publisher or advertiser is matched to the same email address in an identity graph or database of logged-in users. Or, a deterministic ID match could happen if two entities both recognize an ID and can accurately match them. Sometimes three pieces of deterministic information can be used to connect the dots. For example, if it’s known that ID1234 is firstname.lastname@example.org and email@example.com is ID6789, then ID1234 is a deterministic match to ID6789. Ultimately, to achieve a deterministic match, data fields must agree.
So what’s probabilistic data, and how is it used for advertising?
First, a bit on why probabilistic data is used. Deterministic data is hard to come by. Very often ad tech systems can’t match identities because someone is not logged in or an email address or other piece of deterministic data is not available. When advertisers complain about low match rates, it’s because there is a lack of deterministic data links.
Systems using probabilistic methods employ a variety of data points to decipher who a user might be. The easiest way to think about these methods is that they assign identity that is probably accurate. Basically, they’re taking their best guess to infer identity.
When publishers want to assign identity to someone who is not logged in, or a demand-side platform or identity graph provider wants to figure out if there’s a match between a site visitor and another existing ID, they employ probabilistic methods to assign identity based on a variety of probabilistic data points.
Do companies communicate whether an identity has been assigned based on deterministic or probabilistic data?
While identity tech firms provide information about how they create or link IDs in technical documentation and materials provided to clients, their IDs themselves don’t reveal whether deterministic or probabilistic methods are used. In fact, some firms take a hybrid approach to creating or matching identifiers.
What types of information is used to assign probabilistic identity?
Some identity tech firms call the information used to piece together probabilistic identity “soft signals” or “non-unique device characteristics.” Typical data points used include IP address, timestamps, browser version or screen resolution.
Um, isn’t this just fingerprinting?
Fingerprinting also triangulates a variety of data points to establish identity, but ad and identity tech execs often stress that there are distinctions between the two. They’re particularly compelled to draw distinctions because the practice of fingerprinting has fallen out of favor, especially since 2019 when Google said its Chrome browser would restrict its use and since the company prohibits ad tech vendor partners from using fingerprinting for identification. Other browsers like Safari and Firefox also restrict fingerprinting.
Companies employing probabilistic identification methods give varying reasons for why their techniques are distinct from fingerprinting. But the distinctions can seem convoluted or semantic.
For example, some identity tech firms argue that fingerprinting usually happens mainly on the advertiser side, when advertisers or ad tech firms want to create persistent identifiers without the knowledge or approval of people or publishers. Others, however, say fingerprinting happens on the publisher side, when publishers want to create IDs. Others suggest the distinction lies in that fingerprinting happens only at the device-level.
“It’s just in the language and that makes me furious,” one ad tech exec who spoke anonymously told Digiday. “Most ad tech companies, most identity solutions, the probabilistic IDs, these are based on fingerprinting technology — but they’re not calling it fingerprinting.”
Newsletter publishers say they continue to see uptick in revenue despite advertising slowdown
At a time when larger media companies are feeling the pressure of the economic downturn and advertising slowdown, newsletter businesses continue to be in a period of revenue growth.
TikTok’s CEO faces bipartisan skepticism in first Congressional hearing on security concerns
The hearing comes amid calls to remove TikTok from government devices and in some cases even ban it entirely.
Media Briefing: What to expect at the Digiday Publishing Summit
As DPS draws nearer, top pain points for publishers are coming to light.
SponsoredHow advertisers are leveraging omnichannel attribution and measurement to power CTV
Sponsored by MNTN Connected TV advertising has joined and expanded the larger ecosystem of campaigns that advertisers deploy. As such, omnichannel marketing strategies now encompass television and mobile devices, tablets and other screens such as out-of-home. And as customers engage across these different touchpoints, brands are seeking and moving their measurement and analytics efforts to […]
New app launches through Apple hoping to win with ‘zero-party data’ when others haven’t
Caden's new app lets users connect data from their Uber, Amazon, Netflix and other accounts in exchange for money. Will it take off?
‘The next level for us’: The New York Times eyes better retention for games in subscription drive
The games division is focusing on finding new ways to mine the inherent competitive nature of games like encouraging people to play multiple games in a single session or through new achievements and rewards for progression.