As ad tech firms test ways to connect Google’s FLoC to other data, privacy watchers see fears coming true

Google’s automated cookieless ad targeting method — or Federated Learning of Cohorts — is supposed to protect privacy by providing people with a greater degree of anonymity than the third-party cookie offered. Instead, it may make it quicker and easier for advertising companies to identify and access information about people online.

As privacy and data ethics advocates warned, companies are starting to combine FLoC IDs with existing identifiable profile information, linking unique insights about people’s digital travels to what they already know about them, even before third-party cookie tracking could have revealed it. And identity tech firms say the IDs will help improve the accuracy of systems that detect people’s identities and could even serve as persistent identifiers.

“The more signals we have, the more accurate we are, and FLoC IDs will be among signals we use,” said Mathieu Roche, CEO of identity tech firm ID5. 

Google points to FLoC as a beacon of privacy-safe ad targeting because the method does not track people individually. Instead, it uses machine learning to group people based on the web pages they have viewed. Additionally, the FLoC ID assigned to people is updated weekly, which is meant to filter them into gradually evolving collectives and seemingly limit a FLoC ID’s use as a persistent identifier. Furthermore, because the system works automatically inside web browsers like Google’s Chrome, Google does not precisely define how it assembles cohorts, and the company does not supply labels to reveal what their supposedly opaque codes represent. 

Nonetheless, the ad industry — which co-opted foundational internet technologies like the cookie and the IP address into means of identifying people online — sees an opportunity to do the same with FLoC IDs in the hopes of circumventing the impending demise of cookies.

Over time FLoC IDs might work as persistent identifiers in the way IP addresses do, said Nishant Desai, group director of technology and operations at GroupM’s ad tech arm, Xaxis. Like IP addresses, FLoC IDs will not be entirely static. However, it is likely that the same FLoC IDs or range of IDs will be associated with someone.

“If your behavior doesn’t change, the algorithm will keep assigning you in that same cohort, so some users will have a persistent FLoC ID associated with them — or could.” He and other people interviewed for this article stressed that because Google is testing FLoC on a very small portion of web traffic during the trial period, there are limits to how they can use the data today and to the statistical significance of their analysis. Most companies also described various regimes in place to protect people’s privacy when using FLoC data.

FLoC IDs can lower the barrier to companies gathering information about a person, which is what has alarmed privacy advocates. Whereas until now, someone would have had to visit a website once before the site could drop a cookie on their machine to track their movements across the web, a FLoC ID and the signals it emits will be known from the get-go, according to Bennett Cyphers staff technologist at the digital privacy advocacy organization Electronic Frontier Foundation. He added, “This is kind of unprecedented even in ad tech.”

In a blog post published in March, Cyphers criticized FLoC as “a terrible idea,” in part for exactly the reason that advertising companies are now indicating could come to fruition: exploiting its potential as an identifier. “If a tracker starts with your FLoC cohort, it only has to distinguish your browser from a few thousand others (rather than a few hundred million),” he wrote.

During the current trial period, Google has no rules for how FLoC or the data it generates via its Chrome browser can be used. The company did not provide a comment in time before publication.

Connecting FLoC IDs to profile data 

Advertising companies are already strategically gathering FLoC IDs and linking them to identifiable data or analyzing them in an attempt to uncover information about people that may not have been known before, mimicking how they have parsed what third-party cookies told them about people’s behaviors.

Neustar is gathering FLoC data from its own site activity and testing what’s coming in from FLoC trials on website traffic from some clients’ sites, said Devon DeBlasio, product marketing director for the firm, which helps brands manage and use data and offers identity technology. And, it will connect that to existing identifiable data it has about people. “Our advertiser clients will be able to associate a FLoC ID to their first-party identity like an email when a user authenticates on their web properties,” he said.

The idea that “FLoC IDs are an additional dimension of how you resolve identity is definitely true,” said Desai. He said he expects identity tech and ad tech providers to incorporate FLoC IDs as another signal they might use to help resolve identity or to inform what companies know about people they already have identifiable profiles on. “As FLoC IDs start to come into the ID graph, you can associate those with those profiles,” he said.

When Desai spoke to Digiday regarding the issue, he was speaking in general terms about how identity tech providers might attempt to manipulate FLoC IDs. This story originally stated that GroupM’s new data division Choreograph is conducting an analysis of how FLoC IDs can be coordinated into its identity technology, which the company said is inaccurate. After this story was published, GroupM said Desai misspoke and that Choreograph is not and currently does not plan to incorporate FLoC IDs into its own ID graph. GroupM, however, is currently conducting due diligence on the best ways to implement FLoC for its clients in the future.

ID5’s probabilistic identity tech uses a variety of signals including IP address, page URL, and a timestamp to detect people’s identity. While the company has yet to incorporate FLoC IDs as a consistent data source, Roche said a FLoC ID could give the company’s system an accuracy boost. Because a FLoC ID represents a relatively small number of people — a thousand or so rather than hundreds of thousands or millions — FLoC IDs would essentially reduce the number of vectors necessary for the firm’s identity method to work because the system could determine that someone had been assigned that particular ID before. If publishers working with ID5 want to pass along FLoC IDs among the data signals the company uses to help them identify people for ad targeting and measurement, Roche said, “We can use that as another signal to create a stable identifier for them.”

MightyHive, another data consultancy that helps brands manage and use data, is gathering FLoC IDs for analysis, said Michael Neveu, director of data for the firm. “We’re categorizing users in these buckets,” he said. Advertisers will be able to determine whether people associated with particular FLoC IDs take certain actions more often than others, by buying certain products, for instance. “I should be able to say a user is in cohort 1000; I’d like to reach out to other users in similar cohorts to see if they’re interested as well,” said Neveu. 

Desai and others say demand-side platforms and other ad tech firms are expected to collect the cohort IDs, analyze them, and build out rich taxonomies for ad targeting that detect people’s interests based on patterns indicated by those IDs.

That work has begun. Ad tech firm Criteo told Digiday it is collecting FLoC IDs, though the firm would not elaborate on how they are using the data or plan to. Iponweb’s DSP could use data from FLoC IDs in ad targeting if Google rolls it out beyond the current pilot, according to Michael Beschastnov, product architect for Iponweb’s supply-side tech firm, The MediaGrid. He said the parent company is conducting internal testing on how to use FLoC data which he expects may become “part of the product.”

Reverse-engineering FLoCs

In addition to connecting FLoC IDs with other types of data, Google’s cookieless targeting method can be used on its own to create audience profiles.

Don Marti, vp of ecosystem innovation at ad services firm CafeMedia, started reverse-engineering FLoC IDs in April to back into which topics are associated with specific anonymized groups or cohorts of people. “We can use FLoC data not just for placing ads to the subset of the audience that has FLoC, but also to inform contextual placements for users who don’t have FLoC — whether because their browser doesn’t support it, they have it turned off, or their cohort is blocked,” said Marti. There’s no indication that reverse-engineering FLoC IDs could, without other data, identify someone.

Marti posted about what he learned when analyzing the millions of data points from FLoC-enabled browsers he’d captured, spotting where certain FLoC IDs showed up in relation to content keywords more often than others. In one example, he evaluated FLoC IDs associated with financial and tech content. While a “long valley” of FLoC IDs indicated people who probably weren’t especially interested in investment related content, those in a narrow range in the 14000s were. “So a brokerage firm, like Charles Schwab, wanting to reach likely investors might start by testing advertising success with FLoCs in the 14000 range,” he wrote. 

Not all advertising companies are convinced of FLoC’s identity potential, though. Mediavine, which manages ads for small publishers, has been experimenting with how FLoC IDs can inform content recommendations. But the firm’s co-founder and CEO Eric Hochberger said, “Mediavine has not, and has no plans to, link FLoC to our first-party data.” The company, along with many other ad tech firms that work directly with publishers, has been banking on the potential of identifiable first-party data connections to bring ad and subscription revenue to publishers operating on the open web after the third-party cookie fallout.

Related
Amid post-cookie confusion, Amazon plans to launch an identifier of its own

“First-party relationships are on the cusp of becoming the most valuable source of data. Ultimately, FLoC cohorts are merely anonymous IDs that are at best 95% as effective in targeting as third-party cookies per Google’s own findings,” said Hochberger. “We don’t see value in attempting to link the less valuable data to our more valuable data.”

‘This little package of information’

However, clearly, other companies do see FLoC IDs as potentially valuable identity data, which is why privacy researchers like Cyphers see them as a not-so-theoretical privacy problem.

Google’s Chrome browser will assign a FLoC ID to every Chrome user who has not opted-out by turning off the browser’s privacy sandbox setting or blocking it with a browser extension. So, even if someone has never visited a site previously, the FLoC ID can reveal information about that person that the site or ad system may not otherwise have. For example, in aggregate, these data signals might, for uncover someone’s gender, if they are likely to be in a higher or lower income bracket or if they live in a certain region. Despite the fact that the process is intended as a privacy-preserving replacement for behavioral tracking enabled by third-party cookies, said Cyphers, “It’s a new ability to make a judgment.”

“Chrome is going to be presenting this FLoC ID to every site that I visit the first time that I visit them,” said Cyphers. “Without doing anything else, the first time you’re seen by [a website or ad system] they’re going to have this little package of information about you.”

https://digiday.com/?p=417492
Digiday Top Stories