‘Very pleasantly surprised’ Google shares results of Privacy Sandbox experiments

By Lara O'Reilly • October 22, 2020 •

Ivy Liu

Google on Wednesday published to GitHub results of experiments it has been running to test how Chrome’s Federated Learning of Cohorts (FLoC) proposal could work in practice to allow interest-based advertising to function in a more privacy conscious way and without the need for third-party cookies. Early signs are promising, Google suggests. Google said it is now hoping other companies can begin experimenting with the algorithms and also vet its findings.

The FLoC proposal was first published last year to Chrome’s Privacy Sandbox, where the separate Chrome team has published a number of suggestions for APIs (application programming interfaces) aimed at curtailing improper tracking and enabling advertising to continue to work on the web after third-party cookies are killed off. FLoC uses machine learning algorithms that run on the device to group people into audience segments based on behavior such as browser history.

The idea is to improve privacy by letting advertiser target groups (or “FLoCs”) of users based on common interests, rather than using a pseudonymous identifier for each individual user. Unlike some of the other Privacy Sandbox entries, the FLoC proposal has caught particular attention in the ad industry because it’s suggesting an entirely new mechanism, rather than just trying to replace one — as the similarly avian-themed TURTLEDOVE does for retargeting, for example.

Google’s experiments involved using a series of different methods to cluster cohorts together. The company’s tests involved using two publicly available data sets: A database of 1 million songs tagged by categories and users IDs and the MovieLens dataset of 25 million user-entered movie ratings, which associates each movie with a genre.

Google found that creating cohorts based on common user interests led to a 350% in recall — the percentage of time a user was correctly placed into a cohort by the algorithm — and a 70% improvement in “precision,” or how often the algorithm’s data was correct. The company recommended using a random projection method called “SimHash” — an algorithm that ensures a cohort ID can be conducted by the browser without needing to know another user’s information and doesn’t require a central server to store a user’s browsing history.

Chetna Bindra, senior product manager for user trust, privacy and transparency at Google, emphasized that while the early results were encouraging, the publication of this paper is just a “first step.”

“We were very pleasantly surprised,” said Bindra. “This particular kind of algorithm could be a very useful substitute for third-party cookies while still providing for increased privacy.”

Google is now hoping that other companies within the ad tech space can begin running experiments to run the algorithms using their own data to see whether this type of anonymization for users can still deliver results for advertisers.

“It will also be valuable in parallel — or potentially subsequently — to have browsers begin to look at how this can be [utilized],” said Bindra.

FLoC is still a long way off becoming an everyday reality for online advertisers. A number of GitHub users from browser and ad tech companies have raised issues about the proposal since it was published last year, ranging from whether the tech is too “black-box-y,” to how frequently users in a FLoC can change, to whether the “underlying meaning” of a cohort would be made public beyond its numerical ID.

Last year, The Electronic Freedom Foundation published a blog post expressing privacy concerns, equating a FLoC name to “a tattoo on your digital forehead that gives a succinct summary of who you are, what you like, where you go, what you buy and with whom you associate.”

Furthermore, FLoC has yet to pass the minimum bar for incubation within the voluntary consensus web standards organization W3C, said James Rosewell, CEO at tech company 51Degrees.

“Not a single other W3C participant has supported it yet” said Rosewell. “There seems to be, in all things Privacy Sandbox, no kind of pre-filter.”

He continued, “Thousands of people [are] talking about FLoC — senior people — spending hours and hours reading explainers, keeping up with what’s coming out on something that if any other organization came out with as an idea wouldn’t even feature in a tweet.”

The Google FLoC paper is likely to be raised at the W3C’s “TPAC” virtual event this week. The Improving Web Advertising business group sessions kicked off on Wednesday and will run through Thursday.

Google has a number of advantages in putting forward proposals for what might replace third-party cookies to make advertising continue to work on the web. It has a large bench of talented engineers to throw at the issue; it has the proprietary advertising and data systems to conduct thorough testing and it owns and controls the world’s most popular web browser, Chrome.

“FLoC isn’t fit for purpose yet because it’s incredibly technical, complicated and existing information [published about it] out in the world is pretty vague,” said Paul Bannister, chief strategy officer at Cafe Media. “Possibly with new information coming out there will be more meat on the bones.”

But, he added, “The proof is in the pudding — we have to see exactly how it works.”