Third-party data: Is the juice worth the squeeze?

Sponsored by Carbon AI • December 8, 2020 •

First-party data is the current north star in ad tech, but can it single-handedly solve publishers’ need for good targeting, relevant audiences and rich custom profiles?

Not yet. The truth is, low-quality first-party data is still worse than decent-quality third-party data. And so, even if first-party data is the north star, it isn’t necessarily a guarantee of quality. Its weaknesses can be seen in a number of areas:

First-party demographics: It’s impossible to arrive at a strong confidence result for many types of content viewed in isolation

First-party context: Many publishers want to sell beyond their immediate focus areas, and for some topics, like news, the context is especially hard to sell at high CPMs.

First-party logged-in data gaps: Few publishers have enough logged-in data to make accurate inferences. Indeed, according to data published by ID5 at their recent Identity 2021 conference, nearly half of all publishers have less than 10 percent of customers logged in. Relying on those sample sizes to infer across the remaining 90-plus percent is a questionable practice.

Ultimately, any platform proposing first-party data alone as a viable solution in 2021 leaves revenue on the table for publishers. Meanwhile, technically accomplished uses of high-quality third-party data can fill in first-party data gaps. After all, third-party data is just someone else’s first-party data. That’s hardly a cutting edge insight, but it’s too easy to forget it. So, if first-party data is taking a lead role, third-party data should still be a solid supporting act, at least with the right data operation.

How publishers can solve their third-party data struggles

Third-party data struggles most when it comes to opacity in the source, recency, accuracy and relevance. But these areas provide publishers with excellent opportunities to extract some of that first-party goodness from third-party assets. They can be unpacked in the following ways.

Opacity in the source: Typically, the further downstream publishers are from a data source, the greater the chance there will be a drop in quality. For example, data that originated in a bidstream from unknown publishers and then passed through aggregators before hitting the DSP is poised to underperform. As such, concerns about the origin of data are legitimate. When Carbon’s data science team analyzed consented URL strings from large players in the space, 10–50 percent proved to be from sites with sensitive data, including child-oriented content, health, well-being related content and more.

Recency: It’s hard for publishers to see a timestamp on the data publishers are purchasing — which is astonishing given that most practitioners know data and cookies have a very specific shelf life. Marketers are forced to accept paying the same for a 29-day old cookie as they do a 29-hour old cookie. Why is that? Because recency is a difficult element to guarantee if it’s not baked into the data operation from the outset — and few data companies have accurate recency as a result. When data is scored for intent, recency should be front and center in the algorithm.

Accuracy: There is some achingly low-quality third-party data out there. A recent example is this gender profile of a single user: Male | female | male, female | female | female | male. There is obviously some need for greater accuracy, in this case.

As an industry, we must do better.

Revenue data is critical to doing better

So what does better look like? Firstly, there’s another data source most platforms ignore — the value of revenue data as revealed by the bidders. Adding this data into the mix in real-time allows for a transformational understanding of both audience and context to enrich both first-party and third-party data at every stage. It’s an area overlooked by too many data companies. Data-science teams must focus on extracting that unique data source and join it up to first- and third-party data to produce a weighted view that maximizes the value of all the available data resources.

ID consortiums and the future of data

If first-party data was the buzzword of 2020, then ID consortiums look likely to be the 2021 equivalent. In theory, an open consortium of compliant data is wonderful, but it seems as though some companies are merely racing to become the next walled garden — and the jury is still out on quality. Case in point, a recent study from BDEX found that a full 25 percent of device identifiers are invalid.

Now, as the data and identity market matures — and as privacy regulations become more complex — these factors are making third-party data work more complicated as well. But when did ad tech stop trying to solve the hard problems?

For as long as there is still upside in third-party, publishers should be extracting value from it. The juice is worth the squeeze, it turns out, but the squeeze won’t be easy. Still, these challenges can be solved, and when the industry gets to the solutions, publishers get a better shot at survival, stability and a chance to thrive.

Third-party data: Is the juice worth the squeeze?

More from Digiday

Thrive Market’s Amina Pasha believes brands that focus on trust will win in an AI-first world

Media Briefing: Turning scraped content into paid assets — Amazon and Microsoft build AI marketplaces

Despite flight to fame, celeb talent isn’t as sure a bet as CMOs think