Over the past decade, connected mobile devices have changed everything for consumers and marketers. Consumers now have connectivity and access to thousands of services wherever they go, which means marketers now have access to data on not just where they go, but what they do when they are there — giving them real-life insight into consumers’ intents, interests and habits. However, this location data is only as insightful as it is accurate.
As data collection becomes more sophisticated, so do the ways bad actors can generate faulty data and commit mobile location ad fraud. This comes at a huge cost: inaccurate data directly impacts the bottom line for 88 percent of companies, with the average company losing 12 percent of its revenue.
Of what we see at Gravy Analytics, both bidstream and SDK-sourced location data contain fraudulent and low accuracy information — up to 60 percent of bidstream location data can be problematic. The challenge for location data processors is not only to spot and throw out bad data that would misinform our customers’ marketing and business decisions, but to use these points to improve detection in the future.
Where does fraudulent location data come from?
Understanding how fraudsters work and identifying their handiwork starts with understanding their motives. It shouldn’t be any surprise that it comes down to money. The strongest financial incentives are the higher CPMs for ad inventory that comes with location information to fit target audiences. The size and complexity of the advertising ecosystem, along with the variety of forms that mobile location ad fraud takes, makes fraud challenging to detect and relatively easy to conceal.
We see fraudulent data points make their way into our data sets in many different ways, but the most frequent are:
- Apps (or end users) engaged in fraud that send large numbers of signals
- Mobile websites disguised as apps that use calculated latitude/longitude data
- Spoofed locations and devices
- Manually adjusted time stamps
- SDKs embedded in apps that generate false user activity
Finding the fraudulent data
Knowing how fraudulent data points are created helps with identifying and removing those data points to avoid misleading results and misinformed decisions based on those results, like allocating marketing funds to target the wrong audience. In this process, patterns are a data scientist’s best friend, and these patterns are spotted much more accurately when more sources are aggregated.
The most vigilant location data processors work with several data collection partners so that data for a device is rarely confined to a single source. When there’s data for one device from multiple apps or ad exchanges to compare and contrast, we can confirm that a device was truly in a certain location at a certain time and that the device is real. Disparities between data from different sources for the same device send up a red flag that some or all of its data may be fraudulent.
Once multiple data sources are merged and the data points that are obviously “bad” based on anomalies in expected patterns are thrown out, we take an even closer look at the remaining data for signs of suspicious information, such as:
- Latitude/longitude data that is in the exact center point of a country or at 0,0
- Latitude/longitude data that is in an uninhabitable region, like the North Pole
- Devices appearing to move at unrealistic speeds between two places
- Devices appearing to be in two places at once
Some signs of fraudulent data are a little stealthier than others, but as analysts identify problematic trends, they can then build and continuously update algorithms that catch them based on factors like dwell time, distance between signals received over time, and data across multiple sources for a device. With fraudulent data revealed and removed, the remaining data becomes much more reliable.
Fraudulent data can still be useful to business
Although fraudulent data should be removed from data sets used for most analysis, it shouldn’t just be tossed out. Knowing a data point is fraudulent doesn’t mean it’s worthless: it can be helpful in detecting future fraudulent activity. For example, we are increasingly using fraudulent location signals to tune and develop our fraud detection algorithms. This utilization of bad data points to improve fraud detection means there is no such thing as useless data.
Taking the time to identify problematic data is a painstaking process, but ensures that our customers can trust our data and their results. Ironically, the sheer volume of fraudulent and suspicious data is the very thing making detection easier for the future. For the most reliable information, marketers and advertisers should ensure the data providers they work with are taking the necessary steps to spot fraudulent data. Only with more reliable location data can marketers gain a truer picture of consumers’ real-world behavior.
More from Digiday
TikTok quietly tests product links in posts as it looks to boost its reputation for shopping
TikTok is letting some creators add product links from third-party affiliate networks, including Amazon, Walmart and Target, directly to their posts through a new integration.
Biggest creator lessons from the 2024 election: podcast showdown, TikTok trends and news influencers
This political cycle, election campaigns increasingly integrated influencer strategies, particularly through long-form podcasts on YouTube and Spotify and short-form content on TikTok.
Media Buying Briefing: Some creator shops are ripe for agency M&A as market consolidates
Most agencies have either acquired influencer agencies and platforms or grown those technology or talent-related offerings in-house at this point. Who’s left that could make a move?