WTF are data lakes?
Bewildering jargon is as much a part of big data as endless reams of spreadsheets. A new term businesses have to wrap their heads around is data lakes.
Data lakes are for businesses looking for an efficient way to store massive amounts of information. But there are costs to building these platforms, and it can become difficult for analysts to find what they are looking for in these big pools of information. Here’s a cheat sheet:
So WTF are data lakes?
Data lakes are repositories where large chunks of multiple types of data are stored in their native format.
Why do they matter?
Most other data platforms will only store data once it has been formatted to fit a particular structure. For example, data warehouses will only accept data once it has been structured into rows and columns, so unstructured data such as phone call transcripts typically can’t be uploaded.
But data lakes ingest data in its raw form, regardless of how the data is structured, which means that a data lake can take in unstructured data like call transcripts, said Trevor Paulsen, a product manager at Adobe. By taking in all formats, data lakes have more data at their disposal.
This allows data-lake users to “keep all of your data, save it and then figure out what to use,” said Michael Hiskey, CMO of data-management firm Semarchy.
Who uses these data lakes?
Theoretically, anyone who wants to mesh together various types of data, like media and marketing companies but also universities, hospitals and businesses in general.
OK. So what’s an example of how they are used?
Imagine you are a CPG company trying to analyze data from many different sources. To determine if certain ads drove more sales at brick-and-mortar stores, you could combine various data formats like the user ID and browsing data from ad servers with your shipment data.
“If an analyst says, ‘I want to create a data set out of these things,’ I can more easily do that because the data lake can hold all of it,” said Max Knight, vp of marketing science services at demand-side platform Turn. “Before, that information existed in disparate systems. The data lake gives you a more universal access.”
Isn’t that what a data-management platform is for?
Sort of. But DMPs are structured around user and media-related data such as cookie IDs and audience segments. Data lakes are more encompassing and can include offline data related to topics like logistics and production, Knight said.
What makes this different from having a huge Dropbox folder?
From a pure storage standpoint, not a whole lot. But unlike consumer-facing file-hosting services, data lakes offer a lot of analytics tools like SQL, R, BigML and Python so that analysts can quickly run through millions of data points, Paulsen said.
Sounds good. Are there any drawbacks?
Cost. If you are storing 100 terabytes with a cloud service like Microsoft Azure, it will cost about $3,000 per month. Data lakes can also be built using free, open-source software like Apache Hadoop, but you still have to pay web developers to build out the data lake.
Another drawback is data overload. Big data doesn’t do any good if the people gathering the data don’t have an idea of what to do with it. And with so much data in one location, it can become difficult for analysts to quickly find the exact data set they need.
“If everything is stored, it’s easy for the true insights that can solve problems to get lost, or even worse, never discovered,” said Michael Collins, an independent ad tech consultant.
Why Turkey is becoming the Silicon Valley of mobile gaming
Turkey’s gaming industry is mobile-first; few, if any, Turkish game developers focus on major console titles. Unlike console developers, who can spend years fine-tuning their games, mobile game developers are able to follow a spray-and-pray strategy, cranking out scores of mobile titles until one catches on.
Google readies new interest-based advertising in next phase of Privacy Sandbox experiments
Google is trialing a new proposal in its Privacy Sandbox initiative called 'Topics' which it claims will facilitate interest-based advertising long after it sunsets third-party cookies in its Chrome browser in 2023.
Member ExclusiveMarketing Briefing: ‘Bad behavior is positively rewarded’: Why brands continue to push the line on social posts
But recent posts, like Pabst Blue Ribbon’s sexually explicit tweet that got its social media manager fired as well as brands like Ruggables, Hellman’s mayonnaise and Peacock, among others, jumping into TikTok’s West Elm Caleb trend on TikTok have some in the industry questioning were the line is when it comes to standing out or going too far on social media.
SponsoredInfographic: The future of CTV measurement
Connected TV has been inarguably one of the top marketing stories of 2021. As advertisers get ready for 2022, the great CTV shift has brought with it a slew of measurement challenges — with inconsistent metrics being chief among them. The good news is that CTV works and works well, and there is a bevy […]
Member ExclusiveDigiday+ Research: Agency remuneration models are poised to change in 2022
Just around one third of agency respondents said that their agency's remuneration strategies would stay unchanged.
ReKTGlobal’s diversified business model provides a road map for other esports ‘holding’ companies
Unlike many esports orgs, ReKTGlobal is a diversified business with growing revenues. But it had to burn brand recognizability to get there.