Building a Marketing Data Warehouse – Why, How, When and What?

Building a marketing data warehouse is not a simple project to do but if it’s done properly, there is a numerous business outcome. 

We will go through 3 things.

  1. Why we need marketing data warehouse and what’s so important?
  2. How should we build marketing data warehouse and different approaches ?
  3. What are the tools/skills needed to build data warehouse?

Why we need marketing data warehouse and what’s so important?

1. If you have a complex marketing technology stack

if you are a pioneer, you would have faced a lot of challenges before few years when assembling your marketing technology stack. It was not easy initially as I have experienced myself that many tools features were overlapping each other and end up being a dozens of multiple tools. And the data is scattered and silo. How can be bring together to improve the performance of different layers? 

2.If you have matured marketing technology stack

Some latecomers have enjoyed the maturity of the industry where they escaped from all the challenges faced by the pioneers. And 90% clear on what stack would be perfect for them. but more than the data sync, there are multiple needs for a data warehouse. We had a vision of enriching our marketing Datawarehouse and then supply this data to data scientists for further analysis, deep learning and modelling. 

Companies are moving from the insights provided from Analytical tools to supply this database to the marketing systems. 

3. Building the warehouse to help the business Intelligence. 

Instead of trying to connect business intelligence system with every other marketing tools/channels. Marketers can get the data into Datewarehouse solutions and bring the data into business intelligence tool via warehouse. Why? Because getting the data into BI from the marketing platforms directly takes up data Load work, Latency, etc. Also bringing the data into warehouse always allow you to store the newly cleansed data into new dataset.

Case1: Company like Uber

Here is an example happening on Uber Technologies. or basically you want to store all the data safely for other use cases.  Uber current big data process 100+ Petabytes data with Minute Latency. It has achieved this by building their own solutions

Uber data engineering

It receives data from multiple data points from 

  1. Mobile Application like Rider app, Driver App
  2. API data like weather data, marketing data, user behaviour analytical data, etc
  3. Dispatch (Which is the transport vehicle movement log data)
  4. Map services ( which is the location data)

Through the invention of big data softwares companies could stream, process and analyse petabytes of data in fraction of minutes

Step1: Data Streaming and ingestion from data producers

In uber case, it currently uses open source data streaming software like Kafka which ingest data from any type of databases to one pipeline.

Step2: Data Storage system

Systems  like Hadoop(HDFC – Hadoop distributed File System,) which receives structured and unstructured data , then they provide the functionality of processing those data for multiple use cases using tools like spark

Step3: Multiple Use cases like Data processing

  • Processing of datasets using spark for any big data works
  • Also could used to store those data in different database 

Step4: End use cases of data

Once those data are stored after the above steps, the data are used for 

  • Further reporting, analysing and visualisation in tools like Tableau. 
  • For Machine Learning use cases which connects with environment like python, R, etc
  • or sent to end user for further end customer features. 
Uber ETL - Bigdata-processing
Uber Custom Built Data Processing. source: their medium engineering blog

Once this data are processed and produced in the required format, it would be used for many data driven decisions in realtime. 

  • Finding the top demanding areas where the vehicles are needed.  for food and other services
  • Finding the best performing marketing campaigns data in realtime which allows to increase/decrease the budget
  • Calculating the ROI of the marketing channel
  • Avoiding the manual tedious work of collecting and matching granular level of data
  • Predicting the growth of a campaign or user groups increase in their spending using the combination of cohort analysis and other things.

However all this things are done in a big scale for a company like Uber. Thos could be a long term vision for you but Let’s see some realistic example what a growing startup would need for its need. 

Case2: You are on omni channel company

You could be an omni channel company where you have offline, Online (web), Online(mobile app) , etc channel where people engage and transact with you. You have a situation to integrate your legacy or even the new systems with one another in terms of data sync or transfer or merge, how can we do it? 

Case3: You are an Junior Data Analyst/Data Guy

You are just starting as data engineer or advanced marketer/analyst who basically want to apply basic deep learning to your dataset to learn some new insights. However you don’t have IT help, or IT team is busy with lot of other priorities. And if there Is a way to do by yourself, you would do it. How would you achieve it? 

How should we build marketing data warehouse (different approaches) ?

Now things are getting clear on how we should build marketing technology stack to avoid some of the extra work we need to do. Also, thankfully a lot of data engineering tools has come into the picture to help with joining all this data together. Except the case of cost, this tools solve big problems. If not, there is lot of open source tools to help out this as well. 

Let’s get the approaches from marketers perspective and not from developer perspective. We will go through what’s the difference. The importance of going through is we are using a app

Approach of a Marketer vs Data Engineer 

When our IT team and engineers look at it, this is how they imagine their solution 🙂 🙂 Most of us would have heard the names but not sure how exactly all this works or fit together. I would say this is not going to be our approach. This is basically like a separate domain to learn and achieve this. If you are an IT person with a solid experience, of course you can go in this way 🙂 

Open Source Apache Hadoop Ecosystem

Using a warehouse system which already has lot of native connection to it

This is not going to be feasible for everyone but for some who are heavily using Google Ecosystem of Marketing tools. Also This is not going to give you a 100% solution but can be a stepping stone to achieve. The balance things can be achieved by using other ETL connectors.

The Google GCP Ecosystem of using Bigquery give you the option of using native connectors to bring Google analytics, Google Ads, youtube, etc. Yes this is not enough. Thats where we use an additional connectors like Supermetrics for big query where it basically allow most of marketing tools data to be transferred to big query. I have wrote about this connection here when it’s released > Supermetrics for bigquery

Forgetting about what system we use or what warehouse solution but relying on advanced ETL connectors

Tools like Alooma, Google acquired this ETL automation tool to enrich it’s cloud data warehouse solution and also targeting enterprise companies. GCP, AWZ and Azure have their own data warehouse solutions for people to store the data. Almost all of the marketing tools provide API access to consume their data. The connectors which does the connection was the biggest piece which was making it difficult to do the job. Solution providers popped in to solve this. Alternative tools like alooma came into the market. There are more than dozens of tools provide this solution. 

Best datawahouse automation tools 

Most of them provide a option to transfer data to Bigquery or Snowflake, Redshift, Azure, etc. 

Going on a lean path and produce a MVP based on smaller solutions. 

Using ETL Connectors like Stitchdata or Supermetrics for big query can provide a good learning experience as they allow you free version or trial version to see how they all work together.

See here a overview of how to use this different tools when we try to automate facebook ads data

And this is ecosystem image from Google

Best Marketing Data Warehouse Systems/databases

I want to be honest that am not expert at choosing the best database for it. Or say I have not done extensive research on every step in this whole building exercise. I straightaway gone to Google bigquery because it is easy and one step under the GCP solution with most of the Google tools/native connections. However there are companies which use

  • Amazone Redshift
  • Snowflake