top of page

Building your first Data Warehouse - lessons learned



After building 3 data warehouses from scratch that achieved their goals and objectives, saw many things along the way that could be improved , changed so many others through the process of building - It's about time to reflect the past and talk about the common pitfalls and lessons learned!


** This post aim for : Data managers , architects , data engineers ,executives and data enthusiastic


Step 1 - Define the problem - it is important (very important)


DWH is a project that consume resources - there it is , I've said it! When understanding that - the problem definition needs to be very detailed as it will define the phases and progress of the project.

You will need this part for later on in the project lifecycle when it will seem as it is taking too long and the scope has been changed a bit , it will help you to clear the noises and explain the executive why you are doing that.


Finding the company "pains" will be helpful , I have created a survey and passed it by the exec team and mid-level management (results was spot on)

Some questions this survey asked:

  • How you define success today at your domain?

  • How you measure success today at your domain?

  • In order for you and your team to do your job by a magnitude better - what do you need to have more, in terms of data & data services?


Step 2 - Set clear and measurable goals


The goals should be crystal clear as they suppose to justify the resources allocation and overall spend.

Usually DWH projects aim for several key areas in the organization and should improve some others such as:

  1. Consolidation of the organization information in one place

  2. Unified and align the business terminology

  3. Save analysts time and improve time to insight

* there are few more but let's focus on the main ones...

Here you need to quantify several metrics that effected by the pains and derive out of them your goals , for example : reduce 30% of analysts time by establish pre-build data models ( side note - I did measured that and this is the result ) .


Step 3 - Plan for the future and deliver what is necessary - phases approach


We all would want to shut the doors and come out when the project is completed but we need to maintain a balance between the company needs to the project progress...

In order to keep that balance - you need to use the phases approach , plan the entire plan that will be build for scale and include the majority of the company needs , then set the critical path ( usually the highest ROI data models) and finally - start develop the lean version of it (expend in phases).

This approach will keep the SLA with the company stakeholders while making progress with the project.


Step 4 - Use subject matter experts and do mock-ups


In the end , you can have a SOTA data warehouse but if it will not get adopted , it's not worth the effort , right?

You need partners and experts at your side , this way you will be able to simulate the future product and get feedback when it is easy to fix and implement.

Mock-up is like an un-written agreement for the future delivery - it always emphasis your intention to the other side and create alignment ( even excel file is great here).


Step 5 - Assign metrics owners and store the metrics changes


In order to create an alignment within the company - all metrics should be transparent and clear , and in addition - every metric should have the business owner that takes care to validate it every once and a while.

Once those metrics are defined - you need to make sure that there is only one version of it that being used , and better have it in the DWH or manage it in a way that will reflect the same numbers across all views- in that way , when a metric get changed , it will effect all the viz it is being used at once.


Yes... there is a lot of work here , but the results are great and have high impact on the company , it drives the employees to take a data based decision as the information become much more accessible!



There are few other things that I didn't covered in this post but I will cover in the next ones such as:

  • Choosing the right DWH / DB vendor for your needs

  • Lakehouse - why and when do you need it

  • How to model the data correctly with minimal maintenance

Commentaires


bottom of page