What is the difference between the Data Warehouse and CluedIn?

Overview

It is a relevant question. Very relevant. The Data Warehouse is the place that you place all your data in, to make a universal, catch all use case for any data project. Or so we thought. That is the thing with technology, it is hard to predict. Turns out, it was great for reporting. Turns out, it is still good for reporting. But also turns out that many use cases with data today can’t use the Data Warehouse. Why? Because the data is too mature at that point. Sounds silly, but makes complete sense.

The Data Warehouse is designed to be able to run very well known queries on extremely large amounts of data. Guess what? That is not what is needed from the business today! We need flexible, easy access to data from across the business and the less time I spend in cleaning, blending and moulding it, the better.

But isn’t this what the Data Lake promise? Easy access to data for the Data Science team. Easy access, yes. Ready to use? No. You could argue we are a step in the right direction, but still we are not to the point where we can start to get value from data. We also can’t forget that you still have to get data into the Data Lake! Much easier said than done. So as you may have already gathered, we might need something that is in-between the data lake and the data warehouse. The good thing is that our industry has already given this a name(s). The Data Hub or the Data Fabric.