A Data Warehouse is fanstastic at being able to answer known questions with known data. We still think it is and will be a critical part of your data technologies.
The typical problems that creep in with a Data Warehouse is when you want to ask a spontaneous question or you want to see the data from another angle, you often are bound to the way that this data has been modelled in the fact tables.
Although you can add more fact tables, there is obvioulsy a point where this becomes too much to maintain and the queue of requests just keeps growing as more and more parts of the business are working with data.
CluedIn is typically placed before the Data Warehouse in the overall journey of data. CluedIn is fantastic at replacing the typical ETL layer of the Data Warehouse projects where we are simply better at integrating, preparing, governing and proving flexible access to data to fuel your dimension tables.
With CluedIn in place, if someone was to request data from a different question, you won't have to go back to the raw data to fuel your dimension tables with what you need. Instead you create new columns in your dimension tables and then ask CluedIn to fill the data in.
For more details, you can read the full white-paper.
CluedIn is a platform that is composed of many different components. These components are all docker containers and the application is composed together with docker compose.
We use Kubernetes as the orchestration framework that helps you manage and deploy CluedIn into your production environments.
If we dive into the application itself, the CluedIn server is a .net core application. Amongst other advantages, this allows you to host CluedIn on all the popular operating system environments. We do recommend that you host CluedIn in Unix/Linux environments.
The CluedIn server interacts with many different databases through an enterprise service bus abstraction. This allows CluedIn to be truly and horizontally scalable. The anecdote of "just add more machines" is very much a reality with CluedIn.
The data layer of CluedIn is made up of 5 different databases including:
These components can all run in a High Availability mode and are all enterprise grade. We lean on the shoulders of giants here at CluedIn and hence all of these databases are utilisng the industry leaders in their respective space.
A Data Lake, simply put, is a place to migrate all your data to so it can be more easily available to get from the source and then will typically give you SQL as the ubiqutous language to query across files like it was all in database. The storage costs are relatively low, but the maturity of the data is still very raw.
If you already have a Data Lake, CluedIn is typically used to sit over the lake and mature the data to a point where it is consumable and usable. With CluedIn in place, we really dont' recommend that anyone goes directly to the Data Lake anymore.
If you don't have a Data Lake already, the honest truth is that we would strongly recommend that you address some other parts of your data landscape before you get to that. Although the Data Lake does have value and eventually it makes sense to get one, we often see it as a premature optimisation, where low cost storage and the value that comes from that is not outweighed by the fact, that often, the Data Lake can't show value because there is still a lot that need to be done to mature the data before anyone can use it.
The entire reason why CluedIn exists, is because most projects fail in stitching together different products into a coherent fabric. CluedIn was designed with the stitching first and we grew out the different pillars afterwards.
Let's be transparent and open, there are lots of open source and non-open source tools out there that are great. We believe that you could build something like CluedIn, but it is important to remember these projects take a very long time to mature, are fraught with risk and typically cost many times more than you intended.
CluedIn has the benefit of accelerating you past all that turmoil to the point where you can deliver ready-to-use data to the forefront of your business.
If you look at most Cloud providers today, they will offer all the building blocks of data management so that you can compose a data fabric yourself. This is great and in fact you can plug in many of these products into CluedIn - the problem is not with the individual products, it is more that stitching these different products togheter is REALLY hard and it is not a surprise that analyst firms are reporting that 85% of these projects fail.
CluedIn is hosted in your tenant and environment. We make it very easy to install and host CluedIn to match any cloud or on-premise environment you will need.
We believe that the foundation of your data should never be managed outside your control. This doesn't mean that CluedIn isn't fully managed, in fact, it really is a PAAS type of environment where it is in your tenant but everything is managed for you.
At CluedIn, we like to think we have 1000's of competitors, in that there are many platforms that will help you produce value out of data.
Saying that, with our focus being on stitching together the different pillars of data management, we believe that makes vendors like Informatica, Talend and SAS our main competitors. CluedIn prides itself in being the modern Data Fabric and utiling modern techniques and modern approaches to solving challenges in the data space.
The Data Fabric is simply an amalgamation of the different pillars of the data management category. It turns out that many goals that you are wanting to achieve require you to go through some common pillars. These common pillars is the "Data Fabric". Think of it like stitching together different products into one platform. The value of CluedIn comes from the fact that it was born with this stitching and was not an "after-thought". This means that as other Vendors are starting to consolidate their different products into a Data Fabric, CluedIn started and was designed like this from the start.
Eventual Connectivity is the core Data Integration pattern that CluedIn uses to automate the process of unifying data from different data sources. Quite simply, CluedIn utilises a Graph based, schemaless pattern that allows companies to not have to manually determine how different data sets join, if they can at all!
Without a doubt, there are a plethora of tools and technology that can help companies wield their data. CluedIn will always be part of a bigger technology stack and is designed to work in environments where other systems will either already exist or will in the future.
It is important to highlight that if we can conceptualise that your overall company data pipeline is made up of 100 steps, CluedIn is designed to play a role in many of these steps - but not all.
CluedIn is extremely good at integrating and unifying data and preparing it for universal use by any downstream system.
The answer to this is "Yes", "No" and "If you want". At CluedIn, we are 100% focussed on large, enterprise, complex data landscapes. Because of this we know that change is hard and takes time. Due to this, CluedIn has been designed to fit into existing technology stacks. CluedIn is an extensible and pluggable framework and is designed to integrate well with other data solutions. Whether you already have Collibra for your Data Governance or Informatica for your Data Catalog, CluedIn can and will work to integrate well with those platforms.
Data Management just became cheaper, more reliable, more robust, higher quality. You will have a foundation of data to accelerate any data driven use-case. The value of CluedIn really comes, in how quick it can prove its value and plug the holes that you have in your data story.
Better: For integrating and managing data, our Eventual Connectivity pattern is simply better. Easier to work with, scales better.
Cheaper: According to analyst firms, CluedIn is very fair on pricing and is typically a more cost effective option than the market leaders.
Faster: Not faster at processing, but does more automatically on the data – hence it is faster to get to usable data.
The first point we would like to address here is organizational support. CluedIn has an impressive partner network that can provide the backing of size, resources and experience. We strongly recommend that you utilise our partner network to implement your CluedIn solution.
From a technology standpoint, CluedIn is very much built on the shoulders of giants. Where CluedIn believes that they could not build a better part of the stack, we lean onto industry leaders for databases, message bus, streaming, security and more.