<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4011258&amp;fmt=gif">


Cluedin articles

blue and green-waved augmented data management graphic

What is Augmented Data Management?

There comes a time in every industry when the status quo is challenged. Often this comes in the form of a new market entrant with a vision to improve or fix what was broken about the old way of doing things. Sometimes, they don’t just make things better, they eradicate the problem altogether. Almost always, they advance the industry in the customer’s favour and force traditional providers to evolve in order to keep up.

This is what is happening right now in the world of Master Data Management (MDM). For 25 years organisations have had to accept that attempting to put their master data to use in a meaningful way would be expensive, slow, and almost wholly reliant on IT. It’s no wonder then that so many of them either tried and failed to implement MDM, or avoided it altogether. Meanwhile, advances in technology such as Cloud computing, Machine Learning (ML), Artificial Intelligence (AI) and Natural Language Processing (NLP) have continued apace, leaving traditional MDM solutions in the metaphorical dust.

All of that is changing with the advent of modern MDM systems. The new breed of MDM is built for the Cloud, enhanced by AI, Graph and/or NLP, and democratizes data for use across the business. In fact, modern MDM systems are so different from their legacy counterparts that in many ways they aren’t MDM systems at all.

Enter Augmented Data Management (ADM). Augmented Data Management utilizes advanced technologies such as Graph, AI and NLP to enhance and automate data management tasks related to quality, governance, preparation and enrichment. The automation piece is crucial as it takes the burden of manual, repetitive tasks away from the data engineering and IT teams, allowing them to focus on creating value. It also means that business users are empowered to use data to drive their own analytics and insight-driven initiatives.

In addition to the obvious simplification and automation benefits, true ADM is Cloud-native and delivers maximum value as part of a Cloud-based, integrated data management ecosystem. Data often lives in multiple complex and siloed systems within an organisation, from ERP platforms, Data Lakes and Data Warehouses to spreadsheets, presentations and PDFs. Many organisations have invested heavily in Business Intelligence (BI) and Analytics tools which rely on a consistent flow of high quality data, but the challenge has been getting the data out of its various repositories and into a state which is usable by these tools. ADM bridges this gap, delivering data that is ready for insight more quickly than ever before.

ADM realises the unfilled promise of what MDM should have been, and in the not too distant future will supersede MDM entirely. Organisations that embrace this advanced approach will finally be able to use their data to shorten product development cycles, accelerate go-to-market plans and maximise revenue-generating opportunities. In the quest to become data-driven, ADM is well on its way to becoming a non-negotiable requirement.
Read More
Cloud Native graphic showing 3 white clouds on top of a blue and pink gradient background

Why your Data Management Platform should be cloud-native


The phrase cloud-native gets banded around a lot. If we go by the standard Wikipedia definition, cloud-native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds".

Does this definition really capture what it means to be cloud-native? We don’t think so. We believe there is a fundamental difference between this definition and what it is to be truly cloud-native.

Cloud-native goes so much deeper than just scale. If we take CluedIn, a cloud-native Master Data Management platform, we can identify ten characteristics that identify it as cloud-native. We would argue that if your data management platform cannot meet these requirements, then not only is it not cloud-native, it is also falling short of delivering against modern MDM requirements. Let’s examine each in turn.

1. Getting your data-initiatives started is faster and easier

Your first experience of a cloud-native platform should be how easy it is to purchase, install, setup, and get started. A cloud-native platform installs in your own cloud tenant (e.g. Azure) with one click, a few details on a form, and that is it. You can install custom settings such as your own HTTPS certificate, or custom domains, but the experience is straightforward and simple. Long gone are the days of tedious installation processes where you need to worry about what operating systems to choose or to wait for your hardware to arrive.

Cloud-native doesn't mean you are on docker containers and Kubernetes. You can take the majority of platforms, even the old dinosaurs and have them containerised. You can then quite easily have it composed with Kubernetes. This is not cloud-native.

Installing CluedIn will setup the environment in an Azure Kubernetes Service. This makes sure it uses an Azure Key Vault, a firewall and all the best practices that you would expect - all rolled up into a single price. CluedIn can then be scaled in many different ways, including automatically, manually through the Azure Portal, or by redeploying the application with new Helm Chart updates (replicas).

Specifying your environment is all done using box sizes with a known number of boxes and associated cost. You want things to happen more quickly? You pay more – it’s as simple as that.

Cloud-native essentially means that the entire backbone of operations in your platform can run off a cloud provider’s stack. For example, being able to utilize cloud Platform-as-a-Service (PaaS) services to scale out the processing or size or storage amount. Cloud-“nativeness” should feel like using a Software-as-a-Service (SaaS) platform, just in your own tenant.

At the technical level, a cloud-native platform doesn't require containers, but this clearly is the direction that cloud is taking and investments will be made that expect you to be on containers to reap the full benefits of the cloud. Having Kubernetes as the orchestration framework is only the start. To get the best out of Kubernetes, your application needs to be designed in a way that can take advantage of it.

Cloud-native means that costs should be transparent and predictable. Like most companies, you’re probably more than happy to spend money when the costs are clear and the ROI is obvious. What you probably don’t want are hidden or unpredictable costs.

Cloud-native does not necessarily have to mean SaaS. We believe that your most important data should stay in your environment, but to take full advantage of the cloud cost model, multi-tenancy is needed. The future of data sovereignty is having cloud providers offer smart ways of slicing the same digital assets and guarantee the tenancy and isolation. There is no easy technical way to solve this, it is more about trust - i.e. do you trust a particular vendor – say CluedIn - to host your data on your behalf? This is not the same as asking you to trust Microsoft with your data. Microsoft have thousands of full time Infrastructure and DevOps people, CluedIn does not (yet!).

When asked what it means to be cloud-native, an industry veteran replied that "it needs to be multitenant." This is interesting, and not necessarily wrong. Rather it is a clear indication that cloud is about an architecture where data is stored once and consumed separately. Even from different companies.

At CluedIn we endeavour to deliver a SaaS experience, with the benefits of PaaS “under the hood”/ The critical difference being that your master data always stays in your environment - even if it is used in a public cloud.

2. Access to elastic scale is at your fingertips (if your technology stack allows it)

Cloud quotas aside, companies theoretically have access to endless cloud resources on demand. This is immensely powerful in delivering accelerated time to value. But just because you installed an application in the cloud, it does not mean that the technology will take advantage of it (quite the opposite, in fact). With easy access to elastic scale comes a warning, as we have seen some customers’ budgets spiral out of control because they took advantage of the scaling possibilities. There are tools to help with this, but the temptation can be to ignore these rules because of a desire to just get the job done.

3. Other cloud-native services will play nicely with you

Very similar to the architectural wins we saw pre-cloud, the same principle of realising greater efficiencies from services that are co-located applies in the cloud.

Although companies embark on many different data-driven initiatives, they typically tend to fall into a handful of buckets. These include Reporting/Business Intelligence, Data Science and Process Automation. The majority of these services are all fuelled by data. Having the sovereignty of your data on the same backbone means that the insight-generating systems can get to the data more easily.

4. Cloud-native systems work with the economics of the cloud.

Moving to the cloud can be a double-edged sword, with scalability, flexibility and interoperability gains being counterbalanced by escalating costs. The cloud only makes economic sense if it is being used in the way it was intended to be. For example, elastic compute is only more efficient when you have the ability to scale down as well as up. You should be wary of tools that have made their way to the cloud rather than having being designed for it in the first instance. This is because for an existing on-premises platform to truly become cloud-native, it will inevitably need a major system overhaul.

One of the benefits of choosing a cloud-native platform is there is a better chance that the architecture of the platform is more sound and suited to that environment. That is a huge generalisation, and it does not mean that architectures that were built pre-cloud are less robust. Rather it means that architectures built for the cloud should inherently be more robust as a result of the economic model imposed by cloud platforms. The first fundamental element of designing for the cloud is to properly separate state from stateless. This is critical to allowing parts of your application to scale up and down. Your application needs to have parts of the platform which can work with 100 instances or none at all. This is the stateless part of your application. As for the state part, it is very cloud-like to have your data persisted in cheap storage. The benefit of this is that when your platform is not being used at all, you pay either little or no money. A truly cloud-native solution will cater for:

  • Unexpected network latency (the time taken for a service request to travel to the receiver and back).
  • Transient faults (short-lived network connectivity errors).
  • Blockages by long-running synchronous operations.
  • A host process that has crashed and is being restarted or moved.
  • An overloaded microservice that can’t respond for a short time.
  • An in-flight orchestrator operation such as a rolling upgrade or moving a service from one node to another.
  • Hardware failures.

5. The majority, if not all, of the R&D investment is going into the cloud

The majority of technology vendors today are investing in making sure that their offering is suited to and available via the popular cloud providers. In fact, many providers are dropping support for their on-premises and non-cloud-native solutions. This is because the cloud offers the best possible benefits for all players involved, including the customer, the vendor and the market. A properly cloud-native platform is working towards providing customers with a solution that is easier, cheaper and faster.

6. Cloud-native embraces and welcomes the idea of huge data.

For years companies and vendors alike have grappled with the challenge of managing huge amounts of data. One of the benefits of major technological advances is that they don’t just make something easier or better, they eradicate the need to even think about it in the first place. The properly architected cloud solution acknowledges and takes advantage of the fact that data storage is cheap. If you look at the breakdown of the costs associated with CluedIn, 2% is storage. Why? Because storage size and the amount of data should not hinder your ambitions.

7. Cloud-native lets you focus on the things that are important.

Since going cloud-native, we have not spent a single day worrying about the security of our virtual machines in the cloud. We have not spent a single day hardening the boxes that are used in processing of our data. And we have seen the same thing happening with our customers. We realise at CluedIn that our speciality is data and hence we rely on cloud services to manage our firewalls, networks, and more. We focus 100% on providing the best Master Data Management platform on the market, allowing our customers focus on doing whatever it is that they do best too.

8. Cloud-native delivers the required services, as part of the service.

Because CluedIn is cloud-native, our customers inherit several cloud services at no extra cost. Something as simple as cost management is natively available in all services, if they have been designed to take advantage of the tooling. Industry analysts Gartner have developed a full FinOps category, but in our opinion a cloud-native service provides this inherently. In addition, there are numerous services required for security, policy management, performance efficiency, alerts and more. All of these are essential services, that a cloud-native platform should deliver as standard, not as an afterthought.

9. Security, authentication and authorization form part of the ecosystem.

Whether it has come as a result of advances in the technology itself, or it is cloud-native specific, security, authentication and authorization should be ingrained inside the services. Genuinely cloud-native solutions have embraced the idea that access, permissions, authentication and authorization are often provided by third party services. This "context" provides more free flowing access to services, connectivity and discovery once properly authenticated. For CluedIn, being cloud-native means being cloud-aware. This means that the CluedIn platform is aware that it is being hosted in the cloud and sits inside a specific context of security - providing access to the services that sit underneath the platform.

10. Dependencies can become services, allowing products to move faster.

A cloud-native solution supports the idea that users should be able to take full advantage of the native services provided by cloud providers. When CluedIn is deployed into Microsoft Azure it uses the underlying managed disks, the underlying logging, the underlying security and has the option of using the respective database services. This allows our product team to walk in on a daily basis feeling like a team of 400+ developers, because we know that many of the wins we are achieving in our platform come as a knock-on effect of us using the services which huge teams at Microsoft are working on, on a daily basis.


There’s a lot more to being cloud-native than simply being able to scale and run scalable applications. Scalability is a big part of it, but true cloud-nativeness is a mindset and a attribute that runs through the very core of a platform. Most companies are already well on their way to realising the advantages the cloud offers, and there is no turning back the tide. Those who will see the greatest scalability, efficiency and flexibility gains will embrace truly cloud-native platforms and reap the benefits of a cloud-first approach.

Read More
green Data Mesh graphic

Data Mesh: The Next Frontier in Master Data Management?


Data is difficult to manage, especially master data. Master Data Management is essentially the process of controlling the way you define, store, organize and track information related to customers, products and partners across your organization. Traditional data modeling requires that all business entities are defined in advance and cannot change throughout the life of the enterprise system; this limits flexibility and prevents you from easily integrating new systems with existing ones. What if you could handle master data management in a more flexible and agile way? Could it make managing master data much easier?

Data Mesh is a relatively new approach to mastering data, and if its advocates are to believed, could represent the future of master data management.

What is Data Mesh?

Data Mesh is a platform architecture that takes a decentralized approach to managing data. Fundamentally, it’s about treating as something that is separate from any single platform or application. Under the Data Mesh philosophy, data is treated as a democratized product, owned by the entire organisation and managed by domain-specific teams. Each team is responsible for the quality, governance and sovereignty of its own domain, and this data is then provided for use by the rest of the business.

What problems is Data Mesh trying to solve?

The theory behind Data Mesh is a noble one. In essence, it is seeking to solve the fundamental and pervasive issues that have dogged traditional Master Data Management systems for years. In order to realise any value from these initiatives, organisations have been forced to wrestle their data into rigid structures and compositions before they can even get started. Cleaning, preparing and integrating data is hard, time-consuming and expensive, which is why so many Master Data Management projects fail before they've even begun. It's not uncommon in a traditional MDM project for it to take six months to get just one domain operational. Six months! Considering the speed at which customers, markets and competitors change, this timeline simply isn't acceptable. By rejecting the heavily centralized and monolithic models of the past, Data Mesh is trying to do a good thing. The question is whether Data Mesh is a practical alternative.

The challenges of Data Mesh

Despite the buzz around Data Mesh, there are some fundamental questions that need to be answered before you can decide if it's right for you. Firstly, you'll need to consider whether your organisation is operating at a scale at which Data Mesh makes sense. Complete decentralization of data brings its own challenges and risks, as it depends on each of those domain-based teams having the necessary skills and experience to manage the data they are responsible for. Without some level of centralised management, data silos, duplication and governance issues will inevitably arise.

Data Mesh is also a more expensive option. If we accept that every domain-based team will need people with some level of data management experience, and that there needs to be another team to oversee them, suddenly this looks very expensive. For all but the largest of organisations, Data Mesh is most likely cost-prohibitive.

It's not just about cost, it's also about value and building a compelling business case. In theory, federated ownership should lead to quicker learning cycles and accelerated ROI. In practice, if each domain-based team needs to invest in some level of data analytics and engineering expertise before they've even procured any technology, it's going to be very hard for some departments - let’s say the HR team which is responsible for Employee Data - to justify and build a compelling business case.

Which brings us to another point. You cannot buy a Data Mesh. There is no off-the-shelf product that will enable a federated approach to data ownership. It is about so much more than technology and tools. It is a topology, a guiding principle, and in order to realise its value it requires a mindset change and cultural shift that many organisations simply are not ready for.

A means to an end

We've established that Data Mesh is not right for everyone. But that doesn't mean what it seeks to achieve is wrong. Quite the opposite, in fact. There is only way to allow organisations to use their data in ways that will actually help them to adapt to market shifts and serve their customers and stakeholders better. And that is to rip up the current master data management "rulebook" and start again.

It is possible to share data ownership between the IT team and business users without requiring the latter to be data engineers or scientists. It is possible to automate manual tasks like data cleaning, enrichment and integration and save hours of time and significant sums of money. Most importantly, it is possible for sets of data to be treated as products which are universally useful across the business. Truly democratized data can only come from a platform that benefits the many, not only the few - as is the case with Data Mesh. The future therefore lies in a modern approach to Master Data Management that has the same ambition as Data Mesh, but which makes the means of achieving it accessible to all.

Read More
graphic of hands grabbing a circle of light

The Future of Master Data Management: Traditional vs. Modern approaches


Master Data Management (MDM) has been around since the mid-1980s, but has really come to the fore in the last decade, with many of today’s data governance efforts built on top of existing MDM strategies. This has been driven by the advent of Big Data, an increased focus on Business Analytics and Intelligence, and growing adoption of Machine Learning and Artificial Intelligence.

For the past 25 years or so there have been no major leaps in how providers have built or provisioned their MDM offerings. Traditional MDM solutions still require you to implement strict controls over every aspect of your master data management process—from data acquisition to data storage, and from maintenance and modification to security and access control. These systems were built for the on-premises, siloed institutions of the past where data ownership lay almost exclusively with the IT department.

Modern approaches are more aligned to how most enterprises operate today - in a hybrid, highly distributed and fluid fashion. Data is a valuable business asset, which means that technology and business users are equally responsible for its maintenance and use. This does not mean that everyone in the business needs to be a data engineer or architect. What it does mean is that everyone is, to some extent or another, a data steward and a data citizen. It is the job of technology to enable these roles and ensure that everyone with a stake in an organisation's data benefits from its potential. Which is where Modern MDM comes in.

What is Master Data Management?

At a fundamental level, Master Data Management (MDM) is the process of creating and maintaining a single, consistent view of your organization's critical data. MDM is closely related to data governance, which can be thought of as rules for how data is collected, processed, stored and accessed. It also includes policies on how data should be handled, such as how long it should be retained and what access permissions are granted to different groups of people or individual data owners.

Master data is the set of identifiers that provides context about business data. The most common categories of master data are customers, employees, products, financial structures and locational concepts. Master data is different to reference data, which is data that refers to other objects, but does not contain identifiers that represent different types of master data entities. Whether there is still a need for reference data in the context of what can be achieved with modern MDM is debatable, but that's a discussion for another time.

What's the problem with traditional Master Data Management solutions?

It has been estimated by Gartner that up to 85% of MDM projects fail. That's a big number. Little wonder then that so many organisations have been burnt in the past and aren't exactly falling over themselves to start another MDM initiative.

There's a number of reasons why this number is so high:

  1. The upfront planning process - data profiling, analysing and modelling is time consuming and expensive. Many traditional MDM projects take over a year to deliver any ROI at all.
  2.  A domain-by-domain approach, such as that used by traditional MDM systems, causes complexity and creates new silos, restricting how the data can be used.
  3. Traditional MDM demands high manual and technical intervention, which is both costly and time-consuming.
  4. Because traditional MDM systems are built on relational databases with only direct relationships, connections are manual and add to the maintenance overhead.
  5. Due to the upfront profiling and modeling requirements, you're always playing catch-up with your data as it changes. This adds to the complexity and need for manual intervention, further delaying projects.

In spite of all of the above, the fact remains that businesses need to be able to use their data to fuel the projects that will move them forward. Whether these are customer, product, supplier or employee focused initiatives, they all rely on data to provide insights to inform them. At the moment, many organisations are using their data in this way, but the data is neither consistent nor reliable. Which means that the results and recommendations aren't trusted either.

The modern approach to Master Data Management

Modern MDM seeks to solve the above issues in a number of ways.

  • By managing all of your data - master, meta, reference, structured and unstructured. Suddenly, the potential use cases for your data have multiplied exponentially.
  • By eradicating the need to model your data upfront. Modern MDM embraces data in its "raw" form from hundreds, if not thousands, of data sources. The potential cost and time savings are huge.
  • By removing repetitive and manual tasks from the outset. Automating manual tasks like data cleaning reduces the burden on the client and frees time and resources to work on value-orientated tasks instead.
  • By being truly Cloud-native. Most traditional MDM platforms were not born in the Cloud, they were built for an on-premises, highly structured environment and then tweaked for the Cloud. Modern MDM platforms were built for the Cloud - which means that getting up and running is quicker and easier, you can scale up or down at pace, and you benefit from the Cloud economic model.
  • By providing proactive data governance. Establishing trust in data means having full visibility of its lineage and controlling what happens to your sensitive data in a transparent way. Meeting compliance requirements and demonstrating how data is protected won’t slow you down anymore.

You may be wondering what is so different about modern MDM systems that makes all of the above possible. One major difference is that modern MDM systems like CluedIn are built on a NoSQL, schema-less database called Graph. In the world of Graph, the relationships between the data are as important as the data itself.

A really simple way to think of it is similar to the difference between organising your data into neat rows and columns in Excel versus jotting it down on a whiteboard. With the whiteboard you can visualise the relationships between the data and add the connections as they emerge. This is exactly what Graph does - as the data is ingested, it allows the patterns and relationships to surface, and is then able to organise it into a natural data model. LinkedIn, Facebook and Google are all built on Graph, and the same principles of schema-less, scalable modeling now apply to MDM.

What does the future of Master Data Management look like?

In many ways, the future of Master Data Management doesn't look like Master Data Management at all. Where traditional MDM systems were siloed and slow, modern platforms are integrated and quick. Where the old way of approaching MDM dictated set rules and structures, the new way embraces freedom and flexibility. And if we accept that these concepts shouldn't only apply to Master data, but all data, then the concept of Master Data Management becomes almost entirely redundant.

At this point in time, CluedIn is the only MDM platform that uses Graph. This will change as established vendors and new market entrants recognise how powerful Graph can be when applied to the management of business data. And that's a good thing. Right now, forward-thinking businesses that want to use their data to react to market forces, competitive advancements and customer preferences have a very limited choice: traditional MDM or CluedIn. As the market continues in this direction, a new category will emerge and we will no longer talk about traditional or modern approaches to MDM. In fact, there's a very good chance that by that stage, we won't be talking about MDM at all.

Read More
clock graphic with rocking chair icon on the face of the clock

Why the time has come to retire your reference data


In the world of traditional data management, reference data and master data are treated as two different categories of data. Reference data is used to classify or categorize other data, and master data is business critical data which is shared by multiple systems, applications, and processes. Conventionally speaking, examples of master data include customer data, product records and vendor data. Reference data includes code lists, taxonomies, and hierarchies of data, amongst other things.

But times have changed, and the advent of modern Master Data Management (MDM) – an approach which does away with old-fashioned classifications and hierarchies – essentially means that there is little to no difference between reference data and master data any more. In many ways, reference data is a relic to technology that forced us to denormalise models and treat data in an unnatural way. Everything is considered a lookup today, including what was traditionally thought of as reference data, such as colours, countries and currencies. In reality, reference data is just master data, and master data is just…data – you get where this is going?

In the same way we use countries in lots of data, we also do the same with Domains in general. In the world of Graph (which is pivotal to modern MDM), Entities connect to Entities, not Entities to Properties - as with reference data.

In fact, all arguments to maintain reference data can easily be quashed by the more modern approach. Reference data muddies the water and overcomplicates the MDM discussion. It could even be argued that master data does the same.

If master data is slow moving then reference data is even slower. Historically, reference data is managed differently because it is very static and rarely changes. Why does that even matter? In classic database design, you don't call tables different things just because of the data they contain, you call them tables.

Metadata that refers to reference data sets may document:

  • The meaning and purpose of each reference data value domain
  • The reference tables and databases where the reference data appears
  • The source of the data in each table
  • The version of the reference data that is currently available
  • When the reference data was last updated
  • Maintenance description for the reference data
  • Business data stewardship information for the reference data

In the world of ontologies, this is no longer needed, and it always makes sense to remove unnecessary steps in a process. Wikipedia is the best example of this. Wikipedia is a web of objects that talk to each other. There is no differentiation between reference and master data, data is data and objects are objects. A country, is a country. A currency is its own thing that has relationships to other objects.

Master data is data that relates to the business entities that provide context for business transactions. Unlike reference data, master data values are not usually limited to predefined domain values. Business rules typically dictate the format and permitted ranges of master data values. Common organizational master data includes data concerning:

  • Categories such as individuals, organizations, roles, customers, citizens, patients, vendors, suppliers, business partners, competitors, employees and students.
  • Products, internal and external, inventory, and related concepts.
  • Financial structures, including general ledger accounts, cost centres, profit centres, etc.
  • Location concepts, for the organizations and individuals and other entities that concern the enterprise.

In the context of a classic Relational Database, the idea of having a Countries table denormalised in order for it to be used to reference other tables sounds like a good idea. However, the future of MDM is widely acknowledged to be based on the Graph. In the Graph world, you do not denormalise to tables, you denormalise to records. With this flexibility, each record can evolve in its own way, providing its own schema, it is not tied into an expected schema that matches all other Countries, for example.

In many ways, the sooner we stop talking about master data, the better. What should we be saying instead?
We should be speaking in Domains, that is it. Domains are consumable and understandable by all. As soon as you talk about master data, the first question that usually crops up is "What is considered master data"? Why add that extra layer of complexity? Domains are a key part of MDM but they are in ALL data projects, MDM does not have the monopoly on Domains.

What was considered MDM, has completely confused what should be explained very differently.
Whether you call it Data Mesh, Data Fabric or modern MDM, there is definitely a need for SOMETHING to translate all of the data that sits across your business in an easy, scalable and agile manner. Unfortunately, MDM has traditionally involved extremely tight and rigid demands on data, inherently taking the approach of "nobody change a thing!" Guess what, everyone changed everything - and your upfront, schema-driven, top down approach didn't work!

The traditional Data Warehouse also promised this, but similar to traditional MDM it leans more towards having rigid domain tables to rule them all.

Managing reference data properly is important to any organization since reference data carries the context of data transactions through its semantic content (code value descriptions, location data, and other contextual information). Reference data can be used to drive business logic that helps execute a business process, designate an application to perform specific actions, or provide meaningful segmentation to analyse transaction data. Also, mapping reference data often requires human judgement, so the need for intervention by business data stewards in the reference data management process cannot be overlooked.

Reference data management was traditionally thought of as important for several reasons. Reference data:

  • Describes the structures used in the organization (internal department codes, internal product codes, employee organization codes, internal location codes, etc.)
  • Describes the common data used in organizations that are external but connected to the organization (e.g., geographical, currency, country, diagnosis coding structures)
  • Provides assistance and support to analytics and business intelligence (e.g., classification codes).

Organizations with a high demand for data entry, including healthcare, insurance, and government entities, experience significant data quality challenges due to improper coding of reference data values. These errors can be quite costly, in several ways. Additionally, many organizations rely on hundreds of individually developed reference files or tables, and each instance requires updating and periodic quality review. In fact, it is a big reason that we see companies still working and managing reference data in Excel! Since most organizations do not have sufficient staff to perform the reference data tasks, these activities may not be performed; therefore the reference data is outdated, causing errors in application performance and data integration.

So where do we go from here? If we look five years into the future, the modern MDM movement will make it clear that reference data is a relic of the past. Reference data is just data, master data is just data. However, just talking in data is still too abstract. The sooner we steer the data discussion towards speaking about Domains, the easier it will be to generate insights with our data. Having initiatives to move the needle concerning Domains such as Customers, Products, Issues and Vendors will move companies closer to insight and further away from unnecessary complexity.

Read More
6 question marks scattered across green background

The six questions you need to ask to become a data-driven business


The term “data-driven business” refers to an organisation that uses data to inform or enhance decision-making, streamline operational processes and ultimately to fuel revenue and growth. Whether or not it is possible for any business to be solely data-driven is another debate, but there is no doubt that those who get close to it are adept at turning data into insight, and at using that insight to propel the business forward. While most companies today would probably cite becoming data-driven as a crucial enabler of their wider goals, there aren’t many that have achieved it. Google, Facebook, McDonalds and UBER definitely fall into this group, but these are industry heavyweights and represent the exception rather than the rule.

What does that mean for everyone else vying to achieve data-driven status? Like many things in life, it starts with the basics and builds from there. Even the big boys had to start somewhere!
All truly data-driven businesses have something in common, aside from the obvious operational and competitive advantages. They can all answer six vital questions.

  • What data do we have?
  • Where is the data?
  • What is the quality of the data?
  • Who owns the data?
  • Who is responsible for each step of the data journey from start to finish?
  • What happened to the data as it transitioned from raw to insightful?

Why is it even important for you to be able to answer these questions in the first place? There are the obvious compliance and regulatory reasons why you should, but for now let’s focus on what your business could achieve if you had the answers to these questions.

What data do we have?

Once you have experienced one win as a result of seeing data really work for you, you’re hooked. This could be using data to optimise processes, lower operational costs, find more customers, attract great talent, monitor trends in the market and much more besides. Knowing what data you as a company have in your arsenal is the first trigger to inspiring these types of insights. Insights can come from manual discovery, or can come from using technology to find patterns in the data and bring them to your attention. We believe in being able to walk before you can run and it is not necessarily a bad thing to start gaining insights through manual discovery.

For example, if you have a list of customers and a list of support tickets, you might want to know which geography causes the most support tickets. With a pattern-driven approach, it is not so much about asking the questions of the data, but rather about allowing the data to reveal interesting trends. The likelihood is that there will be patterns hidden in the data that you would not proactively ask for – e.g. churned customers took over 54 hours to have their support tickets resolved. This insight may then lead you to hire more customer support representatives to bring down the average answer rate or have an internal SLA that no ticket takes more than 24 hours to answer.

Where is the data?

Knowing where the data is and where it has come from is an important regulatory requirement, but in the context of achieving some type of insight, knowing the answer to these questions is vital to establishing trust in the data from across the organization. If someone on the street handed you a credit card and said "Feel free to use this!” the first thing you’d probably ask is where it came from. Without this lineage, there is no trust. And most notably, in this analogy, you would want to know if the source of this credit card is reputable. 

Also, although duplicate data is not necessarily a huge storage cost issue anymore, it is a big operational issue. Of course, this also depends on exactly how much duplicated data you have – petabytes of it can be quite costly! Which also means that knowing where your data is can help you to reduce operational costs too.

What is the quality of the data?

In the era of fake news and AI bots that are indistinguishable to humans, it is more important than ever to establish integrity in the data you are using to make decisions. There are a plethora of shades of data quality, and every shade will correlate with a different level of confidence in the "usability" of the data. It should also be pointed out that there is no such thing as right or wrong when it comes to data, and no matter how high quality the data is deemed to be it will bring with it an inherent level of risk. 
In the spirit of keeping things technology-agnostic and high-level, think about the times you have made a decision with confidence. What gave you that confidence? Was it that your research came from a reputable source? Was it because the voice of the crowd all agreed with one approach? Was it your gut feeling?

Just like everyone else, you probably make decisions on a daily basis using a combination of these techniques to make your final judgement. It’s much the same with data - determining quality is about building up your confidence in making a decision. The challenge with data is that it doesn't have to adhere to any laws of physics, hence any judgement made on data quality is a heuristic attempt to provide metrics on which a decision can be made with an acceptable level of confidence and risk. You can read more about how CluedIn interprets and measures the shades of data quality here.

Why does data need ownership?

In many ways, it doesn’t. In fact, it needs much more than ownership. This is why we have frameworks in Data Governance like the RACI model, in which the four dimensions of "ownership" are defined as the minimum requirements for an ownership matrix relating to data and journey that data takes. Like any process you have within a business, if no-one is responsible for it, it often grinds to a halt. As you have probably experienced in other parts of your business, sometimes a task can be blocked by the most minuscule reason, but the bottom line is - it was blocked. This is often down to a lack of ownership for that part of the process. 

Who is responsible for each step of the data journey from start to finish?

The data journey from source to insight has some very distinguishable steps, and each of these steps requires you to attack the data from a different angle. Irrespective of the  technology you use to get from source to insight, the generic journey includes pulling data from a number of sources, integration, normalisation, standardisation, deduplication, linking, enrichment, profiling, mapping and transformation. (Honestly speaking, we could easily add another 10 or 15 stages, but let's stick with this list for now!). In many cases, each of these steps is a comprehensive task and responsibility in its own right. For example, the normalisation and standardisation of data is easily a full time job for many data stewards. Hence, if a full supply chain of ownership of the steps in the process is not established then it should not be a surprise that the flow of usable data can break down – often for the most mundane of reasons.

What happened to the data as it transitioned from raw to insightful?

Let’s consider for a moment why it is that data needs lineage, and different parties to take responsibility for the entire data journey, yet other processes we run within the  company don't demand the same level of stringent needs? Could it be that this lineage would actually be very useful in all parts of the business, but because of the digital nature of data it is inherently easier to build a digital footprint? The same cannot be easily said for passing around Excel sheets from department to department, for example. Any explanation of how this Excel sheet "came to be" simply isn’t something that can be achieved simply through the use of Excel. The audit trail of the transformation of data from source to insight is often just as useful for “explainability”  as it is for highlighting parts of the process that can be improved or are error-prone.


Now that we have established the questions you need to answer in order to start your journey to being truly data-driven, we should look at how technology can help you to both answer the questions and use those answers to best effect. The best way to do this is to approach it from both the asset and the record level – which in effect means getting both the birds-eye and granular view, and bringing them together in a way that makes sense. One powerful and increasingly popular combination is to use Microsoft Purview and CluedIn. To some degree, both Purview and CluedIn answer all of the questions above, but at different levels. The bottom line is, you need both and in some ways, you can't have one without the other, particularly if your data technology stacks are all housed within Microsoft Azure.

Read More
data science graphic with multiple shaped icons on a pink background

Driving data science with a data quality pipeline


High quality, trusted data is the foundation of Machine Learning (ML) and Artificial Intelligence (AI). It is essential to the accuracy and success of ML models. In this article, we’ll discover how CluedIn contributes to driving your Data Science efforts by delivering the high quality data you need.

CluedIn not only provides your teams with tooling that improves the quality of the data that is fed to your ML models, it also simplifies the iterations by which you can evaluate their effectiveness.

The five Vs of Data Quality

The term “data quality” is overused and can mean many things. As Machine Learning and Big Data are still both evolutionary fields with developments in each complementing the other, we’ll approach it from an angle you may already be familiar with – the five Vs of Big Data (Volume, Variety, Velocity, Value and Veracity).

Read More
paper aeroplane icon on pink and green gradient background

Breaking down the barriers to entry for MDM


There are always hurdles (many are necessary) to starting any data initiative. Do you have the right team, the right technology, the right budget, and more. In fact, we are really underplaying it here, there are literally 100s of decisions that need to be made before kicking off an internal data project. One of our big focusses at CluedIn is to help you limit the number of hurdles in a positive and constructive manner.

Here’s a couple of good examples. Imagine trying to buy technology and with a budget of $10,000, and the technology you want is $9,000. This is a good example of a hurdle that doesn't exist. If it was $11,000, then suddenly, many months of effort has potentially been added while you figure out how to secure a bigger budget or how to negotiate with the vendor to discount the price. This isn't always a quick process.

At CluedIn, we considered the entire sales process from the point of view of the customer, and put in a strategy to make sure that WE, as the vendor, are removing as many hurdles as possible. Let's dive in and look at some of the options we’ve put in place to make this possible.

Self-install, start when you’re comfortable.

I am an engineer at heart. I have been a software engineer for the last 15 years and I can speak only for myself when I say that I need to use software before deciding whether it’s a good fit. I also realise that when I do this, I really only get a 25% view of what that software is actually capable of. At CluedIn, we have recently added support to deploy CluedIn directly through the Microsoft Azure Marketplace in a Managed Application Offering. This makes CluedIn dead easy to get started with by offering the data sovereignty of PAAS, with the ease and scalability of SAAS. I can get a rough idea of the platform, but am quite happy to accept that some things might not work as I expect or it may not reflect the final nature of what I will be getting.

Start sending data to CluedIn straightaway - no need for upfront modelling.

Let's be clear, having a plan makes so much sense - you should plan. But, you do not need your data model to be perfect and future-proofed before you implement MDM. If you wait until you do, you will literally never start because the perfect data model is a flawed and impossible concept. CluedIn's data model was designed around the idea of source control. This might get a bit technical, but it is literally the best analogy to equate to the way CluedIn stores, changes and processes data. You don't build code with perfection in mind. You evolve it and sometimes you might even fundamentally rewrite something. Without a doubt, source control systems like GIT have proven that they can manage huge repositories in any type of change that you could expect now and not even expect to happen in the future. We provide the same with data. At CluedIn, you do not need to perfect your data upfront, you can delay it until a point when it makes sense. This cannot be said about the majority of MDM systems. At CluedIn, although you will benefit from mapping the data on entry, and mapping the primary and foreign keys, it is not enforced. What is the value of this? Should this not be the time to actually map this data? The answer is, categorically, no. There are so many benefits one can get from simply placing data into CluedIn such as Data Quality Metrics, Data Lineage, Sensitive Data scanning and Data Sharing. Once you are ready, going back to the mapping of data and updating it will simply require the data to be reprocessed and CluedIn will handle the change on your behalf.

No need to complete your Data Governance program before you start.

CluedIn is different – think of it like the Agile alternative to Project Management. You can build and discover and modify and adapt along the way.

Let me reiterate, a plan is a good thing. But at some point, overplanning leads to a lack of agility to change in the face of necessary change. CluedIn is an Agile MDM platform, in that it expects change, it expects things to go wrong and is prepared for that. Which means that bumps in the road will not fundamentally kill a project with CluedIn.

Let business rules evolve over time.

We have mentioned this in previous posts, but CluedIn removes a huge hurdle from common MDM initiatives, which is to develop business rules to either detect and identify possible data quality issues or to setup rules to invoke an action once a certain condition is met. Instead of asking you upfront to manually develop these rules, it turns out that most of the rules that you actually want to build will come from working with the data, allowing the issues to surface and then putting the proper fixes in place. In the majority of cases, you won't be able to develop these proactively, but reactively. CluedIn embraces this idea, by onboarding data into the platform, and then using surfacing tools to help detect and automatically place business rules in place to fix the existing issues and prevent that problem from making its way through the system ever again.

Zero downtime upgrades.

Let’s face it, upgrading software is a massive ****ache in an enterprise environment. That complexity is escalated when you have software that is more of a "platform" that allows you to extend. Now that CluedIn offers generic, REST-based extension points in the latest version, it makes the process of upgrading painless. CluedIn can be setup to auto-update or you can opt-in and manage it yourself, giving you the choice. Considering the core of CluedIn is based off a schemaless data model, with support for reprocessing, then any actions that need to be triggered on new updates can be automated as well.

Auto-scaling across the entire cluster i.e. scales disks, CPU, RAM, network.

In the true spirit of CluedIn, we are not interested in providing a solution that is faster and better. We attempt to remove the need to actually do something in the first place. CluedIn is designed and setup to auto-scale according to your business drivers. Typically these factors will either be working towards a particular time and date, or a particular budget, or even "spend as much money as possible to get the job done as fast as possible." In saying that, although all of the above is possible, it still needs to make economic sense in most cases.

Native integration to 27 Azure services in just a few clicks.

Even with a multi-cloud strategy, native integration to the cloud provider you are hosting your platform in, is without a doubt, hugely valuable. CluedIn is focused on being the most native MDM solution on Microsoft Azure. Sure, we work and have many customers on the other cloud platforms, but on Azure we are easily the most native and obvious choice due to the number of native integrations we support. Want to use Azure Active Directory for authorization, SSO and authentication? One click away. Want to enable Azure Defender, Azure Sentinel? One click away. Want to share mastered and cleaned data from CluedIn to Azure Synapse, Azure DataBricks, Azure Machine Learning Studio? One click away. Want Azure Purview to register and govern all the data movement in CluedIn? One click away.

We want to provide a “think it, done!” type of experience for Microsoft Azure customers. If you have an idea, you should be able to make it a reality within moments, not weeks.

Kubernetes backbone means support for all environments.

It is well known that in the MDM space, many leading vendors take months to just install and setup. Without a doubt, the future of infrastructure is containers and Kubernetes. Kubernetes brings an abstraction that isolates environments, operating systems, and more. This essentially lowers the entry barrier, due to the abstraction, but also due to the native support for all cloud providers. In addition to this, Kubernetes brings some of the pieces expected for modern, enterprise applications such as auto-scaling, zero-downtime upgrades, and more.

Endorsed by Microsoft – already!

Just as we are investing in our Microsoft relationship, Microsoft is also heavily investing in CluedIn. CluedIn was one of the first applications to provide the Managed Application Offering for MDM on the Azure Marketplace. This provides a great combination of security and data sovereignty, combined with the beauty of a managed service.

Built for the enterprise - i.e. Logging, SSO, Telemtry, SSL, DNS, Inbuilt backups, Budget Allocation (scale to budget), Azure Defender, Azure Sentinel.

Just like knowledge of a particular industry will accelerate implementation, knowing what is expected by enterprise customers is also crucial to generating and sustaining momentum. At CluedIn, we know our customers intimately and have our finger on the pulse of what they will expect in the future from enterprise applications. CluedIn has native support to provide logging, SSO, Telemtry, SSL, DNS, Backup/Restore, and more. We have developed this not only from the teams experience, but also by monitoring what the cloud providers are enabling, as well as what customers are asking for during the purchase cycle - e.g. "What does CluedIn provide in terms of threat-detection?".

Accelerators for all industries, and partners that know your space.

Different sectors, industries and verticals require specific domain knowledge. After implementing and being part of over 40+ MDM implementations myself, I can say that each industry has specific identifiers that are known only to its industry e.g. NPI in Health. At CluedIn, we have vertically aligned partners that specialise in implementing MDM for particular sectors. These partners come with their own additions and pre-built packages for CluedIn in the shape of existing Domains, Vocabularies, Connectors to Systems, Enrichers (public datasets) - and that is just from the technology side.

A clear comparison between other MDM vendors and CluedIn.

All MDM vendors are different. Considering that MDM is a well-established industry, it is important to help our customers understand the revolution that Modern MDM has brought. For this, CluedIn provides many layers of research into what the main differences are between Modern MDM providers like CluedIn and traditional MDM providers. I would like to use another example of a shift in technology that has also drawn a clear line in the sand around modern and traditional approaches, and that is the Data Warehouse space.

The modern Data Warehouse makes a fundamental shift at a very low level that essentially optimises files for read access and then distributes jobs across multiple machines to answer a question. In addition to this, it has taken full economic advantage of the scalability of the cloud in that you can spin up a huge number of machines to run distributed computing at relatively low cost.

The same revolution has happened in the MDM space, driven predominantly by the shift to the cloud. I know this sounds like a cliche, but building for the cloud is a fundamental difference. The other big revolutions in MDM have been through either the automation or augmentation or processes that were complex with traditional MDM software.

Migrate from other MDM systems with ease.

CluedIn offers specific services for customers wanting to easily migrate from a traditional MDM solution to CluedIn. We can provide this through our partner network, in conjunction with our own team that has a plethora of experience and expertise in many MDM solutions. CluedIn has a generic framework for translating data, models, business rules, workflows, hierarchies and more.

Get started with a free trial.

Although not unique in the majority of categories, in the MDM space, free trials are really a rarity. Why is this? Well, I can only speculate, but our opinion is that MDM is hard to implement, and although I would like to stand here and say that has been solved, I don't think it has, it is still hard to implement ANY MDM system. It’s just not quite as hard with CluedIn!

Pricing that works for you.

Essentially there are two ways of purchasing software. It is either a Capital Expense (cap-ex) or an Operational Expense (op-ex). Each of these has their own advantages and disadvantages. This is why CluedIn offers both cap-ex pricing (upfront payment, yearly recurring.) and now per hour pricing (consumption based).

Hourly pricing means that you only pay when you are using the platform. This increasingly suits companies that want to avoid hefty upfront investments. On the flipside, with a capital footprint, you can often get quite heavy discounts. This is simply because vendors like CluedIn need some level of predictability. The operational footprint is fantastic for removing hurdles to getting started, but the caveat to this is that customers won’t be offered the same discounts. Why? Because there are literally 100 reasons why a project might not start, could or could be delayed. And the technology might only be one of them. In saying that, a combination of both the cap-ex and op-ex model can be very powerful, particularly when you start with a consumption and move to a commitment once trust has been established.

Buy CluedIn under your MACC agreement with Microsoft.

Finding budget is hard! It may be that your organization already has an agreement with Microsoft in regards to Azure spend. This commitment means that you have been given a discount on Azure services in return for a commitment on many years of agreed revenue. You can use this to purchase CluedIn.

Buy CluedIn under the standard Microsoft Ts&Cs.

When buying enterprise software, you can't just choose any software, it needs to meet the legal requirements of the business. Chances are you have already signed the Microsoft Standard Ts&Cs if you have bought off the Azure platform before. Hence, CluedIn can be bought under the same Ts&Cs.

Read More
cartoon hands grabbing a stacked coins icon

Master Data Services to Modern Data Management!


MDS is still a credible and reliable data management solution with many loyal customers. And if all you’re looking for is on-premises data management functionality such as model versioning, business rules, data quality services, workflows, hierarchies, and a neat Excel plugin then MDS will probably meet your requirements. But master data management (MDM) has SO MUCH more to offer than that, and it’d be crazy not to consider what you could achieve by migrating to a modern, Azure-native MDM solution specifically developed to eradicate many of the challenges associated with traditional MDM.

Read More