October 10th, 2023 | 12 min read
Relational & Graph Databases in Master Data Management – which approach is right for you?
In this article, we focus on comparing relational databases and graph databases in the context of Master Data Management (MDM). We examine how each approach impacts MDM processes, highlighting key differences and considerations for organizations.
Traditional vs. Modern Approach to Data Warehousing
Ask ChatGPT what the difference is between a relational database and a Graph database and you will get this response: https://chat.openai.com/share/404f3235-7ab8-448d-8707-f1c6fd5c2513
Perhaps a more interesting and parallel question to ask would be "What is the difference between the traditional and modern approach to building a Data Warehouse?" Why? Because this gives us a better insight into the difference between a relational database and a Graph database in the context of Master Data Management (MDM).
Try it for yourself and you will see that the traditional approach to building a Data Warehouse emphasizes a design philosophy that touts normalization - a combination of dimensional and fact-based schemas and structures that prioritize upfront structure and design - but as a side-effect of this, deprioritizes agility and flexibility. In addition, the conventional method tends to favour integration pipelines to get data to its new target, instead of simple pipelines to simply move data into a generic store for integration later.
Relational Databases in MDM
This is strikingly similar to what happens when you select an MDM platform that sits on top of a relational database, rather than a Graph database.
Choosing the relational route means that your team will spend more time in ETL pipelines to onboard data, which naturally means that IT will need to be involved. Although it is possible for a relational database to store unstructured data, it is not designed for this purpose. That alone should be a red flag. Relational databases in MDM need the data to be modelled upfront, before data is onboarded. They will inevitably struggle with the practice of onboarding data and progressively evolving the model over time. This doesn't mean that relational database models can’t be refined and evolved - but it is much more painful due to the integrity checks that would need to be addressed and fixed before migration or evolution of the model can occur.
The Power of Graph Databases in MDM
We think in "objects" which makes it easy to understand why we work in Tables. You have your Customer Table, your Product Table - makes sense. The relational approach to MDM would ask us to create these tables upfront (due to the nature of the technology underneath) and then carefully craft input data into this new world that we are creating.
In many situations, you would also have to make sure that the data coming in is "strongly typed" e.g. you would want a Person's age to be a number, right? But what happens if it is a number range, does that mean we should change this to a Text value to accommodate the fact that this could happen? While there may be a desired type, it is surely preferable to have a system that identifies when something doesn’t look right, but allows the data in anyway, with a view to cleaning it and coming back at a later point to fix it. The alternative is to pass it back to IT to fix, which just shouldn't be necessary.
Agility and Data Modeling in Graph
The other challenge (and something that can also be a positive) is that a Relational database approach to MDM requires you to determine the relationships between Domains upfront and choose which Keys can be used to link Domains. If you work with data, you will know that one truism shines through - you will never have data fit together in the same way, ever. It won't happen with your existing data, it won't happen with the data you onboard in the next two years.
By forcing upfront modeling you are imposing a rigidness on the data too early in its lifecycle - a rigidness that often misses the fact that there are three or four other completely legitimate ways for records to link or merge that you are bypassing because you imposed that data must fit together in a predefined way.
Uncovering Relationships and Insights
This brings us to one of the most powerful values of taking the Graph approach to MDM. Graph enables records to link and merge at the record level, not just the domain level, at scale. Let’s break this down with an example.
- You have three files that contain a list of customers. Each list is maintained by a different department, which means that the sales team has the D&B number, VAT number, and a system-unique Identifier called AccountId. Not all rows have a D&B or VAT Number, but they all have an AccountId.
- File 2 is owned by the Customer Success Department and does not have a VAT, D&B, or AccountId number as this is data that is not important to them.
- File 3 is from accounting and has the AccountId, VAT, and D&B number.
Already, you are using quite a bit of cognitive load as to figure out what the ETL pipeline or SQL query would look like. This is one of those cases (that happens quite often) where if you tried to model the Customer table, you would need to study all tables upfront. You would need multiple unique keys that can be used interchangeably (almost like in a mesh) where you would say "If you can use the VAT Number then use that, otherwise use the D&B, but only if you are trying to link file 3 to 1". It very quickly becomes messy, because you are trying to create an overarching table that will handle all use cases.
Embracing Change with an Agile MDM Strategy
The hidden stinger here is that we don't know when we are going to get File 4 and what that will look like, hence it won't be long before this Customer Table becomes very hard to maintain, change, and use. Graph does not suffer from this same complexity as it readily accepts that every record could have a different structure for a customer, it could have its own unique keys and it could relate/link to other records – e.g. products – in its own unique way.
Sounds like an absolute mess, right? Yes, you are 100% correct and this is 100% by design. The goal of modeling in Graph in MDM is not to conform your incoming datasets into a predefined model - it is to have a Graph formed from the actual data, giving it the flexibility to be agile in the way it merges and links to records.
Ask yourself this question: "Are two records the same if just ONE of their unique identifiers matches, or if a specific identifier matches"? If your answer to this is "just one", then you start to realize the agility of the Graph, but more so the rigidity of taking a relational approach to modeling in MDM
Conclusion: Choosing the Right Database Approach
There are also lots of knock-on positive effects such as the ability to augment how you identify possible duplicate records once you have built up this very powerful and connected Graph of records. In our experience, it is not only the values of a record that can help determine if two or more records are the same, but rather the relationships that exist between, or even in the paths that exist between two records.
Let's imagine the case where you have one record called "SpaceTex" and another called "AeroSpace Engineering Systems". If you were to look at these two records, you would think that these are different companies - yet, if we happened to have contacts at both of these companies, enough that we could see that all employees for “SpaceText” worked for "Aerospace Engineering Systems", then we could consider that maybe these are either subsidiaries of another company, or that there was a name change somewhere along the way. Only Graph can bring this insight out in such a fluid and dynamic manner due to the relationships forming naturally, rather than being predefined.
Software engineers can draw analogies between the Agile / DevOps movement that highlighted the importance of delivering software on a daily or weekly basis, which meant that delivering a new version of your product almost felt like something you did every day instead of this big scary thing that happened after two years of development.
Entropy is guaranteed within a business and having an MDM strategy that gets your users comfortable with change on a daily basis allows an MDM system to survive the test of time.
In summary, this isn't an argument that Graph databases are “better” than relational databases. It is simply stating that if you are tasked with bringing together heterogeneous data in an effort to eventually serve it out in a combined form, then Graph offers an agile, almost iterative approach to doing it. By allowing the model to evolve as each dataset is introduced, rather than asking an engineer to almost solidify an all-encompassing model that should cater for when we onboard data in the future, we are making our lives easier rather than storing up trouble
In terms of Master Data Management implementation styles, it must be said that the centralized approach to MDM is better suited to a conformed and rigid model - because it gives you a chance to heavily control the input of data and not let a bad data record get into the system in the first place. This is fair, this makes sense. However, if your challenge is that you already have data out there in the business and your task is to bring it all together, then Graph is absolutely key for that consolidated approach to MDM.
See all of the powerful, Azure-integrated CluedIn MDM features and capabilities
Get ready for AI and advanced analytics with native Master Data Management.
Solving your challenges
See what the business value CluedIn Master Data Management can deliver.