How does the CluedIn merging engine work

Blending data is hard. Let’s start with that realisation. Without trying to sound too facetious, too many companies are taking a too simplistic approach to solving
duplicate data. I wanted to take the time to explain how our merging engine works to show you to what extent we take merging seriously and how this plays a big role in processes like merging or de-duplication.

The first process of merging data starts long before the actual merge. Data doesn’t blend naturally, not within the one system and definitely not across system to system. If there was from day one, the perfect universal id that was proliferated throughout all systems in both structured and unstructured data then we would not be in this predicament, but we are. With this fact in place, we can now say that we need to prepare the data to be able to blend as confidently as possible, but your data will never blend perfectly, there will always be outliers. Your job is to have a system in place to identify what you need to do to make it blend properly.

So, the first stage of blending data is realising that you have a data quality issue. There will be many cases where there will be perfect joining of datasets. There will be many where you think it joins perfectly but in fact, the data was entered incorrectly. And finally there will be plenty of cases where you need to have the data from 3, 5, or 25 different systems available before you can properly merge data from other systems.

Continue-reading

Please click below to read the full whitepaper.