Accelerate your Snowflake soulution with good quality data, so it can work its magic.
So you have bought SnowFlake, great choice. You are now cloud native, you have endless scale possiblities and now you are wanting to becoming more data driven. Stop. You have a problem. The level of readiness that SnowFlake is expecting of your data doesn't exist in your organization. Sure it works well on WikiPedia data and perfectly prepared datasets, but your data doesn't look anything like this.
SnowFlake themselves say on their website "If you want clean data, use Spark.", but don't worry, there is a solution. CluedIn's main focus is on preparing data so that platforms like SnowFlake can provide their magic.
So why can't you just use any cleaning application to do this? You can, but when you go to operationalise it it will fail. CluedIn is a platform that connects to source systems and facilitaties the full flow of data to SnowFlake, but gives tools to stewards so when they clean, they can easily send it off to SnowFlake. Most importantly, cleaning is not a once off thing. It is a continuous flow of data that is coming through your business and you want to systemistimes this.