April 8th, 2022 | 3 min read
Solving the CSV/Excel driven organization
You have all heard the anecdote "our company runs off Excel". In full transparency, many parts of CluedIn also "run" off Excel. So is this necessarily a bad thing?
You can pretty much guarantee that most people have built something in Excel before, whether it is a list, a sales forecast or a marketing plan. We use it because Excel is fantastic for ad hoc data storage and processing, requiring little to no setup, and capable of storing structured data without the need for installing databases and involving IT in elaborate setups.
It is also a well-established file format that can be opened by a plethora of different tools, making it much more flexible and transportable between environments. So why does Excel, then, get such a bad rap sometimes? Excel was never designed to be an operational data storage tool, in fact it is quite limited in nature when it comes to storing larger amounts of data.
The big mistake that many of us make is trying to use Excel in this way – to store operational data and/or as a source repository – when it is neither of those things. Rather it is a front facing proxy to data, and for performing ad hoc data analysis. In fact, one of my colleagues said it best, when she said "We will manage this data in Excel until we have a system in place". Excel is a useful provisional tool, but it must be backed up by a longer term plan on how to manage that same data.
We do have new formats of data such as Parquet and Avro, however these file formats were never meant to be seen by human eyes, rather they are optimised for storing data more efficiently for machines to read. In essence, to open a Parquet file in Excel you will need to install a plugin to take the unreadable and turn it into what you and I can easily deal with - tabular data.
At CluedIn, we recently introduced our native Excel plugin, available directly from the Microsoft Office 365 store. In this we made sure that we offered the ability to proxy data in from CluedIn, and that Excel was used as an interface to edit, add or delete records from CluedIn. Crucially, Excel is not the master source of any data.
As CluedIn is a recommended replacement for Microsoft Master Data Services (MDS), it was necessary for us to offer the same functionality (and more) as that provided with the Excel plugin for the original SQL Server MDS product. For our customers, this opens up a new and familiar interface for cleaning and mastering their data, in turn delivering high quality data to the business - without the need for upskilling on new tooling.
What makes CluedIn unique is that the immense flexibility that you love about Excel, is accentuated within CluedIn due to its "zero modelling" approach. Let me explain. The beauty and simplicity of Excel is in how easy it is to add new columns and essentially remodel the data. The same cannot be said for traditional MDM, where a model needs to be defined upfront, and as the model needs to evolve, there is often a tedious remodelling process that ensues. CluedIn's ability to model on your behalf perfectly complements the agile nature of Excel, where remodelling is comparably simple.
Can Excel be used as a generic transport format of data to share between organisations?
100% and in fact, many businesses find that Excel is a great way to distribute data because it can be read by most tools on the planet. The distinction is that these Excel files should not be thought of as "operational" until they have been loaded into a system for which it then becomes the place of activation. Although formats like Parquet and Avro are gaining in popularity and you could argue are the better format to be sharing data across companies, the tooling is not there yet - but in time, it would make sense for this to be the new format for cross-company data sharing.
CluedIn can be thought of as a ledger for your data in that all changes to the platform are logged in an audit trail, so that at any point in time you know what the state of a piece of data was. The same cannot be said for Excel. Excel does not maintain history and hence only has a session-level history of what changes have been made to data. With CluedIn's Excel plugin, business users can load data from the CluedIn ledger, make changes in Excel and then store the full history or changes across sessions, computers and people.
Can I create new records in CluedIn through the Excel plugin?
Yes, but you must be aware that this then makes CluedIn the operational store of that data. Although sometimes unescapable, it must be said that the preferred recommendation would be that data is always created in an operational store and then fed to CluedIn as an interpretation of that same data. With this approach comes many advantages, including the ability to rebuild CluedIn from the source systems at any point in time.
In summary, we believe that Excel should be embraced as a critical tool for data analysis. However, it is not the right place (long-term) to establish the source of truth. Operational data belongs in operational systems and for all of its many advantages, Excel is not an operational system.