April 25th, 2022 | 12 min read
The six questions you need to ask to become a data-driven business
The term “data-driven business” refers to an organisation that uses data to inform or enhance decision-making, streamline operational processes and ultimately to fuel revenue and growth. Whether or not it is possible for any business to be solely data-driven is another debate, but there is no doubt that those who get close to it are adept at turning data into insight, and at using that insight to propel the business forward. While most companies today would probably cite becoming data-driven as a crucial enabler of their wider goals, there aren’t many that have achieved it. Google, Facebook, McDonalds and UBER definitely fall into this group, but these are industry heavyweights and represent the exception rather than the rule.
What does that mean for everyone else vying to achieve data-driven status? Like many things in life, it starts with the basics and builds from there. Even the big boys had to start somewhere!
All truly data-driven businesses have something in common, aside from the obvious operational and competitive advantages. They can all answer six vital questions.
- What data do we have?
- Where is the data?
- What is the quality of the data?
- Who owns the data?
- Who is responsible for each step of the data journey from start to finish?
- What happened to the data as it transitioned from raw to insightful?
Why is it even important for you to be able to answer these questions in the first place? There are the obvious compliance and regulatory reasons why you should, but for now let’s focus on what your business could achieve if you had the answers to these questions.
What data do we have?
Once you have experienced one win as a result of seeing data really work for you, you’re hooked. This could be using data to optimise processes, lower operational costs, find more customers, attract great talent, monitor trends in the market and much more besides. Knowing what data you as a company have in your arsenal is the first trigger to inspiring these types of insights. Insights can come from manual discovery, or can come from using technology to find patterns in the data and bring them to your attention. We believe in being able to walk before you can run and it is not necessarily a bad thing to start gaining insights through manual discovery. For example, if you have a list of customers and a list of support tickets, you might want to know which geography causes the most support tickets. With a pattern-driven approach, it is not so much about asking the questions of the data, but rather about allowing the data to reveal interesting trends. The likelihood is that there will be patterns hidden in the data that you would not proactively ask for – e.g. churned customers took over 54 hours to have their support tickets resolved. This insight may then lead you to hire more customer support representatives to bring down the average answer rate or have an internal SLA that no ticket takes more than 24 hours to answer.
Where is the data?
Knowing where the data is and where it has come from is an important regulatory requirement, but in the context of achieving some type of insight, knowing the answer to these questions is vital to establishing trust in the data from across the organization. If someone on the street handed you a credit card and said "Feel free to use this!” the first thing you’d probably ask is where it came from. Without this lineage, there is no trust. And most notably, in this analogy, you would want to know if the source of this credit card is reputable.
Also, although duplicate data is not necessarily a huge storage cost issue anymore, it is a big operational issue. Of course, this also depends on exactly how much duplicated data you have – petabytes of it can be quite costly! Which also means that knowing where your data is can help you to reduce operational costs too.
What is the quality of the data?
In the era of fake news and AI bots that are indistinguishable to humans, it is more important than ever to establish integrity in the data you are using to make decisions. There are a plethora of shades of data quality, and every shade will correlate with a different level of confidence in the "usability" of the data. It should also be pointed out that there is no such thing as right or wrong when it comes to data, and no matter how high quality the data is deemed to be it will bring with it an inherent level of risk.
In the spirit of keeping things technology-agnostic and high-level, think about the times you have made a decision with confidence. What gave you that confidence? Was it that your research came from a reputable source? Was it because the voice of the crowd all agreed with one approach? Was it your gut feeling? Just like everyone else, you probably make decisions on a daily basis using a combination of these techniques to make your final judgement. It’s much the same with data - determining quality is about building up your confidence in making a decision. The challenge with data is that it doesn't have to adhere to any laws of physics, hence any judgement made on data quality is a heuristic attempt to provide metrics on which a decision can be made with an acceptable level of confidence and risk. You can read more about how CluedIn interprets and measures the shades of data quality here.
Why does data need ownership?
In many ways, it doesn’t. In fact, it needs much more than ownership. This is why we have frameworks in Data Governance like the RACI model, in which the four dimensions of "ownership" are defined as the minimum requirements for an ownership matrix relating to data and journey that data takes. Like any process you have within a business, if no-one is responsible for it, it often grinds to a halt. As you have probably experienced in other parts of your business, sometimes a task can be blocked by the most minuscule reason, but the bottom line is - it was blocked. This is often down to a lack of ownership for that part of the process.
Who is responsible for each step of the data journey from start to finish?
The data journey from source to insight has some very distinguishable steps, and each of these steps requires you to attack the data from a different angle. Irrespective of the technology you use to get from source to insight, the generic journey includes pulling data from a number of sources, integration, normalisation, standardisation, deduplication, linking, enrichment, profiling, mapping and transformation. (Honestly speaking, we could easily add another 10 or 15 stages, but let's stick with this list for now!). In many cases, each of these steps is a comprehensive task and responsibility in its own right. For example, the normalisation and standardisation of data is easily a full time job for many data stewards. Hence, if a full supply chain of ownership of the steps in the process is not established then it should not be a surprise that the flow of usable data can break down – often for the most mundane of reasons.
What happened to the data as it transitioned from raw to insightful?
Let’s consider for a moment why it is that data needs lineage, and different parties to take responsibility for the entire data journey, yet other processes we run within the company don't demand the same level of stringent needs? Could it be that this lineage would actually be very useful in all parts of the business, but because of the digital nature of data it is inherently easier to build a digital footprint? The same cannot be easily said for passing around Excel sheets from department to department, for example. Any explanation of how this Excel sheet "came to be" simply isn’t something that can be achieved simply through the use of Excel. The audit trail of the transformation of data from source to insight is often just as useful for “explainability” as it is for highlighting parts of the process that can be improved or are error-prone.
Now that we have established the questions you need to answer in order to start your journey to being truly data-driven, we should look at how technology can help you to both answer the questions and use those answers to best effect. The best way to do this is to approach it from both the asset and the record level – which in effect means getting both the birds-eye and granular view, and bringing them together in a way that makes sense. One powerful and increasingly popular combination is to use Microsoft Purview and CluedIn. To some degree, both Purview and CluedIn answer all of the questions above, but at different levels. The bottom line is, you need both and in some ways, you can't have one without the other, particularly if your data technology stacks are all housed within Microsoft Azure.