I’m not always sure people always know what they mean when they talk about Big Data – and even when they do know, I’m not sure they can contrast this new Big Data thing from Data’s previous incarnation.
So let’s see if we can clear it up.
Prior to big data the amount and content of the data you had access to was limited – in technical terms you had to deal with a limited information domain. Why? Because obtaining and storing data was expensive and, more importantly, most data was locked up in the real world and never entered the digital (binary data living in computational systems) realm. That obviously has changed.
This flip – from only generating and storing data directly relevant to operating a business to having access to, collecting and storing massive amounts of data which may or may not be relevant to operating a business is the state change.
The first big problem was tooling. The systems and technologies to collect and store data were designed for the relatively small amounts of strictly modeled data relevant to running our business. Moreover, they were designed to strictly control adding to it, because that was expensive. This was the problem we needed to address first – which is why when we talk about Big Data we invariably talk about technologies – Hadoop, MongoDB, Spark, Kafka, Storm, Cassandra…
But, for business leaders this is misleading, because implementing any (or all) of those technologies will not make the business effective in a Big Data context. These technologies will not provide you magical data which supercharges your business. You will not suddenly have insights your competitors do not; you will not – overnight – find the clarity required to dominate your market.
The key is to combine those tools and capabilities with data driven practices and culture.
Let’s start by avoiding the mistake made with Big Data – let’s clearly talk about what has changed and why data driven is different than what came before.
I’ve worked with organizations – from startups to enterprises – that have robust reporting and systems of operational metrics they use to run the business. They review reports and dashboards regularly, perform regular operational reviews focused on those metrics and target resources and budget toward those that are under performing. Invariably they suggest they are already data driven – because they leverage data to run their business.
They are not. They are optimally operating in the pre-Big Data model – where the universe of data was fixed, the metrics long lived and stable and information outside that realm unobtainable – those insights beyond reach.
A Data Driven organization still does those things – metrics, operational reviews, targeted investments based on under performing metrics. But, they also leverage the larger universe of data to openly question the validity of those metrics; they develop processes to evaluate that universe for new metrics and insights; they allow the data to lead them to opportunities and the identification of threats.
This practice almost always feels like a radical shift – and it is. Organizations must shift from the practice of only focusing on the known knows and embrace this new ability to examine and gain insight from the known unknowns and unknown unknowns.
Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.
Rumsfeld’s observation applies equally to businesses.
When these Data Driven processes and practices, extending and augmenting your metrics driven operational practices, become part of the culture the real value of all that data and all those tools can be realized.