Aug 13, 2013

The Dark Side of Data Abundance

If there is something that defines our digital era it is the abundance of data. It has been created out of the cheap computing power and storage, Internet ubiquity and ease of sharing, open source and open data movements as well as the new requirements from the the governments. It is all great of course but many people and companies join the Dark side of this phenomenon. I hope it does not come as a surprise that "more data" phenomenon has a dark side - almost everything does. I can judge just a narrow aspect of it from the perspective of data analysis and decision making. What I see is sometimes scary and sometimes funny.

Have you been in the situation when going to a location navigated by a handheld device and you circle around and around just to find that all the time you had been just in front of it? Or finding that the device has not selected the best route to a destination? I bet it happened to you or someone you know. And what would have been the thing to prevent or solve this situation? It is the simple taking your eyes off the screen and look around. In case of the bad route choice - a quick glance at the general map. These two advanced techniques get less and less popular. There is a direct analogy with the data abundance. We are immersed in data - statistics of any sort is pouring from everywhere: TV, articles, government, even some adds employ some statistical tricks. There are so many providers of all sorts of data. In the field of analytics it is like the manna sent from the Heavens. But it could also be Fata Morgana that lures to a crash.

Lot of data combined with cheap computing power is like a guided missile with malfunctioning guidance module. The analyst could be enticed into more and more data exploration and application of all the methods at his hands on it. The difference between loosing oneself into the data and moving toward actionable insights comes from understanding the context, knowledge of the driving forces and interconnections in the domain under study as well as having a set of objectives and criteria for reaching them. An analyst could be excited to dig into different aspects of the data, search for a new sources and sets and explore some new method (analyst like all these things) and eventually loose focus on what is important. The important is defined on the basis of a decision to be made or policy to be changed. Some stuff comes out in the process of data exploration of course but it is always in the context in a more general decision situation with its boundaries and limitations. It is like not checking the map of Greece before driving there or not looking around for landmarks when finding a pub in an unknown city.

The abundance of data comes with a great chance for false insights or false signal. There are many publications on this - check Nassim Taleb's Wired article  Beware the Big Errors of ‘Big Data’ for example or Silver's book The Signal and the Noise. The gist is - the more data we employ in our analyses the greater the number of occurrences of correlations happening by pure chance. Also, the more data the better the opportunity for selecting the favorable outcomes and thus support all the possible biases there could be. I have seen quite a lot of examples of that. These spurious correlations combined with a lack of criteria for selecting the findings produce some "amazing" analytical outcomes. The explanation usually is "The method produced these results so they must be true". Sometimes people seem to be so mesmerized by data and complicated calculation methods (especially stat methods - everyone is hypnotized by it) that they could take anything as pure gold. Of course this should come with a nice story to make good sense but stories are easy to come up with.


Obsession with data is like a fashion - everyone has to have it. If you do not do data analysis then you must be doing something wrong and for sure you are not cool. This often results in tons and tons of statistics, numbers and endless slides packed with charts and tables. Most of it is totally useless as does not contribute to a better decisions or insights. It is just a clutter born from the pressure of "more data". This relates closely to the decision making process, skills of people involved in it, methods for communicating insights and information and the technical savvy to some extent. If anything "There is lot of data" is assumed to mean the job is don. As if the data itself is enough. Most of these voluminous decks do not come with anything more than a retelling the story clearly told by the charts. Sometimes too much figures and data is hiding the bigger picture and could lead to conclusions in unfavorable directions. "More data" seems to be a mantra that would bring us to a the perfect state of getting things right.


The three features of the data's dark side are far from exhausting this topic. Add to them all the biases, the management pressure and some other details and you could see the greater availability of data and computing power is a call for a much greater attention and effort toward deeper study of the domain, clarification of objectives and criteria as well as better analytical processes. Fortunately, the problems mentioned above and others related to them rarely are in their pure and extreme form. Also they usually come in sort of a linear combination between them and set of virtues which does not make them easier to detect. Good news is more people and organizations are getting aware of them and use the amazing potential of the data and analytics in the best way.

No comments:

Post a Comment