Feb 12, 2014

Ban GDP as A Driver in Regressions!

If there is one thing in analytics that grinds my gears it is the overuse of GDP as a driver in regression. If it were up me, I would ask the government and academy of science to issue a book with all approved regression drivers and distribute it as a standard repository for all knowledge and wisdom there is related to regression. Then one copy of this book and one copy only would include GDP. This copy will be in Klingon and as a precautionary measure will be booby trapped with a potent explosive and buried 100m in the ground. On the Moon. On the dark side of it.

Is it that dangerous?
Well it is if you spend a moment thinking of it. GDP seems to be the first choice in situations when some economic analyses have to be produced and no mental powers being involved in the process for any reason (intended or not). However, the results are usually so boring you could end up drinking or seeking entertainment in high-risk activities - both being quite dangerous for a normal person. The danger lurks in another way as well - repeatedly explaining stuff with GDP will encourage even further and unchecked use of it and we will loose knowledge and generally would go into extensive BS-ing.

So what makes GDP a bad driver for regression. Let see few reasons:
It is not informative - it is a sort of a measure overall economic output of a country and it includes all the industries, products and services. By using it on its own one could miss the surge or decline in a sector or service that could be much more informative and beneficial for the purposes of the analysis. It also does not provide information about distribution of money in society or any other non-economic measure that could matter lot.
It is not reliable or accurate - despite the allegedly rigorous procedures for calculating the GDP, its estimations change change well too often. As a rule, figures coming from different provides do not match. Of course, we are not talking huge differences but it could be a significant one. GDP outlooks are subject of frequent updates as well as some values in recent historic periods. My recent study revealed differences in up to 10% for the next and previous 2 to 3 years in the data of a major provider.
Nobody cares about it - I am not aware of a business that really cares about GDP figures per se. The real interest are always matter of market, demand, competition, legislation, stability. Some bigger companies use GDP as a proxy for economic development but usually it is of limited and negligible use. Of course, governments use GDP but is is for their short-term partisan policy. Another characters that use GDP in their models are the macro-economist but as we know they are very good in explaining what happened yesterday and their successes are mainly due to pure luck than anything else. Do you remember when a macro-economy model was right? Let me help you - it is close to "never".

The alternative to the mindless inclusion of GDP is simple - stick to the scientific approach and focus on things that matter for the specific problem - demographics, appeal, specific measures that describes in details the industry/sector/service/product, etc. Being more specific about things that drive a measure will also for sure prevent you from being boring and will bring much more value in your analyses. For further thoughts on linear regression in general please visit The Almighty Linear Regression.

