A Short Story
Recently I have been developing a model and despite the sound logic and impeccable calculations the outcome has kept being close to terrible with no hope of improvement. After some time spent in chasing my tail I realized that something else was going on and I start revising the basic and undisputed assumptions I took for granted at the beginning. No surprise that it came out that one of the key parameters in my calculations was behaving way different than assumed. The common sense and undoubted properties of data has misled me. I listed all the assumptions to provide validity of the model and checked them one by one. Most of them held true but a couple revealed gaps in understanding data and the business process behind it. As a result I took different approach and the model went though significant redesign. The match in the test period improved dramatically to the point of acceptance from the customer. That made me think about how often we neglect to check the assumptions behind the models we build and I remembered some otherwise brilliant models that fail exactly because of that.
Every analytical approach is based on a set of requirements and assumptions for its data and context. Statistical models assume specific set of conditions to be true for their applications. These could be on the distribution of variables, properties of variance and so on. The other type of models also rely on these as well as other properties such as low variance in parameters, preserving of constants and so on. What are the point to take home:
- Spend some time to review the assumptions behind the methods before implementing them in the model. Think about distributions, variance, heteroscedasticity, relation and dependency between variables, covariation and etc . No shame refreshing the requirement even for well-known methods as linear regression. Be honest and address your doubts.
- Review model and list the conditions to hold true for your model to work. What do you need for the model to work properly?
Such review may seem as waste of time of time but it is definitely not. It is particularly true when the model is based on not familiar data and/or the analyst does not have experience with the domain.Of course, there is always the other approach of building the model and if it does not work then go back to basics as in my case above. It seems to be the common case In the course of our busy schedule and tight delivery dates but can we do better?