Dan Gardev on Analytics, Data and Everything: Are We Forgetting Something When Estimating Model Accuracy?

Expectations for model accuracy are usually high despite its purpose. We get in the trap of thinking that the availability of huge data, powerful computing and advanced math methods would result in very accurate results about virtually everything. Hollywood also has its role. But what is the reality? The accuracy of a model is limited by the nature of the entity that is limited, methods employed and the accuracy of the input data. Very often we fail to realize the full extent of influence of the last one and even more so in models and calculations with many calculation steps.

Forecasting models accuracy with the expanding confidence intervals are taught in most of the management and engineering courses and people are aware of them in general. Sometimes they need a gentle nudge but overall I have noticed that there is not need of extensive explanations. Of course, it is the analyst's job to remind the customer about the forecasting error and how it changes with getting further away from the historical interval. There are many publications and books explaining the forecasting errors and I am not going into details about it here.

I would like to turn the attention to something we usually forget. This is the fact that most of the parameters in our models are estimates and not constants. Every estimate comes with a natural error or confidence intervals. The confidence interval are easy to understand with physical measures. Experimental sciences define it as half of the smallest measurement unit on the scale. For example, temperature reading on a thermometer with a degree as the smallest unit would have an error of +/-0.5 degrees. The temperature would be reported as 20 +/-0.5 by Celsius and the confidence interval is 19.5 to 20.5. Non-physical parameters also have an error assigned to it. It is not always obvious but if you dig deeper in the way a value is produced you would see that. It is more difficult to calculate this error in the general case - at least compared to reading the thermometer at least.

Unpleasant fact is that model accuracy is not same as the accuracy of the input parameters or the least accurate one. It decreases with every calculation step through the mechanism of propagation of errors. A simple example to help explaining: imagine you have to multiply a parameter with value of 10 with error of +/-10% by another value of 100 having 10% error as well. Simple multiplication results in 1,000 (10x100). Now consider the error ranges. The highest possible outcome is 1,210(11x110) and lowest 810(9*90). In effect, the confidence interval of +/-10% in input parameters transforms into about +/-20% in the result. This is how errors in the input parameters effect the error range of the result. This is oversimplified case to convey the idea of course and proper calculations a more complex. Please check this article for specific and detailed definitions on error calculation in case of each operation. The accuracy erodes with every calculation step and the confidence intervals get wider.

Error propagation analysis is a must for experimental sciences but somehow neglected in other calculations. Part of this is coming from the fact that it is difficult to perform. Efforts for performing it would be considerable for a spreadsheet with substantial amount of calculation steps. Another reason could be that it is very difficult estimating the errors of some estimations, e.g. the GDP figures provided by EIU. Or maybe the main reason for neglecting this analysis is that nobody seems to care that much for estimating expected model accuracy. A way to deal with it is to assume all involved parameters are constant, perform calculations and compare performance of the model with actual data. This is a good approach of course provided there is the luxury of time and data for that.

I think that too much effort in this direction is not well justified. However some rough estimates should be done. This is part of the setting the right expectations and criteria of acceptance of the model which define its success.

Dan Gardev on Analytics, Data and Everything

May 9, 2013

Are We Forgetting Something When Estimating Model Accuracy?

No comments:

Post a Comment