Dan Gardev on Analytics, Data and Everything: Highlights On Model Types And Their Application

There are two types of models - statistical and I call "mechanical" ones. Statistical models step on data sets to extract properties, interconnections, current and future behavior. An example for that are models for customer categorization for the purposes of the financial industry, stock pricing and monthly sales. The the "mechanical model" embeds predetermined laws of nature or society, cause-and-effect dependencies, legislation and any other "hard" causative connections between entities. Examples for this sort of models include nuclear reactor model of the optimal working parameters, safety measures, energy output, etc. The general question what type model to build usually is not explicitly asked as each situation pushes toward one or another. However, a general discussion about applicability is welcome as it could give a guideline about models quality and efficacy.

Statistical Models

Let start with the statistical models.The first and most important condition for building a good statistical model is the availability of data. The data has to be big enough to provide the number of data points to allow proper application of statistical methods. It also has to be with good quality and covering all the aspects of the entity to be modeled. In other words the data has to be enough diverse and big and of good quality to answer all the questions that spur the creation of a model. The question of how to define "enough" in each case is answered by the specific questions we would like to answer with the model. If available data does not seem to cover some of the criteria then statistical model is not viable. Some shortcomings of the data could overcome by some additional assumptions to fill the gaps but we should be careful with that as we could bring in significant systematic error or simply make wrong assumptions (even if they seem logical or obvious) that would result in wrong outputs.
A general caution about statistical models is that the available data has two major limitations. First one is it may not include the piece of data that really has the impact and influences the outputs the most. This piece of data may be not collected at all or simply dismissed as not important at the data selection process. Of course, a proper model building process should make this danger very small or non-existent. The second limitation comes with the fact that historical data is, well, historical and some new or possible developments could not be there at all. For example, no data set could be used to estimate changes in car market if women would be allowed to drive in an Arabic country. This has never happened so no data is available for the effect. Another example is for introduction of a lamp with better lifetime - the pure statistical approach would not give good answers for the effect on the lighting market effect of that.
Another danger for statistical models comes with the usage of huge data set - the bigger the data set the bigger the number of spurious co-variations, i.e. the relations found by pure coincidence. We could create a nice story around every detected relation but it would not make it any more real.
Sometimes I got the feeling that stat models strip the developers from responsibility - "this is what data shows".

Mechanical or Classic Models

The first imperative for this type of models is obviously to have a clear and mathematically formalized relation between entities in the model. If we know something influences something else but the functional dependence is not clear then we cannot built a model. Simple as that but not always the case. Some dependencies are available in theory only and applying them in a real model is impeded by unavailability or impossibility for obtaining some of the required parameters. Functional parameters are weak point also in a way that could be expensive and difficult to estimate. For example, the penetration of a new product on the market is usually modeled by an adoption curve and it depends on good estimates of parameters of the inflection points of this curve. These parameters could be obtained by research but it could be expensive. In some cases the missing parameters could be set through a calibration of the model but it is not always an available option.
Functional dependencies to be used in this type of models could be too complex to calculate or too rough due to insufficient knowledge about them. This situation is remedied by usage of approximating functions and that could kick back as some important insights could be missed.
An advantage of these models is that they provide a "what if" option to easily explore the outcomes with different sets of input files and assumptions. This includes also understanding the sensitivity to the initial parameters - the value ranges that provide same outcome is also a crucial piece of information.
Maybe the most important feature of classical models is that they are transparent - there is logical and calculation flows that could be traced and checked. This very helpful for the model acceptance in contrast to the statistical models where the statistical methods look like a black box for most of the managers while they understand well the logic of the market. This points to another important feature - this type of models require deeper understanding of the specifics of the model object. Stat models seem to be more independent from the subject.
Classical models require more precise definition of the questions to answer - the model built to a specific requirements and changing them could mean that it should be totally reworked and more input parameters obtained.

This post has been inspired by a discussion with a co-worker about the application and advantages of each model type. I hope it is a good starting point for further discussions and has some highlights to help better understand and select models.

Dan Gardev on Analytics, Data and Everything

Jun 11, 2013

Highlights On Model Types And Their Application

No comments:

Post a Comment