May 21, 2014

Great Online Course for Data Mining!

Data mining appeal for companies and analytic practitioners is growing by the day. So where should you start with it? Recently I have been evaluating data mining software and courses and I came across a very good one that I can recommend without any hold-backs. This is the MOOC organized by University of Waikato. MOOC stands for "massive open online course" but do not be fooled by the name - its a serious course that delivers right on the target.

May 14, 2014

The Raise of Data Scientist - Have We Seen That Before?

In a recent conversation somebody was very excited about the marketability of skills in R and similar tools as well as with the growing demand for people having them. The story went about the bright career perspectives - money, good position in the management hierarchy, fame and Aston Martins with Victoria Secret models in them. This person is not alone and his opinion is obviously backed by the growing number of job ads requiring R or similar skills. However, I beg to disagree because we all have seen something very similar and things developed differently.

May 12, 2014

Age of Miss America Correlated to Murders by Steam?

Age of Miss America And Murders by steam, hot vapors and hot objects

One of the dangers of too much data and too many "scientist" are the spurious correlations. These are correlations that happen purely by chance. If you have time series to explain and massive amount of data sets to test, sooner or later you wold find a meaningful correlation. I have posted about that some time ago but today a co-worker have sent a link with some excellent illustrations of the point and is too good not to share. Go to Spurious Correlations to see them all - some are very funny others are puzzling. I did know that somebody could die by becoming tangled in their bedsheets! Scary bedsheets!

May 9, 2014

Big Data Deja Vu?

I was running series of machine learning algorithms on few huge files the other day in search of some meaningful information. I was enjoying all the fun that comes with large volumes of data - painfully long processing times, slow response to any data operation and loud laptop cooler to mention a few. As I was optimizing memory usage and calculation time I had a deja vu about long time ago in a lab far, far way. Back then I was calculating big set of parameters from hundreds of physics experiments. The PCs had less computing and storage power than an entry-level smart phone and all the data operations and calculations had to be performed in a clever way in order to get something meaningful in your lifetime. Back at these times nobody talked about Big Data. It was probably because quite often the data was big. Of course, the ability for collecting large volumes of data was galaxies away from the powers we have today but still, there were many domains that amounted large volumes of data. It got me thinking. Going even further back made me realize that large data sets have been with us since the beginning of the computer era. Big data is defined in many ways (see Defining Big Data) but if we adopt the simplest definition we see that it. It seems our abilities to generate data always are one step ahead of our abilities to process all of it.