May 14, 2014

The Raise of Data Scientist - Have We Seen That Before?

In a recent conversation somebody was very excited about the marketability of skills in R and similar tools as well as with the growing demand for people having them. The story went about the bright career perspectives - money, good position in the management hierarchy, fame and Aston Martins with Victoria Secret models in them. This person is not alone and his opinion is obviously backed by the growing number of job ads requiring R or similar skills. However, I beg to disagree because we all have seen something very similar and things developed differently.

An IT story
If you have lived long enough to remember Moscow Olympics or Milli Vanilli then you have witnessed the IT boom in late 90s and early 00's. The parallels with current data science/big data/analytics/etc are striking  and hint that we could expect the development and trends to repeat. Back then everyone who had some exposure or some skills (or the boldness to claim she does) in programing or networking won big time by placing themselves on the best paid job. The development tools were relatively basic, typing lot of code was the norm and Scandinavian metal-rock bands were on the rise. Building applications from scratch with a compiler was the norm and demand for skills in hard-core languages like C++ or C was very high. The salaries were well above the average and the job openings was through the roof. Many believed the good career development is on the track. And they were right. For some time at least. The industry has been changing rapidly and now things are bit different. The demand seems to be for systems integrators, administrators and people with knowledge for some specific system like Navision or Oracle BI. The salaries are still higher but lower in relative to their levels before. As if there is no need for developers to create things. Even in the development the things are different - 20 years ago if you needed to connect to a database, you had to write your own code. Now there are readily made components and services for everything so one does need to write that much code. Also, the development environments are so rich in features and aids for the developer that experience is very different. I might be wrong as I have been out of this industry for some time, but would not be very wrong. What happened with the people that got on the early wave. Fortunately, most of them still have good salaries but the promises for fantastic management careers did not happened for the greater part as you might expect. Some of their skills are made redundant by the new technologies and they are under pressure by the younger and cheaper generation of IT experts.

Development of the tools
R and similar tools like Python are very powerful, flexible, cheap and everything else that makes them popular for meeting the growing demand. I suspect that "cheap" is the one that stands out. These tools also require extensive learning curve and I believe the direction of development for them would be toward faster and easier use and deployment in production. It means toward a commercial software where complex analyses are performed with few clicks instead of writing down a digital version of War and Peace. Of course, a open-source and free to use versions will always remain at the disposal of academics, students, researchers, small companies, etc. Free/open source tools will meet specific requirements for a low price. 

So what are the lessons for the aspiring data scientist? 
First, there is a natural trend in reducing costs and more and more operations that now require human will be automated or eased in other ways that means the less demand for human operators.
Second, huge efforts in acquiring specific technical skills will have relatively short-term advantage. 
Third, do not forget the players on the analytical market. They have been here long before the iPhone or Google Analytics. Their advantage and experience will keep pushing the industry toward less demand for hard-core skills. One of the greater advantages of languages like R or Python is their low price and the big names will attack that.
Fourth, there are too many aspiring data scientist and soon the saturation level will be reached with all the consequences that follow.
Fifth, having some technical skills is not a guarantee for a good career in this industry and going up in the hierarchy. There is much more than that. Also, the growing number of experts in the field mean lesser and lesser chance for career advancement.

Skills in R or similar is often confused with data science.Data scientist is not made just out of the ability to use a programing language or a tool. It could be beneficial but this is a very small part of the whole. The time would be much better spent in deeper understanding of the techniques and methods, their applicability to problems, analyzing results and communicating the findings than going deeper and deeper in a specific tool. Details could always be found and learned and should not cloud the more general understanding and knowledge.

In conclusion
There is one side of analytics that makes it different from the IT business. There is always a place for a human in it. No matter how a report is created it will be just a compilation of charts and numbers. it is the human to analyze it and make decision on it and there is no machine that could take that away. Well, at least until Skynet comes online but this is another story. However, anyone who considers going into analytics simply for the promises it has now should go other way. These things will change. As with software development, it is hard and demanding work and if you are not passionate about it, you will fail or will feel unhappy. I believe that a person will always have good and happy life if he does what he loves.

No comments:

Post a Comment