tag:blogger.com,1999:blog-55197617285750733322024-03-05T18:53:50.276+00:00Dan Gardev on Analytics, Data and EverythingDanail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.comBlogger97125tag:blogger.com,1999:blog-5519761728575073332.post-24777076639907322862023-04-05T11:15:00.000+01:002023-04-05T11:15:08.946+01:00Optimizing an ETL<p>The ETL process is an integral part of analytical and reporting efforts in an organization and keeping in shape is an essential part of delivering reliable, timely and quality data. Internet has a plethora of articles on building a good ETL but very few turn an eye on maintaining and optimizing a real-world corporate one. In this text I offer my inflation-adjusted two cents on the topic filling a small part of this void. The subject of optimization is the duration of the process and I would like to share with you some points of on making ETLs run faster based on experience and thoughts on the subject. It is a long read but everyone involved in this topic has the patience and stamina to endure lengthy and boring texts. My text is about ETL, but a lot here applies for the ELT as well.<br /><br />What does real-world corporate ETL look like? The first and most obvious feature is the high number of steps in the process, easily in the hundreds and thousands. A typical corporate ETL processes data for all activities in the organization as well as data from various external sources. The large data volume is another distinct feature. Last but not least characteristic is the complexity - the process steps execute in parallel, are highly interdependent and feed all sort of targets with variety of data transformation. Complexity and data volume grow over the time with the business development, new and evolving rules, the growing appetite for dat. A factoris also sub-optimal development management and lack of regular audit. Increased complexity and volumes directly translate into longer execution times. Other factor are contextual. For example, an US company expanding to Asia sets new requirements for shorter load duration because of the time difference.<br /><br /><b><i>Setting a target</i></b><br />The very first step in the optimization effort is establishing the target duration. Keep in mind the difference between total duration of the load and downtime of the target system, DWH or other. All data consumers should be interviewed to establish their limits - the musts and the nice-to-have. Time point of picking the data is also something to consider. Agreeing on a target duration is essential all aspects of the project - managing expectations, budgeting, measuring success, etc. <br /><br /><b><i>Setting up the team</i></b><br />Apart from the data engineers, a successful team should also include a good database administrators and BI experts. The last two may not be involved full-time but their input is indispensable.<br /><br /><b><i>Before any optimization started</i></b><br />Along with creating a test environment, prepare scripts for resetting the target. The ETL will be executed many times during the project and you have to make sure it does so under same conditions. For example, not resetting the target would skip insert operations because there will not be any new data to insert after first run. A scrip for automated execution may also come handy because for some tests the ETL may need to be run multiple times. Another thing to consider is the test instance differs from the production one because typically there is no activity in it and the advancements in test are lesser in production. The optimization project also needs a fast, accurate and reliable way to measure execution time of the process and its steps. Most of platforms and software packages have this feature but some manual processes do not have that built-in.<br /><br /><b><i>Is your service good enough</i></b><br />This is the lowest hanging fruit. Review the "hardware" and its settings. In the case of cloud environment, analyze the adequacy of compute resources, cores, parallelism, etc. The data/compute services could be outdated and if the organization has a budget for upgrade, the target duration could be easily achieved at this very first step. Testing environments in the cloud is relatively easy and cheap so you can play with few setups to balance parameters and cost. Options for testing are more restricted in the case of on-premises servers. This step also includes analyzing the optimality of the settings of environment and the tools. Every software and service has multiple settings and there is a combination providing the best performance within the system parameters. The software vendors and numerous gurus on the Internet have good advice to follow on this one. However, the best setup may not be clear for your specific case because the setup parameters could have compound effect that is difficult to figure out. This case could be tackled by a grid search across the ranges of parameters values. The automated execution script helps a lot in this scenario as you can analyze the best configuration while doing your tai chi.<br /><br /><b><i>General approach - Theory of Constraints</i></b><br />The approach of the optimization follows classical Theory of Constraints. In essence, find the bottleneck, improve it until it is not a bottleneck anymore, then move to the next one. Do this in repeat until you hit the duration target or your options for improvement are depleted. In the case of ETL, bottleneck is the process taking the longest execution time. <br /><br /><b><i>The Worst Offenders list</i></b><br />Some bottlenecks are too difficult to improve and return of invested time is not good enough. Instead of jumping to fixing the longest running step, a better option is to start with compiling a list of worst performing steps. Analyze the efforts to improve and weight in weight in overall duration for the top 10 and set your priorities. Setting these helps in making steady progress, keeping motivation high and showing progress to the project owner. The "worst offenders" list also useful in assigning the tasks to the team members. <br /><br />The steps above set the project and the team could dive in the gory details of each step toward getting to the desired target. Below are some of the common methods for improvement.<br /><br /><b><i>Turn off steps loading unnecessary or redundant data </i></b><br />For an ETL that has been around for some time it is very likely some of processed data not to be used any more. A case like this may also come from out-of-box ETL of some pre-packaged solution. These usually do not consider the company specifics resulting in processing unnecessary data. The data lineage, tracking data from source to report/analysis, is the right tool for identifying the redundant steps. Make sure to turn off the step execution rather than deleting it because in case the step is required in the future all you have to do is turn it back on. Analyzing the data lineage is not easy but worth it - you can avoid massive amount of hard work on optimization.<br /><br /><b><i>Process only the data you need</i></b><br />It sounds obvious and simple enough but nevertheless, make sure only the necessary data is extracted. For example, a source table may be small at ETL development phase and then grow over the years to cause longer load times. As another example, a step could use some hard-coded values or obsolete parameters to cause extraction of a much greater data volume. A step could also be set for full instead of incremental load. The rule of getting in only the new/updated data is basic for ETL development but it is the overlooking of basics that causes the trouble.<br /><br /><b><i>Combine steps sharing same source</i></b><br />It is likely the sub-optimal development process and documentation to introduce steps loading data from same source. Usually there are no great differences between these steps - one loading most of data and the others getting few additional data elements. Find these steps and combine them into one. If it is not not possible for some reason, you can use a temp data structure to pre-load and use in both steps. <br /><br />The next steps could be defined by one word - simplify.<br /><br /><b><i>Simplify</i></b><br />ETL steps involving many tables, views, procedures, and functions along with complex transformations are very likely to have a heavy punch on performance. They offer the additional "benefits" of being hard to read, debug and analyze. Simpler and smaller steps provide faster execution, better readability, optimal use of resources and easier performance analysis. One way to simplify is to split the step into smaller ones. Another is to replace sub-selects and views with temp tables loaded before the actual step. Use of functions is also an option to explore. <br /><br /><b><i>Replace views and sub-selects as source with temp tables</i></b><br />The candidates are the complex steps involving many sources and/or producing large data volume. Apart from the benefits of simplification mentioned above, temp table allow indexing that could further speeding up the processes. The temp tables are loaded before the step using the data. Explore the option to do that in parallel with some other task, saving even more time.<br /><br /><b><i>Optimize indexes and their usage</i></b><br />This goes in few different ways. One is making sure the tables are properly indexed and two, proper indexes are used. Review indexes of all tables to remove redundant and create new ones if necessary. The impact of that is during use of tables but also in rebuild/refresh of indexes. The ETL code has to be reviewed for using hints because hinting an obsolete or sub-optimal index hurts the performance much more than not using index at all. If a hint is used, try the code without it - often the database engines do very good optimization job and improvements are immediate. Another check point is if indexes are deactivated during the ETL operations. Some take the approach to completely drop the indexes before record operations and re-create after. Disable/drop indexes is a fast and easy way to save execution time but balance that with time lost on refresh/rebuild.<br /><br /><b><i>Parallel vs consequential execution</i></b><br />Executing steps in parallel is a great way for reducing the execution time. However, handle with caution. A proper setup should take into account the parallelism capabilities and settings of the database/engine and the ETL tool as well as the required resources. Make sure processes running in parallel do not have same target or source table.<br /><br /><b><i>Optimize the target tables size</i></b><br />Target tables could be enormous. Sometimes it is for a reason, sometimes it is because their size was neglected over the years. Why would you keep data for 12 years back when all ever needed is 5 years back? Reducing the size of target tables will benefit in three ways - faster data insert/update, faster index rebuild/refresh and faster querying. One option is moving unused data to a historical table. The historical data has to be available to BI and other analytical tool and this requires additional development work. Another way to deal with the table size is to use partitioning.<br /><br /><b><i>Optimize custom code</i></b><br />Custom code is not unheard-of in data processing and is very likely to be a weak link in the process. Review this code for opportunities for improvement, usually there are plenty. For example, SQL UPDATE could be replaced by a MERGE operation. Also, consider removing the custom code and including its operations in a standard<br />ETL steps.<br /><br /><b><i>Take a step back and use common sense</i></b><br />The common sense is a lost art. We tend to rush into Googling and get drawn in the millions of pages with advice, magic command-lines and hacks. The answers sometimes are just before our eyes. This applies to optimization of ETL. Always keep in mind project circumstances, general business context and other specifics. Let me illustrate with an example from a project I did some time ago. It was back in times when BI access had to be shut down during refreshinig data from the sources. After weeks of efforts, the total execution time was significantly lowered and there were no obvious candidates for further optimization I was out of moves. However, at some point I realized there was no reason to kick out users before the initial staging phase (loading data from source into staging tables). A simple move of the shut-down step after staging phase saved a third of total downtime. It is not a process optimization step per se, but it served the greater goal of reducing total BI downtime. The temptation to dive in improving steps has to be balanced by a sober look at the project as a whole.<br /><br /><b><i>Instead of Conclusion</i></b><br />A slow ETL frustrates the business users and damages the image of the IT department. The costs of that and the cost of an optimization project can be avoided by introducing better practices in ETL maintanance and development. These practices include monitoring for steps duration trends and promptly acting on undesireable ones, keeping proper documentation, better development and change management, regular audits, avoiding custom code. ETL optimization is a challenge but now you are equipped with some ideas and a plan how to tackle it.<br /><br /></p>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-4306823155647036702016-08-09T13:21:00.001+01:002016-08-09T13:21:04.133+01:00Motor Vehicle Market in Bulgaria - Some Data RevealedCouple of years ago I got interested in second hand car market in Bulgaria and was not surprised there is not available data for it. I tried to proxy it with the data from a <a href="http://www.mobile.bg/index_koli.html" target="_blank">mobile.bg</a> - the largest car sale site in the country - <a href="http://danailgardev.blogspot.bg/2014/09/some-insights-on-used-car-market-in.html" target="_blank">see the post here</a>. Тhe approach had some downsides but still delivered some data. However, it was a partial effort as a proper estimation would require regular data extraction and comparison and I had not had the time to do that. The good news is, thanks to EU Open Data effort, the Bulgarian government has recently published data for registered motor vehicles. Please find it <a href="https://opendata.government.bg/dataset/http-opendata-government-bg-organization-ministerstvo-na-vatreshnite-raboti" target="_blank">here</a>, in Bulgarian only. It is a good step forward and I welcome that. The data is missing some important details such as mileage and engine type (diesel, petrol, electric) and could have been better organized but it way better than no data at all. I was not happy to confirm some of my conclusions about the age of the car park as it is a sign the economy is not in the country it should be after more than 20 years of free market.<br />
<br />
I hope the Open Data initiative will not die and the government will keep publishing data on regular basis to satisfy the data curious minds.<br />
<br />Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-70806125377299522332016-04-22T09:07:00.003+01:002016-04-22T09:07:38.143+01:00Is There Anything Wrong with Year 2016The pop king Prince has passed yesterday at age of 57. He joined a seemingly long list of celebrities we lost this year. Posts on social media go along the lines of "2016 did it again", "First Bowie now Prince" and "Isn't is the worst year ever?". One might really think there is something going on this year. But is it really?<br>
<br>
<a href="http://danailgardev.blogspot.com/2016/04/is-there-anything-wrong-with-year-2016.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-24344104540842826932016-04-12T12:44:00.003+01:002016-04-12T12:44:50.544+01:00Sunken Costs and SausagesLast week I went to a client conveniently located in a fantastic tourist destination and I had great opportunity to add some weight while having good times. I was recommended a restaurant serving fantastic one-in-the-world sausages and one evening I took the walk to get there. It was bit off the tourist and business tracks and soon I got to a part of the city where nothing much was happening and I was expecting a quiet and tasty dinner. However, to my surprise the restaurant was crowded as an Apple store on a release of a new iPhone! People were standing with a beverage and plate of sausage so close to each other that even basic activities as checking Facebook seemed impossible. My companions were all "See how good is this place!". But was it really the food that kept people in?<br>
<br>
<a href="http://danailgardev.blogspot.com/2016/04/sunken-costs-and-sausages.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-35603025670249753972016-03-29T09:33:00.001+01:002016-03-29T10:33:57.667+01:00The Problem with Excel Spreadsheets in OrganizationsThe death of Excel has been announced many times but it seems to be tougher than officer McClane and still saves the day for many organizations. However, implementation is riddled with problems and dangers that could have detrimental effect on careers or businesses. I have long developed and fought for standards and procedures to minimize the risks and pains coming from badly placed, designed and developed spreadsheets and I am not alone in that as there are numerous evangelists of good Excel practices. However, their impact is limited and there are larger opposing forces at play. They come from the organizations and the accessibility of Excel itself.<br>
<br>
<a href="http://danailgardev.blogspot.com/2016/03/the-problem-with-excel-spreadsheets-in.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-57535928716643663272015-12-09T15:01:00.002+00:002015-12-10T08:20:05.392+00:00Quantitative Models in Business War GamesMany things look very cool but turn out to be not that exciting when one try using them. Like <a href="https://www.youtube.com/watch?v=5FFRoYhTJQQ" target="_blank">voice-recognition technology</a> or Communism or electric cars. Quantitative simulation in business war games proudly goes into this list as well. <a href="https://en.wikipedia.org/wiki/Business_war_games" target="_blank">War games in business</a> is simulation of moves and counter-moves by opponents in a commercial setting according to Wikipedia and is a part of the large family of exercises that aim at leveling up the strategies and management methods in modern companies. In a gist, a war game has two or more teams competing for some resources - usually it is a market share, sales or other tangible outcomes. These exercises could be very beneficial for a company. In the general case there is no measure for the impact of a strategy and some thought introducing a suitable quantitative model to deal with that. And it is a good idea. Except it is not as it usually does not work.<br>
<br>
<a href="http://danailgardev.blogspot.com/2015/12/quantitative-models-in-business-war.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-56458716591150173422015-11-18T11:01:00.000+00:002015-12-08T09:09:20.815+00:00From Start up to F**k up - What I Have LearnedI wonder when and why startup businesses became such a big thing. Maybe with the easy access to high tech, available funding and more people got convinced they could develop their own product to kill the market and buy the Hollywood life. However, despite all the books and gurus on the matter, starting a company from scratch is bit more difficult than installing an app on your phone. I am not a doctor but you can trust me on that - year and half ago I co-founded a company and went though the almost full specter of problems and emotions that come with it. The company is focused on retail analytical services and providing BI
solution that addresses the specific challenges in that industry. I have recently terminated my tenure there and have the time to reflect and share some of the lessons I
took. After all, isn't it a crime if you are in a startup and the world does not know about that?<br>
<br>
<a href="http://danailgardev.blogspot.com/2015/11/from-start-up-to-fuck-up-what-i-have.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-30998992333003865512014-10-01T08:08:00.001+01:002014-10-01T08:11:18.489+01:00Some Great Data Science and Big Data Links<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN8dQDYiQFcoxXnriDLe-wzzGqy928oXoaaUzInagxgRtIWKOK-JkVh6LslV9mMAjeWVqks4MlK4I7SuNvZsgDmuRB2mvtop9AxQk5xuGb1fTFoToyPKvBxy2KlapJgzM1DIo0OTEYFkw/s1600/dsc_logo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN8dQDYiQFcoxXnriDLe-wzzGqy928oXoaaUzInagxgRtIWKOK-JkVh6LslV9mMAjeWVqks4MlK4I7SuNvZsgDmuRB2mvtop9AxQk5xuGb1fTFoToyPKvBxy2KlapJgzM1DIo0OTEYFkw/s1600/dsc_logo.png" height="50" width="320" /></a></div>
<br />
The analytical hub of Data Science Central did an extensive research on the liked or mentioned sites and blogs among their member base. The result is a comprehensive list of the best data science sources. Please find the list <a href="http://www.datasciencecentral.com/profiles/blogs/top-2-500-data-science-big-data-and-analytics-websites" target="_blank">here</a>.<br />
<br />
If you are looking to expand your list of regularly visited blogs check out the similar list of <a href="http://www.datasciencecentral.com/profiles/blogs/50-blogs-worth-reading" target="_blank">50 Data Science and Statistics Blogs Worth Reading</a>. There are some true gems.<br />
<br />
In case you need some large data to sharpen your skills or for any other purpose go to <a href="http://www.datasciencecentral.com/profiles/blogs/20-free-big-data-sources-everyone-should-check-out" target="_blank">20 Big Data Repositories You Should Check Out</a>. <br />
<br />
Of course, you can always find some original content and random analytical-related thoughts <a href="http://danailgardev.blogspot.com/" target="_blank">here</a>.Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-87808441174845487662014-09-24T13:14:00.001+01:002014-09-24T17:27:45.198+01:00Used Car Market in Bulgaria - Where is The Data?<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAFrUfobRTwIPTw8XU6iOsnMpcU2c4O4BC8a_4xWh01r8_gQ2TVe6R4drq6ZginaLIPwuOulSWtRNAEToBM3ixHKmUzKJZ58iI4rrhSYAXXm4vteSoI9sGmIaXA7loWfEyNpOYxhAmpzo/s1600/cars+logo.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAFrUfobRTwIPTw8XU6iOsnMpcU2c4O4BC8a_4xWh01r8_gQ2TVe6R4drq6ZginaLIPwuOulSWtRNAEToBM3ixHKmUzKJZ58iI4rrhSYAXXm4vteSoI9sGmIaXA7loWfEyNpOYxhAmpzo/s1600/cars+logo.jpeg" height="224" width="400"></a></div>
Couple of weeks ago I shared some difficulties that come with <a href="http://danailgardev.blogspot.com/2014/09/the-winding-roads-of-sales-forecasting.html" target="_blank">forecasting the new cars market</a>. This market is interesting for marketers for obvious reasons but it forms the smaller portion of the total car market. The second hand car market got my interest and I looked around for some data. I focused my curiosity on my home country and I would like to put some popular myths against hard data. It turned out that there is virtually no data at all and I had to do some digging for details.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/09/some-insights-on-used-car-market-in.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com2tag:blogger.com,1999:blog-5519761728575073332.post-61258296521072399662014-09-09T14:38:00.001+01:002014-09-25T08:13:02.761+01:00The Winding Roads of Car Sales Forecasting<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX7vg-pSyu1YV4L5aqj44IXjydLkiRQKGgnxVnUeCDBe93GZ2GhVVPXU6ouw-7AB-2kqlf9sl_WtiZffC9HfPtcBCL8VFUVualS8YjUyBBLE22UyGoEs5Ys_DuZ0WDa3xFMTfSqHJBKqU/s1600/2fast2furiousrace.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX7vg-pSyu1YV4L5aqj44IXjydLkiRQKGgnxVnUeCDBe93GZ2GhVVPXU6ouw-7AB-2kqlf9sl_WtiZffC9HfPtcBCL8VFUVualS8YjUyBBLE22UyGoEs5Ys_DuZ0WDa3xFMTfSqHJBKqU/s1600/2fast2furiousrace.jpg" height="160" width="400"></a></div>
Forecasting sales of a car dealer is a tough business, much tougher than predicting the total market. Winning a car race is matter of right combination of engine tuning, tires type and pressure, quality of petrol and the other fluids, the race track parameters, weather conditions, the pilot mental and physical status and many others as well as all these of the competitors. Similarly, sales results depend on plethora of interconnected factors that makes forecasting it Heracles-grade labour. However, the challenges in predicting car sales are not unique and it is a good illustration of some of common problems. <br>
<a href="http://danailgardev.blogspot.com/2014/09/the-winding-roads-of-sales-forecasting.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-3285371288273259072014-09-05T07:33:00.000+01:002014-09-05T07:33:06.396+01:00Nice Article About Application of Analytics in Restaurant BusinessRestaurants are probably among the most ancient businesses. Analytics steps on the vast experience in the field and adds new insights and creates lot more opportunities. The article <a href="http://techcrunch.com/2014/09/01/tables-tablets-and-data/?ncid=fb&utm_source=feedburner&utm_medium=feed&utm_campaign=fb" target="_blank">Tables, Tablets, Data And Eating</a> published on techcrunch.com/ has some nice examples for that. There are some good points about retail analytics as well. Enjoy!Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-65787572423099648652014-08-29T08:51:00.000+01:002014-08-29T08:51:23.211+01:00ModelOff 2014 is Now Opened<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlagcALhPfLlIBf94bo7O3lwqYMzb9ImSJMkaT5HqPowuJcILp2NZFbluYDTwK16FDxUmeXMHnT0urpgEECg4RdCAA8zYSfAnQc8MF5KixioZIds60JoXzBolCFH0b6_C51laXUyeYldI/s1600/ModelOff-Herder-TM.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlagcALhPfLlIBf94bo7O3lwqYMzb9ImSJMkaT5HqPowuJcILp2NZFbluYDTwK16FDxUmeXMHnT0urpgEECg4RdCAA8zYSfAnQc8MF5KixioZIds60JoXzBolCFH0b6_C51laXUyeYldI/s1600/ModelOff-Herder-TM.jpg" height="70" width="400" /></a></div>
The biggest and best Excel challenge is now opened! Professionals and students in Finance, Banking, Accounting, Investments and Quantitative industries who love and use Microsoft Excel could accept the challenge and compete with the best from 100+ countries. The tasks are very challenging but the fun is guaranteed - I have done most of the tasks from the previous competitions and I know for sure. There are three rounds - first two are on-line case studies and the third is on a live event in New Your City. <br />
<br />
Please visit <a href="http://www.modeloff.com/">http://www.modeloff.com/ </a> for further details. If you are not going to participate, make sure you review the <a href="http://www.modeloff.com/questions/" target="_blank">past questions</a> and learn some new tricks.<br />
<br />
<br />
<br />Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-23331322069368179782014-08-28T14:49:00.003+01:002014-08-28T14:50:46.755+01:00VBA or Formulas?<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmVarT4_ioA-_Mk_PPmE4TkLkbv8hJ7XpxcIb-mjzn7AUyQkbvRCeFLXHJsqR0zAoT4_DVypWaXJlEW04aB7Wuhkk2Mj5jTtAdD2bcEvwGdalKnoYhqN8mxLmTqXUtE0e1NfEEp9E40qw/s1600/question+small.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmVarT4_ioA-_Mk_PPmE4TkLkbv8hJ7XpxcIb-mjzn7AUyQkbvRCeFLXHJsqR0zAoT4_DVypWaXJlEW04aB7Wuhkk2Mj5jTtAdD2bcEvwGdalKnoYhqN8mxLmTqXUtE0e1NfEEp9E40qw/s1600/question+small.jpg"></a></div>
In the process of development complex spreadsheets in Excel there is always the question of the balance of using VBA macros vs. implementation of with formulas. There is no one answer and the choice depends on the target users, complexity of operations and logic, size of the sheet and others. I would like to share some thoughts on this matter. <br>
<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/08/what-is-better-in-excel-desigh-using.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com1tag:blogger.com,1999:blog-5519761728575073332.post-68342821901112029352014-08-20T08:34:00.001+01:002014-08-20T08:38:32.756+01:00Transition from Excel to R<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4MS1N0Z_OoQCJ-5ZwyBoHX8IOAMHwBOZBCgHoGLon1AqbuRP0Z7Y7KhSpBi_iuKFsegKYYWuSvSbxFM1jxpj8gyfi0imYCVDdEiGUa81rgxK1Prx-BbxdovcR3Rl0G_YaNm8UdbBkHxc/s1600/logo_R_large.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4MS1N0Z_OoQCJ-5ZwyBoHX8IOAMHwBOZBCgHoGLon1AqbuRP0Z7Y7KhSpBi_iuKFsegKYYWuSvSbxFM1jxpj8gyfi0imYCVDdEiGUa81rgxK1Prx-BbxdovcR3Rl0G_YaNm8UdbBkHxc/s1600/logo_R_large.png"></a></div>
The R language is a powerful tool but the learning curve could be very steep and intimidating. However, as a business user you might be looking into it in search of speed, flexibility and better handling of larger data. An Excel user would be thinking in terms of the most common tasks in Excel - summaries, pivots, look-ups, filtering and charting and the first questions about R naturally would be how to perform these tasks. The answers are not always easy to find. Fortunately, there are some good people on the Internet come to help. I have put together a short list of blogs and sites that could be very useful for the initial part of the Excel to R transition.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/08/transition-excel-to-r.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-43247262329836309302014-08-13T12:21:00.000+01:002014-08-13T12:21:00.796+01:00How to Deal With Somebody Else's Excel Workbook<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4Iru602xR_4LfiwXvvcHVv25qgvDmD7NuOqFhNqHHNrAJInVQRj1-gI4lpk_ezrcFIYw_Vtd_1G_UmGxEr6YApp-VMlqZGpU99oacL8s5ORVjDcSgZNpN-00zRL_3M6JCw4wsnEZx9HY/s1600/cartoon5815.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4Iru602xR_4LfiwXvvcHVv25qgvDmD7NuOqFhNqHHNrAJInVQRj1-gI4lpk_ezrcFIYw_Vtd_1G_UmGxEr6YApp-VMlqZGpU99oacL8s5ORVjDcSgZNpN-00zRL_3M6JCw4wsnEZx9HY/s1600/cartoon5815.png" height="240" width="320"></a></div>
No matter how closely you follow your religion's prescriptions for righteous life sooner or later the life will serve you the task of dealing with an Excel model made by somebody else. The clash with a different design style, approach of problem solving and sheet organization is definitely not joyful pleasant experience. However, over the time I developed sort of a methodology to follow in this process to deliver fast result with minimum efforts and keeps my sanity intact.<br>
<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/08/how-to-deal-with-somebody-elses-excel.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-70941135143988633042014-07-23T15:09:00.002+01:002014-07-23T15:19:43.120+01:00Perception vs Data: Is This The Rainiest Summer Ever?<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4GGSxRG2tmOSR6nLploC89wpwPqlYaFPdvyN0SGLV0M-CPq__JuDeft5RN2MvzjIb4zax9gu3CuPeJVeHYMf0jRnxUeLc4SGa318RgELcykZFJzPd8GBvIBepIRSjuePdBtBctULfO-M/s1600/WeatherMap.PNG" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4GGSxRG2tmOSR6nLploC89wpwPqlYaFPdvyN0SGLV0M-CPq__JuDeft5RN2MvzjIb4zax9gu3CuPeJVeHYMf0jRnxUeLc4SGa318RgELcykZFJzPd8GBvIBepIRSjuePdBtBctULfO-M/s1600/WeatherMap.PNG"></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Weather report on bTV</td></tr>
</tbody></table>
"Oh, not again!". This is the thought that flashed through my sleepy head this morning when the weather woman stood in front of a map densely covered by small pictures of rain and <span class="st">lightnings</span>. It has been very rainy summer. The never ending rain and the missing sunshine have been a major topic in the conversations for the last few months. Everyone around is sort of angry for the lost opportunities for good times outdoors. The phrases of the day are "This is the rainiest summer ever!", "There has never been such a summer before" and "I don't remember a summer like this one". It got me thinking if it really is the case or it is our perception playing tricks? I pondered on a similar question in my <a href="http://danailgardev.blogspot.com/2014/07/is-this-world-cup-better-than-previous.html" target="_blank">World Cup post</a> where I tried to find why this tournament is considered to be a fantastic one. In contrast to the emotionally charged football tournaments, the weather perception should be relatively simple to analyze as it has been well recorder for a long time and there is hard data on it as well as there are much less factors to consider.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/07/perception-vs-data-is-this-rainiest.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-42908929918219359212014-07-09T05:43:00.000+01:002014-07-09T05:43:59.295+01:00Is This World Cup Really Better Than Previous Two: Bra 1-7 Ger Update <div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLFJ4zMEMrOn6rKoh9vE8vDC1bU1n3TUxqwYvL7CezWVu9JXG_SYYwSF1IHjsfGnZMQV4YR5EfVdDpTUdiD5oZvoyOidCULXs3fLV2DHjHFnsrWC0vsdoNBdcLOB-7fvt84Ch7yIZMyw/s1600/ger+bra+3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrLFJ4zMEMrOn6rKoh9vE8vDC1bU1n3TUxqwYvL7CezWVu9JXG_SYYwSF1IHjsfGnZMQV4YR5EfVdDpTUdiD5oZvoyOidCULXs3fLV2DHjHFnsrWC0vsdoNBdcLOB-7fvt84Ch7yIZMyw/s1600/ger+bra+3.jpg" height="255" width="400"></a></div>
<br>
<a href="http://danailgardev.blogspot.com/2014/07/is-this-world-cup-really-better-than.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-18384870965156880422014-07-03T13:28:00.002+01:002014-07-04T11:28:42.105+01:00Is This World Cup Really Better Than Previous Two?<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ89xb54nf18nVrV4SLuPZ4woMBTvTLaFD3RDBRgFnxYP-ZyxMxG_QUU2OzJ7Y_KOr5O7SZp8_C4mlTfrOGdJyGUwuxKivodhlPIgillOzSLeD2l7hnbPJBv0f6cSpmeB0H2hqalx9608/s1600/fifa.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJ89xb54nf18nVrV4SLuPZ4woMBTvTLaFD3RDBRgFnxYP-ZyxMxG_QUU2OzJ7Y_KOr5O7SZp8_C4mlTfrOGdJyGUwuxKivodhlPIgillOzSLeD2l7hnbPJBv0f6cSpmeB0H2hqalx9608/s1600/fifa.jpg" height="200" width="200"></a></div>
The World Cup in Brazil is perceived to be the best World Cup tournament among the last few ones. This is according to my circle of friends and acquaintances and me as much as I could be a reliable source. It made me think about what makes a football tournament a good one and why exactly this one is better than the ones in 2006 and 2010. I asked around to gather some opinions and I also decided to go and see some data to find whether it could tell bit more about that. <br>
<br>
<a href="http://danailgardev.blogspot.com/2014/07/is-this-world-cup-better-than-previous.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-5747402000765482812014-06-19T12:46:00.001+01:002014-06-19T14:45:15.928+01:00Friday Funny: The Fastest Way Ever to Becoming a Data Scientist<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqYie4BnScOvi4h9w5aGPh6Ep7B4TeIhP9kwYDOQZmHM9zpPocrVT0BAXt9ssO0kHfelAK32ckUJGJBDtA2AdLlwVXVNWn6rt2M1e3-VeBvm5pYnFwU8E2pkOYLy_Me1Txp6fYWySpvio/s1600/speed.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqYie4BnScOvi4h9w5aGPh6Ep7B4TeIhP9kwYDOQZmHM9zpPocrVT0BAXt9ssO0kHfelAK32ckUJGJBDtA2AdLlwVXVNWn6rt2M1e3-VeBvm5pYnFwU8E2pkOYLy_Me1Txp6fYWySpvio/s1600/speed.jpg"></a></div>
Data scientist and data analyst are among the most wanted positions. It also sounds very cool to be a scientist, despite Sheldon Cooper. There is one little thing that makes it hard to get on the board and it is the amount of work one need to put in developing the required skills. I have a good news to you! Recent research has proved that this an absolute misconception and data-science-related skills could be developed much faster than previously though. There is a recipe to open the door to this lucrative field and it is based on numerous observations on the evolution of experts in the field, so it could be trusted.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/06/friday-funny-fastest-way-ever-to.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-66755064027727970242014-06-13T09:07:00.003+01:002014-06-13T09:07:59.252+01:00Can You Do Better? Some Great Examples of Excel Dashboards<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwJraTb4gObA6BrojALZJ44baguLZY1DQ99V41PQzHJ1pmFJDWVNoH8yi8H3zUEIQZqGn972ePdNiScdhLxw5xbMfAr4j1X5dZtPe8ZBPeApZjhrhsIqEiuWnt5AuqVH1wpOPbcWLHaU/s1600/state-migration-dashboard-snapshot-38+70.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpwJraTb4gObA6BrojALZJ44baguLZY1DQ99V41PQzHJ1pmFJDWVNoH8yi8H3zUEIQZqGn972ePdNiScdhLxw5xbMfAr4j1X5dZtPe8ZBPeApZjhrhsIqEiuWnt5AuqVH1wpOPbcWLHaU/s1600/state-migration-dashboard-snapshot-38+70.png" /></a></div>
<br />
Every year the awesome site of <a href="http://chandoo.org/wp/" target="_blank">Chandoo </a>organizes a Excel dashboards contest. The site has just released the best entries of the latest competition and there is some great stuff. The participants took time to produce fine examples of dashboards. There are a concise comments for what is good and what is bad. Also, every dashboard is available for download if you would like to get in the details of how it is made. Go at <a href="http://chandoo.org/wp/2014/06/12/state-migration-dashboard-contest-entries/" target="_blank">the post</a> and get some ideas, learn and enjoy. <br />
<br />
<br />
<br />
<br />Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-10172172933433674152014-06-11T14:58:00.000+01:002014-06-12T13:05:15.260+01:00Analytics for The Small Business: Mission Possible<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJnJPvqVnGNXY8p8rhMSQ5itQZJMuxhI81XTbEkt_3ETq3636LfXjfTmZuDNo-VN9ylcf8KN_5Rk_tiYqVA3aVHiqiDfLsFDkWRC1pLmzcA7zmp5ARPuLM3NT-7kFQFXixVOjaQ1Iu23A/s1600/small_business.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJnJPvqVnGNXY8p8rhMSQ5itQZJMuxhI81XTbEkt_3ETq3636LfXjfTmZuDNo-VN9ylcf8KN_5Rk_tiYqVA3aVHiqiDfLsFDkWRC1pLmzcA7zmp5ARPuLM3NT-7kFQFXixVOjaQ1Iu23A/s1600/small_business.jpg"></a></div>
Small business usually has to be very smart to compete on the market. "Smart" includes not only the personal quality of people running it but also, with growing importance, the ability to analyze business data. The sad truth is that these businesses are left behind by the majority of analytics vendors and the owners struggle to find solutions to meet their needs. The question is what are these solutions that could meet the demand.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/06/analytics-for-small-business.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-55913819272693908362014-06-03T20:46:00.000+01:002014-06-05T13:56:19.019+01:00My Favourite Job-related Joke<br>
Once there was a company and it had a computer system for its intensive operations. One day, out of he blue sky, the system broke down. The company froze - it could not do anything with it - no sales, no orders processed, no purchases. All the gurus from the IT department sweat over restoring the system, but nothing worked, even Google could not give an advice.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/06/my-favourite-job-related-joke.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-31350311704771271062014-05-21T14:02:00.003+01:002014-05-21T14:08:49.462+01:00Great Online Course for Data Mining!<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTbkTg13EsTh31t16SiRhZYRVz3guCdIN8fH43fQqjIdkpk7QSvBEz1f3O4fTHM3LRE9SjtZjWNgRD7fSqEpP-Zoz9xEyllhqqeBur7x6VbqT7CLRoK-hQT2mdTTvnRHPqpydLtbWZ2xA/s1600/Title-Bird-Header.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTbkTg13EsTh31t16SiRhZYRVz3guCdIN8fH43fQqjIdkpk7QSvBEz1f3O4fTHM3LRE9SjtZjWNgRD7fSqEpP-Zoz9xEyllhqqeBur7x6VbqT7CLRoK-hQT2mdTTvnRHPqpydLtbWZ2xA/s1600/Title-Bird-Header.gif"></a></div>
Data mining appeal for companies and analytic practitioners is growing by the day. So where should you start with it? Recently I have been evaluating data mining software and courses and I came across a very good one that I can recommend without any hold-backs. This is the MOOC organized by University of Waikato. MOOC stands for "massive open online course" but do not be fooled by the name - its a serious course that delivers right on the target.<br>
<br>
<a href="http://danailgardev.blogspot.com/2014/05/great-online-course-for-data-mining.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-3853722787468097772014-05-14T12:47:00.001+01:002014-05-15T15:14:21.733+01:00The Raise of Data Scientist - Have We Seen That Before?<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3KVyZhNV7LGPSaaMpg1OdFeSiW53OKC0hhVUo8hn7DYX3LIA3VX6n2U9PP9J6kbqJ6srrD2d37u2fVUxctPICe2nANp3fQ59n9W2wzTRCqM09X8RmhY_Xq0BPjXQcEObS3IhHAH6pu9U/s1600/rstudio-windows.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3KVyZhNV7LGPSaaMpg1OdFeSiW53OKC0hhVUo8hn7DYX3LIA3VX6n2U9PP9J6kbqJ6srrD2d37u2fVUxctPICe2nANp3fQ59n9W2wzTRCqM09X8RmhY_Xq0BPjXQcEObS3IhHAH6pu9U/s1600/rstudio-windows.jpg"></a></div>
In a recent conversation somebody was very excited about the marketability of skills in R and similar tools as well as with the growing demand for people having them. The story went about the bright career perspectives - money, good position in the management hierarchy, fame and Aston Martins with Victoria Secret models in them. This person is not alone and his opinion is obviously backed by the growing number of job ads requiring R or similar skills. However, I beg to disagree because we all have seen something very similar and things developed differently.<br>
<br>
<b></b><br>
<a href="http://danailgardev.blogspot.com/2014/05/the-raise-of-data-scientist-have-we.html#more">Read more »</a>Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0tag:blogger.com,1999:blog-5519761728575073332.post-76297007340265855842014-05-12T12:11:00.001+01:002014-05-12T18:55:05.535+01:00Age of Miss America Correlated to Murders by Steam?<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRyt1p7Ydgya4stDyoE1yfRhp6PkkP9_MwFaQ3gUwhGwOGnc_UivBgdGOxkfxF1XbzgoTLSeDwNkpPRe48-O0xtBpDYrK-2_w3l6oZOfLi-ZLsXRk3OjtlGPb-OMLdGZ365zCse8wmvZs/s1600/MissAmerica.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRyt1p7Ydgya4stDyoE1yfRhp6PkkP9_MwFaQ3gUwhGwOGnc_UivBgdGOxkfxF1XbzgoTLSeDwNkpPRe48-O0xtBpDYrK-2_w3l6oZOfLi-ZLsXRk3OjtlGPb-OMLdGZ365zCse8wmvZs/s1600/MissAmerica.png" height="137" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Age of Miss America And Murders by steam, hot vapors and hot objects</td></tr>
</tbody></table>
<br />
One of the dangers of too much data and too many "scientist" are the spurious correlations. These are correlations that happen purely by chance. If you have time series to explain and massive amount of data sets to test, sooner or later you wold find a meaningful correlation. I <a href="http://danailgardev.blogspot.com/2013/08/the-dark-side-of-data-abundancy.html" target="_blank">have posted about that</a> some time ago but today a co-worker have sent a link with some excellent illustrations of the point and is too good not to share. Go to <a href="http://www.tylervigen.com/" target="_blank">Spurious Correlations</a> to see them all - some are very funny others are puzzling. I did know that somebody could die by becoming tangled in their bedsheets! Scary bedsheets!Danail Gardevhttp://www.blogger.com/profile/15454895459667019618noreply@blogger.com0