- Make list of all reference data from the different sources.
- Process each list to remove duplicates
- Check the level of granularity and create a reference master data list. The master list has the data as you want it to appear in the analysis. For example, a source could report by each European country while in other there is simply total Europe figure. The report could be at the highest existing level of granularity - in this case - Europe. Of course, master list could be the reference data as used in one of the sources.
- Create a data dictionary. This dictionary is the relation between the master list and the ones in the sources.
- Enrich the reference data in sources with the corresponding values from the dictionary (if possible). I find it is a good practice as it makes very easy to perform any operations on the source data - sum, average, look-up, etc.
- Use the master list to create the analysis and formulas to refer to the sources
Once the master list is set up it is flexible and could easily be updated to include any new piece of data or requirement. As in the example with the countries, it would be easy to prepare the analysis per group of countries in Europe if the total Europe figure is replaced by figures for DACH, IIG, etc. The dictionary has to be updated with the relations between a country name and the group it belongs to. All the formulas that extract or aggregate the source data as well as the ones enriching the sources would remain the same and the analysis has be extended to include the new reference data easily.
I have used this approach for some time now for reports and models that need regular update of data and requirements and with saved some frequently used dictionaries it significantly reduces the data processing and setup time.
No comments:
Post a Comment