Data comes to the spreadsheet in a number of ways. It could be a short list of quarterly sales per department or a long list of customers and their balances copied from another workbook. It could be imported from a website or fed from a database query. If you are lucky, your spreadsheet will require one-off data loading. Unfortunately in most cases we need to review and update our spreadsheets. We then have to go back to the data and remember where and when it came from as well as many other details depending on the specific data source - it could be a query, report name, web-address and date extracted. The problem is even harder when you work with many spreadsheets serving different customers and purposes. This is why I think that proper management of data sources is an important step towards better models and better management of spreadsheets. Here’s how I do that.
What To Do?
I know that "data source management" could sound scary and is yet another thing to do in our busy daily routine. However, little efforts today could save (and do save) lots tomorrow. My simple rule of thumb for organizing data sources: make it absolutely clear what the data is, where it came from and how. On each spreadsheet I put a small header with description of the sheet and the data source as well as some other relevant notes. This is enough and takes just a minute.
Each data source should be accompanied with :
- Description of the data (what the data is for - sales, units, value, what currency and exchange rate, etc.)
- Data origin
• Provider – include link to source file, website URL, name of provider, database name, etc.
• Details for extraction - include info on the filters applied to the original data, report name if it comes from an information system, SQL query, etc.
• Date - date when downloaded or produced
• Specifics - any additional notes that are important, e.g. exchange rate, relevant notes on reference data, etc.
- Columns (fields) description - some sources come with column names that are not informative enough and it is a good practice to have a clearer description.
Depending on the specific situation,the list could be extended. Of course, most of the sources do not need all the details - sometimes they are well known and simply a name or title would be enough. The emphasis should be on practicality. The timing is of essence - the best moment to put the description is when the data is being populated - otherwise you could end up with wrong or missing details which could be much more dangerous than no details at all.
In my experience the clear identification of data sources in Excel spreadsheets should become a practice that becomes second nature. Having the discipline to spend a couple of minutes to punch in some short details will bring good returns over time.