Apr 8, 2013

Generate All Combinations Of Reference Data

Data preparation and analyses in Excel often require a block with formulas based on combinations of reference data items. Populating large ranges with a formula is not a problem and the pesky problems is the generation of reference data combinations. For example, you need all the combination of region, segment and technology to generate the a sales report. This is not a problem in the case of few items in each reference data type - a quick manual copy and paste does the job. However, the case of  are tens of items in each list is time consuming and unpleasant to approach manually. You could avoid that with simple logic and formulas. Read further to see what I propose you can do.

Do You Have It Available From Existing Report Or Other Source?
Before anything else, make sure you cannot take the combinations from existing report, generate them with a pivot table or use a database query. If you are on the manual report track, a pivot table for data is not suitable but consider using it just to get the all the combinations of the reference data. Same with the database query.

Use Formulas
For the sake of the example, let assume the there are three types of reference data - region with 7 element, segment with 3 and technology with 2 elements. The idea is to put together formulas to generate consecutive item numbers for each data type and then to use a look-up to get the names. The steps to follow are:

  • Copy and paste the three data lists on a spreadsheet and make sure that all the duplicates are removed and names are set as you would like to see them in the report. Put a number on the left of each of the items in the lists.
  • Setup 3 columns - one for each of the data pieces. These columns will hold the number of the item. I will use the easiest case when the data types are ordered in columns from left to right by the number of items in each one of them starting from the least one. In the example,  the technology is in the leftmost column as it has 2 items, then segment with its 3 items and the last is the region with 7 items.
  • Put 1 as value in first row for each column
  • Set the formula for region counter. The idea is to generate a consecutive region IDs and restart the counter each time the total region count is reached.The formula is simple - if the region ID on the row above equals the total regions, then set it the current ID to be 1 else it is the ID from the row above plus one. If the fist row is row 3 and region is in column K, the formula would be 
  • Setup counter for the segment. We would like the segment ID to stay the same as in the row above until the region ID has not reached it max count. Then set it plus one if it has. In case of both region and segment counts have reached their max counts, the segment ID is set to 1 to restart counter. The formula again is not complex 
  • Setup counter for the technology. The logic is to keep the same ID as on the row above until both region and segment reached their max counts when the current row gets the value in the previous row plus one
  • Extend the formulas to cover all the cases. You could do that by dragging the formulas till cover all the combinations. Another approach is the calculate how many rows there should be, go in the address bar and navigate to a cell in column I, set a mark there, then go back to your formulas, copy the three cells in columns I to K, then the down arrow (without releasing the Ctrl and Shift keys) and paste. In this example there are 42 (7*3*2) combinations so set a marker in cell I44 (which is 42 combinations plus 3 which is the number of first row minus 1 as we have first combination set).
  • Check if formulas are covering all the combinations and no ID  is missing. 
  • For each data type, write a simple VLOOKUP formula to extract the name for that ID. After populating it for all rows, the list of all combinations is set.
  • Move columns to match the order to be in place in the report.
  • Copy and paste as values in the sheets where the list would be used.

This is a very simple example and you could face more complicated situations, but the principle would be the same.
Tip: keep the generated list for further use. In case data items change, simply change the values in the lists and expand the ranges if necessary.

No comments:

Post a Comment