Nov 21, 2013

Faster Large Data Spredsheets: Alternative to the IF() Function

Spreadsheets with large data and large number of calculations are  part of the business landscape no matter we like it or not. These spreadsheets require greater focus on the design and the way calculations are performed. One of the culprits for poor performance in this type of spreadsheets is the overuse of the IF() function.This is probably the first of Excel function we learn and one we use the most, often neglecting its negative impact on performance. Recently I have done a project that required a calculation intensive spreadsheet model over relatively large data and good portion of these calculations were conditional. The effort for squeezing out the best performance led me to rethinking the way I handle the conditional formulas and I found an alternative of the IF() function that reaped great benefits. 


The idea
The alternative is based on the way Excel treats TRUE and FALSE in formulas and some basic Boolean algebra. Excel automatically converts TRUE to 1 and FALSE to zero in formulas, , i.e. expressions that return TRUE or FALSE are treated as 1 and 0 values. I will skip theory and jump to explanation through practical examples as a faster way to convey my idea. Let see a very simple case:
=IF(A1=20,100,0)
where we like to see the result of 100 when the condition of cell A1 having value of 20 is met.
This formula could be replaced by the Boolean expression:
=(A1=20)*100
The expression (A1=20) returns TRUE when cell A1 has value of 20 and FALSE otherwise. Excel treats these as 1 and 0 and returns 100 and 0 respectively. 
Any type of comparison could be used in the brackets: greater (>), smaller(<), greater or equal to(>=), smaller or equal to(<=). The condition could also be on text data, e.g.
=(Country="UK")*100

Any expression that returns TRUE of FALSE could be used. 

More conditions - AND, OR, BETWEEN
The technique allows using more than one condition at a time. These are usually tackled with AND(), OR() and lookup functions.

- Logical AND is performed by multiplication of two conditions. It returns TRUE only when both conditions are TRUE and FALSE in any other case.
Example for AND:
=IF(AND(A1=40,B1="L"),C1,0)
The formula results in the value of cell C1 when two conditions are met - cell B1 to be "L" and cell A1 to have value of 40. The alternative for faster calculations is
=(A1=40)*(B1="L")*C1



- Logical OR is performed by adding the two conditions. It returns TRUE when one at least one conditions is TRUE and FALSE only when both are FALSE .
Example for OR:
=IF(OR(A1=40,B1="L"),C1,0)
The formula results in the value of cell C1 when two conditions are met - cell B1 to be "L" and cell A1 to have value of 40. The alternative for faster calculations is:
=((A1=40)+(B1="L"))*C1
In contrast to the example for AND, the multiplication of the conditions is replaced by an addition - this is the Boolean expression for OR.

- BETWEEN: This technique allows more complex comparison as BETWEEN to be implemented easier than by the IF() function:
=(A1>10)*(A1<100)*C1
This formula returns the value of C1 when the value of A1 is between 10 and 100

The FALSE part
The examples so far were focused on getting the TRUE part of the IF statement. Getting the FALSE is a no-brainer as the technique is the same but the conditions are in reverse. For example, the reverse of (A1=100) is (A1<>100). 
If both outcomes are required in the formula, include both conditions, e.g
=IF(A1=20,100,1) 
converts to 
=(A1=20)*100+(A1<>20)*1

How to handle the text
Multiplication of a comparison result by a cell value or a constant works OK for numerical values. Text requires a different approach and help comes from the REPT() function that repeats a text by a parameter number. Set the number of repetitions to be a comparison as described above and you will get an IF() statement for text:
=IF(A1=100,"OK","") converts to
=REPT("OK",(A1=100))
The formula returns "OK" only when cell A1 has value of 100 and sets the number of repetitions to 1 and empty string otherwise when the number of repetitions is 0. Same logic as the one described for numbers applies as well.




There is no limitation of comparison that could be put together. A nice side effect is the simplification of more complex conditions and increasing readability as some nested IFs could be reworked by this technique.The performance gains depend on the file size and number of affected calculation cells. of course. The spreadsheet that spurred this technique had about 1000 rows and hundred columns with about 3000 cells with IF() and considerably benefited from its application. It is not a good practice to work with such large spreadsheets but everyone would agree that sometimes we just have to. I hope we have another trick in the sleeve to create better product for these cases.

No comments:

Post a Comment