MDataMonkeyeBookNoIconsForEPub-24

Chapter 22 Advanced Conditional Logic

In Chapter 18, you learned how to replicate the functionality of Excel’s IF() and IFERROR() functions. Of course, the IF() and IFERROR() functions are not the only functions in Excel that you use to implement conditional outputs or matches.

Unlike the formulas examined previously, replicating other formulas in Excel’s conditional logic library involves a bit more complicated techniques, including using lists and custom functions.

Multicolumn Logic

Occasionally, as you are trying to filter down raw data, you may need to filter based on multiple criteria. Consider the following data set, which is included in the Ch22 Examples\Multi-Column Logic.xlsx file:

Figure 288 A raw data set.

Replicating Excel’s OR() Function

What if your boss asked you to list only data where you sold a Talkative Parrot or the item was sold by Fred? If this were an and scenario, it would be easy: Just filter the Inventory Item column to Talkative Parrot and then filter the Sold By column to Fred. But that won’t work in this case because you’d lose any Talkative Parrots sold by John or Jane, and you’d also lose any other items sold by Fred.

If the data is stored in an Excel table, you could solve this by using Advanced Filter or by adding a column using the following formula and filtering to only results that are listed as true:

=OR([@[Inventory Item]]="Talkative Parrot",[@[Sold By]]="Fred")

But what if the data doesn’t originate in an Excel table? What if it is sourced from a database, a web page, or a text file? Plainly, that won’t work.

Pull the data into Power Query to see how you should approach this:

Select any cell in the table on the May worksheet
Create a new query → From Table
Right-click the Date column → Change Type → Date

Since you can’t filter your data without losing required records, you’re going to need to add a custom column and apply a formula to examine each row:

Go to Add Column → Add Custom Column
Name the column Match?

You know that you’re going to need to test whether a certain criterion is true or false, using a basic formula framework along the lines of the following:

= if logic_test then "Meets Criteria!" else "No Match"

The challenge here is coming up with that logic test. Power Query doesn’t have an OR() function, so what is the equivalent?

When you’re looking for text comparisons, it is helpful to quickly scan the list of list functions contained in the Power Query formula categories.

Note: Remember that you can access the list of Power Query formulas by clicking the Learn About Power Query formulas link at the bottom of the Custom Column dialog and then clicking the Power Query formula categories link partway down the page. You end up at the https://msdn.microsoft.com/en-us/library/mt296612.aspx web page.

Within the list functions, you’ll find that there is a List.AnyTrue function, which sounds somewhat promising. Selecting that item in the documentation reveals the following example:

List.AnyTrue({2=0, false, 1 < 0 }) equals false

Based on this, you can see that the function contains a list of values, as indicated by the curly braces within the parentheses. It also shows that the function will return false if none of the items are true.

Try to nest this formula in place of the logic test:

=if List.AnyTrue(

{[Inventory Item]="Talkative Parrot",[Sold By]="Fred"}

)

then "Meets Criteria!"

else "No Match"

Note: Remember that you need to separate the criteria with commas and then surround the entire list of criteria with curly braces because the List.AnyTrue() function requires a list as a parameter.

Upon clicking OK, you can see that the formula returns your message in any case where the sales item equals Talkative Parrot or the Sold By field holds Fred:

Figure 289 Replicating Excel’s OR() function using List.AnyTrue().

Since the function returns true if any of the criteria are true, any instances where Fred sold a Talkative Parrot would also display the result Meets Criteria! in the Match? column.

You can now finalize this query by taking the following steps:

Filter the Match? column to only include only Meets Criteria! values
Change the query name to pqOR
Go to Home → Close and Load

Power Query returns a table containing a total of 88 rows out of the original 332.

Replicating Excel’s AND() Function

While it’s true that you can reduce records by filtering a table column-by-column, what if you only want to tag records where a condition exists in multiple columns? For this you need to replicate Excel’s AND() function.

The trick to this is essentially the same as the trick with replicating the OR() function with List.AnyTrue(), except that for AND() you need to use the List.AllTrue() function. This function returns a true value only if every logical test provided returns a true value—just like Excel’s AND() function.

Take a look at how it differs from the previous function:

Go to the Workbook Queries pane → right-click pqOR → Duplicate
Rename the query name pqAND
Select the Added Custom step → click the gear icon (to edit the formula)
Replace List.AnyTrue with List.AllTrue
Select the Filtered Rows step

As you can see, the results are quite different than the results you get by using the original List.AnyTrue() function:

Figure 290 Using the List.AllTrue() function to tag records based on multiple-column criteria.

While this example obviously filtered the data based on the results from the formula output, the great thing about this function is that you can easily tag records without filtering first. This allows you more flexibility in building more complicated logic while still preserving your original data—something that can’t be done if you filter columns to drill into the end values.

Replicating Excel’s VLOOKUP() Function

As much as some people try to avoid VLOOKUP(), it is an incredibly useful function for Excel pros. Those who love it will certainly want to replicate its functionality in Power Query at some point. However, depending on which version of VLOOKUP() you need, it can be quite tricky to implement.

VLOOKUP() Using an Exact Match

In truth, you don’t need to do anything special to emulate VLOOKUP’s exact match, as this functionality can be replicated by simply merging two tables together, as described in Chapter 9.

VLOOKUP() Using an Approximate Match

Replicating VLOOKUP()’s approximate match is a totally different case than the exact match scenario. It requires some logic to emulate those steps because you’re not trying to match records against each other but actually trying to find the closest record to our request without going over. While you won’t create the function from scratch here, you will see the function and how it works.

To get started, open Ch22 Examples\Emulating VLOOKUP.xlsx. In this file you’ll find two tables: BandingLevels and DataTable.

Figure 291 The BandingLevels table.

Figure 292 The DataTable table.

If you review the functions in columns B:D of the DataTable table, you’ll see that they contain VLOOKUP() functions, as shown in the column headers. Each column is looking up the value shown in column A for that row against the BandingLevels table. Columns B and D are returning the value from column 2 of the BandingLevels table, and column C is returning the value from the Alt Band column of the BandingLevels table.

In addition, notice that columns B and C are returning approximate matches because the fourth parameter has either been set to True or omitted. Column D, however, is asking for an exact match (as the fourth parameter has been set to False), resulting in all records returning #N/A except for the very last one.

You should set up the Power Query function we need now, and then you’ll see how it replicates Excel’s version of VLOOKUP():

Using Windows Explorer go to Ch22 Examples\pqVLOOKUP.txt
Open the file and copy the entire contents of the file
Return to Excel
Create a new query → From Other Sources → Blank Query
Open the Advanced Editor → select all the code in the window
Paste the contents of the text file (replacing all of the existing code)
Click Done
Rename the function pqVLOOKUP
Go to Home → Close & Load

With the function created, you need a pointer to the BandingLevels table:

Select any cell in the BandingLevels table → create a new query → From Table
Go to Home → Close & Load To… → Only Create Connection

We are now ready to see how it works. Pull in the DataTable, and remove all of the Excel versions of the functions.

Select any cell in the DataTable table → create a new query → From Table
Right-click the Values column → Remove Other Columns

You are now down to a single column of data:

Figure 293 Ready to try the pqVLOOKUP function.

To see if the pqVLOOKUP function works for you, you can try to replicate the following formula:

=VLOOKUP([Values],BandingLevels,2,true)

To do that, you can take the following steps:

Go to Add Column → Add Custom Column

Name the column 2,True

Use the following formula:
=pqVLOOKUP([Values],BandingLevels,2,true)

The results are identical to what Excel would show:

Figure 294 Replicating VLOOKUP() with an explicitly defined approximate match.

This looks good. Now try leaving the ,true off the end and returning the third column from the lookup table instead of the second:

Go to Add Column → Add Custom Column
Name the column 3,default
Use the following formula:
=pqVLOOKUP([Values],BandingLevels,3)

The results are again identical to what Excel would show:

Figure 295 Replicating VLOOKUP() with an implicit approximate match.

Try one more. What if you wanted to define an exact match against the second column of the lookup table? To do this:

Go to Add Column → Add Custom Column
Name the column 2,false
Use the following formula:
=pqVLOOKUP([Values],BandingLevels,2,false)

Once again, the results are spot on with what Excel delivers:

Figure 296 Replicating VLOOKUP() with an exact match.

Warning: Even though you can use this function to emulate VLOOKUP()’s exact match, you shouldn’t. The reason is that you can accomplish an exact match effect by merging tables together—a method that will be much faster. If you need the approximate match functionality, however, this is a viable method.

Finalize the quer y:

Go to Home → Close & Load

At this point, you should be aware of one minor difference between Excel’s VLOOKUP() and the pqVLOOKUP function: the #N/A value returned by pqVLOOKUP is actually text, not a true error, as you can see below.

Figure 297 pqVLOOKUP’s #N/A “errors” are actually text.

Returning text is as close as you could get when returning error, as there is no way to output a true #N/A error in Power Query.

Understanding the pqVLOOKUP Function

So how does the pqVLOOKUP function work? Take a look at the code:

(lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>

let

/*Provide optional match if user didn’t */

matchtype =

if approximate_match = null

then true

else approximate_match,

/*Get name of return column */

Cols = Table.ColumnNames(table_array),

ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null,

null, ExtraValues.Error),

ColName_match = Record.Field(ColTable{0},"Column1"),

ColName_return = Record.Field(ColTable{col_index_number - 1},

"Column1"),

/*Find closest match */

SortData = Table.Sort(table_array,

{{ColName_match, Order.Descending}}),

RenameLookupCol =

Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),

RemoveExcess = Table.SelectRows(

RenameLookupCol, each [Lookup] <= lookup_value),

ClosestMatch=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */

ClosestReturn=

if Table.IsEmpty(RemoveExcess)=true

then "#N/A"

else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */

Return =

if matchtype=true

then ClosestReturn

else

if lookup_value = ClosestMatch

then ClosestReturn

else "#N/A"

in Return

The code is fairly long and complex, and it uses a variety of tricks, but the basic methodology is this:

1. Pull in the data table.

2. Sort it descending by the first column.

3. Remove all records greater than the value being searched for.

4. Return the value in the requested column for the first remaining record unless an exact match was specified.

5. If an exact match was specified, test to see if the return is a match. If it is, return the value. If it is not, return #N/A.

Note the following in the code:

Each of the variables in the parameters has an explicit data type declared. This is to prevent the user from accidentally providing a table where a numeric column number is required.
The approximate_match variable is defined as optional, meaning the user can ignore it.
The matchtype variable tests to see if a match type was specified. If a match type was specified, it is assigned to the matchtype variable, but if not (approximate_match is null), a value of true will be assigned.
The name of the column to be returned is pulled by reviewing the column headers of the table, splitting them into a list of records, and pulling out the record whose index matches the requested column (less 1 to adjust to base 0).
The data is sorted in descending order, based on the column to be searched. All records greater than the requested value are removed (by selecting all rows where the value is less than or equal to the value being searched for).
If no rows remain, a #N/A result is then stored, but if there are rows, the first record in the lookup column is stored. This result can later be checked to see if it matches the record being searched for (which is important for the exact match scenario).
The approximate match value is then (always) calculated, even if an exact match was requested. If no rows are present in the data set, a #N/A result is stored; otherwise, the closest value is pulled from the return column.
The final test checks the type of match requested. If it is an approximate match, then the closest match is always returned (which may be #N/A). If, however, the match type was an exact match, the code will return #N/A instead of the closest match unless the lookup column’s value matched the value being sought exactly.

It should go without saying that this is not a function you’ll knock out in a few minutes. It is long and complicated, and it took several hours of development and debugging in order to get it correct. It is, however, a fairly robust function in the way it works, and it showcases how to build complex functions using Power Query.

Replicating Power Pivot’s SWITCH() Function

Power Pivot has a function called SWITCH() that allows you to perform multi-condition logic by declaring a table of index values and results and then passing in a variable index value. The function then looks up the provided index value and returns the matching result. This is easier to maintain than several levels of nested IF() statements, so it sometimes makes sense to replicate the SWITCH() function in Power Query.

The syntax for this function in Power Pivot is as follows:

=SWITCH(expression,value_1,result_1,[value_2,result_2],…,[Else])

One example of where this can be really useful is when breaking down encoded patterns like customer billing codes, where each character represents something specific. Take a code like the MP010450SP, where the ninth character could be one of the following:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted,

I = Inactive, L = Social, M = Medical, U = Regular

To break this apart in Excel, you could build a function with many nested IF statements and build a VLOOKUP() based on the letter options. In Power Pivot, though, it’s much easier with the SWITCH() function, as follows:

=SWITCH([Column],"E","Employee","S","Yacht Club",

"N","Non-Taxable","R","Restricted","I","Inactive",

"L","Social","M","Medical","U","Regular","Undefined")

Note: There are several ways to accomplish this goal. You could, for example, extract just the ninth letter and merge the results against a table. The purpose of this section is to give you yet another alternative.

Building a Power Query SWITCH() Function

Building the function isn’t overly difficult once you know the basic structure. Here’s how you get started:

Open Ch22 Examples\Emulating SWITCH.xlsx
Create a new query → From Other Sources → Blank Query
Name the query fnSWITCH
Go to Home → Advanced Editor
Enter the M code shown below:
(input) =>
let
values = {
{result_1, return_value_1},
{input, "Undefined"}
},
Result = List.First(List.Select(values, each _{0}=input)){1}
in
Result

This code is the basic framework for any SWITCH() function. These are the key parts to recognize here:

result_1 is the first of the possibilities that you may pass to the function
return_value_1 is the value that you’d like to return if the first value is result_1
If you need more values, you just insert another comma after the {result_1, return_value_1} section and put in a {result_2, return_value_2} section
You can keep adding as many values as you need
The input value in the list will return the text Undefined if the value you pass isn’t in your provided list of options (it’s the Else portion of the SWITCH() statement)

Using this structure, you can modify the fnSWITCH function for your scenario as follows:

(input) =>

let

values = {

{"E", "Employee"},

{"S", "SCYC"},

{"N", "Non-Taxable"},

{"R", "Restricted"},

{"I", "Inactive"},

{"L", "Social"},

{"M", "Medical"},

{"U", "Regular"},

{input, "Undefined"}

Result = List.First(List.Select(values, each _{0}=input)){1}

Result

The changes you made here were simply to replace value_1 with “E” and return_value_1 with “Employee” and then to add more list pairs of potential inputs and desired values to return. Note that you’re not restricted to looking up single characters. You can look up values or longer text strings just as easily; just make sure that your options are always entered in pairs between curly braces and have a comma at the end of the line.

When you’re done making the modifications:

Click Done
Go to Home –> Close & Load

Using the fnSWITCH Function

Now that the fnSWITCH() function has been created, you can use it to extract the billing type from each customer record in this file.

Select any cell in the table on the Customers worksheet → create a new query → From Table

The data loads into Power Query as follows:

Figure 298 The raw data tab le.

The fnSWITCH() function is designed to convert the ninth character from the BillingCode into the corresponding customer type. In order to use it, you need to extract that character:

Go to Add Column → Add Custom Column
Name the column Customer Type
Enter the following formula:
=fnSWITCH(Text.Range([BillingCode],8,1))

Note: Remember that you need to start at character 8 in order to get the ninth character because Power Query uses base 0 indexing for this parameter of the Text.Range function, as shown in Chapter 17.

The results are perfect:

Figure 299 Results of the fnSWITCH function.

You’ll find that all the sample codes in the data work just fine and that nothing comes back as undefined. Follow these steps to see how the query reacts to different items:

Change the query name to Billing
Go to Home → Close & Load
Change the second-to-last letter of any billing code to something else

When you refresh the table, it evaluates the new character and returns the appropriate result. Here you can see what happens when the first record’s billing code is updated to MP010450XP:

Figure 300 X is not a valid character for the billing code.

Note that the function as written above is case sensitive, meaning that the code MP010450uP would also return Undefined, even though MP010450UP is a valid code. In the case of the original source of the data, this was entirely expected because valid codes are made up of uppercase letters.

If you wanted to accept either uppercase or lowercase, you would need to modify the Billing query (not the function) and force the results of the Text.Range function to uppercase:

Right-click the Billing query in the Workboook Queries pane → Edit
Select the Added Custom step → click the gear icon
Update the formula to read:
=fnSWITCH(Text.Upper(Text.Range([BillingCode],8,1)))
Click OK
Go to Home → Close & Load

As you can see below, this adjustment allows you to pass lowercase values into the function and still get a positive result:

Figure 301 The customer type is calculated consistently for uppercase and lowercase letters.