13- Introducing the FILTER() Function, and Disconnected Tables Continued
When to Use FILTER()
Simple rule: use FILTER() when, in a <filter> argument to CALCULATE(), you need to perform a more complex test than “<column> equals <fixed value>” or “<column> greater than <fixed value>,” etc.
Examples of <filter> tests that require you to use FILTER():
• <column> = <measure>
• <column> = <formula>
• <column> = <column>
• <measure> = <measure>
• <measure> = <formula>
• <measure> = <fixed value>
I used “=” in all of the above, but the other comparison operators (<, >, <=, >=, <>) are all implied.
You can also use FILTER() as the <table> argument to functions like COUNTROWS() and SUMX() in order to have those functions operate on a subset of the table rather than all rows in the current filter context. This chapter will focus on its primary usage however, which is as a <filter> argument to CALCULATE().
FILTER() Syntax
FILTER(<table>, <single “rich” filter>)
Why is FILTER() Necessary?
I mean, why can’t we just slap any old complex test expression into the <filter> argument of CALCULATE()? Why the extra hassle?
I have made my peace with having to use FILTER(). I quite like it. Here’s why.
It’s All About Performance (Speed of Formula Evaluation)
Short answer:
1. Formulas written using just CALCULATE() are always going to be fast, because CALCULATE() has built-in “safeties” that prevent you from writing a slow formula. “Raw” CALCULATE() refuses richer <filter> tests because those can be slow if used carelessly.
2. FILTER() removes those safeties and therefore gives you a mental trigger to be more careful – you can still write fast formulas using FILTER(), but if you are careless you can write something that is slow.
I’d like to introduce three terms that I often use when I talk about formula speed:
Performance: the practice of keeping your reports fast for your users. For instance, if someone clicks a slicer and it takes 30 seconds for the pivot to update, I would refer to that as “poor performance.” If it responds instantly, I might call that “excellent performance,” or I might say that the pivot “performs well.”
Response time: the amount of time it takes a report to respond to a user action and display the updated results. In the example above, I described a “response time” of 30 seconds as poor. Generally we try to keep response times to 3 seconds or less.
Expensive: an operation is said to be “expensive” if it consumes a lot of time and therefore impacts performance/response time. For instance, above I could have described <column> = <static value> tests as “inexpensive” for the DAX engine, and richer comparisons like <column> = <measure> as “potentially expensive.”
I will say more about these concepts in a subsequent chapter dedicated to Performance. For now this is enough.
Anyway, the important thing to understand is that FILTER() removes the safeties and lets you perform an incredible variety of filter tests, but you have to be careful when you use it.
How to Use FILTER() Carefully
You are going to love this, because the vast majority of “being careful” comes down to two simple rules:
1. When you use FILTER(), use it against Lookup tables, never against Data tables.
2. Never use FILTER() when a “raw” CALCULATE() will get the job done.
Pretty simple. For those of you who want to know more about the “why” behind that first rule, I am saving that for the chapter on Performance.
Applying FILTER() in the “Thresholds” Example
Revisiting the Successful Formula
Let’s return to our “thresholds” example from the previous chapter, where we wanted to only include products whose Products[ListPrice] column was >= our [MinListThreshold] measure:
Figure 158 Back to the “threshold” example: only including products whose ListPrice is >= the selection on the slicer.
The formula I ended up using for the measure on the right was:
[Product Sales Above Selected List Price] =
CALCULATE([Total Sales],
FILTER(Products,
Products[ListPrice]>=[MinListThreshold]
)
)
Am I following the rules for using FILTER() carefully? Let’s check.
1. Products is a Lookup table, not a Data table (like Sales). YES on rule #1.
2. I am comparing Products[ListPrice] to a measure, which cannot be done in raw CALCULATE(). YES on rule #2.
OK, so now the [Products Above Selected List Price] measure – that gives me a count of products that pass the [MinListThreshold] test, and it’s executed the same way as the measure above.
First though, I need a base measure that just counts products:
[Product Count] =
COUNTROWS(Products)
Note how I assigned that measure to the Products table, since it counts rows in that table:
Figure 159 [Product Count] measure is assigned to the Products table since its arithmetic operates on the Product table (best practice)
Now I can create [Products Above Selected List Price] using that new base measure:
[Products Above Selected List Price] =
CALCULATE([Product Count],
FILTER(Products,
Products[ListPrice]>=[MinListThreshold]
)
)
I could have skipped the separate step of defining the [Product Count] measure, and just specified COUNTROWS(Products) as the first argument to CALCULATE(). But [Product Count] is likely to be a useful measure elsewhere too, and remember, it’s a best practice to build measures on top of other measures, so that future changes to your model can be made in a single place.
Verifying That the Measures Work
Well the measures are returning some numbers, but are they the right numbers? Let’s investigate a little bit (I won’t do this for every measure but I think it’s good to show a few validation approaches).
Figure 160 How do we know the measures are correct?
The first thing to do is just change slicer selection and make sure that it has an impact. Let’s try $20 as our minimum list price:
Figure 161 We would expect both measures to return larger numbers with $20 as the selected threshold, and they both do
A good sign. But let’s make sure that the measures are truly counting the right products. Let’s put Products[ProductKey] on Rows, and set the slicer to $3,000 since that should only show us a small number of products:
Figure 162 Only 13 products show up – another good sign
But we really need to see the ListPrice. Let’s put that on Rows too:
Figure 163 OK, all of the products showing up are indeed priced over $3k
Lastly, over in the PowerPivot window, let’s filter the Products table to ListPrice>=3000:
Figure 164 This should result in 13 rows, matching the grand total from the pivot…
Figure 165 …and it does
OK, this last step probably would be the first thing I would check. But I wanted to show that both the PowerPivot window and the pivot itself are important tools for validating/debugging. I use both.
Since both measures use the same FILTER() logic, once we validate this one, we can be pretty confident that the other is working too.
So there you have it: a simple threshold example driven by slicer, and it works.
This Could Not Be Done with Relationships
Just to reinforce: the disconnected table approach was absolutely necessary for this threshold example. A given product, like a $75 shirt, belongs to many different price ranges – it is included in the $0, $5, $10$, $15, $20, and $50 price ranges. (In other words, the price ranges overlap with each other).
To see what I mean, imagine creating a column, in the Products table, to form the basis of the relationship. What would that column look like? If you committed to going down this road, you’d ultimately end up with multiple rows for each product (one for each price range that product “belongs to”). That would therefore require a “many to many” relationship with the slicer table (and with the Sales table), which PowerPivot does not support.
Tip: Measures Based on a Shared Pattern – Create via Copy/Paste
Notice how the two FILTER() measures above are identical except for their base measure? One uses [Total Sales] as the first argument to CALCULATE() and the other uses [Product Count], but otherwise the formulas are the same.
You will do this all the time. And there’s a quick way to do it:
1. You write the first measure. In this case, the [Total Sales] version.
2. Then you right click that measure in the field list (or in the Values dropzone) and choose edit:
Figure 166 Edit your first measure
3. Copy the existing formula:
Figure 167 Copy the existing formula, which is conveniently selected already when you edit an existing measure
4. Cancel out of the editor, create a new measure, and then paste the formula:
Figure 168 Paste the original measure’s formula
5. Lastly, just replace the base measure reference ([Total Sales]) with the different desired measure ([Product Count]):
Figure 169 The whole process takes just a few seconds
You would discover this “trick” on your own pretty quickly (if you haven’t already), but I do it so often that I wanted to make absolutely sure you are aware of it.
More Variations on Disconnected Tables
Upper and Lower Bound Thresholds
Let’s take that Product[ListPrice] threshold example and extend it. Here’s a new table:
Figure 170 A new disconnected table, but this time with min and max price columns
Note again that the price tiers overlap, meaning a given product can belong to more than one, thus making a relationship impossible.
Now I’m going to define two “harvester” measures on that table:
[PriceTierMin] =
MIN(PriceTiers[MinPrice])
and
[PriceTierMax] =
MAX(PriceTiers[MaxPrice])
Now I’m going to use RangeName column as my slicer:
Figure 171 You can use a label column from a disconnected table on your pivot. Both “harvester” measures again capture the user’s selection, but this time based on columns that the user does not see.
Fixing the Sort Order on the Slicer: The “Sort By Column” Feature
In our first threshold example, we used a numerical field on the slicer, which naturally sorted from smallest to largest. In this label example however, “Budget” alphabetically precedes “Counter,” and out sort order is misleading as a result.
In PowerPivot v1, we had to “fix” this by prepending strings for correct sorting, yielding slicers with values like “1 – Counter” and “2 – Budget” on them. Yuck.
In PowerPivot v2 however, we have a much better fix: the Sort By Column feature.
First we need a single numerical (or text) column that sorts the table in the proper order.
Doesn’t matter how you go about creating this column – as long as you create one (or already have one), it works.
In this case I will use a new calculated column:
Figure 172 Creating a column that will sort properly (in this case, my column will be the midpoint of each price tier)
Now I select the RangeName column and click the Sort by Column button on the ribbon:
Figure 173 Select label column, click Sort by Column
In the dialog, set it to sort by the new MidPt column:
Figure 174 Set the “sort by” column to the MidPt column
Flip back over to Excel:
Figure 175 Changing the Sort By Column triggers the “refresh” prompt
Click the refresh button and the slicer sort order is fixed:
Figure 176 Refresh, and the Sort By Column feature “kicks in” – the slicer is now properly sorted
Completing the Min/Max Threshold
Now, just like in the simple threshold example, we need versions of [Product Count] and [Total Sales] measures that respect the user’s selection on the slicer:
[ProductCount MinMaxTier] =
CALCULATE([Product Count],
FILTER(
Products, Products[ListPrice]>=[PriceTierMin] &&
Products[ListPrice]<=[PriceTierMax]
)
)
Since FILTER() only supports a single <rich filter> expression, I use the && operator – a row of Products table needs to meet both of those comparison tests in order to be included.
But since CALCULATE() itself does support multiple <filter> arguments, I could have done this without the && operator by using two FILTER() functions:
CALCULATE(<measure>, FILTER(…), FILTER(…) )
That would yield the same results. I use the && approach whenever I can though, because it is less expensive (in terms of performance) to do so. More on this later.
And then the [Total Sales] version, again employing the “copy/paste/change base measure” trick:
[Total Sales MinMaxTier] =
CALCULATE([Total Sales],
FILTER(
Products, Products[ListPrice]>=[PriceTierMin] &&
Products[ListPrice]<=[PriceTierMax]
)
)
Now I’ll put both measures on the pivot, and remove the harvester measures:
Figure 177 It’s alive! :-)
A Way to Visualize Disconnected Tables
Disconnected tables, by definition, have no relationships to other tables in the model. If we look at diagram view, we see that the PriceTiers table, for instance, is an island like we expect:
Figure 178 PriceTiers Disconnected Table has no Relationship Arrows (as expected)
But when we use the “MinMaxTier” measures that we wrote above, the PriceTiers table does act a lot like a Lookup table, since the PriceTiers filter context (such as user selections on the slicer) very much impacts the measure calculations and results.
So I often like to say that disconnected tables have a “dotted line” relationship with the tables that contain the corresponding FILTER() measures. In your head, you might think of it like this:
Figure 179 In your head, you can imagine “dotted line” relationships
Disconnected tables only impact the measures that are specifically written to “pay attention” to them – so the PriceTiers table impacts [ProductCount MinMaxTier] and [Total Sales MinMaxTier], but no other measures in the Products and Sales tables
Putting This Chapter in Perspective
A couple things I want to emphasize before moving on:
• We are not done with FILTER(). There’s more to learn about FILTER() than what I have covered here, but I want to come back to those points later. It is not essential to learn the rest yet, and I am sticking to my philosophy of introducing things to you in the most learnable/useful order.
• You will not use disconnected tables most of the time. 90% of the pivots I create do not use disconnected tables. The other 10% of the time, they are very, very useful. I introduced disconnected tables in these last two chapters in large part because they are a great introduction to the FILTER() function (and also because they are a useful technique).