Chapter 3: Conquering Common Calculated Column Conundrums
For years, we’ve been saying that calculated columns aren’t a “strength” of Power Pivot as compared to normal Excel. Sure, the ability to write a calculated column in a table of more than 1 million rows is certainly a distinction, and we’ve even written calculated columns on tables with more than
100 million
rows. Hey, that’s pretty hot.
But if you set that one new capability aside, calc columns in Power Pivot do not enable you to
create
anything that you could not create in normal Excel. In the end, a calc column in Power Pivot produces a column full of values—just like a calc column in Excel.
By contrast, measures (called calculated fields in Excel 2013) absolutely
do
empower you to create dramatically more flexible, insightful, and useful results than are possible in regular Excel. When it comes to Power Pivot formulas, measures are the main attraction, and calc columns are the warm-up band.
In that light, Power Pivot calc columns often require more of an adjustment for long-time Excel users than do measures. The lack of “A1”-style references in the DAX formula language is much more noticeable in calc columns than in measures, and that sometimes leaves newcomers exasperated (“This was so easy in Excel, but I can’t figure it out in Power Pivot”).
Note
This confusion strikes at the moment when you first need to reference a value (or values) that
isn’t
in the current row. Up until that point, Power Pivot calc columns are in fact easier to write than normal Excel calc columns, thanks to the named reference syntax. So “current row” calc columns are a dream, but what we call “cross-row” calc columns require some adjustment.
|
Hey, I (Rob) have been to “cross-row confusion land.” But I’ve grown to love Power Pivot calc columns over time, even for cross-row cases, and the intent of this chapter is to help speed you to that same happy place.
An Excel Pro’s Primer on Calculated Columns
If you’re an Excel pro trying to learn to use calculated columns in Power Pivot, you may face a few hurdles that could trip you up. There are 400 different functions in Excel, and only one-fifth of them work in Power Pivot—but it is definitely the best one-fifth of the functions. If you want to use
MID()
,
LEFT()
,
RIGHT()
,
IF()
, and other popular functions, go right ahead…they will work as you expect.
The TEXT()/FORMAT() Oddity
Whereas some functions work the same in Power Pivot as in Excel, there are some oddities. The
TEXT()
function in Excel is useful for converting dates to month names or weekday names, and you might eventually want to use this function. But for some reason, the Power Pivot team chose to use the VBA function
FORMAT()
instead of the Excel function
TEXT()
. We like to joke that is it simply a typo—they misspelled T-E-X-T as F-O-R-M-A-T—but there are some important differences between these two functions.
If you usually use
=TEXT(A2,"MMMM")
in Excel, this correctly translates to
=FORMAT([Date],
"MMMM")
in Power Pivot. Both of these functions take a date field and return a full month name, like January.
The
FORMAT()
function accepts a few format codes that
TEXT()
does not accept. For example,
Q
returns a quarter number (from
1
to
4
), and
W
returns a week number (from
1
to
51
). Such differences make
FORMAT()
superior to
TEXT()
, but the uninitiated may not understand this.
Where is VLOOKUP()?
The number-one function for many Excel pros is VLOOKUP(). This function is not available in Power Pivot, and for good reason: in most cases, you no longer need it! Relationships allow us to use fields from table A on the same pivot as fields from table B, without having to “merge” the two tables. But in cases when you truly do need to reference another table in a calc column, Power Pivot offers you several great methods like the RELATED() function (covered in DFPP) and LOOKUPVALUE() (covered in this chapter).
You Can’t Point to a Column by Using the Arrow Keys
There is no workaround for this. Excel pros should be thankful that Rob Collie was on the Power Pivot team. But Rob is too young. Younger Excel pros might build their formulas by clicking on columns with the mouse. Others might actually type to enter formulas. But older Excel pros will often point to a column using the arrow keys. This technique dates back to Lotus 1-2-3. For example, if you start in E2 and want to build the formula =B2*C2/D2, you can use the arrow keys technique as follows: =, left arrow, left arrow, left arrow, *, left arrow, left arrow, /, left arrow, Enter. It sounds horrible to read, but it is fast to do, once you get used to the technique. In our seminars, 70% of the room uses the mouse, 10% type the formulas, and 20% use the arrow keys to build their formulas. The Power Pivot grid allows you to use the mouse or to type a formula, but it does not support the arrow keys method.
No One Ever Named Their Kid “CalculatedColumn1”
There is no workaround for this. I’ve tried every possible syntax to specify a name and the formula at the same time (ie Month:=RELATED…), and there simply is no way to prevent CalculatedColumn1. In regular Excel, you might type a heading in row 1 and then a formula in row 2. In the Power Pivot grid, you cannot enter a column heading until you’ve added the formula. Once you add the formula, Power Pivot uses the default names CalculatedColumn1, CalculatedColumn2, CalculatedColumn3, and so on. We don’t really care for these names, but they’re easy to change: After you enter the first formula, you can right-click the heading of CalculatedColumn1, choose Rename, then type the field name you want.
Subtotaling Calc Columns and the EARLIER() Function
Say that you have a very simple table like this:
Figure 3.1
And you want to add a third column that is the total for each customer:
Figure 3.2
This is the calculated column formula for that third column:
=CALCULATE(SUM(Table[Amt]),
FILTER(Table,
Table[Customer]=EARLIER(Table[Customer])
)
)
If you don’t really understand row context (described in just a bit), you can simply follow the pattern laid out here to add that third column. We suggest, for starters, that you just use the formula above as a pattern, substituting your own table and column names. Then you can move on without worrying about understanding what you’ve done. It’s fine. You have our permission and encouragement to do so, especially in the early days.
Seriously, you can just move on to the next trick in this chapter. You only need to read the rest of this section if and when you are ready for the next level of enlightenment.
Oh, you’re still here? Okay then, here is an explanation of what’s going on in that subtotal calc column: Since there are four rows in the table, the calc column formula needs to “run” four times—once to calculate each row’s resulting value (in this case,
3
,
3
,
7
, and
7
, as pictured above). Within each one of those four steps, a reference to a column will evaluate to the value of the specified column from that “current” row. So, for instance, in the second row of the table, a reference to
Table[Amt]
will return
2
. Just like you want it to.
On the off chance that this seems complex already, we want to illustrate that there is nothing complex here
yet
because this is precisely the way you
already
understand things. Now say that you write another calc column, this time with the following formula:
=CONCATENATE('Table'[Customer], 'Table'[Amt])
You get, of course, precisely what you expect:
Figure 3.3
This is because, at each one of the four “steps” in the calc column (one step per row), a reference to a column is essentially limited to the current row. It would be quite bizarre if, in the first row, a reference to the
[Amt]
column evaluated to
2
! Of
course
you want it to evaluate to
1
.
As a calc column evaluates, once for each row in the table, there’s a fancy DAX term for the concept of “current row at each step.” It’s called
row context
. If we use this fancy new term to describe the simple concepts conveyed earlier, it goes something like this: “When evaluating a calc column in a four-row table, the DAX engine steps through four different row contexts, evaluating the formula once for each row context.” So far so good, right? Take a deep breath and slowly let it out…
The problem is that the formula for the TotalForCustomer column contains a
FILTER()
function, and
FILTER()
is quite sneaky. Just as the calc column formula itself is evaluated once per each row context,
FILTER('Table',…)
also steps through each row in
Table
, one at a time, much like the calc column formula itself!
One of the key characteristics of
FILTER()
is that it ignores the row context of the calc column and operates on the
entire
table. This makes it simultaneously confusing (at first) and awesome (once you grasp it).
Since
Table
has four rows, every time
FILTER('Table',…)
is evaluated,
FILTER()
has its own four steps to process. At each step, it looks at a row and says, “Hey, should I keep this row or filter it out?”
All in all, then, there are 16 steps involved in evaluating the calc column:
Figure 3.4
In this illustration, CC1 denotes “the first row context evaluated by the calculated column,” and F1 means “the first row context evaluated by
FILTER()
.”
You might want to imagine the “names” of each of the 16 row contexts examined by this calc column formula. They would be something like CC1/F1, CC1/F2, etc., all the way up through CC4/F3 and CC4/F4. (These names are completely unofficial. There are no names for individual row contexts in DAX. We are just using this as an explanatory trick.)
So there are four “calc column row contexts,” and tucked inside
each
of those four, there are
also
four “
FILTER()
row contexts”—an
outer
loop of four row contexts and an
inner
loop of four row contexts.
Now, within a
FILTER()
function, a “naked” reference to a column will evaluate according to the row context of the
FILTER()
function itself, not the calc column row context. Get it? Inside
FILTER()
, the reference to
'Table'[Customer]
will be evaluated according to that “inner” row context of F1, F2, etc.—completely ignoring the “outer” row context of CC1, etc.! (Remember that
FILTER()
has to look at every row in the table to see if it should be kept or discarded.) So even when the formula is evaluating in the CC2 “outer” row context (where
[Customer]="a"
), the “naked” reference to
'Table'[Customer]
inside
FILTER()
will sometimes evaluate to
"b"
—in contexts CC2/F3 and CC2/F4.
Within
FILTER()
, how do you “get back” to the outer loop and see what the value of
[Customer]
is in CC2? You use the
EARLIER()
function!
EARLIER()
basically says “pop out of the current row context and look at the outer (earlier) row context instead, when evaluating the value of the specified column.” You will therefore often hear us refer to
EARLIER()
as the
CURRENTROW()
function. This allows you, within
FILTER()
, to examine every row in the table and compare it to the current row of the calc column. That is, you can to focus on a given row of a table and then go look at
all
the rows in the table, saying to each row “Hey, are you similar to me? If so, I want to count you.”
And yes, it is possible (but
very
rare) to end up in situations where you have
more
than two nested loops of row contexts. That’s why
EARLIER()
has the optional second argument
Number
—because you can specify how many loops “outward” you want to step. When you omit that argument, it defaults to
1
. This is also why there is an
EARLIEST()
function: Its purpose is to get back to the
absolute
outermost row context, no matter how deeply nested you might find yourself.
Note
For perspective: At this writing, your dear authors have never encountered a need for more than one outward step and, accordingly, have never used either the
Number
argument to
EARLIER()
or
the
EARLIEST()
function. For now, we merely aspire to
encountering
such a need.
A long time ago, when I (Bill) was working on
Power Pivot for the Data Analyst
, I desperately tried to get a working example of
EARLIER()
or
EARLIEST()
. Now, several years later, we are using
EARLIER()
to essentially do a
SUMIF()
, and it looks so simple and powerful.
|
Referencing the Previous Row and Similar Calcs
This is a very common question in the world of calc columns: “How do I reference the row immediately ‘above’ this one?”
For example, how do you create a calc column like the one pictured here?
Figure 3.5
As it happens, the answer is
very
similar to the subtotal example we just looked at:
=CALCULATE(SUM(LookingBack[Value]),
FILTER(LookingBack,
LookingBack[Date]=
EARLIER(LookingBack[Date])-1
)
)
In fact, the only difference is the
-1
at the end of the third line!
Note the following here:
-
• Sort order of the table has
no
impact on this formula. This formula finds the row(s) whose date column is one less than the current row’s date and sums them up.
-
• If you want to grab the following row(s) rather than the previous row(s), change the
-1
to
+1
.
-
• Similarly, if you want to go back a week, change the
-1
to
-7
.
-
• This technique works with columns other than Date. If you have a column like MonthNumber, subtracting one from that will work, too.
-
• Many times when you are writing a calc column like this, you’d be better off writing a measure instead, but a full discussion of that is beyond the scope of this book. (You will see examples of such measures later in this chapter and in Chapter 4, however.)
-
• There
are
definitely cases where such a calc column is required, hence this technique’s inclusion here.
Referencing Rows “Within Range” of the Current Row
As you read the last trick, you might have wondered why you used a
SUM()
when you just wanted to grab the previous row. Well, Power Pivot does not trust you that it’s only going to find a single row when you send
FILTER()
off to do its work. It always assumes it’s going to find more than one, which is why it always requires an aggregate function like
SUM()
. Of course, if it
does
find only a single row, the aggregation function you choose is often unimportant;
SUM()
,
AVERAGE()
,
MAX()
, etc. are all going to return the same result.
Let’s look at a few examples. In this starting table, note that there are
multiple
rows for each date:
Figure 3.6
Here you see a sum of yesterday’s values:
Figure 3.7
The formula for this table is
precisely
the same as in the previous trick, just updated to the new table name:
=CALCULATE(SUM(MultiValue[Value]),
FILTER(MultiValue,
MultiValue[Date]=
EARLIER(MultiValue[Date])-1
)
)
And here is an example of a running total that includes the current row’s date:
Figure 3.8
Here’s its formula:
=CALCULATE(SUM(MultiValue[Value]),
FILTER(MultiValue,
MultiValue[Date]<=
EARLIER(MultiValue[Date])
)
)
Note
To exclude the current row’s date, just change
<=
to
<
.
|
Continuing in this vein, here’s a three-day moving sum of the current date, the previous date, and the following date:
Figure 3.9
And here’s its formula:
=CALCULATE(SUM(MultiValue[Value]),
FILTER(MultiValue,
MultiValue[Date]>=
EARLIER(MultiValue[Date])-1 &&
MultiValue[Date]<=
EARLIER(MultiValue[Date])+1
)
)
Note
Of course, you can change
SUM()
to
MAX()
, for instance, and find the amount of the single “best” day in this three-day window. And again,
many
times when you are writing a calc column like this, you’d be better off writing a measure instead—but not
always
.
|
Cross-Table Lookups: LOOKUPVALUE() and Other Techniques
All the examples so far in this chapter involve looking across rows in the same table. But what if you want to grab values from other tables? Let’s look at an example of such a problem. First, you have a Sales table:
Figure 3.10
Note
In the Sales table, for each pair of Date and Product, you have multiple rows.
|
You also have a Prices table:
Figure 3.11
Note
In the Prices table, for each pair of Date and Product, you have a
single
row.
|
Now, using a calc column in the Sales table, you want to grab the appropriate Unit Price amount from the Prices table.
The “simplest” answer would be to use the
RELATED()
function. And if you had a relationship established between these two tables, that would work quite well. But alas, you do not have a relationship here. Furthermore, you don’t have a column on which you could establish a relationship. In order to create a relationship, you’d have to create a new column in each table (Sales and Prices) and concatenate the Date and Product columns.
Then
you could relate the two tables via those “Frankencolumns.” That’s a lot of busy work. Plus, calc columns can add a lot of weight to a file (bloating file size), especially when the tables involved are large.
So here’s a formula that allows you to grab the value from the Unit Price column without first establishing a relationship:
=LOOKUPVALUE(Prices[Unit Price], Prices[Date], Sales[Date], Prices[Product], Sales[Product])
It’s very simple.
LOOKUPVALUE()
’s first argument asks what column’s value you want to fetch. Then you specify pairs of columns—first the column in the “fetch
from
” table and the matching column in the “fetch
into
” table. You can have as many pairs as you like, practically speaking. And it works:
Figure 3.12
What Happens if LOOKUPVALUE() Finds More Than One Match?
Hey, great question. To see this in action, introduce another row into the Prices table, for Product A on January 1. This time, give it the same price ($1.50) as in the original January 1/Product A row:
Figure 3.13
Next, check in on the
LOOKUPVALUE()
calc column:
Figure 3.14
Hey, it still works! Cool, so it detected that, even though it found multiple rows, they all returned the same value ($1.50), and it didn’t raise an error. We salute such robustness!
If you then go back and change that “new” row to be $1.60, which creates a conflict, you do indeed get an error:
Figure 3.15
What if LOOKUPVALUE() Doesn’t Address Your Need?
What if that prior example—with two Prices table rows for the same Day/Product pair, with different price values—is completely legitimate? What do you do then? Well, it’s back to our old friends
CALCULATE()
and
FILTER()
:
=CALCULATE(AVERAGE(Prices[Unit Price]),
FILTER(Prices,
Prices[Date]=Sales[Date] &&
Prices[Product]=Sales[Product]
)
)
This works like a charm:
Figure 3.16
Note
You are
not
required to use the
AVERAGE()
function as the first argument to
CALCULATE()
. You can use
MIN()
,
MAX()
,
SUM()
, or pretty much any other aggregation function you want. And in “multi-match” cases, your choice of aggregation function is, of course,
very
important.
|
Now let’s take a look at some similar but more advanced examples that arise from time to time.
Lookups Based on Start and End Dates
Common conundrums is the theme of this chapter, and this problem definitely qualifies. Here’s a Schedule table with lots of dates in it:
Figure 3.17
And then here’s a different kind of pricing table. This Prices2 table, rather than specifying the effective price on every single day, uses Start and End columns to indicate a time frame in which a given price is active:
Figure 3.18
To add an effective price column to the Schedule table, use this formula:
=CALCULATE(MAX(Prices2[Price]),
FILTER(Prices2,
Prices2[Start]<= Schedule[Date] &&
Prices2[End]>=Schedule[Date]
)
)
Here’s the result:
Figure 3.19
The price changing on February 8 is precisely correct. Your work here is done.
Lookups Based on Start Date Only
Now say that you have this same problem, but now you have only a Start column, and no End column. For example, say that you have the Prices3 table pictured here, which is really just the Prices2 table with the End column removed:
Figure 3.20
Now what? Basically, you want to find the “most recent” row from this table that has happened on or before the current date. This is tricky. One approach would be to re-create the End column with a calc column, like this, in Prices3:
=CALCULATE(MIN(Prices3[Start]),
FILTER(Prices3,
Prices3[Start]>
EARLIER(Prices3[Start]))
)-1
Here’s the result:
Figure 3.21
Now, re-equipped with the End column, you can apply the previous section’s technique. But for completeness, here you’ll solve the problem without re-creating the End column. Back in Schedule, you add the following calc column:
=CALCULATE(MAX(Prices3[Price]),
TOPN(1,FILTER(Prices3,
Prices3[Start]<=Schedule[Date]),
Prices3[Start]
)
)
This calc column does not reference the End column at all!
FILTER()
removes all Prices3 rows that happened after the current
Schedule[Date]
, and then
TOPN()
grabs the latest of all those rows! And yes, it returns the same results as the previous examples:
Figure 3.22
Note
This use of
FILTER()
nested inside
TOPN()
might be blowing your mind—and we get that! “Normally” we use the
FILTER()
function as a direct argument to
CALCULATE()
, but here we are using it in a place where
TOPN()
asks for a table. There’s a moment when you realize that functions like
FILTER()
(and
TOPN()
and
DATESYTD()
and many others) serve two different yet related purposes in DAX: They have a filter purpose (when used as a direct argument to
CALCULATE()
) and a table purpose (when used as the table argument to another function).
The moment you realize this, we challenge you to resist giggling with glee (or cackling maniacally). It’s a moment when you feel nearly unlimited power—and not only regarding calc columns. In fact, it’s even more useful with measures! You will know this moment when you experience it; trust us. It is unmistakable. Everyone in the office turns and looks at you, and you don’t care.
|
Basically, when you want to fetch cross-table values,
LOOKUPVALUE()
is a great place to start (assuming that using
RELATED()
isn’t possible or practical). Failing that, you can move on to
CALCULATE()
and
FILTER()
, and maybe even things like
CALCULATE(.., TOPN(…, FILTER(…)))
.
Note
This is a topic where “moving pictures” are worth a thousand pictures, and thus a million words. We’re going to skip those million words here and move on. If you want more coverage of this topic, go to PowerPivotPro University, where you’ll find this and many other things illustrated and explained using animated visuals.
|
Totaling Data Table Values in Lookup Tables
Here’s another common conundrum. Say that you have the same Sales table from earlier in this chapter, and it’s not related to a Products table. Sales is a data table, and Products is a lookup table.
Note
For further explanation of data versus lookup tables, see
DAX Formulas for Power Pivot
or PowerPivotPro University.
|
As per our normal convention, the lookup table is arranged vertically above the data table in diagram view:
Figure 3.23
And here are those two tables in data view. First you see Products, which is very simple, with just two rows:
Figure 3.24
And here again for reference is the Sales table:
Figure 3.25
The desired result is a Total Quantity Sold column in the Products table, like this:
Figure 3.26
There are a number of ways to fail at this and a number of ways to succeed. Here’s one successful formula:
=CALCULATE(SUM(Sales[Quantity]))
What? A
CALCULATE()
with only one argument? What the heck? We could burn a bunch of pages on that. Or we could just tell you that the
CALCULATE
function takes a row context and promotes it to become a filter context. That is the “right” explanation, but again, this is one of those places where we must choose between a million words and animations. Or, pragmatically speaking, you can just take this as a pattern and not worry about the inner workings.
It turns out, by the way, that in this case, a
SUM()
without a
CALCULATE()
may also be useful:
=SUM(Sales[Quantity])
Here’s the result:
Figure 3.27
This is pretty nifty. Maybe you want to use this as a denominator sometimes. Also, this “raw sum” formula does
not
rely on a relationship being in place. It
always
sums up
all
the rows in the other table. (In fact, we often find it useful to take
MAX()
of the date column in a data table; see the section “Common Calendar Conundrums,” later in this chapter.)
For completeness, here is another way to write the first (successful) formula:
=SUMX(RELATEDTABLE(Sales), Sales[Quantity])
RELATEDTABLE()
is essentially the “opposite” of
RELATED()
, since it traverses the relationship in the opposite direction. Whereas
RELATED()
always fetches a single lookup table value into a data table,
RELATEDTABLE()
fetches all related rows from a data table to make them available for an aggregation in a lookup table.
Notice that the formula above returns the same results as the first formula:
Figure 3.28
It’s important to understand that subtotal calculations like this in a lookup table are
very
often better implemented as measures. The most common “valid” reason to do this in a calc column is to create a “grouping” column—like High/Low/Medium—that groups products (or other entities) into categories. You can then place the grouping column in the Rows, Columns, Slicers, or Report Filters drop zone of your Pivot and view results accordingly.
CONTAINSX: Finding Matching Values in Two Tables
Rather than look up or aggregate values across tables, you may at some point want to find whether a value in one table has a matching value in another table. For example, here you want to flag rows in the Companies table on the left when they contain a keyword from the MatchList table on the right:
Figure 3.29
Finding matching values in two tables is a pretty common need. Sometimes you can do this very quickly in Power Pivot by relating the two tables and then writing a
=RELATED
calculated column in one table to see if it has a matching value in another table. But there are times when this doesn’t work—for instance, when you’re not looking for an exact match but a “contains” match.
Rolling Your Own X Functions
Perhaps the only thing that makes me (Rob) happier than a new
X
function is “inventing” a new one (such as
PRODUCTX()
; see
http://ppvt.pro/PRODUCTX
).
Note
I (Bill) still badly want a
CONCATENATEX()
or perhaps
ROMANX()
function!
|
The other day, I was looking at a Power Pivot model and thinking “Gee, it sure would be nice to have a
CONTAINSX()
.” Turns out you can “make” your own
CONTAINSX()
by using
SUMX()
.
The Formula for Matching Table Values
Here’s one formula you can use to determine whether the same value appears in two tables:
[Is this company a metals company] =
=IF(
SUMX(MatchList,
FIND(
UPPER(MatchList[Keyword]),
UPPER(Companies[Company])
,,0
)
) > 0,
"YES!",
"Probably Not"
)
Note
There are definitely other ways to do this.
|
Here’s what the formula does:
-
•
SUMX()
steps through every row in MatchList. For each row in MatchList, it evaluates the
FIND()
function.
FIND()
returns a number.
SUMX()
then sums up all the values from
FIND()
. (There will be five values to sum up because there are five rows in MatchList.)
-
• For the
FIND()
row of MatchList, it looks to see if that substring also appears in the current row of Companies. If it does,
FIND()
returns the number of the position where that substring is found, as it always does. If it doesn’t find a substring match, it returns
0
.
-
• The
UPPER()
functions make the
FIND()
case-insensitive. If you want it to be case-sensitive, remove the
UPPER()
functions.
-
• If you get
0
back from
SUMX()
, no matches were found, so
IF()
returns
Probably Not
. If anything other than
0
comes back, there was at least one match (maybe more!), and
IF()
returns
YES!
.
No Relationships Needed
No relationships are required for this technique:
Figure 3.30
A Million Variations
You could twist this example in many practical ways: You could return the number of matches rather than
Yes
/
No
. You could make the search case-sensitive rather than case-insensitive. You could look for “begins with,” “ends with,” or an exact match.
CONTAINSX Revisited: What Is the Match?
You just learned how to find whether there’s a match between two tables. You might also want to find out what the match
is
:
Figure 3.31
You’ve seen that it’s pretty easy to write a formula that reports whether there is a match. But what if you want to know what the matching keyword actually is? It’s pretty easy to write a formula for this, too, especially if you’re using Power Pivot.
Note
Determining what keywords match is an example of something that is easier in Power Pivot calculated columns than in regular Excel. Once you get used to calculated columns in Power Pivot, you’ll actually start to realize that handling those same problems in normal Excel is clumsier. At the beginning of this chapter, we mentioned that using
RELATED()
is easier than using
VLOOKUP()
. Here you are essentially doing a
VLOOKUP()
for a wildcard. Sure, this is possible in regular Excel with
=VLOOKUP(“*”&A1&”*”…)
, but 99.9% of people using Excel have never seen this use of
VLOOKUP()
.
|
To determine what the matching keyword is, you can create a WhatWasTheMatch column:
Figure 3.32
Here’s the formula for that column:
[WhatWasTheMatch] =
=FIRSTNONBLANK(
FILTER(
VALUES(MatchList[Keyword]),
SEARCH(
MatchList[Keyword],
Companies[Company],
1,
0
)
)
,1
)
Look at the
FIRSTNONBLANK()
part of this formula (boldfaced). This is a nonstandard use of
FIRSTNONBLANK()
. You can often “misuse”
FIRSTNONBLANK()
in useful ways like this.
See the boldfaced
,1
in the formula? That is the second input to
FIRSTNONBLANK()
, and in “standard” uses, it is typically a measure. For example, you might be trying to find the first time a customer
ever
bought something, which happens when the sales measure first returns a nonblank value. In this case, using
1
as the last input short-circuits the “Is it blank?” test (the
NONBLANK()
part) and grabs the first value found. In other words,
FIRSTNONBLANK()
with a
1
as the last input is a makeshift “first text value we find” function.
In the formula above, the
FILTER(VALUES(SEARCH(…)))
part (between the boldfaced portions) has only one purpose: to return a single column of text values from the
MatchList[Keyword]
column. But it returns only values from that column that actually were found in the current row of
Companies[Company]
. In other words, it only returns keywords that were found.
It all starts with
VALUES(MatchList[Keyword])
.
And because there is no relationship or other dependency between the Companies table and the MatchList table, this will
always
start out with
every
distinct value from the
MatchList[Keyword]
column:
Figure 3.33
In this particular calculated column formula,
VALUES(MatchList[Keyword])
always
starts with the unfiltered list of all values in that column. Then
FILTER()
kicks in
and only keeps rows/values where
<filter expr>
evaluates to
TRUE
.
The formula then uses the
SEARCH()
function as the
<filter expr>
test to
FILTER()
. Since
SEARCH()
returns a nonzero number whenever it finds something, and
FILTER()
treats nonzero numbers as
TRUE
, well,
FILTER()
only keeps values that it finds.
Where Can You Use This Trick?
What would you
do
with a column like the calculated column shown in this trick? Well, how about slapping the new column in the Rows drop zone of a Pivot, with a couple measures? This new column works pretty well in Rows:
Figure 3.34
Note here that we typed over the Row Labels cell in the Pivot and called it Industry, and we typed over the blank cell and called it <unknown>. You could do a bunch of other things, too:
-
• Use the new column as a slicer instead of in the Rows drop zone.
-
• Use the new column as a
<filter>
input to a
CALCULATE()
. You could use a new measure called
[Average Value of Copper Companies]
, for instance, with the formula
CALCULATE([Average Value], Companies[WhatWasTheMatch]="Copper")
.
-
• Write another calculated column that groups “Mine” and “Mining” into a single value, like, um, “Mining.” And then you could maybe use
that
new column for any of the purposes just mentioned.
Common Calendar Conundrums
A tour of common calc column scenarios would not be complete without some time in calendar land, so that’s our last stop in this chapter.
Using an IsFutureDate Column
Here you see that the most recent date in the ServiceCalls table is 6/1/2004:
Figure 3.35
But the Calendar table extends beyond that, into August 2004:
Figure 3.36
It is often very helpful to have a calculated column in a Calendar table that flags dates as being in the future, from the perspective of your data tables. Here is one such formula:
=IF(Calendar[Date]>MAX(ServiceCalls[CallDate]),"Yes","No")
When you add this formula to the Calendar table, it starts returning
"Yes"
on 6/2/2004:
Figure 3.37
Using a DayOfWeekName Column
You already have a column, DayNumberOfWeek, that is a number from 1 to 7. But now you want the names of those days. Here’s an incredibly simple formula for that:
=FORMAT([DayNumberOfWeek], "dddd")
The result is crazy cool:
Figure 3.38
Using a YearMonth Column
Now that you’ve seen the basic pattern involved in handling calendar conundrums, we’ll start moving in rapid fashion, providing just a formula and then an image of the result. Here are the YearMonth formula and result:
=[CalendarYear]&FORMAT([MonthNumberOfYear], "00")
Figure 3.39
Using an IsWknd Column
Here are the IsWknd formula and result:
=SWITCH([DayNumberOfWeek],1,"Weekend",7,"Weekend","Weekday")
Figure 3.40
Note
Obviously, this formula uses a U.S.-centric definition of “weekend” being Saturday and Sunday.
|
Using a Quarter Year Column
Here are the Quarter Year formula and result:
=[CalendarYear]*10 + [CalendarQuarter]
Figure 3.41
An exceedingly common problem is that periods of time bigger than a single day (week, month, quarter, etc.) that also “bake in” a particular year (like Q2 2004) cannot be sorted by any column you typically have in your Calendar table by default.
We think we should be allowed to sort by date, but Power Pivot doesn’t like that. And you certainly can’t sort by Quarter Number of Year, since that fails to account for different years.
So we often find ourselves implementing a workaround: We multiply the year number by 10 to “make room” for the single-digit quarter number and then add that in. Who cares that “20042” is relatively useless and will never be used on a Pivot? The point is that it
sorts
properly. We then hide that column from client tools and forget about it.
Note
You can use the same technique for Year Month, except in that case, you multiply Year by 100 because you need two digits to accommodate Month Number.
|
Custom Calendar Conundrums
All of the previous examples deal with what we call “standard calendars”: calendars in which a month is defined strictly by the calendar on the wall.
In many businesses, including retail, a standard calendar is nearly worthless. All time period comparisons are performed on the basis of custom-defined calendars. Very often, these are of the form “445,” in which months consist of 4 weeks, 4 weeks, and then 5 weeks (and then the pattern repeats).
Some of the columns usually required in those tables can get quite sticky, so this trick provides a few examples.
Finding the Day of the Period
Let’s say you have a 445-style calendar table named HybridCal. Among other columns, it already contains DateID and 445MonthID:
Figure 3.42
445MonthID is integral to all month-level calculations; in order to go back and look at last month, you perform simple
-1
arithmetic, based on that column.
But you quickly find, when writing your 445-based time comparison measures, that you need to know what day of the period you are currently “in.” This number should go from 1 to 28, then 1 to 28 again, then 1 to 35 (the 5-week month/period), and then repeat. That results in some head scratching.
To save your scalp, here is an example of a formula that works:
=INT([DateID] -
CALCULATE(FIRSTDATE(HybridCal[DateID]),
FILTER(HybridCal,
HybridCal[445MonthID]=
EARLIER(
HybridCal[445MonthID])
)
)
)
+ 1
)
And here’s the result:
Figure 3.43
Finding the Week of the Period
Another common question is “What week are we in within the current period?” This should go 1 through 4, 1 through 4, then 1 through 5.
This problem is simpler than finding the day of the period but still can cause an embarrassing amount of head-scratching when someone is watching you. Here’s the formula:
=CEILING([DayOfPeriod]/7,1)
The Ever-Increasing and “Smooth” WeekID
“Prior week”–style measures quickly inform you that you need a column like the one pictured below, which “goes up” by 1 for every week that goes by but never “resets” (back to 1) when you roll over to a new year. It also never has gaps. (In other words, you need the first week of year 2 to be exactly 1 number higher than the last week of year 1.)
Figure 3.44
This is a surprisingly difficult formula to write—not because it’s Power Pivot, but because the math itself tends to tie our brains in knots. But here’s what you do:
=ROUNDDOWN(([445MonthID]-1)/3,0)*13 + MOD([445MonthID]-1,3)*4 + [WeekOfPeriod]
If we went any further down this road, we would exhaust authors and readers alike. So instead we’ll change gears and have some fun with portable formulas in Chapter 4.