17- Multiple Data Tables
Example1: Budget versus Actuals
Here’s a common problem: you have a Sales table, where each row represents an individual transaction. In my case that’s about 60 thousand rows. But then you also have a Budget table, where each row is typically captured at a coarser granularity, and is much smaller in terms of row count.
For instance, here’s a sample Budget table that I’ve imported into PowerPivot:
Figure 256 Budget table: 1,877 rows at Year/Month/Territory/SuCategory granularity
And now the common question: how are my products selling compared to budget?
Difficult in Normal Excel
Solving that problem in normal Excel is tedious. The normal VLOOKUP() routine that we used in Excel for combining a Data table (like Sales) with a Lookup table (like Products) does not work in this case.
The problem is essentially that Sales and Budget are both Data tables. Which one would you VLOOKUP() “into” the other? Plus, each table has multiple rows that correspond to multiple rows in the other, so even if you decided which way VLOOKUP() should “flow,” you wouldn’t be able to successfully construct a single VLOOKUP() formula.
So a common solution involves creating two pivots – one to measure Sales, the other to measure Budget, and then writing formulas that index into each pivot to form one unified “Sales vs. Budget” report. Takes awhile to get it right, and then when someone inevitably wants to see a slightly different report format or rollup level, it’s almost as much work to modify as it was to create the first time!
Much Faster and More Flexible in PowerPivot
Hey, I wouldn’t be bringing it up if I didn’t have a solution for you :-)
The short version is that with PowerPivot, Sales and Budget can co-exist in the same pivot. And you still don’t need to combine them into one table.
Creating Relationships – We Need Some New Lookup Tables
The next piece of good news is that we can achieve everything we need with relationships. No fancy disconnected tables or “dotted line” relationships through measures.
But we do have a problem: the Budget table refuses to relate to any of our Lookup tables.
For instance, let’s try relating Budget to Products using the only Product-related column in Budget: the SubCategory column.
Figure 257 Attempting to relate Budget to Products…
Figure 258 …results in the dreaded “many to many” error.
Well that makes sense: each SubCategory value (like “Mountain Bikes”) does appear many times in each table.
We have a mismatched granularity problem between Budget and the rest of our model that’s existed so far. Which is why it’s such a tough problem in normal Excel actually. So how do we solve it? We need a SubCategories Lookup table!
Figure 259 A single-column SubCategories table:
Then we relate it to Products and Budget tables, matching the SubCategory column in each.
So now our Diagram View looks like:
Figure 260 Updated Diagram View – SubCategories table now acts as Lookup table to both Products and Budget tables
Remember, filter context “flows” in the opposite direction of the relationship arrows. Let’s visualize that:
Figure 261 Filter context flow represented by orange arrows. Note that SubCategories table filter context DOES flow through to Sales, even though it’s a “multi-step” flow.
Filter context flows from SubCategories to Products, and then from Products to Sales. In other words, the SubCategories still influences Sales (in terms of filter context) as if SubCategories were directly related to Sales. Stated more generally, filter context is transitive: if table A is a Lookup table for table B, and table B is a Lookup table for table C, a filter on table A will impact table C.
Where Do We Get This New Lookup Table? Consider a Database.
It’s mighty tempting to create this SubCategories table via copy/paste. But this is another one of those places where a database really shines. A table or query in a database that always returns the unique list of SubCategory values in your company is an absolute lifesaver, since it saves you from manual update work in the future when you gain new SubCategories (or retire old ones).
In absolute terms, it isn’t a ton of manual effort to update the SubCategories table that you created via copy/paste. So compared to the normal Excel way, it’s not a big deal.
But little manual stuff like that starts to stand out a lot more once the other 95% of your spreadsheet life now lacks manual drudgery.
When you get to the point where an entire family of sophisticated Excel reports would just be running themselves every day if it weren’t for this one manual step, suddenly that one manual step becomes a big win to eliminate, whereas that same task would have been hardly noticeable in the old world of constant tedious effort.
SalesTerritory is at Same Granularity Already
For SalesTerritory, we do not need to create a new Lookup table. SalesTerritory is the one place where Budget does match our existing granularity. So we just create the relationship for that one, no new table required.
Repeating the “New Table” Process for Calendar
Budget’s granularity in terms of time only goes down to Year/Month pairs. So again, we need a new lookup table at that same granularity.
Here is the newly-create YearMonths table:
Figure 262 The new YearMonths table. Note the rowcount of 37, and the calculated column I will use to create relationships.
That YearMonth calculated column is a pattern I find myself repeating a lot. The FORMAT() function is used to add the extra zero in front of single-digit month numbers. That isn’t strictly necessary here – I use it just to make Year/Month combos sort properly – but it’s become such force of habit for me that I figured I would share it.
I add that same sort of YearMonth calculated column to my Budget table, and my Calendar table, then create both relationships, yielding the following Diagram View:
Figure 263 Diagram View updated to show new tables, relationships, and filter context flow (orange arrows). Note that with this many tables, I have turned off the details on each table so that more can fit on a single screen.
Integrated Pivot
I can now construct a single pivot using measures from both Sales and Budget, as long as I only use fields from shared Lookup tables on Rows/Column/Filters/Slicers.
What’s a shared Lookup table? It’s a table that filters both of my Data tables.
In this case, there are three shared Lookup tables: YearMonths, SalesTerritory, and SubCategories, all marked with asterisks in this diagram:
Figure 264 When constructing a pivot that displays measures from both Budget and Sales, only the three tables marked with asterisks should be used on Rows/Columns/Filters/Slicers, because only those three filter both Sales and Budget.
I have created a single, simple measure on the Budget table:
[Total Budgeted Sales] =
=SUM(Budget[Budgeted Sales])
Let’s put that on a new pivot, along with [Total Sales] from the Sales table:
Figure 265 These measures come from different Data tables: Budget and Sales
But the real test comes when I start adding fields to Rows, for instance. Here I have Year on rows:
Figure 266 Year on Rows, and both measures still work! But note that in this case, Year comes from the YearMonths table and NOT from the Calendar table!
Once you have the same sort of field (like Year) in more than one table in your model, you need to make sure you are using the right one for the measures in your pivot. The Calendar[Year] column will not work properly with your Budget measures, for instance.
Integrated Measures
This next part is either going to make you yawn and say “yeah, that’s obvious” or make you scream “that is the most awesome thing I have EVER seen!” Or somewhere in between perhaps. For me, I still get a smile on my face every single time I do this.
I can write new measures that reference (and compare) measures from these separate tables, Budget and Sales.
For instance:
[Sales vs. Budget] =
([Total Sales] – [Total Budgeted Sales]) / [Total Budgeted Sales]
Results:
Figure 267 [Sales vs. Budget] in action. (I added the conditional formatting, that was not automatic).
Now I can remove the original two measures, then pile some more fields onto Rows and Columns:
Figure 268 Sales vs. Budget, made criminally simple. Under- and Over- Performers just jump out at you. And this pivot can be rearranged/restructured at will – the formulas will just keep working, as long as you only use Lookup tables that filter both Data tables.
Example 2: Making Use of that Mysterious RANKX() Third Argument
All right, this has become a matter of honor. The third argument will be put to good use. But I had to invent new data in order to put together a credible example.
First, here is the new data. Pretend I have acquired sales figures for my chief competitor, and how well their bikes have been selling over the past few years.
That is here in the CompetitorSales table:
Figure 269 CompeitorSales is just three columns: ModelName, Year, and SalesAmt
The Problem: Ranking MY Products Against Theirs!
So… what if I want to see how MY products rank against my competitors in terms of sales?
For example, if one of my models sold $3M worth of product, and their top three Models sold $4M, $3.5M, and $2.5M, that means my model would rank 3rd against their models.
(Credit goes to Scott Senkeresty for breaking the logjam and suggesting a scenario in which the third param could be used.)
Year Granularity Mismatch Means a New Lookup Table
Just like in Sales vs. Budget, since we have a granularity mismatch, we need a new Lookup table. This time is the simplest one yet: Years.
Figure 270 The new Lookup table, Years
Now I relate that to CompetitorSales, and also to Calendar (so I can filter Sales), yielding the following table diagram:
Figure 271 Table diagram (other tables moved aside to highlight just this “corner” of the model)
Simple Measure
Now I add a very simple measure on the CompetitorSales table:
[Compete Sales] =
SUM(CompetitorSales[SalesAmt])
And here it is on a simple pivot:
Figure 272 [Compete Sales] with CompetitorSales[ModelName] on Rows
Now the Absolutely Amazing “Cross-Rank” Measure
Back on the Sales table (or the Products table if I prefer):
[Model Sales Rank vs Competition] =
RANKX(VALUES(CompetitorSales[ModelName]),
[Compete Sales],
[Total Sales]
)
What does that formula mean? It starts out like it’s going to just rank competitive products against each other and then takes a twist:
• VALUES(CompetitorSales[ModelName]) – this means that the “entities” being ranked are in fact the unique ModelNames from the CompetitorSales table.
• [Compete Sales] – this means that the measure by which those competitive models will be ranked will be their own [Compete Sales] measure. So far, this is just normal, totally understandable usage of RANKX().
• [Total Sales] – but whoa! This means we’re going to take the value of [Total Sales] in our current filter context (which on the left-side pivot, is a ModelName from my company), and insert it into the pecking order established by the first two arguments! Essentially, treat the value of this measure, in the current filter context, as if it were a participant in the normal evaluation of RANKX() as controlled by the first two arguments.
And results:
Figure 273 New Cross-Rank measure compared to pivot displaying competitive sales. My top product would indeed be behind their 13 best products, earning a rank of 14.
And Since Both Are Filtered by the Years Table…
I can add Years[Year] as a slicer to both pivots!
Let’s see if it still works when I slice to a different year:
Figure 274 Shared Year slicer: measure still works
Wow.