Chapter 9 Merging Tables and Queries

One of the classic issues that has presented itself to Excel pros for years is aggregating two separate data tables into one in order to serve a PivotTable. The route was to use a VLOOKUP() or INDEX(MATCH()) combination in order to read data from one table into the other.

The challenge with this is that many users were terrified of VLOOKUP() and didn’t understand the INDEX() and MATCH() functions either. And while PowerPivot now allows you to aggregate two tables without using any VLOOKUP() functions, it carries its own complexities. Yet still the issue remains: Sometimes you just need an easy way to merge two tables’ records together.

When Power Query hit the scene, it introduced yet another method to combine two tables together—one that didn’t involve learning Excel formulas or building a relational database structure.

Merging Tables via Power Query

For this example, say that you have two separate tables that exist in an Excel worksheet. The first is the Sales table, which holds the Sale Date, Product SKU (number), Brand, and Sales Quantity of the sales transactions. But it doesn’t hold any information about the price or cost of the products. That information (and more) is, however, held in the Inventory table. You’d like to merge the two tables together in order to get a comprehensive list of products with their finer details.

Creating “Pointer” Queries

In order to merge or append queries together, the queries must exist. Having a table in Excel isn’t good enough; Power Query actually needs to recognize the data as a query. Yet it seems somewhat crazy to have to take an existing table, pull it into a Power Query, and then load it to an Excel table in order to reference it. And, indeed, you don’t actually need to do that.

You do still need to pull the original table into Power Query, but the output is where you make a change.

Inside the Ch09 Examples\Merge.xlsx file, you’ll find two tables set up: an Inventory table and a Sales table. In order to let Power Query read the contents of the Inventory table you need to:

Power Query opens its editor so you can make any transformations you need. In this case, however, you actually don’t need to make any. All you want to do is create something that you can connect to later.

Figure 115 Choosing to create a connection-only query.

You see your query show up in the Workbook Queries pane, but Power Query doesn’t create a new table for you.

Figure 116 A new query created as a connection-only query.

Note: If you make a mistake and create a query with a table, you can fix it by right-clicking the query in the Workbook Queries pane and choosing Load To…. Power Query warns you that you’re going to delete data when you commit the change, as you will be removing the newly created table. You can also add a table if you created a connection-only query by editing the Load To… behavior as above and changing the selected load behavior from Only Create Connection to Table.

With the Inventory table now available in Power Query, you need to repeat this process for the Sales table:

You’re all set up and ready to merge the tables together.

Performing the Merge

You create a merged table in the same place where you create an append query from the Excel user interface. Here’s what you do:

The Merge dialog appears. Its two drop-downs allow you to pick the two tables that you’d like to merge together.

Oddly, after you take these actions, the OK button is still not enabled:

Figure 117 You’ve chosen tables, but why can’t you proceed?

The issue at hand is that Power Query doesn’t know which fields you want to use to perform the merge.

In order to perform a merge, you ideally want to have a column that contains unique values in one table but has repeating records in the other table. This is called a one-to-many structure, and using it is the best way to ensure that you end up with results that match what you’d expect.

Note: Power Query supports one-to-one and many-to-many joins, as you’ll see shortly.

In this case, the SKU Number column contains unique products in the Inventory table, and it repeats many times in the Sales table, so you can use those:

Power Query opens the editor, where you see a nice new column of tables on the right side of the Sales table:

Figure 118 A new column of tables, containing the matching Inventory table records.

You know what to do with a column of tables: Expand them! The only question here is which columns you need. Because the SKU Number and Brand columns already exist in the sales table, you don’t need those, so make sure to exclude them during the expansion. Follow these steps:

As you can see, you now have the product details merged into the Sales table:

Figure 119 Details from the Inventory table merged into the Sales table.

Now you can finalize this query by using the following steps:

You’ll find that you have 20 records, one for each transaction in the original Sales table, exactly replicating VLOOKUP()’s exact match scenario.

Many-to-Many Merges

When building a merge, you need to be careful to merge based on the correct columns. If you try setting up a merge as follows, you’ll find that your output differs slightly:

As you can see, the only two changes here are the column used to merge the tables and the final query name (which won’t affect the output). Yet this time there are 22 records in the output table—2 more than the original count of transactions.

To understand why this is happening, you need to edit the ManyToMany query and step back to the Source step. If you go down the table to record 19 and click in the whitespace beside the word Table, you get a preview of the data in the table that will be merged into your Sales table.

Figure 120 A many-to-many merge in action.

In the previous merge, you created the merge based on the SKU number, meaning that records would only be matched up if their item numbers matched. In this case, however, you matched based on the brand. Because the OK Springs brand shows twice in the Inventory table, when the Sales table has the brand OK Springs, it creates two matches. The fact that they have different SKUs is irrelevant to Power Query, as you asked for a match based on brand, not SKU.

You can see from this example that you need to be careful when creating matches. This feature can be very useful, but it can also be dangerous if you are not paying attention and are expecting a one-to-many match.

Dynamic Table Headers

Excel tables are fantastic, they truly are. But one issue with them is that they lock down the header row into hard-coded values. That prohibits you from having dynamic headers on your tables, driven by formulas.

The file Ch09 Examples\Dynamic Headers.xlsx contains a spreadsheet used for budgeting green fee rounds for a golf course. The author has set up a tabular setup and is able to change the year in B1 and update all headers in rows 3 and 32 of the document:

Figure 121 A budget document with dynamic headers.

The Issue

The challenge here is that you need to load and unpivot the Recap table (rows 32:36). Since using Power Query is the easiest way to unpivot data, you obviously want to use it, but there are some challenges in getting the data into Power Query in the first place:

The workaround in this case is to create a translation table and merge that back into the query.

Creating a Translation Table

The translation table for this example needs to be able to come up with a logical way to translate a static column header to the current dates. Using the notation CYMx (Current Year, Month x), you can build a table to hold these dates. Create the table as follows:

Your table should now be set up as follows:

Figure 122 The DateTranslation table.

Naturally, in order to merge this into another table, you need Power Query to also know that it exists, so follow these steps:

With a connection set up to this table, you’re now ready to go.

Retrofitting the Source Table

Next, you need to pull in the Summary table. But in order to merge the data together, it will obviously need to have headers consistent with the CYM format. Yet at the same time, you’d like to keep the current dates showing. This is no big deal—you can fake it:

The data range is now set up and ready for to use, and you just need to add a table to it:

Figure 123 The summary table now has the alternate header.

The trick here is that you need to make sure the table only covers rows 33:37, as you don’t want to lock down row 32’s formulas. Follow these steps:

Depending on the style you choose, you can even hide all evidence that you’ve been tinkering with this and setting it up as a table:

Figure 124 Table? What table?

Merging the Tables

And now for the magic moment—combining these tables:

At this point the query is looking fairly decent. It’s unpivoted, and it’s ready to be merged. And guess what? You don’t even have to leave Power Query to do it!

The output is a very nice unpivoted set of data, with the correct dates associated with each transaction:

Figure 125 The unpivoted data set.

But how well does it update? Try this to see:

It looks like the data entry worksheet is working:

Figure 126 The data entry worksheet is still reacting dynamically.

Now check the Power Query:

It looks like the table is working nicely as well!

Figure 127 The output table updates to reflect the current dates.

With the table headers now essentially dynamic and your ability to quickly unpivot the data into a table, you’ve now opened up great possibilities for the data. From merging this to other data sets to feeding it into PivotTables or Power Pivot, the possibilities are truly endless—and you still have a user-friendly front end.