Chapter 3 Basic Append Operations
One of the jobs that Excel pros do on a regular basis is appending one file to another one. Particularly in organizations where data is shared by CSV, TXT, or Excel files, the process usually boils down to the following:
The user then builds a business intelligence solution in the Consolidated file. When the next month comes along, the data from the next monthly file is copied into the Consolidated file, the solution is updated, and all is good.
But the process is plainly not all sunshine and rainbows, and there are some very obvious issues here. This chapter does not solve the issue of a user making a mistake in the transformations (although future chapters do), but it does show you how Power Query can consolidate two or more data sets without ever worrying about a user pasting over the last few rows of data.
Importing Data Files
The Ch03 Examples folder contains three CSV files: Jan 2008.csv, Feb 2008.csv, and Mar 2008.csv. This section walks through the process of importing and appending each file.
Importing the files is fairly straightforward:
Power Query opens the file and executes the following steps automatically for this data source:
To be safe, you should remove the Changed Type step and re-create it to force the dates to import based on the U.S. standard that they came from:
Note: Remember, if your number format is set to display negative numbers with a – sign instead of using parentheses, you may have to also have to remove the $ signs from the Sum of Amount column before changing that column to a decimal number in order to avoid errors. To do this, select the Changed Type with Locale step → right-click the Sum of Amount column → Replace Values → Replace $ with nothing. After you’ve performed these steps, select the Changed Type step again to make sure you’re at the end of the query before continuing.
In addition, you should also make the following transformations:
Note: Errors are explored in detail in Chapter 7.
At this point, the query should look as shown below.
Figure 36 The Jan2008 query before loading to the worksheet.
Go to Home and click Close & Load to load the Jan2008 query to the worksheet.
You now need to replicate the process with both the Feb 2008.csv and Mar 2008.csv files. The import process uses exactly the same steps, and when you’re done, you should have three new tables in your Excel workbook in all:
Appending One Table to Another
Next, you need to append the Jan_2008 and Feb_2008 tables. To do this you need to create a new query, but this time you want an append query. To create one, follow these steps:
The Append dialog appears, and in it you can choose the queries you’d like to append:
Figure 37 The Append dialog.
You need to understand a couple of tricks here:
Clicking OK opens the Power Query editor with a new query called Append1, which has a single Source step.
At this point you may be tempted to scroll down the query to see if all of your records are actually there. Unfortunately, this won’t really work. Power Query doesn’t actually load all your data in the initial window; rather, it shows a preview of your data. The number of rows it shows you varies with the number of columns you add, but you can see this in the bottom-left corner of the Power Query editor:
Figure 38 Power Query shows you how many preview rows it can handle right now.
The reason for this behavior is that Power Query can be used to handle large data sets. Imagine for a second that you want to connect to a data set that has 5 million rows, but you only want to pull in the records for department 150. The Power Query team describes the preview as “looking at the first inch of water in the fire hose,” under the assumption that the preview should give you enough information to determine the key structure of the data. You then make your transformations on the preview data and create a pattern. At load time, Power Query processes this pattern against the data, pulling in only the records it needs to give you your output. This is much more efficient than loading all the data to the workbook and then processing every row and column. But if you can’t see all the data, how do you know it worked?
The answer is to finalize the query. Here’s how:
A new table is created in the worksheet, and you see some key information in the Workbook Queries pane:
Figure 39 The Workbook Queries pane show that the record counts match.
Still, you shouldn’t just believe that everything has worked. You can create a PivotTable to make sure Excel isn’t lying to you:
If you’re using Excel 2010 or 2013, you also need to take these steps:
You end up with a PivotTable that shows that both tables were indeed consolidated into one:
Figure 40 January and February transactions are now in one PivotTable.
Appending Additional Tables
Say that you want to add the March records to the query as well. You might be tempted to head back to the same spot to append the March records:
Figure 41 Setting up to merge March’s records.
But wait! Will this actually work? Won’t this create a new query? Yes, that’s absolutely the case. Doing this will kick off a new append query, which won’t make any sense at all. The PivotTable is already built against the Transactions table, so you really need to go back and modify that table to append March’s records as well. But how do you do that?
The answer is to cancel the Append dialog shown above and go back to the Workbook Queries pane, where you right-click the Transactions query → Edit.
In the Power Query editor, you can modify your existing query. And, as it happens, there is a really inviting button on the Home tab that looks promising: the Append Queries button in the Combine group (the second group from the right).
Unlike the Append button in the Excel user interface, the button in the Power Query user interface asks for only a single table, as it already knows which query to append it to (the one you’re currently in).
Interestingly, the options include not only the other Power Queries you have set up but also the query you are currently building:
Figure 42 Options to append include the table you are working with.
Note: Selecting the table listed as current would append a copy of the query to that point, essentially duplicating the data set. While this is certainly not something that most users would use on a frequent basis, it’s nice to know that the capability exists.
At this point, you should choose the Mar 2008 query, and it will be appended to the Transactions query that already holds Jan and Feb records:
Now you see an unfortunate issue with Power Query queries. When you look at the worksheet that holds your PivotTable, you can see that the Transactions query (and, therefore, the Excel table) does hold all 6,084 rows—the combined totals of the three previous data sets. Yet the PivotTable has not changed:
Figure 43 The Transactions actions table has updated, yet the PivotTable has not.
This is a minor inconvenience, but you’ll need to refresh the PivotTable as well in order to have the updated values flow through. To do that, right-click the PivotTable → Refresh. And it does, indeed, update:
Figure 44 The January through March records are now showing in a single PivotTable.
Combining Queries with Different Headers
When you’re appending queries, as long as the headers of the queries being combined are identical, the second query will just be appended to the first one, as you’d expect. But what if the columns don’t have the same column headers?
In the case of the image below, the user forgot to rename the TranDate column in the Mar 2008 query. Everything was fine as the user merged the Jan 2008 and Feb 2008 records together. But when the user appended the Mar 2008 records to the table, things broke down:
Figure 45 The TranDate column full of null values in January and the Date column full of null values in March.
When you append one table to another, Power Query loads the data from the first query. It then scans the header row of the second query and all subsequent queries. If any of the headers are not present in the results retrieved to date, Power Query adds the new column(s). It then fills the appropriate record into each column for each data set, filling any gaps with null values.
In the scenario above, this means that the TranDate column is filled with null values in January, since the Jan 2008 query doesn’t have a TranDate column.
On the flip side, because the user forgot to rename the TranDate column, the Mar 2008 query has no Date column. For this reason, the Date column is filled with null values for each March record, while the TranDate column holds the values that were intended to be in the Date column.
The fix for this is to do the following:
As soon as you open the Transactions query, you see that it has already fixed itself.
Implications of Append Queries
The implications of the ability to append queries are numerous. Consider for a second that you have just reached out to three separate files, imported them, combined them into a single table, and built a PivotTable from them. That is one PivotTable from three separate files.
And when you want to refresh the solution, you simply need to go to Data → Refresh All to refresh it. Power Query kicks off the refresh of the Transactions table, which kicks off the refresh of the three individual data tables to feed it.
Assume now that this solution was built on files that weren’t date specific but were instead Product 1, Product 2, and Product 3. Say that you’ve built a solution by loading in the CSV files that hold the pertinent data, and you’ve built a PivotTable against them. And then the next month comes along . . . and the IT department sends you replacement files with new transactions for each product.
You save the new Product 1 file over the old one, and you do the same for Product 2 and Product 3. Then you click Refresh All, and you’re done.
Seriously, let that sink in for a moment: You’re done.
You’ve cut your work time to a fraction of what was required in the previous process, and there is no risk of having users accidentally paste over existing data, as Power Query doesn’t work using cut and paste. It simply appends one set to the other and removes the duplicated headers. You get the best of both speed and consistency in a single solution.
Note, however, that the process of appending queries is not specific to working with external CSV or TXT files. Say that you have tables of transactions such as the promotional gift certificates your company has issued this year. The author of the workbook set up 12 tables, one for each month of the year, stored on separate worksheets. You can see that in this case, you could easily consolidate those individual tables into one master list for analysis.