Chapter 25 Query Organization

With all the work you’ve done learning Power Query, you wouldn’t expect to suddenly get hit by an error when trying to merge two tables. Unfortunately, this is a real possibility if the tables come from two different data sources.

The Formula.Firewall Error

One of the things that you haven’t seen so far in this book is the dreaded Formula.Firewall error, but it can be quite a shock when it hits you. And while there are strategies for avoiding it, it is best if you know up front what causes this nasty error message to appear.

Triggering the Formula.Firewall Error

In the timesheet example that was used through several chapters, recall that you built a fairly robust solution. The solution file is saved in a specific location, and the data files are stored in the Source Files subfolder, which hangs off the solution directory. You’ve repurposed code to import a single timesheet, turning it into a function and leveraging it to preprocess all the files in the Source Files directory so you can combine them afterward. The data lands nicely in an Excel worksheet (or the Power Pivot Data Model if you were inclined to push it there). What could possibly go wrong?

A scope change. That’s what. Suppose your manager tells you that he wants the department for each employee merged in with the data set.

You’ve got the employee records stored in another Excel file with each employee’s name and department. You know you can import from Excel files, and you know how to merge data sets together. How hard can it be? Try it to see:

Figure 317 Importing the EmployeeDepts table.

This table looks pretty good. In fact, it looks so good, you try just loading it to a connection right away, as it really doesn’t need any modification at all:

You get a nice new EmployeeDepts query in your Workbook Queries pane, and all is looking good so far.

Figure 318 Things are looking positive so far.

Now merge these tables together:

  • Right-click the Timesheets query → Edit
  • Go to Home → Merge Queries
  • Choose to merge the EmployeeDepts table
  • Select the Employee column on each table

Oddly, Power Query is unable to determine how many rows will match:

Figure 319 Why is Power Query having matching issues?

Never mind this. Click OK and hope for the best. And boom, you get the dreaded Formula.Firewall message:

Figure 320 The dreaded Formula.Firewall rears its ugly head.

This message is a bit intimidating. In English, what it is trying to say is that you have connected to two completely different data sets using the same query. It doesn’t matter if you are working with two tables from the same workbook; they are seen as two data sets, and Power Query doesn’t like this.

Avoiding the Formula.Firewall Error

So how do you fix this problem? You avoid the issue, and here’s how:

You now have a connection-only pointer set up to both the Timesheets data and the EmployeeDepts data. Create a new query to merge them:

Notice that this time Power Query identifies the matches:

Figure 321 This is much more promising.

With the merges apparently going to happen, you can continue on:

This time, it works:

Figure 322 The departments are merged into the data set.

Creating Staging/Loading Queries

The secret to avoiding the Formula.Firewall error is to separate the job of connecting to the database and loading to the end destination. We call these staging queries and loading queries.

You use staging queries to ensure that you create separate queries to extract the initial data from each data source. You then do as many transformations as possible in the staging query in order to get the data into a shape where it can be used by as many other queries as possible. What you never do, however, is try to merge to another data source or load the end result into a worksheet of the Data Model. Both of those tasks are reserved for a loading query.

A loading query is designed to pull data from one or more staging queries and merge or append as required before making any final transformations that are needed. You then set up these queries to load the output to the final destination, whether it’s the worksheet or the Data Model.

Visually, the process looks as follows:

Figure 323 Organizing staging and loading queries.

Technically, combining any two different data sources can cause a Formula.Firewall error to appear; even trying to inject a parameter from an Excel parameter table can trigger the error. If you get a Formula.Firewall error when trying to nest a dynamic parameter call in the middle of a query, try to declare your variable up front rather than avoid the variable declaration and nest the function later in your code. This often resolves that issue.

Caveats

When working with staging and loading queries, you need to be careful in certain cases, especially when the data source you connect to can take advantage of query folding to improve performance. The reason for this is that query folding can be executed only from the primary connection to the data source (the staging query).

When using a staging/loading approach against a database that supports query folding, you should always try to accomplish as many filtering and grouping operations as you can in the staging query. This will allow Power Query to pass the processing job back to the database. As soon as you move out of your staging query into a secondary staging query or the loading query, no more commands can be passed back to the database to be folded, which means Power Query has to do the processing work. This can significantly slow down the process. It may also make sense to make several connections to the database to pull individual data sets, allowing the database to do the heavy lifting for you since databases are optimized for the sorting, filtering, and grouping operations.

By contrast, if you are targeting your solution against sources that don’t support query folding (such as text or Excel files), it’s a good idea to create one staging query to pull in the data and cut it down as much as possible. Then you can create as many secondary staging/loading queries to reshape the data as needed in order to get it into your model. Since there is no ability to take advantage of a more efficient processing engine, you want to bring the data in once and try to make sure each step works with as few records as possible to improve processing time.

You should also be conscious of not trying to set up too many levels of staging queries between your data source and your loading queries. Whereas using one staging query feeding one loading query can be quite easy to follow, things get much more complicated when you start feeding data through five or more intermediate staging queries on the way to the end result. It’s a longer trail to process, which could have performance impacts, and most certainly there will be a longer trail to audit should something go wrong. Keeping the query flow tighter will help you when you’re trying to debug and maintain your queries.

You should also be aware that Power BI doesn’t generally appreciate long trails of queries using the staging and loading query approach. If you are intending to push your solution to Power BI, it is a good idea to try to keep the approach to a single query if possible and limit the number of steps if not.

Keeping Queries Organized

As you build more and more Power Query queries, you’ll find that things start to get a bit unorganized in the Workbook Queries pane. As it happens, there are a couple ways you can keep a handle on this issue.

Changing Query Display Order

As you start building queries, Power Query will just add them to the bottom of the list, in sequential order. You can, however, change the order in which these queries are presented in the Workbook Queries pane (and Power Query Navigator inside the Power Query editor). To change where a query shows in relation to the others, follow these steps:

Figure 324 Moving query positions.

Grouping Queries

Ordering is great, but it doesn’t help you categorize queries very easily. Especially when using the staging and loading approach, it is very helpful to create groups in which to store each query. Here’s how:

Figure 325 Classifying functions.

You can now move the fnGetParameter into the Functions group as well:

It’s then just a simple matter of creating new groups for the staging and loading queries as well:

Figure 326 All queries, nicely organized.

This approach works very nicely for keeping things organized, and it also gives you the ability to quickly scan the queries to see if they are set up as expected. If you see the text “x rows loaded” under a staging query, you know it’s not actually taking a specific staging role, and you can either fix it or move it to the Load group.

In addition to being able to perform the reordering and grouping in the Excel interface, these features are also supported in the Power Query editor, via the Navigator pane:

Figure 327 Organizing Power Query queries in the Navigator pane.