Chapter 2 Overriding Power Query Defaults

Your job is tough when everyone follows the rules. But every now and then when you’re trying to import a data file into Excel, you’re likely to discover that the report programmer didn’t follow the proper report standards. The file may have a particular file extension that implies it should follow a certain standard, but it is actually just masquerading as that type of file. Yet when you open it up, you find something that is not compliant with the standard, and it doesn’t work correctly.

Malformed File Example

Say that you’re working with a CSV data set that looks like this when viewed in Notepad:

Figure 22 A raw look at a malformed CSV file, which has an extra header row.

So what’s wrong with this data set? Starting on row 2, it has a nice consistent list of data showing in a proper comma-separated format, with one a header row and four columns of data, separated by commas. The issue here is that there is an extra row at the top of the page, with no commas in it. In a normal CSV setup, this should not occur, as most programs work out the column count based on the first row.

Interestingly, when Excel opens this file, it handles it without issue:

Figure 23 Malformed CSV file loaded into Excel.

Power Query, unfortunately, doesn’t handle this quite as gracefully, as you can see if you follow these steps:

When Power Query launches, the data is imported as follows:

Figure 24 Malformed CSV file loaded into Power Query.

This is obviously not good. You’re looking at a single column of data. Worse, it appears that Power Query recognized the column as delimited, and it failed to import the last three columns!

The challenge here is that Power Query relies on the CSV file being a proper CSV file because it wants to split the data based on the commas. It then reads the first row to determine how many columns exist. Since there are no commas in that line, Power Query figures that it must be a single column. It then starts to read each row and stops at the first comma.

Normally this process works just fine. But in the case of this file, Power Query ignores a large part of the data you need. The good news is that you can fix this by telling Power Query how the file needs to be treated.

Overriding Power Query’s Default Choices

Power Query loves to be helpful by trying to make assumptions about data. While it’s often right, it can occasionally be wrong and needs some manipulation.

There are two main ways these issues manifest in the user interface: Power Query sometimes inserts extra steps that are incorrect/irrelevant, and sometimes it misconfigures steps. You deal with these two problems differently, as discussed in the following sections.

Removing Power Query Steps

Of the two Power Query problems just mentioned, removing extra added steps is by far the easier.

In the case of the import shown earlier in this chapter, Power Query did two things. First, it imported the file, and then it set the data type on Column1. The challenge here is that you need to change the import format, and you may not be sure if you’ll even have a Column1 after this is done. For that reason, you need to delete the Changed Type step. This is very easy to do—simply mouse over the step in the Applied Steps box and click the little x on the left side of the step to delete it from the Power Query sequence:

Figure 25 Removing a step in Power Query.

Warning: Be aware that if you delete steps in the middle of a query that are important to the end result, you may cause the query to throw an error by the end. In addition, there is no undo functionality in this area of the user interface!

In this case, deleting the Changed Type step doesn’t appear to change anything in the query view at this point. (After all, Power Query just forced the text to be treated as text.) It does, however, help you avoid potential errors as you fix the true problem, described next.

Reconfiguring Steps

Now you need to change the way Power Query handles the file at import. To do that, you need to click the little gear icon on the right side of the Source step. When you do that, you’re launched into a configuration window:

Figure 26 Changing the details of the file to be imported.

This dialog allows you to change a few important things about the file, including the file path and what type of file Power Query should assume this is when trying to import it. Because the file in this case carries the CSV file type extension, Power Query chose the Csv Document type.

Note: Each of the example files for this book has a “Completed” version. In order to get them to refresh, you need to click the gear icon on the Source step and change the folder location to the location on your PC.

At this point you need to tell Power Query that CSV is not the correct format for this document, even though it is masquerading under a CSV extension. To do this, click the dropdown next to Open File As → select Text File → OK.

Figure 27 The different options available for data imports.

The results are much more comforting, although there is definitely a bit more work to do now:

Figure 28 All of the data is showing again!

It looks like you’ll need to manually split this data up into columns, but that’s not a huge deal. At least it is comma delimited, and that will make things a bit easier. Now you need to go through the steps that you probably wish Power Query had done for you:

The import now looks much better:

Figure 29 The data the way you wanted it imported.

Note: You might think that you could skip this process by choosing to import From Text instead of From CSV. Unfortunately, because Power Query is so helpful, it will override your choice and still import the file as a CSV anyway.

Reordering Steps

When you review the steps in the Applied Steps box, you can see that after you split the columns by delimiter, Power Query automatically inserts a step to change the column types. Unfortunately, the data is lined up to the left, so you know that Power Query configured everything as text.

You need to fix that, but if you step back to the Changed Type step and start converting the data types to dates and values, you’ll cause errors across row 1. (They are all text values and haven’t been promoted to headers yet.) So it really makes sense to change the column types after you’ve promoted headers, not before. Try moving that Changed Type step down by right-clicking the Changed Type step → Move Down.

The step moves, but it causes an error when doing so:

Figure 30 Power Query tells you it can’t find a specific column anymore.

The reason this error shows up is that the Changed Type step was setting the data types on Column1.1. But since you promoted the column headers, there is no Column1.1 anymore; it’s now TranDate.

Moving steps in Power Query can be a very handy feature, but you have to watch out for errors like the one just shown.

As you become more comfortable with Power Query, you may decide to just fix a problem like this by editing the formula step directly, but for now the easiest way to fix this specific issue is simply to delete the step and re-create it:

Note: If you get an error in the column when you do this, you can fix it by doing the following: Select the Promoted Headers step → right-click the Amount column → Replace Values. Replace the $ sign with nothing. At this point, things will work when you select the Changed Type step. The root cause of the issue is that your regional settings use a – sign for negative numbers as opposed to putting negative numbers in parentheses.

You’re almost done. You just have the date column left to go. For now, go to Home → Close & Load to finalize the query.

Forcing Date Types

A huge issue for Excel pros, particularly those outside the United States, is importing dates and getting them correct.

Once data is in Excel, dates are no big issue. No matter where the workbook goes, the dates will render correctly and can easily be switched to display in the format the reader needs, wherever in the world they are or whichever standard they wish to follow. The difficulty is in importing data representing dates in the first place.

The problem is that Excel and Power Query need to interpret the date from the external data source, determine what date it is, and return the correct date serial number. What makes the job difficult is the fact that not all of the software we use in our daily jobs is written using the same date standards that our country uses as a standard. In fact, if there is a mismatch between the data source and the regional settings on your PC, you’ll have issues.

Warning: The examples and illustrations in this chapter have been constructed using a PC with a U.S. date standard format. If your system settings are in a different format, the results may be different from what is shown here.

Demonstrating Date Issues

The see this problem, follow these steps:

The view you get depends on the Windows regional settings in your PC. If you have your settings configured to use the U.S. date standard MM/DD/YYYY, your import will appear as shown below. If your system uses a DD/MM/YYYY setting, however, you’ll find that the first column will be left aligned, and the second column will be right aligned.

Figure 31 Dates imported and formatted using U.S. standards.

Change the format of the left-aligned column to Date by right-clicking it and choosing Change Type → Date. The result should look similar to that shown below (although the column you selected may vary):

Figure 32 Applying a Date format to data sourced from a different region.

Notice that a couple of issues present themselves immediately. First, the days and months appear flipped, and second, you get errors in the column.

With the Windows regional settings set to MM/DD/YYYY, Power Query tries to interpret all dates in that format. And when it hits a date like 17/6/2015, it throws an error because there are only 12 months in the year.

Fixing Date Issues

The secret to avoiding the problems just discussed is explicitly telling Power Query the format of the data being imported. And while it’s easy to do this once you know how, the terminology used to label this feature wasn’t put in terms that Excel pros recognize.

To force Power Query to read the dates using the correct setup for each column, follow these steps:

Figure 33 Setting the region your data comes from.

As shown below, now the dates all render consistently according to the format declared in the regional settings of your Windows Control Panel. This is because the dates—no matter their original format—have been converted to the date settings on your system.

Figure 34 A consistent view of dates originally contained in two different formats.

Note: If you don’t know which date standard your country uses, experiment and choose one that returns the correct results.

If there is any chance that your data may be shipped to someone using different regional settings, it is highly advisable that you force the date input formats. This will eliminate the chance of surprising errors occurring and is a practice that we follow throughout this book.

You can now finalize this query by going to Home → Close & Load, and you can edit the MalformedCSV query to force the dates to load correctly:

The query now looks as follows and is ready to be loaded to the worksheet.

Figure 35 The query is ready to be loaded to the worksheet.

Warning: Changed Type with Locale steps will not merge with Changed Type steps. If a Changed Type step includes a date conversion, you must remove that step. The reason the query above worked is that the Changed Type step did not set the data type for the TranDate column.