Chapter 11 Defining Data Types

As Power Query matures, it is getting better and better at automatically setting appropriate data types on columns of data. Even so, there are still times when it pops up with an odd choice (or no choice at all). This can leave hidden traps in data sets that can rear their ugly heads when you least expect it and can cause all kinds of confusion.

What Data Types Are Recognized?

Power Query recognizes several data types—more, in fact, than Excel does. The complete list of Power Query’s data types is:

The true killer is in the last data type. Any is a variant data type that Power Query will use to indicate that it isn’t sure of the data type. The challenge here is that data defined as Any could take a variety of forms when being loaded or referenced in the future.

Why Data Types Are Important?

To demonstrate the importance of defining data types, in this chapter you will create a new query and remove the data types that Power Query defines for you. This will give you a good understanding of why declaring data types is important and not something you should leave to chance.

Tables and the Any Data Type

Start by loading a query with no defined data types into an Excel table:

At this point, the data has no data types defined at all, which you can see by selecting a cell in the column and then looking on the Transform tab:

Figure 141 The Date column with a date type of Any.

Here’s the Excel table that shows up:

Figure 142 What happened to that Date column?

As you can see, Excel didn’t recognize the Date column as dates but instead placed the date serial numbers. While this is easily rectified (by formatting the column as dates), it is indicative of a more serious problem.

Power Pivot and the Any Data Type

At this point, make a change to the query and load it to the Data Model:

Once you commit this change, the Excel table in the worksheet also changes!

Figure 143 Strange changes that occur when you add the data to the Data Model.

This is probably one of the oddest changes you’ll see in Power Query. The mere act of adding this connection to the Power Pivot Data Model changes the field back to a date in the Excel table—or does it? To see if it does:

There are no changes! In fact, you can apply any numeric style to the data in column A that you like, and it will not change it. The data is now being treated as text, not as a date at all.

What about inside Power Pivot? If you open Power Pivot, you see that the data is there, and it looks like a valid Date/Time value. But when you select the column, you can plainly see that the data type has been set to Text. That certainly won’t help when creating a relationship to your calendar table!

Figure 144 Dates that aren’t really dates inside Power Pivot.

Dealing with Any Data Types

The fix for these issues is very simple. You edit the May query and set the data types, like this:

If you now check the table and Power Pivot model, you’ll find that everything is being treated as the data types you’d expect.

Note: Remember that Power Pivot formats valid dates in a Date/Time format, with 00:00:00 being 0 minutes after midnight of the provided date.

Combining Queries with Different Data Types

One of the tasks you’re likely to perform often is appending two tables. But what happens when the columns in those tables have different data types?

Since you’ve already got the May query corrected, you can now create a new query for the June data without data types and see what happens when you append them:

Appending Defined to Any Data Types

Now you can append the two queries:

At this point, if you check the data type in the Date column, you’ll see that it is Any:

Figure 145 Combined tables with differing data types.

This probably isn’t very surprising. After all, you started with a query where the Date column was defined as Any, so appending something else to an Any should probably yield a column of the same data type.

You can finalize this query and see what happens if you attack it from the other side:

Appending Any to Defined Data Types

Now try this from the other direction, starting with a column that did have the data types defined:

The data certainly looks better, but when you inspect the Date column, it still shows as an Any data type:

Figure 146 The Date column still shows as an Any data type.

By now it’s fairly clear that it doesn’t matter in which order you append the queries: If the data types are mismatched between the sets, they will be converted to Any data types.

You can now finalize this query and see what happens if you correct the issue in the underlying table:

Appending Consistent Data Types

Since you already know the data types are defined in the May query, you can make the data types consistent in the June query and see how that affects the append queries:

And now you can return to one of the two append queries to see the effects. No matter which you choose, you’ll find that the values in the Date column show up properly defined as Date data types.

Figure 147 The Date data type is now applied to the append query’s column.

Data Type Best Practices

While it can be tempting to just rely on Power Query to assign data types and continue on, we strongly recommend that you review the final step of every output query and define your data types there. Even though Microsoft is constantly improving the interface, there are still commands in the Power Query interface that will return an Any data type, even if it was defined before.

Until recently, an example of this was creating a month end from a date. Even though the column with the date was properly formatted as a date, the Month End function returned a value that looked like a Date but was, in fact, an Any. Although this specific issue is now fixed, there are bound to still be issues like this lurking in the program.

In addition, there are also certain commands that will not run on columns where the data type is defined as Any. One such command is the Replace Values command, which won’t always find certain values. If the column is converted to Text, however, the command functions properly again.

As you’ve seen, the risk of having your data defined as an Any data type can cause issues in both Excel and Power Pivot, and these problems can manifest in different ways. Remember also that you fixed the two append queries in this chapter simply by changing the underlying data type. The flip side of this is that it would be just as easy to break things when modifying an underlying query and cause the final query to render Any data types. For the amount of effort compared to the peace of mind, it is highly recommended that you create a habit of defining data types for each and every column as the final step of a production query, even if the data types have been defined earlier. While it might seem tedious, remember that it is a one-time task for a query that could be rerun every single day of the year.