Chapter 20 Understanding the M Language

Now that you’ve explored the different objects you can use in Power Query, it is time to take a deeper look at the M language that is used to perform Power Query’s magic. While mastering M isn’t truly necessary, it will certainly add some incredible power to your arsenal and allow you to work with situations that others can’t.

M Query Structure

To get started, you’ll pull a table into Power Query and then examine the code that is working behind the scenes:

Your query should now look like this:

Figure 255 The initial query is ready to examine.

Query Form

So far, everything you’ve seen has been driven through the user interface. You’ve seen Power Query act as a macro recorder, and you have been able to interact with it via the Applied Steps box. You’ve also had some limited interaction via the formula bar. What you haven’t seen yet is the programming language that resides underneath the covers of this incredible tool. It’s time to change that.

The Advanced Editor window appears, and it contains the code for the entire query created to date:

Figure 256 Power Query’s Advanced Editor.

You can see that the feature set at this time is very limited. The Advanced Editor is essentially just a text editor with a syntax checker in the bottom-left corner. This area will no doubt see some investment in future, but right now it is essentially just a text box.

Note: Before you get frustrated trying to resize the editor to make it as small as the image shown above, be aware that you can’t. (We shrunk it with photo editing to save page space.) The Advanced Editor can be expanded to make it bigger, but it can’t collapse much more than the default size when you first open it.

Take a look at the code inside the window shown above:

let

Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content]

in

Source

There are some key pieces here that are required to build a successful query, so we’ll break this down line by line.

The first line of every query must begin with the word let. (This changes for functions, which are covered in Chapter 21.) Nothing else goes on this line at all.

The second line begins with the word Source. This is important, as this is the step name from the Applied Steps box. It’s via this step name that you can work out which line of code relates to the step you want to examine. In this case, you can see that when the new query From Table was created, Power Query invoked the Excel.CurrentWorkbook() function. It then appended a list of the records (the Excel table) that it imported. Finally, it drilled into the content records for that table object.

The second-to-last line of every query is the sole word in.

The last line is a single word that refers to the step containing the output you’d like to return at the end of the query. This is usually the previous step but doesn’t have to be.

Now you can add another step to this query:

You can see that a new step has been added to the window:

Figure 257 A new step has been added to the query.

There are a few things worth noticing here. First, a comma has been added to the end of the Source line. (If you check back to the prior code, you’ll see that the comma was not there.) This is incredibly important: Every line between the let and in lines must have a comma at the end except the last line before the in line, which can never have a comma at the end. In this case, the Source line ends with a comma, but the #”Changed Type” line does not, as it is the last line before the in statement.

Note: To put this in Power Query terms, you are providing a list of steps between the let and in lines. As you know, all items in a list need to be separated from each other via a comma, but you don’t put a comma at the end.

The Meaning of #”syntax”

The second thing you should notice in the previous screenshot is that the step reads Changed Type in the Applied Steps box but comes into the code as #”Changed Type”.

The challenge here is that the space between Changed and Type causes Power Query to treat the two words as separate terms. To deal with this and treat Changed Type as a single term, Power Query prefixes the term with the # character and then wraps the term in quotes. Sadly, you’ll see this kind of thing happening frequently in Power Query, as the default names given to steps usually have spaces in them. This results in a whole lot of hash marks and quotes in longer code blocks.

If you wanted to, you could change this to avoid those characters. You can do this in two different ways:

Regardless of which method you choose, your code will end up looking like this:

Figure 258 Updating code to remove the #”awkwardness”

Warning: If you make your modifications manually, remember that your new term must be identical in all instances (no case changes), and don’t forget to also rename the last step!

Tying the Steps Together

The next thing to understand is how the individual lines link to each other. Notice that in the NewType line, there is a reference to Source:

Figure 259 The Source step referenced in the NewType line.

It’s this referencing that allows Power Query to chain all the commands together. You can basically read the NewType line as follows:

Using Line Breaks

Another piece of this equation that is interesting is that the code as shown above is equivalent to this block:

let

Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],

NewType = Table.TransformColumnTypes

(Source,{{"Price", type number}})

in

NewType

But this violates the rule about having a comma at the end of all lines except the line preceding the in statement, doesn’t it? Not exactly.

When Power Query reads your code, it looks for keywords, including the comma and the in keyword. As it reads a line, it ignores any commas that are enclosed within matching parentheses, curly braces, square brackets, or quotes. Once it encounters a lone comma, it recognizes that as the end of the current code line and starts reading the next line of code as an independent step. Alternatively, if it finds the in keyword, it now knows that the query is going to end and looks to see which step to return.

Why is this important? It means that you can actually place a line break in the middle. Because Power Query hasn’t found either a comma or the in keyword at the end of the initial NewType line, it keeps moving on, assuming that the next line is still part of the first. In essence, this means that this code:

NewType = Table.TransformColumnTypes(Source,{{"Price", type number}})

Is equivalent to this code:

NewType = Table.TransformColumnTypes

(Source,{{"Price", type number}})

Or this code:

NewType = Table.TransformColumnTypes(

Source,

{

{"Price", type number}

}

)

The key to understand here is that you can’t place a hard return in the middle of a function name or word, but breaking at any punctuation mark is okay. Looked at on a grander scale, this query will run perfectly well:

let

Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],

NewType = Table.TransformColumnTypes

(

Source,

{

{"Price", type number}

}

)

in

NewType

Why would you even want to do this?

This technique can be very useful when you’re trying to debug things like lists of lists, which have numerous pairs of list items. When making manual tweaks to code, it is very easy to get lost in the braces and not realize that you missed closing one. When you do that, you get unbalanced code that will not compile and that can be very difficult to correct, especially with the current lack of good debugging tools in Power Query. By separating the opening and closing curly braces and separating the list item pairs onto individual lines, you have a much better chance of keeping your opening and closing braces in balance. Another bonus of this approach is that it’s also very easy to add new columns to the TransformColumnTypes step (at least when you know what data types to assign).

Add the data types for all the columns as follows:

Column(s)

Data Type

Date

Date

Inventory Item

Text

EmployeeID, Quantity

Number

You can recognize from the curly braces after the Source keyword in the NewType step that you need to provide a list of columns and data types in a list format. Since you already know from Chapter 19 that all the list items need to be separated by commas, you know that this should work:

let

Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],

NewType = Table.TransformColumnTypes

(

Source,

{

{"Price", type number},

{"Date", type date},

{"Inventory Item", type text},

{"EmployeeID", type number},

{"Quantity", type number}

}

)

in

NewType

If you click Done to commit this code (exactly as written), you see that the query returns your desired results, with each data type converted as you requested:

Figure 260 Not only are the dates time-free, but check out the indentation in the formula bar!

Code Comments

The last sets of characters you need to know how to create are those that let you leave comments in code. These can be very useful for leaving notes in code or temporarily disabling a line of code.

To mark a single line of code as a comment, you place two slashes at the beginning of the line, as shown here:

let

// Retrieve the Sales table from Excel

Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],

Power Query will not read the // Retrieve . . . line, and this line also will not appear in the Applied Steps box, but it is there to remind you of the purpose of the following row.

Sometimes you need longer comments that don’t fit on one line. In this case, you place the characters /* prior to the code you don’t want executed and the characters */ at the end of the section, like this:

/* I broke this code comment across multiple lines

(by reading M is for Data Monkey) */

Summary of Special Characters

The following table provides a list of the special characters that you will encounter in your Power Query coding journey:

Character

Purpose

(Parameters)

Surrounding function parameters

{List}

Surrounding list items

[Record]

Surrounding records

"Text"

Surrounding text

#"Step Name"

Referring to a step name that contains spaces or other reserved characters

//comment

Commenting a single line of code

/* comment */

Commenting multiple lines of code

Operating on Each Row in a Column

There is one more very important construct to understand in M: how to read and modify code that operates on each row in a column. To get this code, you can take your existing query and add a column to determine the total sales by row:

The query will now looks like this:

Figure 261 The query with a Gross Sales column.

Viewing the code in the Advanced Editor yields two new lines of code at the end of the query (wrapped lines have been indented here for ease of identification):

CalcSales = Table.AddColumn(NewType, "Multiply", each

List.Product({[Quantity], [Price]}), type number),

Rename = Table.RenameColumns(CalcSales,{{"Multiply",

"Gross Sales"}})

While you may not necessarily recognize the function used in the Rename step, you can fairly easily recognize that the Table.RenameColumns function refers to the previous step and then provides a list of the previous column names and the new name you’d like them to take. In addition, based on the fact that you see the list open with two {{ characters, you can identify that you could provide a list of lists here and rename multiple columns at once.

The CalcSales line, however, has a new keyword in it that you haven’t seen before. The Table.AddColumn function first refers to the NewType step and then provides the value “Multiply”. Since Multiply was the name given to the new column, this means that you could probably just change this line to use “Gross Sales” instead of “Multiply”, avoiding the step of renaming the column later.

Following the column’s name is a new keyword you haven’t encountered yet: each. The each keyword indicates to Power Query that you want this action performed for each row in the query. Following this, you see the List.Product() function, which indicates which columns to multiply against each other and the data type the output should take.

You can now make some modifications to make the code shorter:

Your query should now look similar to this:

Figure 262 Modifications made to the query.

Note: Your line breaks may not appear in the same places—or they may not appear at all. The key pieces to remember are (1) the CalcSales line is being treated as one complete line of code since it was split onto two lines in the middle of the function, and (2) the line does not end with a comma since this line precedes the in keyword.

Notice that the Rename step is gone, but the Gross Sales column still exists:

Figure 263 The Gross Sales column, generated without needing to be renamed.

The end effect of this code modification is that it now sets the column name up front, avoiding the need to rename the column later. Saving a step makes the code more efficient by preventing processing steps later.

In addition, you now recognize another keyword in the Power Query programming language. When you see the each keyword, you now know that the formula after this keyword will be applied to each row in the table.

You can now finalize the query:

Referring to Steps or Rows

Consider a scenario where you have a text file that contains data like that shown below:

Figure 264 Tab-delimited data with a specific requirement.

This data throws a few interesting wrinkles:

Your overall challenge here is that you need to extract rows from the middle of a data set and make it dynamic to deal with however many rows are evident before, in the middle of, and after the needed data.

In addition, as one final wrinkle, you’d like to determine the number of days between each order date compared to the previous order. This task will entail trying to subtract the date on one row from the data on another—something that can’t be done via the Power Query user interface.

Connecting to the Data

You need to connect to the data and then figure out which rows you need in order to perform your goals:

Note: You could accomplish the same goal by filtering to rows that begin with ID, but you don’t know if there are other rows in the data set that begin with ID. By making the terms as tight as possible, you can reduce the chance of errors in the future. (Of course, if another ID Number row enters the data, you will still have to work out how to deal with this.)

The query should now look as follows:

Figure 265 The query shows just the row indexes you need.

You now have a very cut-down view, showing the row numbers for both the ID Number and ID Total rows. Keep in mind that these are fully dynamic, as no matter how many rows precede, follow, or are in the middle of the data, the steps you have taken will still generate the correct rows.

Combining Power Query Steps

Now, could you make this code shorter if you wanted to? You are going to need the Source step again, but the Added Index step isn’t something you absolutely need in the Applied Steps box. It is, however, possible to combine these two steps into one. You can look at the code for those two steps by going to Home → Advanced Editor:

#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),

RowNumbers = Table.SelectRows(#"Added Index", each Text.StartsWith([Column1], "ID Number") or Text.StartsWith([Column1], "ID Total"))

Notice how the #”Added Index” step contains the Table.AddIndexColumn function. On the RowNumbers line, you also see that the #”Added Index” step is the first parameter fed into the Table.SelectRows() function. If you want to avoid having a separate #”Added Index” step, all you need to do is substitute the actual code within the #”Added Index” step in place of that term in the next line.

Warning: Remember when substituting code in this way that you want everything between the equals sign and the final comma only.

By substituting the code in place, adding some line breaks, and removing the #”Added Index” line completely, you end up with nested code, as follows:

RowNumbers = Table.SelectRows(

Table.AddIndexColumn(Source, "Index", 0, 1),

each Text.StartsWith([Column1], "ID Number")

or Text.StartsWith([Column1], "ID Total"))

As you can see, this code still runs flawlessly, but it removes the Added Index step from the Applied Steps box:

Figure 266 Eliminating the Added Index step while preserving the function.

Creating New Query Steps

Next, you need to be able to use this data in your query, but there is an issue: Power Query works as a sequential macro recorder, which means you’ve left your original source data behind. So how do you get it back?

The answer is to click the fx button in the formula bar:

Figure 267 Manually creating a new query step.

If you’re not paying attention when you click this button, you’d be forgiven for thinking that nothing happened. The data in the preview window doesn’t appear at all. You will see, however, that you have a new step in the Applied Steps box called Custom1, and the formula in the formula bar has changed to this:

=RowNumbers

Clicking the fx button always creates a new query step that refers to the previous step. So how is that useful? Change the formula to this:

=Source

You now see that you are back looking at the original data.

Note: If you check the Advanced Editor, you see that you could have typed this in manually by adding a new step right before the in keyword, as follows: Custom1 = Source Of course, in order to keep the code working, you would also need to add a comma to the preceding line and change the term after the in keyword to Custom1 from RowNumbers.

You’re now in good shape here, as you can select the RowNumbers step to see the first and last rows you want to find, then step back to Custom1 so that you can start doing exactly that.

Referring to a Previous Applied Step

You now want to restrict your data to just the necessary rows.

To start with, you set up your command via the user interface. The reason for this is that it will give you the syntax for the functionality you’re after—instead of looking up the function and doing things manually from the beginning.

The index number for the last row is 9 as the data set currently sits. If you filtered out the rows from the top first, however, this would need to be updated. Rather than complicate things, you can remove the bottom rows first, as the index for the top rows won’t change when we remove the bottom rows.

The approach you’re going to use here to remove the bottom rows isn’t actually driven by the Remove Rows feature, as it requests the number of rows to remove. This means that you’d need to calculate all rows in the data set, work out the last row, and perform some math to determine how many rows to remove. Rather than do that, you can identify that the final row you need is in row 9, so you can use the Keep Top Rows feature instead, to keep the top 9 rows:

You now get a table filtered to show only the top 9 rows:

Figure 268 The data now shows the top 9 rows.

Notice that you don’t have the row that starts with ID Total. This is to be expected because the ID Total row was actually the tenth row in the file, but you used a base 0 index to count rows. By pulling the ninth row, however, you get the data you’re after, without the total row (which you’d just need to filter out anyway).

Now how do you make this dynamic? Look at the code in the formula bar:

=Table.FirstN(Custom1,9)

In theory, you just need to replace the 9 with a reference to the correct value from the RowNumbers step. This can be done by referring to a previous step and extracting the value from the Index column in that step. To do this, you would use the following code:

RowNumbers[Index]{1}

As you can see, you refer to the RowNumbers Applied Step, provide the [Index] column, and then drill into the row you need. In this case, you want to use {1}, as this would refer to the second row of the column. (Remember that {0} returns the first value in a base 0 system, {1} returns the second, and so on.)

This means you can rewrite the formula in the formula bar as follows to pull the value dynamically from the table:

=Table.FirstN(Custom1,RowNumbers[Index]{1})

Can you do even better than this?

If you check back on the code for the Custom1 step, you see that it simply refers to Source. Can you skip using the Custom1 step and just refer to Source in the formula? You bet you can! Update that formula again:

=Table.FirstN(Source,RowNumbers[Index]{1})

The result still gives you exactly what you need:

Figure 269 The updated formula still works nicely.

And now, since the formula refers to the Source step, you don’t need the Custom1 step any more at all.

With that step cleaned up, you’re now down to a three-step query, and you’re on your way. Add your dynamic filter to filter out the top rows as well:

The result is a set of rows limited to the header row and the raw data you need:

Figure 270 The code is now dynamically pulling in both the starting and ending rows.

You can consolidate the last two steps as well, just to keep the code cleaner to this point:

Once again, you have shortened the number of applied steps, and you’ve ended up with compact code that dynamically retrieves the raw data set you need:

Figure 271 The dynamic data set is ready for cleanup.

The next steps are just run-of-the-mill cleanup steps:

The result is a perfectly clean data set:

Figure 272 The data set, all cleaned up and ready to use.

So far you have managed to import a set of data and dynamically restrict the rows that you wish to use. At this point, no matter how many rows of data exist before the ID Number row, the query will still start in the right place. In addition, regardless of how many rows exist after the ID Total row, you’ll never see them in your output query, and the query will also pull in any number of rows between those two headers.

Compacting the code is an entirely optional step. You are able to keep nesting your code by copying the previous step and substituting it into the next step in place of the previous step name. While this can make your code more compact and efficient, be aware that it can also drastically affect your ability to debug your code in the case that something goes wrong. The trick is to find a happy medium that makes sense to you.

Referring to the Previous Row Dynamically

There is one more task to accomplish with this data set: working out the number of days since the previous order. Unfortunately, while it would be incredible to just click a button to make this happen, you’re not so lucky, as Power Query doesn’t have this functionality built in. You can still accomplish your goal, but you have to do it manually.

To make this work, you need to figure out a way to create a new column that holds the previous order’s date.

You already know how to refer to a previous step, and you even know how to refer to the column within that step. You also know that you can provide a row number between curly braces to pull back a specific row. All you need to do here is extend that logic a little bit and provide a number that is one less than the current row number.

In order to work that out, you need the index number for each row in the data set. As you are going to be using this new step in future calculations, you should also rename it to avoid #” coding awkwardness:

You are now set up and ready to build a formula that spans rows:

Figure 273 You have everything you need to build cross-row formulas.

Add a new column:

In this case, you are referring to the Transactions step, and you want a value from the OrderDate column. It’s the next part that is the tricky piece. [Index] is inside the curly braces, and it isn’t prefixed by a step name. This means it is still referring to the current step rather than previous step’s value. In addition, since it was entered in the Add Custom Column dialog, it will be prefixed by the each keyword, operating on each row individually. You simply take the Index value for each row, subtract 1, and that will be the row you return from the Transactions step.

Note: The [Index] column is generated so that you can tell Power Query which row you want to retrieve from the target data set, but it is not necessary to have the Index column present in that target. You could just as easily use the formula =#”Changed Type” [OrderDate]{[Index]-1}, and it would work, despite the Changed Type step not containing an Index column.

Everything isn’t perfect, however, as you’ll see when you commit the formula:

Figure 274 Ugh. An error.

The very first record returns an error, even though the rest of the records are working well. Why?

On this row, what is the Index value? It’s 0. Subtract 1 from that, and you get -1. Which row would [OrderDate]{-1} return? Since you’re working in a base 0 system, the first row is {0}, so you can’t start at {-1}. This is the reason for the error.

So what should this value return? If there is no previous record, why not return the OrderDate instead? That way, when you do your calculation, you’ll just have 0 days since the previous order.

To fix this, you need to use the try statement, as described in Chapter 18:

The results make you much happier:

Figure 275 The PreviousOrder column returns the previous order date.

With those dates in place, you can now calculate the number of days between them:

And you have it:

Figure 276 The date difference has been calculated.

Note: The order in which you select columns here is important. If you select the PreviousOrder column first and then the OrderDate column, your difference will show up as negative.

At this point, the only thing left to do is the final cleanup:

At this point, you can test the query by opening the Ch20 Examples\Varying Header Rows.txt file and making modifications to the file. Here’s what you’ll see when you play with the data: