Chapter 14 Grouping and Summarizing

In many ways, grouping and summarizing sums up the life of an Excel pro. You’ve got all kinds of tools for the purpose, including Excel’s ability to subtotal, aggregate, and perform various other operations using PivotTables. Sometimes, however, when working with large data sets you need to group the records at the source before analysis.

Power Query adds another set of tools that you can use to quickly group and summarize data. Interestingly, it even gives you the ability to quickly remove extra columns, thereby thinning the data set to a truly manageable level.

Applying Grouping

To demonstrate how grouping works in Power Query, consider the case of a T-shirt manufacturer that has provided this table of data:

Figure 167 A list of sales transactions.

Your goal is to take this list and figure out two things:

Connecting to the Data

Start by loading your data in Power Query:

You now have your table in a nice state, ready to group and summarize the data:

Figure 168 The table is ready to summarize.

Grouping the Data

To group the data by date and by channel.

In this interface, you have the ability to define the items you want to group by, as well as how you’d like them grouped. Power Query starts with only the Date column in the Group By section. If you left that setting as is, you wouldn’t have the data broken down by Date and ChannelName, so you need to change that:

Figure 169 Adding grouping levels

Next, you need to determine how you want the data grouped. Because you are looking for a total of all channels, as well as a total of the products by channel, you need to take the following steps:

Figure 170 The grouping level is configured correctly.

The data is grouped for you, as shown below:

Figure 171 The data is nicely grouped by Date and ChannelName.

Notice that the grouping feature works quite nicely to get your data grouped, and you can have multiple levels stacked very easily for both grouping levels and calculations. You can aggregate the data in different ways here as well, including counting rows, counting distinct rows, or performing Sum, Average, Median, Min, or Max operations.

As an added bonus, the Group By feature removes all columns that were not specified in the original Group By section at the top of the dialog. This saves you the step of removing the unneeded columns before or after the operation is complete.

Now you can load this data to a table:

Creating Summary Statistics

You’ve managed to accomplish the first of your goals in grouping the data by Date and by ChannelName. Now you need to work out the top-selling product in each segment and what it represented as a percentage of total sales for that group.

Duplicating Queries

In order to accomplish your goal, you actually need to modify and add some steps to the original query. Rather than potentially break it, it’s not a bad idea to create a copy to work from. That gives you a fallback plan if things don’t work out the way you wanted:

You now have an exact copy of the previous query that we can load to the workbook:

Figure 172 An exact copy of the Grouped query.

Determining the Top Seller

Now you can set about determining the top-selling product. To do this, you are going to modify the Grouped Rows step and add another step:

This adds a column of tables to your query. These tables, however, are quite special. They contain the details of which rows from the previous step were summarized in order to come up with each row’s totals!

Figure 173 A peek at which rows were used to generate the grouped values.

The question is, how can you use these tables?

Your goal here is to work out the top-selling sales item for each day. You can identify that by looking at the highest value in the Amount column shown in the preview for each table. But how do you extract it? The answer is to reach to a custom column and use a formula. To do so, follow these steps:

You now have a column of records listed:

Figure 174 A column of . . . records?

Let’s circle back for a moment and look at what happened first and then look at what you got from it.

The formula you used examines a table and extracts the max value from the given column. For the parameters, you provide the [Details] column of the query, as that holds the tables to be examined. The column within that table, Amount, was then provided, wrapped in quotes as the function requires.

Unlike previous formulas you’ve seen, however, this function doesn’t return just the value. It returns a record. This is fantastic because the record doesn’t just hold the maximum value. It holds all of the details that go with that data point.

Note: You’ll learn more about records in Chapter 19.

In addition, you can expand the MaxRecord column by using the double-headed arrows on the top right to get the individual components into columns:

The result is that you now have a table that summarizes sales by Date and Channel and clearly shows the top-selling item, as well as its contribution:

Figure 175 The data grouped by Date and Channel with TopSeller details.

There is only one thing left to do: work out the percentage that the top seller represents of the total daily sales. That is easily accomplished with a simple calculation:

Note: Don’t let the hash marks and quotes mess you up here. The easiest way to build the formula is by clicking the field names on the right, and Power Query will put them in for you. You’ll learn more about why they show up when you read Chapter 21.

The result is a little ugly, with some numbers showing no decimals, some showing one, and others showing a ton. You can round them off quite easily:

Warning: Make sure you return to the Transform tab to perform this operation. If you don’t, Power Query will create a new column for you, and you’ll have to remove the previous column manually.

The results are nicely rounded for you, but they do show as decimal values like 1, 0.66, and so on. They certainly don’t show as nice percentages, but that’s okay. You can load the query to an Excel table and apply a percentage style to the column there. As you can see, this gets you to your original goal:

Figure 176 The completed analysis.

Note: Number styles applied to a table column stay in place after a Power Query query is updated.