Chapter 15 Transposing and Unpivoting ­Complex Data

When working with real-world data, Excel pros often encounter data that needs to be treated in a variety of ways. While unpivoting is extremely common, on occasion you may need to reverse this process and pivot data before making more complex transformations. Another technique that can be extremely useful is transposing data—for example, flipping data that’s laid out in a vertical manner and so that it’s displayed horizontally (or vice versa). Both of these functions are, fortunately, built in to the Power Query user interface and can aid in transforming data into useful tables.

Unpivoting Subcategorized Tables

The first complex issue we will deal with in this chapter is unpivoting a subcategorized table. While we looked at unpivoting in Chapter 6, this scenario, based on the figure below, adds an additional dimension to the issue.

Figure 177 A challenging unpivot scenario

What makes this problem any more challenging than a standard unpivot? It’s the additional level of having the month with the measure type.

When you unpivot, the data is de-aggregated, using the column names as data points. But what should the column names be here? If you promote the second row to headers, you will lose some key information about the month. It’s just not going to work.

But there is most certainly a way to unpivot this data; you’ll just need to use a bit of imagination and some more advanced tricks.

Loading the Data

For this example, you will unpivot the financial statement found in Ch15 Examples\UnPivot Sub Categories.xlsx:

Figure 178 A challenging data set to unpivot.

You begin by loading the data into Power Query. You can do that by setting up a named range, as you don’t really want to lock down the column headers with a table. Follow these steps:

Power Query opens, showing you a somewhat ugly yet informative view of the data:

Figure 179 The data set is rife with null values.

There are some key points to notice in this data set:

Now you have to figure out what you are going to do with this. You can start by looking at Column1.

Filling Vertically

You can’t get rid of Column1 because it contains information that tells if the account is a revenue classification or an expense classification. The problem is that the account classification doesn’t exist in the same row as the account description and values. This means you need to find a way to get the classification to fill down into the null areas.

As it happens, this is exactly how Power Query’s Fill Down command works:

Now you’ve got your classifications on each row where required:

Figure 180 The account classification has been filled down.

Power Query’s Fill Down command fills the value above into any cell that is null, but it does not overwrite any other data. The Fill Up command works the same way but fills up instead of down.

Note: A blank cell is not the same as a null. If you need to fill into blank areas, you must replace the data on the column, asking to replace an empty value (leave the field blank) with null. Power Query will convert the text-based null to null.

You now need a strategy to deal with the column headers. They seem to have the same issue, with April being in Column3 and missing from Column4 and Column5. Unfortunately, there is no Fill Right command in Power Query. So how do you deal with this?

Transposing Data

In order to fill data to the right in Power Query, you must transpose the table, flipping it on its ear:

The view changes drastically, with the former column headers now becoming rows and vice versa:

Figure 181 The result of transposing the table.

This opens up a couple of useful features. One of the issues you had earlier is that you couldn’t fill right, but now you can fill down on Column1. In addition, the main challenge with unpivoting subcategorized data is that you have to unpivot based on the column names, which means you can’t use two rows of column names to perform the task. But what if you were to temporarily merge Column1 and Column2, resulting in a single header? Try it to see what happens:

Note: The reason you use a | character is that this character is seldom found in the average data set. This means you can use this character later, confident that you’re not going to accidentally pick up a character that already existed in the data.

You now have a single column of labels that you can use for headers:

Figure 182 Merging the categories and subcategories into a single column.

Now you can flip the data back and promote the new headers:

Figure 183 You’re getting closer!

Unpivoting the Data

The rest of the process is fairly straightforward and standard for an unpivot scenario:

Now you have an unpivoted list:

Figure 184 The list is finally unpivoted.

Finally, you need to split the Attribute column back into its respective pieces and clean up the column headers:

The query is now finished, unpivoted, and ready to load:

Figure 185 The subcategorized list, fully unpivoted.

Method Recap

As you’ve seen in this chapter, the process for unpivoting subcategories involves the following eight-step process:

1. Load the data.

2. Transpose the data.

3. Fill the major heading into null areas.

4. Merge the major headings and subheadings into a single column.

5. Transpose the data back.

6. Promote the consolidated headings into header rows.

7. Unpivot the data.

8. Split the consolidated headings back into their individual components.

This technique is not limited to only two levels of headings. The secret is to combine all the headings and subheadings into one text string that can be promoted to headers. As long as you’ve done that, you can unpivot a dozen levels of subheadings if needed.

Replicating the Original Data

With the data completely unpivoted, the original data table can quickly be rebuilt using a PivotTable with the following configuration:

However, in order to rebuild the PivotTable so it’s laid out consistently with the original data, you also need to make the following modifications:

At this point, you should have replicated the original report so that it looks as shown below:

Figure 186 The original data, reconstructed via a PivotTable.

Transposing Stacked Tables

Another complicated transformation scenario is one where the details for a transaction are stacked vertically on top of each other, with blank rows separating the transactions:

Figure 187 Data where repeating the transpose operation is necessary.

In this scenario, you essentially need to transpose each block of data, then stack the blocks into one tall table. The issue, however, is that Power Query doesn’t really have an out-of-the-box command to do this. If you transpose the table, you’ll get one row of data and a large number of columns rather than getting each record transposed individually the way you need.

Again, getting the outcome you want is entirely possible, but it requires a little creative thinking and some tricks.

Loading the Data

In this example, you will transpose the list of credit card transactions found in Ch15 Examples\Transpose Stacked Tables.xlsx:

Figure 188 A stacked list of transactions in a single column.

Before you can do anything else, you need to get the data into Power Query:

The data is now ready for you, and it’s time to figure out how to approach it.

Creating a Modulo Column

Rather than try to transpose the records at this point, you’re going to take a different approach. You’re going to number the records, and you’re going to do it twice. To make this happen:

At this point, you have a numeric list of the transactions, starting from 0:

Figure 189 Each line of the file now has an index number.

You currently have a number indicating the numeric data line in the file. What we actually want is a number as it corresponds to each line of the transaction. In other words, the transaction date should always be 0, the vendor 1, and so on. To get that, you need to add a modulo column.

Before you do this, you need a key piece of information from the table: You need to know the value in the first row of the second record. In this case, the value you are looking for is 6, as shown in the Index column. (Power Query starts counting from 0, not 1.) That number is going to be the factor for the modulo calculation.

The modulo essentially takes a value, subtracts the closest multiple of the factor from the value provided, and returns the remainder. So if the value is 8, the modulo subtracts 6 and returns the remainder, which is 2. Try it out:

You now have a list of the transaction line numbers!

Figure 190 You now have a transaction line number for each item.

This looks interesting, but why set this up in the first place? The answer lies in the solution to this data set.

Pivoting the Data

When you have a list of repeating transaction numbers as you do in this example, you can pivot them. It will look a little ugly at first, but you’ll see how incredible this truly is.

Now comes the tricky part: The pivot needs to be set up correctly.

By default, when you ask to pivot a column, the values in that column will be used as the column headers. Power Query then asks which values you want in the values (or body) area of the pivoted data set. The trick here is that pivoting data is usually done to summarize the values, and you aren’t interested in doing that. You want the original values, so follow these steps:

The table changes shape and fills with null values everywhere:

Figure 191 The pivoted table, full of null values.

This is actually amazing.

Filling in the Blanks

The next step is to fill in the holes in the pivoted data set:

The transactions now cascade up to fill in the blanks:

Figure 192 The data, filled up.

Believe it or not, you’re very close to finished here:

You now have a data set that has been fully transposed from a single column of stacked records:

Figure 193 The transformation from stacked tables to useful data is almost complete.

The final thing to do to this query is to provide better column names and set the data types. As soon as that’s done, you’re finished!

The data is finally transformed:

Figure 194 The transformed data set.

Method Recap

This is the process for transposing and pivoting a single column of data:

1. Load the data.

2. Add an Index column from 0.

3. Add a modulo column, using the value that indicates the first value in the second data set.

4. Pivot the modulo column using the text as values with aggregation.

5. Fill the data up for every column after the initial one.

6. Filter out null rows in the first data column.

7. Do the final cleanup.

With practice, this process actually becomes quite quick, which is a good thing. You’ll be amazed how many times you need to do this in the real world.