Introduction: A New Revolution
Whether we are performing basic data entry, building simple reports, or designing full-blown business intelligence solutions using VBA, SQL, and other languages, we Excel pros all deal with data to a certain extent. Our skill sets vary greatly, but the overall jobs we are usually trying to perform include:
We may get tagged with the name “data monkey,” but we are actually information workers. But no matter what we call ourselves in our formal job descriptions, our role is to clean up data and turn it into information. Our jobs may not be glorious, but they are essential, and without our work done correctly, the end results of any analysis are suspect.
While Excel has an amazing toolset to help us build business intelligence out of data, converting raw data into consumable data has been a challenge for years. In fact, it’s this issue that we often spend most of our time on—prepping data for analysis and getting it into a nice tabular format to expose Excel’s most powerful analytical and reporting tools.
Figure 1 Behind the curtains, we are all information workers trying to reach our desired goal with data.
Despite the moniker “data monkey,” we information workers are often more like data magicians. Our data seldom enters our world in a ready-to-consume format; it can take hours of cleaning, filtering, and reshaping to get things ready to go.
Once our data is prepared and ready, we can perform a vast array of powerful analytical processes with ease. Tools including conditional formatting, filters, pivot tables, charts, slicers, and more let us work magic and impress our audience.
But getting the data prepped and ready is the hard part. We’re served dirty data, held in collections of text and Excel files (maybe a database, if we’re very lucky), and we somehow have to clean it up and get it ready to use. Our end goal is simple: Get the data into an Excel table as quickly as possible, while making sure it is scoped to our needs and accurate. And every solution needs a different combination of data coming from different sources . . . which takes magic.
Figure 2 Black magic is what really happens to data before consumption.
The Benefits and Dangers of Black Magic
The true wizards of Excel use many different techniques to make their magic happen—sometimes on their own and sometimes in combination. These types of magic include:
All these tools have something in common: For many years, they were essentially the only tools available for cleaning and transforming data into something useful. Despite their usefulness, many of these tools also have two serious weaknesses: They require time to build a solution and time to master the techniques.
While it’s true that truly savvy magicians can use these tools to build solutions to automate and import raw data in a clean format, this takes years of learning advanced languages as well as a significant amount of time scoping, developing, testing, and maintaining the solutions. Depending on the complexity of the solutions built, fixing the solutions for a minor change in the import format or extending them to embrace another source could be horrendous.
One hidden danger of having a true wizard in a company is that the person may build an incredible solution that works until long after he or she has left the company. At some point, though, others at the company realize that they don’t understand the solution and don’t have anyone to fix it when it eventually breaks.
On the flip side, many people tasked with this data cleanup didn’t have time or opportunity to learn these advanced magic techniques. And while we could say that maybe they’re better off never having a massive system collapse without anyone to fix it, instead they waste hours, days, weeks, months, and years of labor time and money performing repetitive data cleanup and imports on a regular basis.
Take a moment and think about how many hours are consumed on a monthly basis in your company simply performing repetitive data import and cleanup tasks in Excel. Multiply those hours by the average wage rate in your company . . . and by the number of companies in your industry worldwide and . . . you get the idea. The cost of productivity in this area is staggering.
Enter a product that tackles all these problems—one that is easy to learn and that others can pick up and understand with limited instruction. It’s a product that lets you automate the import and cleanup of data, so you can focus on turning that data into information, adding true value to your company. That product is called Power Query.
The Future Transforms
Power Query solves the problems related to the toolsets just described. It is very easy to learn and has one of the most intuitive user interfaces we’ve ever worked with. It’s easy to maintain, as it shows each step of the process, which you can review or update later. And everything done in Power Query can be refreshed with a couple of clicks.
We have spent years building solutions using black magic techniques, and we see Power Query as a game changer for many reasons. One of those is the speed with which it can be learned.
When it comes to importing, cleaning, and transforming data to get it ready for analysis, you can learn Power Query faster than you can learn Excel formulas, and it handles complex sources much more easily than VBA.
Figure 3 Power Query was designed to be an easy-to-use data transformation and manipulation tool.
Its ease of use makes Power Query the answer to the vanishing data magician problem that many businesses face. Even if a modern-day magician builds something complex in Power Query, you can have someone up to speed and able to maintain or fix the query with minimal training—we’re talking hours, not weeks.
As hard as it is for true Excel pros to understand, many users actually don’t want to master Excel formulas. They simply want to open up a tool, connect it to their data source, click a few buttons to clean it up and import it, and build the chart or report they need. It’s for exactly this reason that Power Query can reach even further than formulas. With the menu-driven interface, in many cases a user can avoid ever having to learn a single formula or line of code.
Figure 4 Power Query’s ease of use will impact more users than any of the classic methods.
There is no doubt in our minds that Power Query will change the way Excel pros work with data forever.
We want to make it quite clear that we are not discounting the value of formulas, VBA, or SQL. In fact, we couldn’t live without those tools. You can quickly knock out formulas to do many things outside the transformation context that Power Query will never do. VBA has a far greater reach in sheer capability and power, allowing you to reach to other applications, create programs to pull and push data, and so many other things. And a SQL query written by a SQL wizard will always be faster and better than one created by Power Query.
In the context of simply connecting to, cleaning, and importing data, however, Power Query offers more for less, allowing you to automate the job more quickly and in less time.
The good news for true wizards of data is that Power Query is yet another tool that you have access to. You can provide your own SQL queries if needed, refresh them with VBA when desired, load your Power Query–created queries directly to Power Pivot, and much more.
Why Power Query Is Magic
The number-one issue Excel pros face when building robust and stable solutions has been accessing, cleaning, and transforming the data. What we’ve needed, and yet many of us have never heard of, is an ETL tool—that is, a tool for extracting, transforming, and loading data.
Figure 5 ETL: extract, transform, load.
Power Query is an ETL tool; its function is to extract data from almost any source, transform it as desired, and then load it. But what does that truly mean to us as Excel pros?
Extract
Extraction can be targeted against one or more data sources, including text files, CSV files, Excel files, databases, and web pages. In addition, the Power Query team has built many connectors to data sources that have otherwise been tough to get at—Microsoft Exchange, Facebook, Salesforce, and other Software-as-a-Service (SaaS) sources.
Transform
When we talk about transformation, we include each of the following areas:
Power Query allows you to perform many transformations through menu commands rather than having to write formulas or code to do them. This tool was built for Excel pros, and with no coding experience whatsoever, you can use Power Query to perform transformations that would be incredibly complex in SQL or VBA. That’s a great thing!
If you’re the type of person who likes to get under the covers and tinker with formulas or code, however, you can. While there is no requirement to ever learn it, Power Query records everything in a language called M. (Languages A through L were taken.) And if you’re a wizard who decides to take advantage of this language, you can build even more efficient queries and do even more amazing things than without it.
Load
With Power Query you can load data into one of four places:
The last point might seem a bit mysterious, but it simply means that you can create a query that can be used by other queries. This allows for some very interesting use cases that we’ll explore more fully in the book.
While it’s interesting to look at where the data loads, that really isn’t the important part of the loading process in this ETL tool. It’s how it loads or rather how to load it again.
Power Query is essentially a macro recorder that keeps track of every bit of the extract and transform steps. You can define a query once and determine where you’d like to load it. After you’ve done that, you can simply refresh your query to run it again.
Figure 6 Define the transformation process once and consume anytime.
Consider this for a moment: You need a particular TXT file, and it takes you 20 minutes to import and clean it before you can use it. Power Query enables you to accomplish the same task in 10 minutes, which saves you 10 minutes the first time you use it. Then next month comes along, and you need the new version of the same TXT file. Without Power Query, you have to roll up your sleeves and relive the 20 minutes of Excel exuberance where you show Excel that you’re a master at reliving the past, performing those exhilarating steps over and over again each month. Wait . . . you don’t find that exhilarating? In that case, just save your new TXT file over the old one, go into Excel, and click Data → Refresh All. You’re finished. Seriously.
This is where you see the real power of Power Query. It’s easy to use, and it’s also easy to reuse. It changes your hard work into an investment and frees up your time during the next cycle to do something worthwhile.
Power Query Versions
Before we tell you where to get Power Query, let’s talk about the updates. Yes, that may seem like putting the cart before the horse, but there is a pretty solid reason for this.
The Update Cycle
The Power Query team releases monthly updates. We’re not talking bug fixes (although those are certainly included); we’re talking new features and performance enhancements. While some are small, others are much larger. In February 2014 the team added the ability to connect to Microsoft Exchange as a data source. In early 2015 the team released an update that cut query load time by 30%. In July 2015 the team released an update that solved some very serious issues with refreshing to Power Pivot.
Are there risks involved in installing the latest updates as soon as they become available? Sure there are. Bugs happen, particularly in complex software. But the reality is that the Power Query team works very hard to address serious bugs in the software. If you’re particularly concerned, download the installer and save it rather than installing directly from the web. This will allow you to roll back if the need ever arises.
If you currently have Power Query installed, make sure you update it. This book was written using version 2.24, released in July 2015, and you should be on at least this update.
Where Do I Get Power Query?
The answer depends on the version of Excel that you have:
Note: Even though Power Query handles data sourcing for Power BI Desktop, this book is written by Excel pros for Excel pros. Every solution in the pages of this book is illustrated with Excel.
How to Use This Book
This book is intended to be your number-one resource for understanding Power Query and the M language from a practical point of view as an Excel pro. Our goal is to address Excel problems that Excel pros commonly face and show you how to use Power Query to solve them. We also cover some more advanced scenarios as well, incorporating Power Query and M best practices throughout, to help you understand not only how to build Power Query solutions but how to make them last.
After working with Power Query for a long time, we’ve come up with a method that we believe is the optimal way to teach how to use this incredible tool. It’s one that we’ve tested and refined in our online http://powerquery.training workshops, and it involves carefully layering techniques that build on each other. The learning map is shown below.
Figure 7 An optimal learning path to master Power Query and the M language.
Where to Find the Power Query Commands
Power Query was initially released after Excel 2013 as a free add-in for Excel, and it was (rather shockingly) backward compatible with Excel 2010. In both of those versions, a unique ribbon tab holds all the Power Query commands.
Because Power Query is so incredibly useful, it only made sense to integrate the tool into Excel 2016. Due to concerns about bloating the user interface, though, Power Query was not given its own ribbon tab in Excel 2016 but was instead squished down into the Get & Transform group on the Data tab. The image below shows where Power Query is accessible in each application.
Figure 8 Locating Power Query in Excel 2010, Excel 2013, Excel 2016, and Power BI Desktop.
Since the routes to access the Power Query commands are different in different versions of Excel, we have settled on using the following command structure to describe how to get started for each query:
When you see this structure, you need to interpret it as follows:
Figure 9 Creating a query from a CSV file in Excel 2016 vs Excel 2010/2013.
So let’s put this to the test. If we want you to grab data from an Azure SQL Database, our directions read:
The process for Excel 2016 and for Excel 2013 would look as shown below.
Figure 10 Creating a query from an Azure SQL database in Excel 2016 vs Excel 2010/2013.
Before you read any further, we highly recommend that you download all the files used in this book so you can follow along with us. You can get them all at http://www.powerquery.training/book-files/.
It’s time to explore this amazing tool in depth. Let’s get started.
Special Elements
Notes will appear in an indented green paragraph. These delicious paragraphs point out special features, quirks, or software tricks that will help increase your productivity with Power Query.
Warnings appear in a yellow shaded paragraph. Pay special attention to the caution boxes as they can cause you to slip up. We want to prevent you from heading down a path that will make the query experience problems in the future.