Chapter 13 Loading Data from Exchange
In the past, trying to get data out of Microsoft Exchange was an absolute nightmare. It involved complex VBA scripts, significant development, and painful maintenance. The unfortunate part of this was that there is a significant amount of information buried in your email and calendar that can add great business value to you, but with it being so difficult to get at, it was a nonstarter. Power Query has changed that, allowing an easy way to connect to your Exchange database and start pulling out key drivers that can impact your business intelligence solutions.
Accessible Data
There are five main aspects that you can connect to in your Exchange database:
Each of these can be pulled in to Power Query, cleaned and turned into a table, and then served up to a table or the Power Pivot Data Model.
Potential Use Cases
The most difficult question to wrap your head around related to Exchange as a data source is not how to connect to it but rather why you would want to. The main reason for this is that you’ve never had the ability, so you haven’t thought about it much. Consider the following scenarios and how connection to your Exchange server could be used in your organization.
With Power Query’s ability to read email, you could create a simple survey with a table in the body and send it out to your audience. You could then use Power Query to read all the replies, extract the information from the individual tables, and combine the tables into a data source. Better yet, you don’t have to worry if someone sends in a copy late, as you can just refresh the data source to pull it in.
It’s a common practice for accountants to send Excel workbooks to key personnel and ask them to fill in budget information and return the spreadsheet via email. What if you could then use Power Query to scan for all emails with a certain subject and/or attachment and then open and combine each attachment and extract the data? This is entirely possible with Power Query.
Building an IT Alert Dashboard
In IT, we often get numerous email alerts from various servers and pieces of equipment. Some are informational and some critical, but depending on the volume of equipment you have and the frequency of alerts, they can quickly turn into information overload. Oftentimes these emails are automatically routed to a folder and ignored. But think about this: You can now use Power Query to aggregate the emails, chart the volume and severity, and build an entire systems alert dashboard from previously useless email.
Endless Possibilities
From extracting and matching contact details to appointments and emails, the real question today is not if you can use this feature but how. Maybe you need a dashboard of interaction you’ve had with a client, or you need to track who has replied to a critical email you sent out. There is a ton of valuable information sitting in your inbox that you now have access to in a way you never did before.
Building an Email Dashboard
For this example, you’ll take the role of a website owner to see how previously useless email can be turned into something interesting. While you won’t be able to replicate the steps in this chapter yourself (they are specific to the website owner’s inbox), you’ll see the process used to strip key information out of previously unread emails to drive a dashboard.
To begin with, you need to connect to Microsoft Exchange. This is done via the following process:
Figure 158 Trusting the Autodiscover Service.
Note: This service is configured by your IT department and needs to be trusted in order to access the Exchange server. It is typically already installed and configured to allow cell phones to access their email remotely.
After a short delay while Power Query connects to the Exchange server, you are taken to a Navigator window.
Warning: Don’t ever click the Load button after choosing the Mail table in the Navigator. You’ll be waiting for a long time as it loads your entire inbox!
Power Query launches and provides you with a preview of your inbox:
Figure 159 A list of emails in your inbox.
In order to reduce the number of emails to search through, you routed the relevant emails to a folder via a rule. This means that you need to filter the Folder Path column in order to locate only the emails that have been routed to that specific folder. Interestingly, when you attempt to set the filter, the relevant folder isn’t listed:
Figure 160 Locating the appropriate folder.
If you ever find that the values you’re sure should be in the column just aren’t there, look for the Load More option at the bottom right. By default, this list is shortened to the limited amount of data showing in the preview, but clicking the Load More button brings up all the folders in the inbox.
When the list appears in full, you select the correct folder, and the list of emails is filtered to show only those of interest—the earned commission emails from a specific affiliate source:
Figure 161 A list of emails with exciting subject lines.
While they don’t show in the image, a huge variety of columns are included with email, including such useful fields as To, CC, Sender, Attachments, and more. For the purposes of this dashboard, however, only the DateTimeSent and Body fields are required. For that reason, you select those two columns and use the Remove Other Columns command to remove the extraneous data:
Figure 162 A column full of records.
The interesting part of the Body column is that it is full of records, which you can think of as the individual cells of a table. As it happens, these records contain some very interesting information in the body of the email. You can also see the expansion arrow at the top-right corner of the Body column, which means it is possible to expand these records into a column. Doing so gives you another option: the choice between TextBody and HTMLBody. Since working with text is far easier than working with HTML, you will usually elect to expand TextBody only and then clean and trim the resulting records.
The result is a consistent data set, which can then be further cleaned:
Figure 163 The body of all matching emails.
At this point, the email body can be pared down the email to get at the only part of real interest: the money.
One of the great things about Power Query is the preview window that appears when you select any cell in the Power Query grid. Not only can you see the data, you can select the data in that window and copy it. Rather than split by the $ sign and have to remove a column afterward or type a huge amount of text in the replace window, why not just replace the leading text with nothing? To do this, follow these steps:
The results are shown below, with the money at the left side of the column:
Figure 164 The TextBody column now starts with the important part—the money.
Next, you need to get rid of the portion after the money. The reality, though, is that you aren’t sure if the rest of the email is consistent, as you lost interest and stopped reading after you saw the dollar signs. What you do know is that after each dollar value there is a space. So you just split based on the leftmost space in that column and remove the subsequent column.
After setting the data types and renaming the columns, you are left with data that looks much better:
Figure 165 Date sourced from email, cleaned and ready to be used.
The final step for this data source is to load it to Excel and build a PivotChart from it:
Figure 166 A profit-tracking chart built from previously useless email.
Consider the impact here where you’ve been collecting these emails for the better part of two years and never reading them. Suddenly they now have business value. These emails can be used as part of an email dashboard to review the profitability of one of the income streams that supports your website.
Best Practices for Exchange Sources
There are two main issues to consider when building a solution against a Microsoft Exchange service: portability and performance.
The very first thing you need to take into account is how long this solution is intended to survive and whether we will need to pass it on to another user. Why? Because you are targeting your own email account. Building a solution against your email account locks it down so that only you can use it. It can’t be shared, and it’s difficult to port to others later, as the source files for the solution (emails, attachments, and so on) are in your inbox.
The reality is that if you leave the organization, voluntarily or otherwise, the solution will most likely become useless to others. And if you stay and get promoted, you’ll be stuck updating the solution that is bound to your email account.
If you intend to use the solution for a longer period, or it serves other users, it may be best to try to figure out how to extract the data from your inbox first and build the solution against that source. For email attachments, that is fairly easy: Just save them to a folder. For the email itself, it may not be practical, and maybe Power Query is the method to use to extract the emails and archive them in another format. Regardless, you should definitely give some thought to a succession plan for your solution once it starts to get reused.
Performance
The other major consideration you should take into account is performance, including speed.
Let’s face it: Most of us don’t keep extremely organized email inboxes. Instead, they are huge buckets of email spanning back sometimes months but more often years. As a result, retrieving data from Exchange can be very slow.
The solution to this is to use rules to reroute key emails into subfolders. When the subfolders contain only relevant data, retrieving, transforming, and loading the finished product will happen much more quickly.