4- Loading Data Into PowerPivot

No Wizards Were Harmed in the Creation of this Chapter

I don’t intend to instruct you on how to use the import wizards in this chapter. They are mostly self-explanatory and there is plenty of existing literature on them. Instead I want to share with you the things I have learned about data import over time.

Think of this chapter as primarily “all the things I learned the hard way about data import.”

That said, all chapters need to start somewhere, so let’s cover a few fundamentals…

Everything Must “Land” in the PowerPivot Window

As I hinted in previous chapters, all of your relevant data MUST be loaded into the PowerPivot window rather than into normal Excel worksheets. But this is no more difficult than importing data into Excel has ever been. It’s probably easier in fact.

Launching the PowerPivot Window

The PowerPivot window is accessible via this button on the PowerPivot ribbon tab in Excel:

DAXCh04-1.png

Figure 6 This button launches the PowerPivot window

If the PowerPivot ribbon tab does not appear for you, the PowerPivot addin is either not installed or not enabled.

One Sheet Tab = One Table

Every table of data you load into PowerPivot gets its own sheet tab. So if you import three different tables of data, you will end up with something like this:

DAXCh04-2.png

Figure 7 Three tables loaded into PowerPivot. Each gets its own sheet tab.

You Cannot Edit Cells in the PowerPivot Window

That’s right, the PowerPivot sheets are read-only. You can’t just select a cell and start typing.

You can delete or rename entire sheet tabs and columns, and you can add calculated columns, but you cannot modify cells of data, ever.

Does that sound bad? Actually, it’s a good thing. It makes the data more trustworthy, but even more importantly, it forces you to do things in a way that saves you a lot of time later.

Everything in the PowerPivot Window Gets Saved into the Same XLSX File

DAXCh04-3.png

Figure 8 Both windows’ contents are saved into the same file, regardless of which window you save from

Each instance of the PowerPivot window is tightly “bound” to the XLSX (or XLSM/XLSB) you had open when you clicked the PowerPivot Window button in Excel. You can have three XLSX workbooks open at one time, for instance, and three different PowerPivot windows open, but the contents of each PowerPivot window are only available to (and saved into) its original XLSX.

Many Different Sources

PowerPivot can “eat” data from a very wide variety of sources, including the following:

From normal Excel sheets in the current workbook

From the clipboard – any copy/pasted data that is in the shape of a table, even tables from Word for instance

From text files – CSV, tab delimited, etc.

From databases - like Access and SQL Server, but also Oracle, DB2, MySQL, etc.

From SharePoint lists

From MS SQL Server Reporting Services (SSRS) reports

From cloud sources like Azure DataMarket and SQL Azure

From so-called “data feeds”

So there is literally something for everyone. I have been impressed by PowerPivot’s flexibility in terms of “eating” data from different sources, and have always found a way to load the data I need.

For each of the methods above, I will offer a brief description and my advice.

Linked Tables (Data Source Type)

If you have a table of data in Excel like this:

DAXCh04-4.png

Figure 9 Just a normal table of data in a normal Excel sheet

You can quickly grab it into PowerPivot by using the “Create Linked Table” button on the PowerPivot ribbon tab:

DAXCh04-5.png

Figure 10 This will duplicate the selected Excel table into the PowerPivot window

Advantages

This is the quickest way to get a table from Excel into PowerPivot

If you edit the data in Excel – change cells, add rows, etc. – PowerPivot will pick those changes up. So this is a sneaky way to work around the “cannot edit in PowerPivot window” limitation.

If you add columns, those will also be picked up. I call this out specifically because Copy/Paste (below) does not do this, and I frequently find myself wishing I had used Link rather than Copy/Paste for that reason.

Limitations

You cannot link a table in Workbook A to the PowerPivot window from Workbook B. This only creates a linked table in the PowerPivot window “tied” to the XLSX where the table currently resides.

This is not a good way to load large amounts of data into PowerPivot. A couple thousand rows is fine. But ten thousand rows or more may cause you trouble and grind your computer to a halt.

By default, PowerPivot will update its copy of this table every time you leave the PowerPivot window and come back to it. That happens whether you changed anything in Excel or not, and leads to a delay while PowerPivot re-loads the same data.

Linked Tables cannot be scheduled for auto-refresh on a PowerPivot server. They can only be updated on the desktop. (This is true for PowerPivot v1 and v2. I believe this is no longer true in 2013 but have not yet tried it myself).

You cannot subsequently change over to a different source type – this really isn’t a limitation specifically of linked tables. This is true of every source type in this list: whatever type of data source is used to create a table, that table cannot later be changed over to use another type of data source. So if you create a PowerPivot table via Linked Table, you cannot change it in the future to be sourced from a text file, database, or any other source. You will need to delete the table and re-create it from the new source.

It is often very tempting to start building a PowerPivot workbook from an “informal” source like Linked Tables or Copy/Paste, with a plan to switch over and connect the workbook to a more robust source (like a database) later. Resist this temptation whenever possible! If you plan to use a database later, load data from your informal source (like Excel) into that database and then import it from there. The extra step now will save you loads of time later.

Tips and Other Notes

To work around the “large data” problem, I often save a worksheet as CSV (comma separated values) and then import that CSV file into PowerPivot. I have imported CSV files with more than 10 million rows in the past.

To avoid the delay every time you return to the PowerPivot window, I highly recommend changing this setting in the PowerPivot window to “Manual”

DAXCh04-6.png

Figure 11 Change the Update Mode to Manual

Pasting Data Into PowerPivot (Data Source Type)

If you copy a table-shaped batch of data onto the Windows clipboard, this button in the PowerPivot window will light up:

DAXCh04-7.png

Figure 12 This button could have been named “Paste as New Table”

Advantages

You can paste from any table-shaped source and are not limited to using just Excel (unlike Linked Tables)

You can paste from other workbooks and are not limited to the same workbook as your PowerPivot window

Pasted tables support both “Paste/Replace” and “Paste/Append” as shown by the buttons below:

DAXCh04-8.png

Figure 13 These paste methods can come in handy

Limitations

Suffers from the same “large data set” drawback as Linked Tables.

You can never paste in an additional column. Once a table has been pasted, its columns are fixed. You can add a calculated column but can never change your mind and add that column you thought you omitted the first time you pasted. This becomes more of a drawback than you might expect.

Not all apparently table-shaped sources are truly table-shaped. Tables on web pages are notorious for this. Sometimes you are lucky and sometimes you are not.

Cannot be switched to another data source type (true of all data source types).

Importing From Text Files (Data Source Type)

DAXCh04-9.png

Figure 14 The text import button in the PowerPivot window

Advantages

Can handle nearly limitless data volumes

You can add new columns later (if you are a little careful about it, see below)

Text files can be located anywhere on your hard drive or even on network drives (but not on websites, at least not in my experience). So some backend process might update a text file every night in a fixed location (and filename), for example, and all you have to do is refresh the PowerPivot workbook the next day to pick up the new data.

Can be switched to point at a different text file, but still cannot be switched to an entirely different source type (like database).

Limitations

No reliable column names – unlike in a database, text files are not robust with regard to column names. If the order of columns in a CSV file gets changed, that will likely confuse PowerPivot on the next refresh.

Cannot be switched to another data source type (true of all data source types).

Databases (Data Source Type)

DAXCh04-10.png

Figure 15 The Database import button in the PowerPivot window

Advantages

Can handle nearly limitless data volumes

You can add new columns later

Can be switched to point at a different server, database, table, view, or query. Lots of “re-pointability” here, but you still can’t switch to another data source type.

Databases are a great place to add calculated columns. There are some significant advantages to building calculated columns in the database, and then importing them, rather than writing the calculated columns in PowerPivot itself. This is particularly true when your tables are quite large. We will talk about this later in this chapter.

PowerPivot really shines when paired with a good database. There is just an incredible amount of flexibility available when your data is coming from a database. More on this in the following two links.

If you are already curious, you can read the following posts about why PowerPivot is even better when “fed” from a database:

http://ppvt.pro/DBpart1

http://ppvt.pro/DBpart2

Limitations

Not always an option. Hey, not everyone has a SQL Server at their disposal, and/or not everyone knows how to work with databases.

Cannot switch between database types. A table sourced from Access cannot later be switched over and pointed to SQL Server. So in reality, these are separate data source types, but they are similar enough that I did not want to add a completely separate section for each.

Cannot be switched to another data source type (true of all data source types).

Less Common Data Source Types

SharePoint Lists

These are great when you have a data source that is maintained and edited by human beings, especially if more than one person shares that editing duty. But if your company does not use SharePoint, this isn’t terribly relevant to you.

Only SharePoint 2010 and above can be used as a PowerPivot data source.

The Great PowerPivot FAQ is an example of a public SharePoint list, where myself and others from the community can record the answers to frequently-asked questions, which are then shared with the world. It is located here:

http://ppvt.pro/TheFAQ

Reporting Services (SSRS) Reports

This is another example of “if your company already uses it, it’s a great data source,” but otherwise, not relevant.

Only SSRS 2008 R2 and above can be used as a PowerPivot data source.

Cloud Sources Like Azure DataMarket and SQL Azure

Folks, I am a huge, huge, HUGE fan of Azure DataMarket, and they improve it every day. Would you like to cross-reference your sales data with historical weather data for every single store location over the past three years? That data is now easily within reach. International exchange rate data? Yep, that too. Or maybe historical gas prices? Stock prices? Yes and yes. There are thousands of such sources available on DataMarket.

I don’t remotely have space here to gush about DataMarket, so I will point you to a few posts that explain what it is, how it works, and why I think it is a huge part of our future as Excel Pros. In the second post I explain how you can get 10,000 days of free weather data:

http://ppvt.pro/DataMktTruth

http://ppvt.pro/DataMktWeather

http://ppvt.pro/UltDate

SQL Azure is another one of those “if you are using it, it’s relevant, otherwise, let’s move on” sources. But like DataMarket, I think most of us will be encountering SQL Azure in our lives as Excel Pros over the next few years.

“Data Feeds”

Data Feeds are essentially a way in which a programmer can easily write an “adapter” that makes a particular data source available such that PowerPivot can pull data from it.

In fact, SharePoint and SSRS (and maybe DataMarket too, I forget) are exposed to PowerPivot via the Data Feed protocol – that is how that source types were enabled “under the hood.”

So I am mentioning this here in case your company has some sort of custom internal server application and you want to expose its data to PowerPivot. The quickest way to do that may be to expose that application’s data as a data feed, as long as you have a programmer available to do the work.

For more on the data feed protocol, which is also known as OData, see:

http://www.odata.org/

Other Important Features and Tips

Renaming up front – VERY Important!

The names of tables and columns are going to be used everywhere in your formulas. And PowerPivot does NOT “auto-fix” formulas when you rename a table or column! So if you decide to rename things later, you may have a lot of manual formula fixup to do.

And besides, bad table and column names in formulas just make things harder to read. So it’s worth investing a few minutes up front to fix things up.

I strongly recommend that you get into the habit of “import data, then immediately rename before doing anything else.” It has become a reflex for me. Don’t be the person whose formulas reference things like “Column1” and “Table1” ok?

Don’t import more columns than you need

I will explain why in a subsequent chapter, but for now just follow this simple rule:

If you don’t expect to use a column in your reports or formulas, don’t import it. You can always come back and add it later if needed, unless you are using Copy/Paste.

Table Properties Button

This is a very important button, but it is hiding on the second ribbon tab in the PowerPivot window:

DAXCh04-11.png

Figure 16 For all data source types other than Linked Tables and Copy/Paste, you will need this button

This button is what allows you to modify the query behind an existing table. So it’s gonna be pretty important to you at some point. I know someone who used PowerPivot for two months before realizing that there was a second ribbon tab!

When you click it, it returns you to one of the dialogs you saw in the original import sequence:

DAXCh04-14.png

Figure 17 Here you can select columns that you originally omitted, or even switch to using a different table, query, or view in a database

Table Properties button. Don’t leave home without it.

Existing Connections Button

Also located on the second (“Design”) ribbon tab:

DAXCh04-12.png

Figure 18 This button will also come in handy

Clicking this brings up a list of all connections previously established in the current workbook:

DAXCh04-13New.png

Figure 19 List of connections established in the current workbook

This dialog is important for two reasons:

1. The Edit button lets you modify existing connections. In the screenshot above, you see a path to an Access database. If I want to point to a different Access database, I would click Edit here. Same thing if I want to point to a different text file, or if I want to point to a different SQL Server database, etc.

2. The Open button lets you quickly import a new table from that existing connection. I highly recommend doing this rather than starting over from the “From Database” button on the first ribbon tab. You get to skip the first few screens of the wizard this way, AND you don’t litter your workbook with a million connections pointing to the same exact source.