8 Working with External Data

External data is any data from outside Excel. You may get external data from a SQL Server database, an Oracle data set, an Access database, or a Sybase database, for example. If Excel can connect to such an external source, you can pull that data into a Table. Excel offers a number of functions to query and manage data from external sources.

Why External Data Is Important

Acquiring and housing massive quantities of data has become an important part of the modern business landscape, and analyzing such big data has become increasingly challenging. To address this trend, Microsoft continues to add more powerful data querying and analysis tools to Excel. The latest addition is a suite of what Microsoft calls BI (business intelligence) tools to pull, aggregate, and transform very large data sets. These tools allow Excel users to present that data with traditional tools such as Tables, PivotTables, and charts to effectively tell a story.

The majority of this book to this point has focused on how to use Tables within the scope of a workbook. This chapter discusses how Excel extends the reach of Tables to data originating outside the workbook. It presents three scenarios covering three different kinds of sources: external databases, text files, and the Azure Marketplace. The data in these sources can be from any entity capable of producing data in a tabular form: from point-of-sale systems, accounting and finance departments, or other business sources anywhere there is Internet connectivity.

How Excel Exposes External Data Connections

Excel devotes an entire tab to data. The DATA tab provides functions for pulling data into Excel from external sources as well as transforming and working with local data. You can access the following external data sources via Excel's DATA ribbon tab:

Disabled Get External Data group on the DATA tab.

Data connections are stored with the workbook, and you can view them by selecting DATA | Get External Data | Existing Connections and navigating to the Connections tab. The Tables tab of this dialog box shows existing Data Model connections that use a Table as a source, such as when a PivotTable is created using a Table as a source. (See Chapter 5 for more on this.)

CAUTION

Data connections may be disabled on your computer. To enable data connections, select FILE | Options | Trust Center | Trust Center Settings | External Content and ensure that either Enable all Data Connections (not recommended) or Prompt user about Data Connections is enabled.

Refreshing External Data Connections

When you use Tables with data connections, you can rearrange and delete columns without affecting a query. Excel continues to map the queried data to the correct columns. Column widths, however, are typically not maintained and AutoFit on refresh.

TIP

To stop Excel from AutoFitting Table columns with every refresh, right-click anywhere on the Table and select Table | External Data Properties and then uncheck Adjust column width. Two other options are Preserve column sort/filter/layout and Preserve cell formatting. They are both selected by default. These options are not available to all types of external data connections.

Working with Data from a Database

Every desktop Excel installation provides many different drivers for connecting to external data sources. A driver is software that knows about a specific kind of data source, including how to query it and how to interpret the query result and transform the result into a form compatible with Excel. Older drivers use ODBC application API technology, and newer drivers use the Object Linking and Embedding, Database (OLEDB) application API methodology. The differences between these two methodologies are beyond the scope of this book, but basically, you should use the newer OLEDB whenever possible.

The database example in this chapter focuses on SQL Server, but there are drivers for just about any database source, including Oracle, MySQL, Access, Hadoop, SSAS, OData data feeds, and many others.

NOTE

When you’re connecting to an Access database, you cannot connect to a crosstab query. Chapter 9 presents a VBA solution to this problem.

Connecting to SQL Server

To connect to a SQL Server database, select DATA | Get External Data | From Other Sources | From SQL Server. The Data Connection Wizard appears, offering controls for specifying the server name and the type of login to use to open the connection. Contact your SQL Server or IT administrator to determine how to enter the login credentials.

NOTE

The following example uses the AdventureWorks2012 sample SQL Server database, which is freely available from Microsoft.

Creating the Query

Making a connection is only part of the process of pulling data out of a database. Once a connection is established, a query is defined to find and return the desired data in a meaningful form. Many databases support custom routines on the server that do a lot of the querying. A database administrator is responsible for creating tables and custom routines. A query originating from an external location such as Excel can be as simple as a query for the entire contents of a single table in the database, or it can be a query for a complex joining of many tables and filtering on specific values. How much or little support an Excel developer has from the database administrator determines how simple or complex the query on the Excel side of the connection will be.

Fortunately, Excel provides many powerful tools for performing virtually all the data transformation tasks that the database administrator can do in a stored routine. While this book does not delve into the many nuances of relational database queries, this example does present the basic process.

With an active connection to the source database, Excel asks for the database to which to connect. (Each SQL Server instance can house any number of databases, which may or may not be related to each other.) When a database is selected, the tables and views are displayed for selection, and you can select any number of them. When you click the button Select Related Tables, you select every table with any known relationship to any currently selected table. By clicking this button again, you add more tables to the selection if there are more relationships defined in the database.

When you import a single table, by default the destination is a new Excel Table, and the data is not loaded into the Data Model. There are two ways to load the queried data into the Data Model:

If the database administrator for a database has created a stored routine (call a stored procedure in SQL Server) for your use, then you need a custom query, often specifying parameters. To define a custom query, select any table on the Select Database and Table page of the Data Connection Wizard (it doesn't matter which one), click Next, and click Finish to display the Import Data dialog box. Then click Properties to open the Connection Properties dialog box and navigate to its Definition tab. The default query command text is displayed in the Command text text box. Change the Command type to "SQL" and enter the custom query string in the Command text text box. Click OK to execute the query.

By default, creating a query connection to SQL Server creates an ODC (Office Database Connection) file, which is, by default, stored in the folder "C:\Users\USERNAME\MyDocuments\My Data Sources\".

There are many different types of data sources available, and sometimes the default connection templates that Excel presents do not work. In these cases, Excel provides tools to define custom connections. A helpful website for determining connection strings for these custom connections is www.connectionstrings.com. Many examples of connection strings are available there, along with articles and Q&A forums.

Table Names

Excel imports each database table selected into a new Excel Table on a new sheet in the active workbook. It creates Table names using the format "Table_Name", where "Name" is the name of the database table or view in the SQL Server database. If you import a single table, the Table's name is "Table_ServerName_DatabaseName_TableName", where "ServerName" is the name of the server, "DatabaseName" is the name of the database, and "TableName" is the name of the Table or view imported.

NOTE

Crosstab queries cannot be returned natively when you connect to Access from Excel. There are two workarounds: Use Microsoft Query by selecting DATA | Get External Data | From Other Sources | From Microsoft Query or use VBA.

Working with Data from Text Files

Sometimes data comes in the form of simple text files. While this is not common these days, many applications still export data in text file format. Text files are usually in the form of a CSV (comma-separated value) file, a TXT (tab-delimited) file, or a PRN (fixed-field or space-delimited) file.

Excel still creates a data connection when importing text files, but it does not put data into a Table, as it does with other data connections. This is due to the lack of information about the data contained in a text file. Databases have certain rules, like Excel Tables, such as unique field/column names, whereas text files are not bound by these rules. Excel loads data as it finds it in the text file into a range of cells without any formatting; it does not try to force the data into a Table or other structured format.

CAUTION

If a Table is inserted on data that is connected to a text file, Excel displays a warning, as shown in the next figure. The warning states that the connection will be severed if the Table is inserted.

Error that appears when you insert a Table over a text file data connection.

Connecting to a Text File

To choose the text file to import, you select DATA | Get External Data | From Text. Default file formats are PRN, TXT, and CSV. After the file is selected, Excel opens the Text Import Wizard dialog box. The wizard walks you through three steps:

1. Select Delimited for CSV and TXT files or select Fixed width for PRN flies. Then click Next.

2. For a CSV or TXT file, select Comma for a CSV file or Tab for a TXT file. Then click Next. For a PRN file, define the column breaks and click Next.

3. Define the format to be used to interpret each column in the text file. Click Finish. The Import Data dialog box appears, as shown in the next figure. Enter the destination range and click OK to import the data.

Import Data dialog box.

NOTE

To refresh a data connection to a text file, you need to go through the import process again.

Working with Data from Azure Marketplace

The Azure Marketplace (datamarket.azure.com) is a repository of data and data mining applications. It is a cloud infrastructure and platform that is used to build, deploy, and manage cloud-based services. Data available from the Azure Marketplace can be exported to text files, pulled into a Power Pivot Data Model, or imported directly into a Table. If a queried record set is large, it may have to be imported directly into the Data Model, which supports many more rows than a worksheet.

NOTE

Connecting to the Azure Marketplace from Excel requires both the URL for the data and a primary Account Key. Both are available from the Azure site. If an Account Key is missing or invalid you will be prompted to verify it, as shown in the next figure.

An error message with a missing or invalid Account Key when connecting to an Azure data source.

At the Azure Marketplace home page, click the Data link in the top menu bar. The data page appears, listing all the catalogs or data sets available; some of them are free and some have an associated fee. Select one of these data sources to view that product's page, which displays the cost per transaction (query) and information about the data. Click the Signup link to open the Signup page. Click the check box stating that you agree to the terms of the agreement and click the Signup link to go to the Receipt page.

On the Receipt page, you see one or more links for using the data. These are the most commonly displayed links:

Click any of the displayed download options or use your URL and primary Account Key from within Excel to retrieve the data. You then get a standard data connection in Excel. The following is an example of a connection string:

Data Source=URL_Goes_Here;Namespaces to Include=*;Max Received Message Size=4398046511104;Integrated Security=Basic;User ID=AccountKey;Persist Security Info=false;Base Url=URL_Goes_Here

Once you're connected to the Azure Marketplace data set or catalog, the connection is like any other data connection and can be refreshed so you can keep updated with the most current data. Keep in mind that every connection counts as a transaction.