Chapter 4
IN THIS CHAPTER
Understanding external data sources
Exporting data from other programs
Importing data into Excel
Grabbing data from the web
Querying an external database
In many cases, the data that you want to analyze resides outside Excel. That data could be in a text file or a Word document, on a web page, in a database file, in a database program, such as a corporate accounting system, or on a special database server. Alas, you can’t analyze anything that’s hunkered down “out there” in a file, a program, or a server. Instead, you’ve got to figure out a way to get that data “in here,” by which I mean that you import that data into an Excel workbook and in the form of an Excel table. Importing can be quite a challenge, but fortunately, Excel offers many powerful tools for importing outside data.
You can use two basic approaches to grabbing the external data that you want to analyze. You can export data from another program and then import that data into Excel, or you can query a database directly from Excel. I describe both approaches in this chapter.
A vast amount of data exists in the world, and most of it resides in some kind of nonworkbook format. Some data exists in simple text files, perhaps as comma-separated lists of items. Other data resides in tables, either in Word documents or, more likely, Access databases. Also, an increasing amount of data is available on web pages. External data is data that resides outside Excel in a file, database, server, or website.
By definition, external data is not directly available to you via Excel. However, Excel offers a number of tools that enable you to import external data into the program, and from there you can break out Excel’s data-analysis tools to extract useful information from that data.
The world’s data exists in a seemingly endless variety of files and formats. Here are some of the external data types you’re most likely to come across:
Before getting to the techniques for importing data into Excel, take a minute to consider how you might export data from an external program into a format that Excel can use. Fortunately, most programs that store data also offer a (usually) straightforward way to export data. Even better, because Excel is the dominant data-analysis tool available to business, you can almost always tweak a program’s export routine to produce data in an Excel-friendly format.
What does it mean for exported data to be Excel friendly? It means that the resulting data is in one of the following two formats:
Therefore, when exporting data from some other program, your first step is to do a little bit of digging to see whether you have a way to easily and automatically export data to Excel. This fact-finding shouldn't take much time if you use the program’s Help system. Next, you need to find the program’s Export feature. You can usually find the Export command on the File menu, although you might first need to open a submenu with a name such as Import/Export, Save As, or Utilities. After you’ve launched the export process, look for an option to save the data directly to an Excel workbook. If you don’t see an Excel option, say “Aw, too bad!” and look for a way to export the data as a text file or XML file. If you go the text file route, be sure to export the data using one of the following two formats:
With your external data in a format that Excel can work with, you’re ready to get that data imported into Excel.
After you have access to the data, your next step is to import it into an Excel worksheet for analysis and manipulation. Depending on the amount of data, this can make your worksheet quite large. However, having direct access to the data gives you maximum flexibility when it comes to analyzing the data. Not only can you use Excel’s main data-analysis tools — tables, scenarios, and what-if analysis — but you can also create a PivotTable from the imported data. I talk about powerful PivotTables in Part 2, “Analyzing Data with PivotTables and PivotCharts.”
The next few sections take you through the specifics of importing various data types into Excel.
If you want to use Excel to analyze data from a table or query within an Access database, you can import the table or query into an Excel worksheet. You can use the Query Wizard (described later in this chapter) to perform this task. However, if you don’t need to filter or sort the data before importing it, you can import the table or query directly from the Access database, which is usually a bit faster. Here are the steps to follow to import a table or query directly from Access:
Choose Data ⇒ Get Data ⇒ From Database ⇒ From Microsoft Access Database.
The Import Data dialog box appears.
Open the folder that contains the database, select the Access database file, and then click Import.
Excel analyzes the Access database and then displays a window that lists the tables and queries that are available to import.
Alternatively, you might see message similar to The database has been place in a state by user ‘Whoever’ on machine ‘Whatever’ that prevents it from being opened or locked.
Yikes! This means that someone else is using the database right now, so you should try again later.
Use the Navigator pane to select the table or query you want to import.
A preview of the data appears, as shown in Figure 4-1.
Select OK.
Excel imports the Access data into the worksheet.
FIGURE 4-1: Select a table or query in the Navigator pane to see a preview of the data.
A Microsoft Word table is a collection of rows, columns, and cells, which means that it looks something like an Excel range. Moreover, you can insert fields into Word table cells to perform calculations. These fields support cell references, built-in functions, and operators. Cell references designate specific cells; for example, a reference such as B1 refers to the cell in the second column and first row of the table. You can use cell references with built-in functions such as SUM
and AVERAGE
, and operators such as addition (+
), multiplication (*
), and greater than (>
), to build formulas that calculate results based on the table data.
However, Excel still offers far more sophisticated data-analysis tools. Therefore, to analyze your Word table data properly, you should import the table into an Excel worksheet. Here are the steps to trudge through:
Select a cell inside the table you want to import.
Excel adds the Table Tools contextual tab to the ribbon.
Under the Table Tools contextual tab, choose Layout ⇒ Select ⇒ Select Table.
You can also select the table by selecting the table selection handle, which appears in the upper-left corner of the table.
Paste the table data.
How you paste the table depends on whether you want Excel to create a link with the original Word table:
Nowadays, most data reside in some kind of special format: an Excel workbook, an Access database, a server database, a web page, and so on. However, finding data stored in simple text files is still relatively common because text is a universal format that users can work with on any system and in a wide variety of programs. You can analyze the data contained in certain text files by importing the data into an Excel worksheet. Note, however, that you cannot import just any text file into Excel; the file needs to use either the delimited or fixed-width format.
A delimited text file uses a structure in which each item on a line of text is separated by a character called a delimiter. The most common text delimiter is the comma (,), and the resulting file is called a comma-separated values (or CSV) file. When Excel imports a delimited text file, it treats each line of text as a row (record) and each item between the delimiter as a column (field).
Follow these steps to import a delimited text file into Excel:
Choose Data ⇒ From Text/CSV (or, if you want the workout, choose Data ⇒ Get Data ⇒ From File ⇒ From Text/CSV).
The Import Data dialog box appears.
Open the folder that contains the text file, select the file, and then click Import.
Excel analyzes the file and then opens a window that displays a preview of the data in the file.
In the Delimiter list, select the delimiter character that your text data uses.
How do you know which delimiter character to choose? The comma is by far the most common delimiter, so start with that. You’ll know you’ve hit the delimiter jackpot when the previewed data appears in separate columns, as shown in Figure 4-2.
Either choose Load to import the data into a new worksheet or choose Load ⇒ Load To ⇒ Existing Worksheet ⇒ OK to import the data starting at the cell you selected in Step 1.
Excel imports the delimited text data into the worksheet.
FIGURE 4-2: Select the delimiter that gets the data into nice, neat columns.
A fixed-width text file uses a structure in which all the items use a set amount of space. The first item might always be, say, 10 characters wide (including spaces); the second item might always be 8 characters wide; and so on. Crucially, these fixed widths are the same on every line of text, which gives the entire file a predictable and regular structure that Excel can work with. Excel imports a fixed-width text file by treating each line of text as a row (record) and each fixed-width item as a column (field).
If you’re importing data that uses the fixed-width structure, you need to tell Excel where the separation between each field occurs. In a fixed-width text file, each column of data is a constant width, and Excel is usually quite good at determining these widths. So in most cases, Excel automatically sets up column break lines, which are vertical lines that separate one field from the next. However, titles or introductory text at the beginning of the file can impair the wizard’s calculations, so you should check carefully that the proposed break lines are accurate.
Follow these steps to import a fixed-width text file into Excel:
Choose Data ⇒ From Text/CSV (or, if you have extra time to kill, choose Data ⇒ Get Data ⇒ From File ⇒ From Text/CSV).
The Import Data dialog box appears.
Open the folder that contains the text file, select the file, and then click Import.
Excel analyzes the file and then opens a window that displays a preview of the data in the file.
In the Delimiter list, make sure that the Fixed Width item is selected.
Below the Delimiter list, Excel displays a series of numbers, each of which represents the starting point of a column in the text file. The first column always starts at 0, and each subsequent value depends on the width of each column. Figure 4-3 shows these values as 0, 30, 45.
Either choose Load to import the data into a new worksheet or choose Load ⇒ Load To ⇒ Existing Worksheet ⇒ OK to import the data starting at the cell you selected in Step 1.
Excel imports the fixed-width text data into the worksheet.
FIGURE 4-3: If needed, edit the column starting points until your columns are correct.
You already know that the web is home to more information than you can ever use, but did you know that some of that data is available to import into Excel? I talk about a more sophisticated web-based data format in the next section, but here I want to introduce you to web data that comes in the form of a table. A web page table is a rectangular array of rows and columns, with data values in the cells created by the intersection of the rows and columns. Why, that sounds just like an Excel table, doesn’t it? It sure does, and that similarity means that if you know the address of the page that contains the table, Excel offers a tool that lets you import that data into a worksheet for more data-analysis fun.
Here are the steps to follow to import a web page table into Excel:
Choose Data ⇒ From Web (or, if you prefer the scenic route, choose Data ⇒ Get Data ⇒ From Other Sources ⇒ From Web).
The From Web window appears.
Enter the address of the web page in the URL text box and then click OK.
Excel displays the Access Web Content dialog box.
Specify how you want to access the web page and then click Connect.
As shown in Figure 4-4, Excel offers five methods that you can use to access the web content:
Excel connects to the web content, analyzes the web page, and then opens the Navigator dialog box, which displays a list of the tables found on the page. One of these objects is always named Document, and it contains a table of data related to the entire web page (so you can safely ignore it).
Select the table you want to import.
Excel displays a preview of the table, as shown in Figure 4-5. If you want to see what the table looks like on the web page, click the Web View tab.
Either choose Load to import the data into a new worksheet or choose Load ⇒ Load To ⇒ Existing Worksheet ⇒ OK to import the data starting at the cell you selected in Step 1.
Excel imports the web table data into the worksheet.
FIGURE 4-4: Choose the method you want to use to access the web data.
FIGURE 4-5: Select a table to preview its data.
XML (Extensible Markup Language) is a standard that enables the management and sharing of structured data using simple text files. These XML files organize data using tags, among other elements, that specify the equivalent of a table name and field names. Here’s an example that shows the first few records in an XML table of book info:
<?xml version="1.0"?>
<catalog>
<book>
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book>
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book>
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
etc.
</catalog>
Because XML is just text, if you want to perform data analysis on the XML file, you must import the file into an Excel worksheet. Excel stores imported XML data in an Excel table.
Here are the steps to follow to import an XML file into an Excel worksheet:
Choose Data ⇒ Get Data ⇒ From File ⇒ From XML.
The Import Data dialog box appears.
Open the folder that contains the XML file, select the file, and then click Import.
Excel analyzes the XML file and then opens the Navigator dialog box, which displays a list of the tables found in the file.
Select the table you want to import.
Excel displays a preview of the table, as shown in Figure 4-6.
Either choose Load to import the data into a new worksheet or choose Load ⇒ Load To ⇒ Existing Worksheet ⇒ OK to import the data starting at the cell you selected in Step 1.
Excel imports the XML data into the worksheet.
FIGURE 4-6: Select an XML table to preview its data.
If you want to analyze data using a sorted, filtered subset of an external data source, Excel offers the Query Wizard tool, which enables you to specify the sorting and filtering options and the subset of the source data that you want to work with. Why bother doing all that work? Because databases such as those used in Microsoft Access and SQL Server are often very large indeed and contain a wide variety of data scattered over many different tables.
With data analysis, you rarely use an entire database as the source. Instead, you extract a subset of the database: a table, or perhaps two or three related tables. You might also require the data to be sorted in a certain way, and perhaps need to filter the data so that you work with only certain records. The specifics of these three operations — extracting a subset, sorting, and filtering — constitute the criteria for the data you want to work with, and all together they make up what’s known in the trade as a database query.
All database queries require two things at the very beginning: access to a database and an Open Database Connectivity, or ODBC, data source for the database. ODBC is a database standard that enables a program to connect to and manipulate a data source. An ODBC data source contains three things: a pointer to the file or server where the database resides; a driver that enables the Query Wizard to connect to, manipulate, and return data from the database; and optional login information that you require to access the database.
Before you can do any work with the Query Wizard, you must select the data source that you want to use. If you have a particular database that you want to query, you can define a new data source that points to the appropriate file or server.
Most data sources point to database files. For example, the relational database management program Microsoft Access uses file-based databases. You can also create data sources based on text files and Excel workbooks. However, some data sources point to server-based databases. For example, SQL Server and Oracle run their databases on special servers. As part of the data source definition, you need to include the software driver that the Query Wizard uses to communicate with the database, as well as any information that you require to access the database.
Follow these steps to define a data source for your query:
Choose Data ⇒ Get Data ⇒ From Other Sources ⇒ From Microsoft Query.
The Choose Data Source dialog box appears, as shown in Figure 4-7.
Your computer probably comes with a few predefined data sources that you can use instead of creating new ones. In the Choose Data Source dialog box, any predefined data sources appear in the Databases tab. For example, Microsoft Office creates two default data sources: Excel Files and MS Access Database. These incomplete data sources don’t point to a specific file. Instead, when you select one of these data sources and then click OK, Excel prompts you for the name and location of the file. These data sources are useful if you often switch the files that you’re using. However, if you want a data source that always points to a specific file, you need to follow the steps in this section.
Choose <New Data Source>, deselect the Use the Query Wizard to Create/Edit Queries check box, and then click OK.
The Create New Data Source dialog box appears.
In the What Name Do You Want to Give Your Data Source? text box, enter a name for your data source.
This name appears in the Databases tab of the Choose Data Source dialog box, so enter a name that helps you remember which data source you’re working with.
From the Select a Driver for the Type of Database You Want to Access list, select the database driver that your data source requires.
For example, for an Access database, select Microsoft Access Driver (*.mdb, *.accdb)
.
Many businesses store their data in Microsoft SQL Server databases (SQL is usually pronounced ess-kew-ell). This is a powerful server-based system that can handle the largest databases and thousands of users. To define an SQL Server data source, select SQL Server. Click Connect to display the SQL Server Login dialog box. Ask your SQL Server database administrator for the information you require to complete this dialog box. Type the name or remote address of the SQL Server in the Server text box, type your SQL Server login ID and password, and then click OK. Perform Steps 9 and 10 later in this section to complete the data source.
Click Connect.
The dialog box for the database driver appears. The steps that follow show you how to set up a data source for a Microsoft Access database.
Click the Select button.
The Select Database dialog box appears.
Open the folder that contains the database, select the database file, and then click OK.
Excel returns you to the database driver’s dialog box.
If you use a login name and password to access the database, click Advanced to display the Set Advanced Options dialog box. Type the login name and password and then click OK.
Click OK.
Excel returns you to the Create New Data Source dialog box. Figure 4-8 shows the completed dialog box for the Northwind Access database that I’m using.
You can use the Select a Default Table for Your Data Source list to select a table from the database. When you do this, each time you start a new query based on this data source, the Query Wizard automatically adds the default table to the query, thus saving you several steps.
If you specified a login name and password as part of the data source, you can select the Save My User ID and Password in the Data Source Definition check box to save the login data.
Click OK.
Excel returns to the Choose Data Source dialog box, which now displays your shiny, new data source.
Click Cancel to bypass the steps for importing the data.
You can now use the data source in the Query Wizard, which I talk about in the next section.
FIGURE 4-7: The Choose Data Source dialog box.
FIGURE 4-8: The completed Create New Data Source dialog box for the Northwind database.
To run a database query and import the query results, follow these steps:
Choose Data ⇒ Get Data ⇒ From Other Sources ⇒ From Microsoft Query.
Excel displays the Choose Data Source dialog box. In Figure 4-9, you can see that the Northwind data source I created in the previous section now appears in the Databases tab.
Select the Use the Query Wizard to Create/Edit Queries check box.
If you didn’t define a new data source as described in the previous section, this check box should already be selected for you.
On the Databases tab, select the database you want to query and then click OK.
If you choose one of the predefined database types, Excel displays the Select Database dialog box. In this dialog box, select the database that you want to query and then click OK.
Excel displays the Query Wizard - Choose Columns dialog box.
You use the Query Wizard - Choose Columns dialog box to pick which tables and which table columns (fields) you want to appear in your query results. In the Available Tables and Columns box, Excel lists tables and columns. Initially, this list shows only tables, but you can see the columns within a table by clicking the + icon next to the table name.
Populate the Columns in Your Query list with the columns you want to work with.
You have three techniques to use:
This all sounds very complicated, but it really isn’t. Essentially, all you do is identify the columns of information that you want in your Excel table. Figure 4-10 shows how the Query Wizard - Choose Columns dialog box looks after adding several columns from the Products table and one column (CategoryName) from the Categories table.
After you identify which columns you want in your query, click the Next button to filter the query data.
Excel displays the Query Wizard - Filter Data dialog box.
You can filter the data returned as part of your query by using the Only Include Rows Where text boxes. For example, to include only rows in which the CategoryName column equals Beverages, click the CategoryName column in the Column to Filter list box. Then select the Equals filtering operation from the first drop-down list and enter the value Beverages into the second drop-down list (or select it); you can see this filter in action in Figure 4-11.
The Query Wizard - Filter Data dialog box performs the same sorts of filtering that you can perform with the AutoFilter command and the Advanced Filter command. Because I discuss these tools in Chapter 3, I don't repeat that discussion here. However, note that you can perform quite sophisticated filtering as part of your query.
Click Next.
Excel displays the Query Wizard - Sort Order dialog box.
Choose a sort order for the query result data from the Query Wizard - Sort Order dialog box.
Select the column that you want to use for sorting from the Sort By drop-down list and then select one of the following radio buttons (see Figure 4-12):
You can also use additional sort keys by selecting fields in one or more of the Then By drop-down lists that appear in the Query Wizard – Sort Order dialog box. The wizard displays enough drop-down lists for every field in your query.
You sort query results the same way that you sort rows in an Excel table. If you have more questions about how to sort rows, refer to Chapter 3. Sorting works the same whether you’re talking about query results or rows in a table.
Click Next.
Excel displays the Query Wizard - Finish dialog box.
In the Query Wizard - Finish dialog box, specify where Excel should place the query results.
For most types of data, you have two choices:
Click the Finish button.
After you click the Finish button to complete the Query Wizard, Excel displays the Import Data dialog box.
In the Import Data dialog box, choose the worksheet location for the query result data.
Use this dialog box to specify where query-result data should be placed.
Click OK.
Excel places the data at the location that you chose.
FIGURE 4-9: Any data sources that you’ve created appear in the Databases tab.
FIGURE 4-10: The completed Query Wizard – Choose Columns dialog box.
FIGURE 4-11: The Query Wizard – Filter Data dialog box with a filter added.
FIGURE 4-12: The Query Wizard – Sort Order dialog box.
By using the instructions that I describe in this chapter to retrieve data from some external source, you can probably get the data rather quickly into an Excel workbook. But you may have also found that the data is rather raw and are saying to yourself (as I would be, in your shoes), “Wow, this stuff is pretty raw.”
But don't worry: You are where you need to be. Having your information be raw at this point is okay. In Chapter 5, I discuss how you clean up the workbook by eliminating rows and columns and information that are not part of your data. I also cover how you scrub and rearrange the actual data in your workbook so that it appears in a format and structure that’s useful to you in your upcoming analysis.
The bottom line is this: Don't worry that your data seems a bit ugly right now. Getting your data into a workbook accomplishes an important step. Now you need to spend a little time on your housekeeping. Read through the next chapter for the lowdown on how to do that.
By the way, if the process of importing data from some external source has resulted in very clean and pristine data — and this might be the case if you've grabbed data from a well-designed database or with help from the corporate database administrator — that's great. You can jump right into the data-analysis techniques that I start describing in Chapter 6.