20 Using Unlinked Data and “Driver” Tables

20
Using Unlinked Data and “Driver” Tables

“If you only have a hammer, you tend to see every problem as a nail.”

—ABRAHAM MASLOW

Topics Covered in This Chapter

What Is Unlinked Data?

Solving Problems with Unlinked Data

Solving Problems Using “Driver” Tables

Sample Statements

Summary

Problems for You to Solve

Before you start this chapter, make sure you get a good night’s sleep! And while I’m doling out warnings, perhaps you should also make sure your seatbelt is securely fastened. I promised that I would introduce you to concepts that make you think “outside the box.” In this chapter, I am going to tackle problems that can be solved using unlinked data—problems where you will use more than one table in your FROM clause, but you won’t specify any linking criteria using an ON clause. Let’s get started.

Caution: I am going to use the CASE expression extensively in this chapter. If you are not familiar with using CASE, I strongly recommend you work through Chapter 19, “Condition Testing,” before tackling this chapter.

What Is Unlinked Data?

As you learned beginning in Chapter 7, “Thinking in Sets,” most problems you’ll solve using SQL involve gathering data from more than one table. In Chapter 8, “INNER JOINs,” I showed you how to fetch information from multiple tables by linking them on matching data in the Primary and Foreign keys where all the values match. In Chapter 9, “OUTER JOINs,” I showed you how to fetch all the rows from one table and any matching information from a related table again using matching data in the Primary and Foreign keys. In this chapter, I’ll use multiple tables, but I will purposefully not match on key values—I will be using “unlinked” tables.

Let’s take a look at the SQL syntax to create unlinked tables. First, Figure 20-1 shows you the syntax for the SELECT Statement.

Figure 20-1 The SELECT Statement

You need to study the Table Reference to understand how to put unlinked tables in a FROM clause. Figure 20-2 shows you the full diagram for Table Reference.

Figure 20-2 The structure of a Table Reference

And finally, you need to study the diagram for Joined Table. Even though you really aren’t going to “join” unlinked tables, the SQL Standard does show you how to do it in the Joined Table definition, as shown in Figure 20-3.

Figure 20-3 The diagram for Joined Table

To get unlinked tables, you need to do what the SQL Standard calls a CROSS JOIN. So what do you get when you put two or more tables in the FROM clause of your SQL using a CROSS JOIN? The result is something called a Cartesian Product. You’ll get all rows from the first table matched with all rows from the second table, and the total number of rows you will get will be the product of the number of rows in the first table times the number of rows in the second table. Let’s take a look at a simple example:

Click here to view code image

SELECT Customers.CustLastName,
Products.ProductName
FROM Customers CROSS JOIN Products;

In the Sales Orders sample database, you can find 28 customers and 40 products, so you’ll get 28 times 40 rows or 1,120 rows! The result looks like this:

CustLastName	ProductName
Viescas	Trek 9000 Mountain Bike
Thompson	Trek 9000 Mountain Bike
Hallmark	Trek 9000 Mountain Bike
Brown	Trek 9000 Mountain Bike
McCrae	Trek 9000 Mountain Bike
Viescas	Trek 9000 Mountain Bike
Sergienko	Trek 9000 Mountain Bike
Patterson	Trek 9000 Mountain Bike
Cencini	Trek 9000 Mountain Bike
Kennedy	Trek 9000 Mountain Bike
<< more rows here >>

You might be asking: Why is this useful? Let’s say you need to produce a catalog of all products that is customized for each customer. Your sales department has asked you to create the information to be able to say “Dear Mr. Thompson” or “Dear Mrs. Brown,” print a mailing label on the outside cover, and then list all the products available. You could certainly include the Orders and Order_Details tables to fully link Customers with Products, but then you would get only the products that each customer had ever purchased. To solve your problem, you need to use unlinked tables that result in a Cartesian Product to get the information you need. (By the way, I saved the query to produce the list of all customers and products as CH20_Customer_Catalog in the Sales Orders sample database.)

Note: The SQL Standard allows you to simply list tables separated by commas when you want to use unlinked tables (see Figure 20-1 of the SELECT Statement shown previously), and nearly all database systems accept this syntax. However, as you have learned, the SQL Standard also defines the keywords CROSS JOIN to explicitly indicate that you intend to get the Cartesian Product of the table reference on the left with the table reference on the right.

When you save a View in Microsoft SQL Server using only commas to separate the table names, you’ll find the view saved with the commas replaced with CROSS JOIN. When you save a View in MySQL using only commas, you’ll find the view saved with the commas replaced with JOIN. (CROSS is the default if you don’t specify INNER or OUTER and do not include an ON clause.) PostgreSQL leaves the commas but replaces INNER JOIN with just JOIN—the default being INNER when not specified. Go figure! Microsoft Office Access doesn’t support CROSS JOIN, so I created all the sample queries using only the lowest common denominator—the comma syntax. I will, however, continue to use CROSS JOIN in the text and in the Clean Up steps to make it clear that’s what I am doing. In the SQL statements in the sample databases, I’ll use only commas.

Deciding When to Use a CROSS JOIN

Deciding to use a CROSS JOIN isn’t easy. You can think of these types of queries in two categories:

• Using data from two or more of the main data tables in your database—the tables that you built to store all the subjects and actions described by your application.

I mentioned Customers and all Products previously in this chapter. The same might apply to all Agents and Entertainers, Students and Courses, or even Teams unlinked with a second copy of the Teams table to list all potential matches.

• Using data from one or more of your main data tables and a “helper” or “driver” table that contains rows, for example, for all dates across a relevant time period.

You certainly have date information in your database, such as the OrderDate in the Orders table. But when you want to look at all dates across a range regardless of whether an order was placed on that date, you need a driver table to supply all the values. You can also use driver tables to supply “lookup” values such as a translation from Gender code to the relevant word or conversion of a grade point to a letter grade defined by a range of grade points.

Solving Problems with Unlinked Data

Normally when you set about solving problems using data in your main data tables, you figure out where the data you want is stored and then gather all the tables required to link that data in some meaningful way. When the data you want is in two or more tables, you think about using a JOIN to link the tables, including any intervening tables necessary to logically link all the tables even if you don’t actually need data columns from some of those tables.

Solving problems with unlinked data involves breaking this mold and “thinking outside of the box” to get the answer you want. Let’s take a look again at the Customers and Products “catalog” problem, but let’s complicate it by flagging any products the customers have already ordered.

Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you, I have combined the Translation/Clean Up steps for all the following examples to simplify the process.

“Produce a list of all customer names and address and all products that we sell and indicate the products the customer has already purchased.”

From what you learned in Part III, “Working with Multiple Tables,” you would look at your table relationships to start to figure out how to proceed. Figure 20-4 shows you the standard way you would link Customers and Products, using the Orders and Order_Details tables as intermediaries.

Figure 20-4 The normal way to connect Customers to Products

Remember that I want all customers (including those who haven’t ordered anything) and all Products (including products never ordered). If you have your thinking cap on, you might come up with using a FULL OUTER JOIN (see Chapter 9), and you are correct—that would be one way to do it. Keep in mind that not all database systems support FULL OUTER JOINs, so that might not be a solution for you. You could also create one query (view) that LEFT JOINs Customers with Orders and Order_Details, and then use that query in another query to RIGHT JOIN with the Products table. When (remember CASE?) a key field in the Order_Details table is not Null, then indicate that the customer has previously ordered the product.

But this chapter is about solving problems with unlinked tables, so let’s tackle the problem head-on by using a CROSS JOIN of Customers and Products and a subquery in the SELECT clause to do a lookup to see if the customer ever ordered the product. Just for fun, let’s also look up the category description for each product. Here’s how to do it:

Translation/Clean Up

Select customer first name, customer last name, customer street address, customer city, customer state, customer zip code, category description, product number, product name, retail price, and (CASE when the customer ID is in the (selection of customer ID from the orders table inner joined with the order details table on Orders.order number in the orders table equals = Order_Details.order number in the order details table where the Products.product number in the products table equals the = Order_Details.product number in the order details table then display ‘You purchased this!’, else ‘ ‘ END) display a blank from the customers table and the CROSS JOIN categories table inner joined with the products table on Categories.category ID in the categories table equals = Products.category ID in the products table sorted ORDER BY customer ID, category description, and product number

SQL

SELECT Customers.CustomerID, Customers.

CustFirstName,

Customers.CustLastName,

Customers.CustStreetAddress,

Customers.CustCity, Customers.CustState,

Customers.CustZipCode,

Categories.CategoryDescription,

Products.ProductNumber, Products.ProductName,

Products.RetailPrice,

(CASE WHEN Customers.CustomerID IN

(SELECT Orders.CustomerID

FROM ORDERS INNER JOIN Order_Details

ON Orders.OrderNumber =

Order_Details.OrderNumber

WHERE Order_Details.ProductNumber =

Products.ProductNumber)

THEN 'You purchased this! '

ELSE ' ' END) AS ProductOrdered

FROM Customers, Categories INNER JOIN Products

ON Categories.CategoryID = Products.CategoryID

ORDER BY Customers.CustomerID,

Categories.CategoryDescription,

Products.ProductNumber;

Yes, there is an INNER JOIN to link Categories with Products, but the key part of the FROM clause is the CROSS JOIN with the Customers table. You can find this query saved as CH20_Customer_All_Products_PurchasedStatus in the Sales Orders sample database. As expected, the query returns 1,120 rows.

Note: Recall from Chapter 19 that Microsoft Office Access does not support the CASE expression. In the samples I created in the Access databases, you’ll find that I used a built-in function called Immediate If (IIf) that serves a similar purpose.

Solving Problems Using “Driver” Tables

Let’s move on now to solving problems that require you to set up one or more tables containing a list of values that you’ll CROSS JOIN with other tables in your database to get your answer. I call this sort of table a “driver” table because the contents of the table “drive” the result you get. (If you also own Effective SQL that I wrote with my good friends, Doug Steele and Ben Clothier, we decided to call them “tally” tables in that book, but they’re the same thing.) Arguably, the most common type of driver table contains a list of dates or weeks or months that you can CROSS JOIN with your data to list all days or weeks or months and any matching events that occur on those dates.

Another use of a driver table is to define a categorization of values across a set of defined ranges. Examples include assigning a letter grade to a grade point score, rating instructors based on their proficiency rating, evaluating bowlers based on their average score, categorizing product prices, or categorizing the amount spent by a customer.

A really creative use of a driver table lets you “pivot” your data to display a result that looks more like a spreadsheet. A common example would be to display sales or purchases by month, with the months listed across by product or customer.

Setting Up a Driver Table

The SQL Standard defines WITH RECURSIVE that allows you to execute a stated SQL query multiple times in a loop. This can be useful to load a driver table with consecutive dates across a date range. Unfortunately, only a few database systems support this. To load my large driver tables, I resorted to using Visual Basic in Microsoft Office Access to perform the recursion necessary to load hundreds of rows into a date range table. (You can actually find some of the code I used if you dig around in the sample databases that are in Microsoft Office Access format.)

When your driver table is a simple set of ranges to translate to a value, it’s easy enough to load the data by hand. For example, here’s the list of values I entered into the ztblLetterGrades table you can find in the School Scheduling sample database:

LetterGrade	LowGradePoint	HighGradePoint
A	93	96.99
A-	90	92.99
A+	97	120
B	83	86.99
B-	80	82.99
B+	87	89.99
C	73	76.99
C-	70	72.99
C+	77	79.99
D	63	66.99
D-	60	62.99
D+	67	69.99
F	0	59.99

This should look familiar because it’s the same list of ranges that I used in the CH19_Students_Classes_Letter_Grades query in the previous chapter. (By the way, I named all the driver tables using a “ztbl” prefix to clearly separate them from the main data tables in each database.) One clear advantage to setting up a table like this is that you can easily change the range values should the need arise. You don’t have to go digging in the CASE clauses in each query that depends on the ranges to obtain the answer.

As I noted previously, a really creative use of a driver table lets you pivot your result to look like a spreadsheet. Quite a few database systems provide nonstandard ways to pivot data, but I’ll show you how to create a pivot using standard SQL and a driver table. You can find one such table I created for this purpose in the Sales Orders sample database called ztblMonths. Here is what part of the table looks like:

MonthYear	YearNumber	MonthNumber	MonthStart	MonthEnd
January 2017	2017	1	1/1/2017	1/31/2017
February 2017	2017	2	2/1/2017	2/29/2017
March 2017	2017	3	3/1/2017	3/31/2017
April 2017	2017	4	4/1/2017	4/30/2017
May 2017	2017	5	5/1/2017	5/31/2017
June 2017	2017	6	6/1/2017	6/30/2017
July 2017	2017	7	7/1/2017	7/31/2017
August 2017	2017	8	8/1/2017	8/31/2017
<< more rows here >>

Additional columns…

January	February	March	April	May	June
1	0	0	0	0	0
0	1	0	0	0	0
0	0	1	0	0	0
0	0	0	1	0	0
0	0	0	0	1	0
0	0	0	0	0	1
0	0	0	0	0	0
0	0	0	0	0	0
<< more rows here >>

Additional columns…

July	August	September	October	November	December
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0	0
1	0	0	0	0	0
0	1	0	0	0	0
<< more rows here >>

Looks a bit strange, doesn’t it? The little secret is you’ll use a WHERE clause to match the rows in this driver table with the date of the order, and then you will build columns by multiplying the total sales times the value found in a particular column to get a total for that month. When the order occurs in January 2017, only the January column on the matching row contains a 1 to result in 1 times quantity times the price. The value won’t be added to the columns for the other months because zero times any value is always zero. Another way to think of it is the ones and zeros define the horizontal “buckets” for each value encountered in your query that calculates the values you want to display. When a date matches the range defined by the row in the driver table, the 1 indicates the correct horizontal bucket in which to place the value. So, when a value is in January 2017, that value ends up in the January column by multiplying the column value times the expression that calculates the total.

Using a Driver Table

Let’s use the two driver tables described in the previous section to solve problems. First, I want to display a grade letter based on each student’s numeric grade received for a class. I solved this problem using CASE in the previous chapter. Now I’ll solve it using the driver table.

“List all students, the classes for which they enrolled, the grade they received, and a conversion of the grade number to a letter.”

Translation/Clean UP

Select Students.student ID from the students table, Students.student first name from the students table, Students.student last name from the students table, Classes.class ID from the classes table, Classes.start date from the classes table, Subjects.subject code from the subjects table, Subjects.subject name from the subjects table, Student_Schedules.grade from the student_schedules table, and ztblLetterGrades.letter grade from the letter grades driver table from ztblLetterGrades the letter grades driver table and CROSS JOIN the students table inner joined with the student schedules table on Students.student ID in the students table equals = Student_Schedules.student ID in the student schedules table, then inner joined with the classes table on Student_Schedules.class ID in the student schedules table equals = Classes.class ID in the classes table, then inner joined with the subjects table on Classes.subject ID in the classes table equals = Subjects.subject ID in the subjects table, then inner joined with the student class status table on Student_Schedules.class status in the student schedules table equals = Student_Class_Status.class status in the student class status table where Student_Class_Status.class status description in the student class status table equals = ‘Completed’ and Student_Schedules.grade in the student schedules table is between ztblLetterGrades.low grade point in the letter grades driver table and ztblLetterGrades.high grade point in the letter grades driver table

SQL

SELECT Students.StudentID, Students.StudFirstName,

Students.StudLastName, Classes.ClassID,

Classes.StartDate,

Subjects.SubjectCode, Subjects.SubjectName,

Student_Schedules.Grade,

ztblLetterGrades.LetterGrade

FROM ztblLetterGrades, (((Students

INNER JOIN Student_Schedules

ON Students.StudentID =

Student_Schedules.StudentID)

INNER JOIN Classes

ON Student_Schedules.ClassID = Classes.ClassID)

INNER JOIN Subjects

ON Classes.SubjectID = Subjects.SubjectID)

INNER JOIN Student_Class_Status

ON Student_Schedules.ClassStatus =

Student_Class_Status.ClassStatus

WHERE

(Student_Class_Status.ClassStatusDescription =

'Completed')

AND (Student_Schedules.Grade Between

ztblLetterGrades.LowGradePoint

AND ztblLetterGrades.HighGradePoint);

You can find this query saved as CH20_Students_Classes_Letter_Grades in the School Scheduling sample database. You’ll find that it returns the same 68 rows as the CH19_Student_Classes_Letter_Grades that I showed you in Chapter 19.

Now let’s take a look at using the second driver table.

“Show product sales for each product for all months, listing the months as columns.”

Translation/Clean Up

Select Products.product name from the products table, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.January from the months driver table) as January, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.February from the months driver table) as February, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.March from the months driver table) as March, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.April from the months driver table) as April, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.May from the months driver table) as May, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.June from the months driver table) as June, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.July from the months driver table) as July, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.August from the months driver table) as August, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.September from the months driver table) as September, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.October from the months driver table) as October, the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.November from the months driver table) as November, and the sum of (Order_Details.quoted price from the order details table times * Order_Details.quantity ordered from the order details table times * ztblMonths.December from the months driver table) as December from ztblMonths the months driver table CROSS JOIN and the products table then inner joined with the order details table on Products.product number in the products table equals = Order_Details.product number in the order details table then inner joined with the orders table on Orders.order number in the orders table equals = Order_Details.order number in the order details table where Orders.order date in the orders table is between ztblMonths.month start in the months driver table and ztblMonths.month end in the months driver table grouped by Products.product name in the products table

SQL

SELECT Products.ProductName,

SUM(Order_Details.QuotedPrice *