“If you only have a hammer, you tend to see every problem as a nail.”
—ABRAHAM MASLOW
Solving Problems with Unlinked Data
Before you start this chapter, make sure you get a good night’s sleep! And while I’m doling out warnings, perhaps you should also make sure your seatbelt is securely fastened. I promised that I would introduce you to concepts that make you think “outside the box.” In this chapter, I am going to tackle problems that can be solved using unlinked data—problems where you will use more than one table in your FROM clause, but you won’t specify any linking criteria using an ON clause. Let’s get started.
Caution: I am going to use the CASE expression extensively in this chapter. If you are not
familiar with using CASE, I strongly recommend you work through Chapter 19, “Condition Testing,” before tackling this chapter.
As you learned beginning in Chapter 7, “Thinking in Sets,” most problems you’ll solve using SQL involve gathering data from more than one table. In Chapter 8, “INNER JOINs,” I showed you how to fetch information from multiple tables by linking them on matching data in the Primary and Foreign keys where all the values match. In Chapter 9, “OUTER JOINs,” I showed you how to fetch all the rows from one table and any matching information from a related table again using matching data in the Primary and Foreign keys. In this chapter, I’ll use multiple tables, but I will purposefully not match on key values—I will be using “unlinked” tables.
Let’s take a look at the SQL syntax to create unlinked tables. First, Figure 20-1 shows you the syntax for the SELECT Statement.
You need to study the Table Reference to understand how to put unlinked tables in a FROM clause. Figure 20-2 shows you the full diagram for Table Reference.
And finally, you need to study the diagram for Joined Table. Even though you really aren’t going to “join” unlinked tables, the SQL Standard does show you how to do it in the Joined Table definition, as shown in Figure 20-3.
To get unlinked tables, you need to do what the SQL Standard calls a CROSS JOIN. So what do you get when you put two or more tables in the FROM clause of your SQL using a CROSS JOIN? The result is something called a Cartesian Product. You’ll get all rows from the first table matched with all rows from the second table, and the total number of rows you will get will be the product of the number of rows in the first table times the number of rows in the second table. Let’s take a look at a simple example:
SELECT Customers.CustLastName,
Products.ProductName
FROM Customers CROSS JOIN Products;
In the Sales Orders sample database, you can find 28 customers and 40 products, so you’ll get 28 times 40 rows or 1,120 rows! The result looks like this:
CustLastName |
ProductName |
Viescas |
Trek 9000 Mountain Bike |
Thompson |
Trek 9000 Mountain Bike |
Hallmark |
Trek 9000 Mountain Bike |
Brown |
Trek 9000 Mountain Bike |
McCrae |
Trek 9000 Mountain Bike |
Viescas |
Trek 9000 Mountain Bike |
Sergienko |
Trek 9000 Mountain Bike |
Patterson |
Trek 9000 Mountain Bike |
Cencini |
Trek 9000 Mountain Bike |
Kennedy |
Trek 9000 Mountain Bike |
<< more rows here >> |
You might be asking: Why is this useful? Let’s say you need to produce a catalog of all products that is customized for each customer. Your sales department has asked you to create the information to be able to say “Dear Mr. Thompson” or “Dear Mrs. Brown,” print a mailing label on the outside cover, and then list all the products available. You could certainly include the Orders and Order_Details tables to fully link Customers with Products, but then you would get only the products that each customer had ever purchased. To solve your problem, you need to use unlinked tables that result in a Cartesian Product to get the information you need. (By the way, I saved the query to produce the list of all customers and products as CH20_Customer_Catalog in the Sales Orders sample database.)
Note: The SQL Standard allows you to simply list tables separated by commas when you want
to use unlinked tables (see Figure 20-1 of the SELECT Statement shown previously), and nearly all database systems accept
this syntax. However, as you have learned, the SQL Standard also defines the keywords
CROSS JOIN to explicitly indicate that you intend to get the Cartesian Product of
the table reference on the left with the table reference on the right.
When you save a View in Microsoft SQL Server using only commas to separate the table names, you’ll find the view saved with the commas replaced with CROSS JOIN. When you save a View in MySQL using only commas, you’ll find the view saved with the commas replaced with JOIN. (CROSS is the default if you don’t specify INNER or OUTER and do not include an ON clause.) PostgreSQL leaves the commas but replaces INNER JOIN with just JOIN—the default being INNER when not specified. Go figure! Microsoft Office Access doesn’t support CROSS JOIN, so I created all the sample queries using only the lowest common denominator—the comma syntax. I will, however, continue to use CROSS JOIN in the text and in the Clean Up steps to make it clear that’s what I am doing. In the SQL statements in the sample databases, I’ll use only commas.
Deciding to use a CROSS JOIN isn’t easy. You can think of these types of queries in two categories:
• Using data from two or more of the main data tables in your database—the tables that you built to store all the subjects and actions described by your application.
I mentioned Customers and all Products previously in this chapter. The same might apply to all Agents and Entertainers, Students and Courses, or even Teams unlinked with a second copy of the Teams table to list all potential matches.
• Using data from one or more of your main data tables and a “helper” or “driver” table that contains rows, for example, for all dates across a relevant time period.
You certainly have date information in your database, such as the OrderDate in the Orders table. But when you want to look at all dates across a range regardless of whether an order was placed on that date, you need a driver table to supply all the values. You can also use driver tables to supply “lookup” values such as a translation from Gender code to the relevant word or conversion of a grade point to a letter grade defined by a range of grade points.
Normally when you set about solving problems using data in your main data tables, you figure out where the data you want is stored and then gather all the tables required to link that data in some meaningful way. When the data you want is in two or more tables, you think about using a JOIN to link the tables, including any intervening tables necessary to logically link all the tables even if you don’t actually need data columns from some of those tables.
Solving problems with unlinked data involves breaking this mold and “thinking outside of the box” to get the answer you want. Let’s take a look again at the Customers and Products “catalog” problem, but let’s complicate it by flagging any products the customers have already ordered.
Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced
in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you,
I have combined the Translation/Clean Up steps for all the following examples to simplify
the process.
“Produce a list of all customer names and address and all products that we sell and indicate the products the customer has already purchased.”
From what you learned in Part III, “Working with Multiple Tables,” you would look at your table relationships to start to figure out how to proceed. Figure 20-4 shows you the standard way you would link Customers and Products, using the Orders and Order_Details tables as intermediaries.
Remember that I want all customers (including those who haven’t ordered anything) and all Products (including products never ordered). If you have your thinking cap on, you might come up with using a FULL OUTER JOIN (see Chapter 9), and you are correct—that would be one way to do it. Keep in mind that not all database systems support FULL OUTER JOINs, so that might not be a solution for you. You could also create one query (view) that LEFT JOINs Customers with Orders and Order_Details, and then use that query in another query to RIGHT JOIN with the Products table. When (remember CASE?) a key field in the Order_Details table is not Null, then indicate that the customer has previously ordered the product.
But this chapter is about solving problems with unlinked tables, so let’s tackle the problem head-on by using a CROSS JOIN of Customers and Products and a subquery in the SELECT clause to do a lookup to see if the customer ever ordered the product. Just for fun, let’s also look up the category description for each product. Here’s how to do it:
Translation/Clean Up |
Select customer first name, customer last name, customer street address, customer city, customer state, customer zip code, category description, product number, product name, retail price, and (CASE when the customer ID is in the (selection of customer ID from the orders table inner joined with the order details table on Orders.order number in the orders table equals = Order_Details.order number in the order details table where the Products.product number in the products table equals the = Order_Details.product number in the order details table then display ‘You purchased this!’, else ‘ ‘ END) display a blank from the customers table and the CROSS JOIN categories table inner joined with the products table on Categories.category ID in the categories table equals = Products.category ID in the products table sorted ORDER BY customer ID, category description, and product number |
|
Yes, there is an INNER JOIN to link Categories with Products, but the key part of the FROM clause is the CROSS JOIN with the Customers table. You can find this query saved as CH20_Customer_All_Products_PurchasedStatus in the Sales Orders sample database. As expected, the query returns 1,120 rows.
Note: Recall from Chapter 19 that Microsoft Office Access does not support the CASE expression. In the samples
I created in the Access databases, you’ll find that I used a built-in function called
Immediate If (IIf) that serves a similar purpose.
Let’s move on now to solving problems that require you to set up one or more tables containing a list of values that you’ll CROSS JOIN with other tables in your database to get your answer. I call this sort of table a “driver” table because the contents of the table “drive” the result you get. (If you also own Effective SQL that I wrote with my good friends, Doug Steele and Ben Clothier, we decided to call them “tally” tables in that book, but they’re the same thing.) Arguably, the most common type of driver table contains a list of dates or weeks or months that you can CROSS JOIN with your data to list all days or weeks or months and any matching events that occur on those dates.
Another use of a driver table is to define a categorization of values across a set of defined ranges. Examples include assigning a letter grade to a grade point score, rating instructors based on their proficiency rating, evaluating bowlers based on their average score, categorizing product prices, or categorizing the amount spent by a customer.
A really creative use of a driver table lets you “pivot” your data to display a result that looks more like a spreadsheet. A common example would be to display sales or purchases by month, with the months listed across by product or customer.
The SQL Standard defines WITH RECURSIVE that allows you to execute a stated SQL query multiple times in a loop. This can be useful to load a driver table with consecutive dates across a date range. Unfortunately, only a few database systems support this. To load my large driver tables, I resorted to using Visual Basic in Microsoft Office Access to perform the recursion necessary to load hundreds of rows into a date range table. (You can actually find some of the code I used if you dig around in the sample databases that are in Microsoft Office Access format.)
When your driver table is a simple set of ranges to translate to a value, it’s easy enough to load the data by hand. For example, here’s the list of values I entered into the ztblLetterGrades table you can find in the School Scheduling sample database:
LetterGrade |
LowGradePoint |
HighGradePoint |
A |
93 |
96.99 |
A- |
90 |
92.99 |
A+ |
97 |
120 |
B |
83 |
86.99 |
B- |
80 |
82.99 |
B+ |
87 |
89.99 |
C |
73 |
76.99 |
C- |
70 |
72.99 |
C+ |
77 |
79.99 |
D |
63 |
66.99 |
D- |
60 |
62.99 |
D+ |
67 |
69.99 |
F |
0 |
59.99 |
This should look familiar because it’s the same list of ranges that I used in the CH19_Students_Classes_Letter_Grades query in the previous chapter. (By the way, I named all the driver tables using a “ztbl” prefix to clearly separate them from the main data tables in each database.) One clear advantage to setting up a table like this is that you can easily change the range values should the need arise. You don’t have to go digging in the CASE clauses in each query that depends on the ranges to obtain the answer.
As I noted previously, a really creative use of a driver table lets you pivot your result to look like a spreadsheet. Quite a few database systems provide nonstandard ways to pivot data, but I’ll show you how to create a pivot using standard SQL and a driver table. You can find one such table I created for this purpose in the Sales Orders sample database called ztblMonths. Here is what part of the table looks like:
MonthYear |
YearNumber |
MonthNumber |
MonthStart |
MonthEnd |
January 2017 |
2017 |
1 |
1/1/2017 |
1/31/2017 |
February 2017 |
2017 |
2 |
2/1/2017 |
2/29/2017 |
2017 |
3 |
3/1/2017 |
3/31/2017 |
|
April 2017 |
2017 |
4 |
4/1/2017 |
4/30/2017 |
May 2017 |
2017 |
5 |
5/1/2017 |
5/31/2017 |
June 2017 |
2017 |
6 |
6/1/2017 |
6/30/2017 |
July 2017 |
2017 |
7 |
7/1/2017 |
7/31/2017 |
August 2017 |
2017 |
8 |
8/1/2017 |
8/31/2017 |
<< more rows here >> |
Additional columns…
January |
February |
March |
April |
May |
June |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
<< more rows here >> |
Additional columns…
July |
August |
September |
October |
November |
December |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
<< more rows here >> |
Looks a bit strange, doesn’t it? The little secret is you’ll use a WHERE clause to match the rows in this driver table with the date of the order, and then you will build columns by multiplying the total sales times the value found in a particular column to get a total for that month. When the order occurs in January 2017, only the January column on the matching row contains a 1 to result in 1 times quantity times the price. The value won’t be added to the columns for the other months because zero times any value is always zero. Another way to think of it is the ones and zeros define the horizontal “buckets” for each value encountered in your query that calculates the values you want to display. When a date matches the range defined by the row in the driver table, the 1 indicates the correct horizontal bucket in which to place the value. So, when a value is in January 2017, that value ends up in the January column by multiplying the column value times the expression that calculates the total.
Let’s use the two driver tables described in the previous section to solve problems. First, I want to display a grade letter based on each student’s numeric grade received for a class. I solved this problem using CASE in the previous chapter. Now I’ll solve it using the driver table.
“List all students, the classes for which they enrolled, the grade they received, and a conversion of the grade number to a letter.”
You can find this query saved as CH20_Students_Classes_Letter_Grades in the School Scheduling sample database. You’ll find that it returns the same 68 rows as the CH19_Student_Classes_Letter_Grades that I showed you in Chapter 19.
Now let’s take a look at using the second driver table.
“Show product sales for each product for all months, listing the months as columns.”
The “magic” happens when you restrict the row returned by the driver table to the month that matches the date in the orders table. When the date falls in January, only the January column has the value 1. That places the amount for that row in January in the correct “bucket” to be finally summed to get your pivoted result. You can find this query saved in the Sales Orders sample database as CH20_Product_Sales_Month_Pivot. The query returns 38 rows, but there are 40 products in the Products table. The two missing rows occur because two of the products have never been sold. (See CH09_Products_Never_Ordered to discover those two products.)
You now know the mechanics of constructing queries using CROSS JOIN and driver tables and have seen some of the types of requests you can answer with these techniques. Let’s take a look at a fairly robust set of samples, all of which use CROSS JOIN between two data tables or with a driver table. These examples come from each of the sample databases, and they illustrate the use of these techniques to solve “thinking outside the box” problems.
Note: Remember in the Introduction that I warned you that results from each database system
won’t necessarily match the sort order you see in examples in this book unless you
include an ORDER BY clause. Even when you include that specification, the system might
return results in columns not included in the ORDER BY clause in a different sequence
because of different optimization techniques.
If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. You must open the design of the view and execute it from there to see the ORDER BY clause honored.
Also, when you use GROUP BY, you’ll often see the results returned by your database system as though the rows are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.
I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data that you’ll find on the book’s website, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), using “CH20” as the leading part of the query or view name. You can follow the instructions in the Introduction to this book to load the samples onto your computer and try them out.
Note: Remember that all of the field names and table names used in these examples are drawn
from the sample database structures shown in Appendix B, “Schema for the Sample Databases.”
Because many of these examples use complex joins, the optimizer for your database system may choose a different way to solve these queries. For this reason, the first few rows I show you may not exactly match the result you obtain, but the total number of rows should be the same. Keep in mind that for any SQL Server View that contains an ORDER BY clause, you must open the view in Design mode first and then execute it to see the specified order. If you SELECT * from the View, SQL Server does not honor the ORDER BY clause.
This first set of sample statements shows you problems you can solve using unlinked tables. All of them use a CROSS JOIN between two data tables.
“List all employees and customers who live in the same state and indicate whether the customer has ever placed an order with the employee.”
CH20_Employees_Same_State_Customers (83 rows)
Emp FirstName |
Emp LastName |
Cust FirstName |
Cust LastName |
Cust AreaCode |
Cust PhoneNumber |
Cust Status |
Ann |
Patterson |
William |
Thompson |
425 |
555-2681 |
Ordered from you. |
Ann |
Patterson |
Gary |
Hallmark |
253 |
555-2676 |
Ordered from you. |
Ann |
Patterson |
Dean |
McCrae |
425 |
555-2506 |
Ordered from you. |
Ann |
Patterson |
John |
Viescas |
425 |
555-2511 |
Ordered from you. |
Ann |
Patterson |
Andrew |
Cencini |
206 |
555-2601 |
Ordered from you. |
Patterson |
Liz |
Keyser |
425 |
555-2556 |
Ordered from you. |
|
Ann |
Patterson |
Julia |
Schnebly |
206 |
555-9936 |
Ordered from you. |
Ann |
Patterson |
Suzanne |
Viescas |
425 |
555-2686 |
Ordered from you. |
Ann |
Patterson |
Jeffrey |
Tirekicker |
425 |
555-9999 |
|
Ann |
Patterson |
Joyce |
Bonnicksen |
425 |
555-2726 |
Ordered from you. |
<< more rows here >> |
Note: If you’re really sharp, you probably figured out that I could have solved the problem
using an INNER JOIN between Employees and Customers ON EmpState = CustState. As I
have stated many times before, there’s almost always more than one way to solve a
problem. Now you know how to solve it using a CROSS JOIN.
“List all customer preferences and the count of first, second, and third preferences.”
Note: This is a bit tricky because you first need to “pivot” each customer’s first, second,
and third preferences (as indicated by the PreferenceSeq column), and then count them.
You could use a driver table to help perform the pivot, but with only three unique
values to pivot into columns, it’s just as easy to do it with CASE.
CH20_Customer_Style_Preference_Rankings (20 rows)
StyleID |
StyleName |
FirstPreference |
SecondPreference |
ThirdPreference |
21 |
Standards |
2 |
2 |
0 |
15 |
Jazz |
2 |
1 |
0 |
19 |
Rhythm and Blues |
2 |
0 |
1 |
22 |
Top 40 Hits |
2 |
0 |
0 |
10 |
Contemporary |
1 |
2 |
0 |
8 |
Classic Rock & Roll |
1 |
1 |
0 |
20 |
Show Tunes |
1 |
1 |
0 |
3 |
60’s Music |
1 |
0 |
0 |
11 |
Country Rock |
1 |
0 |
0 |
14 |
Chamber Music |
1 |
0 |
0 |
23 |
Variety |
1 |
0 |
0 |
<< more rows here >> |
Notice that although there are 25 distinct musical styles defined in the database, the query returns only 20 rows. The styles that are missing aren’t ranked first, second, or third by any customer.
“List all students who have completed English courses and rank them by Quintile on the grades they received.”
A Quintile divides a group into five equal ranges. When applied to student rankings, a quintile will be 20% of students—those in the top 20 percent are in the first quintile, those in the next 20% are in the second quintile, and so on. To solve this, you need to CROSS JOIN two queries:
1. A query that assigns a ranking number to each student who completed an English course. You can calculate the rank by counting the number of students who have a grade greater than or equal to the current student’s grade. The student with the highest grade will be ranked #1, the student with the second highest grade #2, and so on.
2. A query that counts all students who completed an English course. You can use this count times 0.2, 0.4, 0.6, and 0.8 to figure out the quintile. The students whose rank (as calculated by the first query) is less than or equal to 0.2 times the total number of students is in the first quintile.
Translation/Clean Up |
Select S1.subject ID from the first query, S1.student first name from the first query, S1.student last name from the first query, S1.class status from the first query, S1.grade from the first query, S1.category ID from the first query, S1.subject name from the first query, S1.rank in category from the first query, StudCount.number of students from the student count query, and (CASE when the rank in category <= is less than or equal to 0.2 * times the number of students then return ‘First’ when the rank in category <= is less than or equal to 0.4 * times the number of students then return ‘Second’ when the rank in category <= is less than or equal to 0.6 * times the number of students then return ‘Third’ when the rank in category <= is less than or equal to 0.8 * times the number of students then return ‘Fourth’ else return ‘Fifth’ END) as the quintile from the (selection of Subjects.subject ID in the subjects table, Students.student first name in the students table, Students.student last name in the students table, Student_Schedules.class status in the student schedules table, Student_Schedules.grade in the student schedules table, Subjects.category ID in the subjects table, Subjects.subject name in the subjects table, and the (selection of the count(*) of all rows from the classes table inner joined with the student schedules table AS SS2 on Classes.class ID in the classes table = equals SS2.class ID in the student schedules table, then inner joined with the subjects table AS S3 on S3.subject ID in the subjects table = equals Classes.subject ID in the classes table where S3.category ID in the subjects table = equals ‘ENG’ and SS2.grade in the student schedules table >= is greater than or equal to Student_Schedules.grade in the student schedules table) as rank in category) from the subjects table inner joined with the classes table on Subjects.subject ID in the subjects table = equals Classes.subject ID in the classes table, then inner joined with the student schedules table on Student_Schedules.class ID in the student schedules table = equals Classes.class ID in the classes table, then inner joined with the students table on Students.student ID in the students table = equals Student_Schedules.student ID in the student schedules table where Student_Schedules.class status in the student schedules table = equals 2 and Subjects.category ID in the subjects table = equals ‘ENG’) AS S1 CROSS JOIN and the (selection of the count(*) of all rows as number of students from the classes table AS C2 inner joined with the student schedules table AS SS3 on C2.class id in the classes table = equals SS3.class ID in the student schedules table, then inner joined with the subjects table AS S2 on S2.subject ID in the subjects table = equals C2.subject ID in the classes table where SS3.class status in the student schedules table = equals 2 and S2.category ID in the subjects table = equals ‘ENG’) As student count ordered by S1.grade in the first query descending |
|
CH20_English_Student_Quintiles (18 rows)
Subject ID |
Stud First Name |
Stud Last Name |
Class Status |
Grade |
Category |
Subject Name |
RankIn Category |
Num Students |
Quintile |
37 |
Scott |
Bishop |
2 |
98.07 |
ENG |
Composition - Fundamentals |
1 |
18 |
First |
37 |
Sara |
Sheskey |
2 |
97.59 |
ENG |
Composition - Fundamentals |
2 |
18 |
First |
37 |
John |
Kennedy |
2 |
93.01 |
ENG |
Composition - Fundamentals |
3 |
18 |
First |
37 |
Brannon |
Jones |
2 |
91.66 |
ENG |
Composition - Fundamentals |
4 |
18 |
Second |
37 |
Janice |
Galvin |
2 |
91.44 |
ENG |
Composition - Fundamentals |
5 |
18 |
Second |
38 |
Kendra |
Bonnicksen |
2 |
88.91 |
ENG |
Composition - Intermediate |
6 |
18 |
Second |
37 |
George |
Chavez |
2 |
88.54 |
ENG |
Composition - Fundamentals |
7 |
18 |
Second |
37 |
Marianne |
Wier |
2 |
87.4 |
ENG |
Composition - Fundamentals |
8 |
18 |
Third |
<< more rows here >> |
Note: This query uses the grade from each English class to rank the students, so it is
possible to see a student listed more than once if the student has completed more
than one English class. To rank students for all English classes taken, you would
first have to calculate the average of credits times grade divided by credits for
each student and then rank those results.
“List all potential matches between teams without duplicating any team pairing.”
To solve this problem, you need two copies of the Teams table in your FROM clause. That will give you all combinations of two teams, but you obviously don’t want to list a team bowling itself. Think about dealing with each team one at a time. When looking at Team 1, you need to match it with any team that has a higher TeamID value. Looking at Team 2, you’ve already matched it with Team 1 on the first pass, but any higher value in TeamID will work. So as long as the TeamID in the second copy of the table has a higher value than the TeamID in the first copy of the table, you’re good to go!
Translation/Clean Up |
Select Teams.team ID from the 1st copy of the teams table as team1 ID, Teams.TeamName from the 1st copy of the teams table as team1 name, Teams_1.team ID from the 2nd copy of the teams table as team1 name, and Teams_1.team name from the 2nd copy of the teams table as team2 name from the teams table CROSS JOIN and a 2nd copy of the teams table AS Team_1 where Teams_1.team ID in the 2nd copy of the teams table is greater than > Teams.team ID in the 1st copy of the teams table ordered by Teams.team ID in the 1st copy of the teams table, and Teams_1.team ID in the 2nd copy of the teams table |
|
|
CH20_Team_Pairings (45 rows)
Team1ID |
Team1Name |
Team2ID |
Team2Name |
1 |
Marlins |
2 |
Sharks |
1 |
Marlins |
3 |
Terrapins |
1 |
Marlins |
4 |
Barracudas |
1 |
Marlins |
5 |
Dolphins |
1 |
Marlins |
6 |
Orcas |
1 |
Marlins |
7 |
Manatees |
1 |
Marlins |
8 |
Swordfish |
1 |
Marlins |
9 |
Huckleberrys |
1 |
Marlins |
10 |
MintJuleps |
2 |
Sharks |
3 |
Terrapins |
2 |
Sharks |
4 |
Barracudas |
<< more rows here >> |
Note: You might look at this query and ask: “Couldn’t I also solve this with an INNER JOIN
moving the WHERE clause to an ON clause?” You would be absolutely correct for most
database systems that support something other than an equijoin in the ON clause. As
usual, there’s always more than one way to solve a particular problem using SQL.
Let’s move on to solving some problems using driver tables. All the following solutions use driver tables that I’ve already built for you in the sample databases.
“The warehouse manager has asked you to print an identification label for each item in stock.”
You can look up the quantity on hand in the Products table. The trick here is to use a driver table that has one column of integers, and each row has a successive value from 1 to the maximum number you might have in stock. You can use ztblSeqNumbers in the sample database for this purpose.
Translation/Clean Up |
Select ztblSeqNumbers.Sequence from the sequence driver table, Products.product number from the products table, and Products.product name from the products table from ztblSeqNumbers the sequence driver table CROSS JOIN and the products table where ztblSeqNumbers.sequence in the sequence driver table is less than or equal to <= Products.quantity on hand in the products table ordered by Products.product number from the products table, and ztblSequenceNumbers.sequence from the sequence driver table |
|
|
CH20_Product_Stock_Labels (813 rows)
Sequence |
ProductNumber |
ProductName |
1 |
1 |
Trek 9000 Mountain Bike |
2 |
1 |
Trek 9000 Mountain Bike |
3 |
1 |
Trek 9000 Mountain Bike |
4 |
1 |
Trek 9000 Mountain Bike |
5 |
1 |
Trek 9000 Mountain Bike |
6 |
1 |
Trek 9000 Mountain Bike |
1 |
2 |
Eagle FS-3 Mountain Bike |
2 |
2 |
Eagle FS-3 Mountain Bike |
<< more rows here >> |
“Produce a booking calendar that lists for all weeks in January 2018 any engagement during that week.”
Note: Remember to find an engagement that occurs in any part of a date span you need to
find engagements that begin before or on the end date of the span and end after or
on the start date of the span. You need to do a similar thing to find weeks in which
any part of the week falls in January 2018.
CH20_All_Weeks_Jan2018_All_Engagements (50 rows)
Week Start |
WeekEnd |
Entertainer ID |
EntStage Name |
CustFirst Name |
CustLast Name |
Start Date |
End Date |
2017-12-31 |
2018-01-05 |
1001 |
Carol Peacock Trio |
Mark |
Rosales |
2017-12-30 |
2018-01-08 |
2018-01-07 |
2018-01-12 |
1001 |
Carol Peacock Trio |
Mark |
Rosales |
2017-12-30 |
2018-01-08 |
2018-01-12 |
1001 |
Carol Peacock Trio |
Matt |
Berg |
2017-01-09 |
2018-01-09 |
|
2018-01-21 |
2018-01-26 |
1001 |
Carol Peacock Trio |
Dean |
McCrae |
2017-01-23 |
2018-01-31 |
2018-01-28 |
2018-02-02 |
1001 |
Carol Peacock Trio |
Dean |
McCrae |
2018-01-23 |
2018-01-31 |
2017-12-31 |
2018-01-05 |
1002 |
Topazz |
Estella |
Pundt |
2018-01-02 |
2018-01-10 |
2018-01-07 |
2018-01-12 |
1002 |
Topazz |
Estella |
Pundt |
2018-01-02 |
2018-01-10 |
2017-12-31 |
2018-01-05 |
1003 |
JV & the Deep Six |
Carol |
Viescas |
2017-12-31 |
2018-01-05 |
2018-01-07 |
2018-01-12 |
1003 |
JV & the Deep Six |
Mark |
Rosales |
2018-01-09 |
2018-01-10 |
2018-01-28 |
2018-02-02 |
1003 |
JV & the Deep Six |
Zachary |
Ehrlich |
2018-01-29 |
2018-02-02 |
<< more rows here >> |
“Display a list of classes by semester, date, and subject.”
Note: This is a bit tricky because the Classes table has an unnormalized list of days listed
as columns. When a class is scheduled for a given day, the value is 1 or “true” for
that column. I need to use the ztblSemesterDays driver table and include a class when
the semester matches and the day of the week in the driver table has the “flag” turned
on in the appropriate day column. Because some database systems use -1 (Microsoft
Office Access, for example) and others use 1 for “true,” I will test for not equal
to 0 to determine whether the column has a “true” value.
Select ztblSemesterDays.semester no from the semester driver table, ztblSemesterDays.semester date from the semester driver table, Classes.start time from the classes table, ztblSemesterDays.semester day name from the semester driver table, Subjects.subject code from the subjects table, Subjects.subject name from the subjects table, Class_Rooms.building code from the class rooms table, Class_Rooms.class room ID from the class rooms table from ztblSemesterDays the semester driver table CROSS JOIN and the subjects table then inner joined with the classes table on Subjects.subject ID in the subjects table equals = Classes.subject ID in the classes table then inner joined with the class rooms table on Class_Rooms.class room ID in the class rooms table equals = Classes.class room ID in the classes table where Classes.semester number in the classes table equals = ztblSemesterDays.semester no in the semester driver table and 1 equals = (CASE when ztblSemesterDays.semester day name in the semester driver table equals = ‘Monday’ and Classes. Monday schedule in the classes table does not equal <> 0 then return 1 when ztblSemesterDays.semester day name in the semester driver table equals = ‘Tuesday’ and Classes.Tuesday schedule in the classes table does not equal <> 0 then return 1 when ztblSemesterDays.semester day name in the semester driver table equals = ‘Wednesday’ and Classes.Wednesday schedule in the classes table does not equal <> 0 then return 1 when ztblSemesterDays.semester day name in the semester driver table equals = ‘Thursday’ and Classes.Thursday schedule in the classes table does not equal <> 0 then return 1 when ztblSemesterDays.semester day name in the semester driver table equals = ‘Friday’ and Classes.Friday schedule in the classes table does not equal <> 0 then return 1 when ztblSemesterDays.semester day name in the semester driver table equals = ‘Saturday’ and Classes.Saturday schedule in the classes table does not equal <> 0 then return 1 else return 0 END) ordered by ztblSemesterDays.semester no in the semester driver table, ztblSemesterDays.semester date in the semester driver table, Subjects.subject code in the subjects table, Class_Rooms.building code in the class rooms table, Class_Rooms.class room ID in the class rooms table, and Classes.start time in the classes table |
|
|
|
CH20_Class_Schedule_Calendar (7,221 rows)
Semester No |
SemDate |
Start Time |
SemDay Name |
Subject Code |
Subject Name |
Building |
Class RoomID |
1 |
2018-09-11 |
16:00 |
Monday |
ACC 210 |
Financial Accounting Fundamentals I |
IB |
3305 |
1 |
2018-09-11 |
15:30 |
Monday |
ART 101 |
Design |
AS |
1619 |
1 |
2018-09-11 |
8:00 |
Monday |
ART 111 |
Drawing |
AS |
1627 |
1 |
2018-09-11 |
9:00 |
Monday |
ART 111 |
Drawing |
AS |
1627 |
2018-09-11 |
11:00 |
Monday |
ART 251 |
Art History |
LB |
1231 |
|
1 |
2018-09-11 |
14:00 |
Monday |
ART 251 |
Art History |
LB |
1231 |
1 |
2018-09-11 |
10:00 |
Monday |
BIO 100 |
Biological Principles |
AS |
1532 |
1 |
2018-09-11 |
12:00 |
Monday |
BIO 101 |
General Biology |
AS |
1532 |
1 |
2018-09-11 |
13:30 |
Monday |
BIO 280 |
Microbiology |
AS |
1530 |
1 |
2018-09-11 |
7:30 |
Monday |
CHE 101 |
Chemistry |
AS |
1519 |
<< more rows here >> |
“Print a bowler mailing list, but skip the first three labels on the first page that have already been used.”
Note: What you want to do is produce three blank name and address lines to bypass the used
labels, and then list all the bowlers and their addresses. You can use ztblSkipLabels
in a SELECT statement that substitutes blanks for all the fields and stops when the
number in the driver table becomes greater than the number of labels you want to skip.
Follow that with a UNION ALL of a SELECT statement to produce the names and addresses
for all bowlers. You must use a UNION ALL because a simple UNION would eliminate all
the duplicate blank lines you produced in the first query.
Translation / Clean Up |
Select blanks ‘ ‘ as bowler last name, blanks ‘ ‘ as bowler first name, blanks ‘ ‘ as bowler address, blanks ‘ ‘ as bowler city, blanks ‘ ‘ as bowler state, blanks ‘ ‘ as bowler zip from ztblSkipLabels the skip labels driver table where the ztblSkipLabels.label count in the skip labels driver table is less than or equal to <= 3 unioned with all rows in select bowler last name, bowler first name, bowler address, bowler city, bowler state, and bowler zip from the bowlers table ordered by bowler zip, and bowler last name |
|
CH20_Bowler_Mailing_Skip_3 (35 rows)
BowlerLast Name |
BowlerFirst Name |
Bowler Address |
Bowler City |
Bowler State |
Bowler Zip |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Patterson |
Kathryn |
16 Maple Lane |
Auburn |
WA |
98002 |
Patterson |
Rachel |
16 Maple Lane |
Auburn |
WA |
98002 |
Patterson |
Ann |
16 Maple Lane |
Auburn |
WA |
98002 |
Patterson |
Neil |
16 Maple Lane |
Auburn |
WA |
98002 |
Patterson |
Megan |
16 Maple Lane |
Auburn |
WA |
98002 |
Viescas |
Carol |
16345 NE 32nd Street |
Bellevue |
WA |
98004 |
Sheskey |
Sara |
17950 N 59th |
Seattle |
WA |
98011 |
<< more rows here >> |
I started this chapter with a definition of unlinked data and a discussion of how to use CROSS JOIN to handle unlinked data in your queries. At the end of the first section, I outlined the two cases where a CROSS JOIN can be useful—linking main data tables with each other and linking main data tables with a “driver” table.
I next covered solving problems using main data tables linked to each other and explained a complex example. After that, I covered using a driver table, showing you how to set one up and discussing another two examples. Finally, I showed you and explained examples from four of the sample databases using both main tables linked to each other and a driver table linked to one or more main tables. I encourage you to try to work out the problems presented in the final section that follows.
The following problems show you the request statement and the name of the solution query in the sample databases. If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your Result Set is the same.
1. “List months and the total sales by products for each month.”
(Hint: Use the ztblMonths driver table I provided.)
You can find the solution in CH20_Product_Sales_ByMonth (253 rows).
2. “Produce a customer mailing list, but skip the five labels already used on the first page of the labels.”
(Hint: Use the ztblSeqNumbers driver table I provided.)
You can find the solution in CH20_Customer_Mailing_Skip_5 (33 rows).
3. “The sales manager wants to send out 10% discount coupons for customers who made large purchases in December 2017. Use the ztblPurchaseCoupons table to determine how many coupons each customer gets based on the total purchases for the month.”
(Hint: You need to CROSS JOIN the driver table with the Customers table joined with a subquery that calculates the total spend for each customer.)
You can find the solution in CH20_Customer_Dec_2017_Order_Coupons (27 rows).
4. “Using the solution to #3 above, print out one 10% off coupon based on the number of coupons each customer earned.”
(Hint: Use the ztblSeqNumbers driver table that I provided with the query in the above problem.)
You can find the solution in CH20_Customer_Discount_Coupons_Print (309 rows).
5. “Display all months in 2017 and 2018, all products, and the total sales (if any) registered for the product in the month.”
(Hint: Use a CROSS JOIN between the ztblMonths driver table and the Products table and use a subquery to fetch the product sales for each product and month.)
You can find the solution in CH20_Product_Sales_All_Months_2017_2018 (960 rows).
6. “Display all products and categorize them from Affordable to Expensive.”
(Hint: Use a CROSS JOIN with ztblPriceRanges.) You can find the solution in CH20_Product_Price_Ranges (40 rows).
1. “List all agents and any entertainers who haven’t had a booking since February 1, 2018.”
(Hint: Use a CROSS JOIN between Agents and Entertainers and use NOT IN on a subquery in the WHERE clause to find entertainers not booked since February 1, 2018.)
You can find the solution in CH20_Agents_Entertainers_Unbooked_Feb1_2018 (162 rows).
2. “Show all entertainer styles and the count of the first, second, and third strengths.”
(Hint: This is similar to the CH20_Customer_Style_Preference_Rankings query I showed you earlier. Use a CROSS JOIN of the Musical_Styles table with a subquery that “pivots” the strengths into three columns, then count the columns.)
You can find the solution in CH20_Entertainer_Style_Strength_Rankings (17 rows).
3. “Display customers and their first, second, and third-ranked preferences along with entertainers and their first, second, and third-ranked strengths, then match customers to entertainers when the first or second preference matches the first or second strength.”
(Hint: Create a query on musical styles and customers and pivot the first, second, and third strengths using a CASE expression. You will need to use MAX and GROUP BY because the pivot will return Null values for some of the positions. Do the same with entertainers and musical styles, then CROSS JOIN the two subqueries and return the rows where the preferences and strengths match in the first two positions.)
You can find the solution in CH20_Customers_Match_Entertainers_FirstSecond_PrefStrength (6 rows).
4. “List all months across and calculate each entertainer’s income per month.”
(Hint: Use the ztblMonths driver table to pivot the amounts per month and use SUM to total the amounts per entertainer.)
You can find the solution in CH20_Entertainer_BookingAmount_ByMonth (12 rows).
5. “Display all dates in December 2017 and any entertainers booked on those days.”
(Hint: Build a subquery using a CROSS JOIN between the ztblDays driver table and a JOIN on entertainers and engagements, then LEFT JOIN that with ztblDays again to get all dates.)
You can find the solution in CH20_All_December_Days_Any_Bookings (79 rows).
6. “Produce a customer mailing list, but skip the four labels already used on the first page of labels.”
You can find the solution in CH20_Customer_Mailing_Skip_4 (19 rows).
1. “List all students and the classes they could take, excluding the subjects enrolled or already completed. Be sure to list any subject prerequisite.”
(Hint: Do a CROSS JOIN between students and subjects joined with classes, and use a subquery to eliminate classes found in the student schedules table for the current student where the class status in the student schedules table is not 1 (enrolled) or 2 (completed).)
You can find the solution in CH20_Students_Additional_Courses (1,894 rows).
2. “Display a count of students by gender and marital status by state of residence in columns across.”
(Hint: Use the ztblGenderMatrix and ztblMaritalStatusMatrix driver tables to pivot your values.)
You can find the solution in CH20_Student_Crosstab_Gender_MaritalStatus (4 rows).
3. “Calculate an average proficiency rating for all teaching staff across the subjects they teach and show an overall rating based on the values found in the ztblProfRatings driver table.”
You can find the solution in CH20_Staff_Proficiency_Ratings (24 rows).
4. “Create a mailing list for students, but skip the first two labels already used on the first page.”
You can find the solution in CH20_Student_Mailing_Skip_2 (20 rows).
1. “Show bowlers and a rating of their raw score averages based on the values found in the ztblBowlerRatings driver table.”
You can find the solution in CH20_Bowler_Ratings (32 rows).
2. “List all weeks from September through December 2017 and the location of any tournament scheduled for those weeks.”
You can find the solution in CH20_Tournament_Week_Schedule_2017 (19 rows).