“Do not quench your inspiration and your imagination; do not become the slave of your model.”
—VINCENT VAN GOGH
Up to this point, I have primarily focused on solving problems using single tables. You now know how to get simple answers from one table. You also know how to get slightly more complex answers by using expressions or by sorting the result set. In other words, you now can draw the perfect eyes, chin, mouth, or nose. In this chapter, I’ll show you how to link or join multiple parts to form a portrait.
In Chapter 2, “Ensuring Your Database Structure Is Sound,” I emphasized the importance of separating the data in your tables into individual subjects. Most problems you need to solve in real life, however, require that you link data from multiple tables—customers and their orders, customers and the entertainers they booked, bowlers and their scores, students and the classes they took, or recipes and the ingredients you need. To solve these more complex problems, you must link, or join, multiple tables to find your answer. You use the JOIN keyword to do so.
The previous chapter showed how useful it is to intersect two sets of data to solve problems. As you recall, however, an INTERSECT involves matching all the columns in both result sets to get the answer. A JOIN is also an intersection, but it’s different because you ask your database system to perform a JOIN only on the columns you specify. Thus, a JOIN lets you intersect two very dissimilar tables on matching column values. For example, you can use a JOIN to link customers to their orders by matching the CustomerID in the Customers table to the CustomerID in the Orders table.
As you’ll see later, you specify a JOIN as part of the FROM clause in an SQL statement. A JOIN defines a “logical table” that is the result of linking two tables or result sets. By placing the JOIN in a FROM clause, you define a linking of tables from which the query extracts the final result set. In other words, the JOIN replaces the single table name you learned to use in the FROM clause in earlier chapters. As you’ll learn later in this chapter, you can also specify multiple JOIN operations to create a complex result set on more than two tables.
The SQL Standard defines several ways to perform a JOIN, the most common of which is the INNER JOIN. Imagine for a moment that you’re linking students and the classes for which they registered. You might have some students who have been accepted to attend the school but have not yet registered for any classes, and you might also have some classes that are on the schedule but do not yet have any students registered.
An INNER JOIN between the Students table and the Classes table returns rows in the Students table linked with the related rows in the Classes table (via the Student_Schedules table)—but it returns neither students who have not yet registered for any classes nor any classes for which no student is registered. An INNER JOIN returns only those rows where the linking values match in both of the tables or in result sets.
Most of the time, you specify the primary key from one table and the related foreign key from the second table as the link that JOIN uses. If you remember from Chapter 2, a foreign key must be the same data type as its related primary key. However, it’s also “legal” to JOIN two tables or result sets on any columns that have what the SQL Standard calls “JOIN eligible” data types.
In general, you can join a character column to another character column or expression, any type of number column (for example, an integer) to any other type of number column (perhaps a floating-point value), and any date column to another date column. This allows you, for example, to JOIN rows from the Customers table to rows from the Employees table on the city or ZIP Code columns (perhaps to find out which Customers and Employees live in the same city or postal region).
Note: Just because you can define a JOIN on any JOIN eligible columns in two tables doesn’t mean you should. The linking columns must have the same data meaning for the JOIN to make sense.
For example, it doesn’t make sense to JOIN customer name with employee address even
though both columns are character data type. You won’t get any rows in the result
set unless someone has put a name in the employee address column by mistake. Likewise,
it doesn’t make sense to JOIN StudentID with ClassID even though both are numbers.
You might get some rows in the result set, but they won’t make any sense.
Even when it makes sense to JOIN linking columns, you might end up constructing a request that takes a long time to solve. For example, if you ask for a JOIN on columns for which your database administrator has not defined an index, your database system might have to do a lot of extra work. Also, if you ask for a JOIN on expressions—for example, a concatenation of first name and last name from two tables—your database system must not only form the result column from your expression for all rows but also might have to perform multiple scans of all the data in both tables to return the correct result.
Before I jump into the syntax for a JOIN, there’s a key bit of information that I haven’t covered yet. Because you’ve been creating queries on a single table, you haven’t had to worry about qualifying column names. But when you start to build queries that include multiple tables (as you will when you use a JOIN), you’ll often include two or more tables that each have columns with the same name. If you remember from Chapter 2, I recommended that you create a foreign key in a related table by copying the primary key—including its name—from one table into another.
So, how do you make it crystal clear to your database system which copy of a field you are talking about in your query syntax? The simple answer is that you provide a column reference that includes the table name. Figure 8-1 shows the diagram for a column reference.
Although you can use only the column name by itself in any clause in a statement that you write in SQL, you can also explicitly qualify a column name with the name of its parent table. If the column name isn’t unique in all the tables you include in your FROM clause, then you must qualify the column name with the name of its parent table. Here’s how you would write a simple SELECT statement on the Employees table to incorporate qualified column names:
|
|
Now that I’ve covered that little tidbit, you can move on to studying the syntax of a JOIN operation.
You can think of what you’ve studied so far as taking a nice ride down a country lane or a quick jaunt across town to pick up some groceries. Now let’s strap on our seat belts and venture out onto the highway—let’s examine the INNER JOIN syntax.
I’ll start with something simple—an INNER JOIN on two tables. Figure 8-2 shows the syntax for creating the query.
As you can see, the FROM clause is now just a little more complicated. (I left out the WHERE and ORDER BY clauses for now to simplify things.) Instead of a single table name, you specify two table names and link them with the JOIN keyword. Note that the INNER keyword, which is optional, specifies the type of JOIN. As you’ll learn in the next chapter, you can also specify an OUTER JOIN. If you don’t explicitly state the type of JOIN you want, the default is INNER. I recommend that you always explicitly state the type of JOIN you want so that the nature of your request is clear.
Note: Those who are following along with the complete syntax diagrams in Appendix A, “SQL Standard Diagrams,” will find
Table Reference JOIN Table Reference
described as part of the Joined Table defined term. Table Reference can be either a table_name
or a Joined Table
, and the FROM clause of a SELECT statement uses Table Reference
. I “rolled up” these complex definitions into a single diagram to make it easy to
study a simple two-table JOIN. I’ll be using this same simplification technique in
diagrams throughout the remainder of this chapter.
The critical part of an INNER JOIN is the ON or USING clause that follows the second table and tells your database system how to perform the JOIN. To solve the JOIN, your database system logically combines every row in the first table with every row in the second table. (This combination of all rows from one table with all rows from a second table is called a Cartesian product. I show you how to use a Cartesian product to solve problems in Chapter 20, “Using Unlinked Data and ‘Driver’ Tables.”) It then applies the criteria in the ON or USING clauses to filter out the actual rows to be returned.
You learned about using a search condition to form a WHERE clause in Chapter 6, “Filtering Your Data.” You can use a search condition in the ON clause within a JOIN to specify a logical test that must be true in order to return any two linked rows. Keep in mind that it only makes sense to write a search condition that compares at least one column from the first table with at least one column from the second table. Although you can write a very complex search condition, you’ll typically specify a simple equals comparison test on the primary key columns from one table with the foreign key columns from the other table.
Let’s look at a simple example. In a well-designed database, you should break out complex classification names into a second table and then link the names back to the primary subject table via a simple key value. You do this to help prevent data entry errors. Anyone using your database chooses from a list of classification names rather than typing the name (and perhaps misspelling it) in each row. For example, in the Recipes sample database, recipe classes appear in a table separate from recipes. Figure 8-3 shows the relationship between the Recipe_Classes and Recipes tables.
When you want to retrieve information about recipes and the related RecipeClassDescription from the database, you don’t want to see the RecipeClassID code numbers from the Recipes table. Let’s see how to approach this problem with a JOIN.
Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced
in Chapter 4, “Creating a Simple Query.”
“Show me the recipe title, preparation, and recipe class description of all recipes in my database.”
Translation |
Select recipe title, preparation, and recipe class description from the recipe classes table joined with the recipes table on recipe class ID in the recipe classes table matching recipe class ID in the recipes table |
Clean Up |
Select recipe title, preparation, and recipe class description from the recipe classes table inner joined with the recipes table on recipe_classes.recipe class ID in the recipe classes table matching = recipes.recipe class ID in the recipes table |
|
|
Note: You might have noticed that I’ve started to format the Clean Up step into phrases
that more closely mirror the final set of clauses I need in SQL. As you begin to build
more complex queries, I recommend this technique to help you move from the Clean Up
step to the final SQL.
When beginning to use multiple tables in your FROM clause, you should always fully qualify each column name with the table name, wherever you use it, to make absolutely clear what column from what table you want. (Now you know why I took a minute to explain a column reference!) Note that I had to qualify the name of RecipeClassID in the ON clause because there are two columns named RecipeClassID—one in the Recipes table and one in the Recipe_Classes table. I didn’t have to qualify RecipeTitle, Preparation, or RecipeClassDescription in the SELECT clause because each of these column names appears only once in all the tables. If I want to include RecipeClassID in the output, I must tell the database system which RecipeClassID I want—the one from Recipe_Classes or the one from Recipes. To write the query with all the names fully qualified, I should say this:
|
|
Note: Although most commercial implementations of SQL support the JOIN keyword, some do
not. If your database does not support JOIN, you can still solve the previous problem
by listing all the tables you need in the FROM clause and then moving your search
condition from the ON clause to the WHERE clause. In databases that do not support
JOIN, you solve the example problem like this:
SELECT Recipes.RecipeTitle, Recipes.Preparation,
Recipe_Classes.RecipeClassDescription
FROM Recipe_Classes, Recipes
WHERE Recipe_Classes.RecipeClassID =
Recipes.RecipeClassID
For a beginner, this syntax is probably much more intuitive for simple queries. However, the SQL Standard syntax allows you to fully define the source for the final result set entirely within the FROM clause. Think of the FROM clause as fully defining a linked result set from which the database system obtains your answer. In the SQL Standard, you use the WHERE clause only to filter rows out of the result set defined by the FROM clause.
Not too difficult, is it? But what happened to the USING clause that I showed you in Figure 8-2? If the matching columns in the two tables have the same name and all you want to do is join on equal values, use the USING clause and list the column names. Let’s do the previous problem again with USING.
“Show me the recipe title, preparation, and recipe class description of all recipes in my database.”
Select recipe title, preparation, and recipe class description from the recipe classes table joined with the recipes table using recipe class ID |
|
Clean Up |
Select recipe title, preparation, and recipe class description from the recipe classes table inner joined with the recipes table using recipe class ID |
|
|
Some database systems do not yet support USING. If you find that you can’t use USING with your database, you can always get the same result with an ON clause and an equals comparison.
Note: The SQL Standard also defines a NATURAL JOIN, which links the two specified tables
by matching all the columns with the same name. If the only common columns are the
linking columns and your database supports NATURAL JOIN, you can solve the example
problem like this:
SELECT Recipes.RecipeTitle, Recipes.Preparation,
Recipe_Classes.RecipeClassDescription
FROM Recipe_Classes
NATURAL INNER JOIN Recipes
Do not specify an ON or USING clause when using the NATURAL keyword. Keep in mind that the INNER keyword is optional. If you specify NATURAL JOIN, an INNER JOIN is assumed.
As mentioned earlier in this section, your database system logically creates the combination of every row in the first table with every row in the second table and then applies the criteria you specify in ON or USING. This sounds like a lot of extra work for your database to first build all the combinations and then filter out the potentially few matching rows.
Rest assured that all modern relational database systems evaluate the entire JOIN clause before starting to fetch rows. In the example I have been using so far, many database systems begin to solve this request by first fetching a row from Recipe_Classes. The database then uses an internal link—an index (if one has been defined by the designer of the tables)—to quickly find any matching rows in the Recipes table for the first row in the Recipe_Classes table before moving on to the next row in Recipe_Classes. In other words, your database uses a smart or optimized plan to fetch only the rows that match. This won’t seem important when your database tables contain only a few hundred rows, but it makes a big difference when your database has to deal with hundreds of thousands of rows!
The SQL Standard defines a way to assign an alias name—known as a correlation name in the Standard—to any table you list in your FROM clause. This feature can be very handy for building complex queries using tables that have long, descriptive names. You can assign a short correlation name to a table to make it easier to explicitly reference columns in a table with a long name.
Figure 8-4 shows how to assign a correlation name to a table in a FROM clause.
To assign a correlation name to a table, follow the table name with the optional keyword AS and then the correlation name you want to assign. (As with all optional keywords, I recommend including AS in order to make the query easier to read and understand.) After you have assigned a correlation name to a table, you use that name in place of the original table name in all other clauses, including the SELECT clause, the search conditions in the ON and WHERE clauses, and the ORDER BY clause. This can be confusing because you tend to write the SELECT clause before you write the FROM clause. If you plan to give a table an alias in the FROM clause, you must use that alias when you qualify column names in the SELECT clause.
Let’s reformulate the sample query I’ve been using with correlation names just to see how it looks. The query using R as the correlation name for the Recipes table and RC as the correlation name for the Recipe_Classes table is shown here:
|
|
Suppose you want to add a filter to see only recipes of class Main course or Dessert. (See Chapter 6 for details about defining filters.) After you assign a correlation name, you must continue to use the new name in all references to the table. Here’s the SQL:
|
|
You don’t have to assign a correlation name to all tables. In the previous example, I could have assigned a correlation name only to Recipes or only to Recipe_Classes.
In some cases, you must assign a correlation name to a table in a complex JOIN. Let’s hop over to the Bowling League database to examine a case where this is true. Figure 8-5 shows you the relationship between the Teams and Bowlers tables.
As you can see, TeamID is a foreign key in the Bowlers table that lets you find the information for all bowlers on a team. One of the bowlers on a team is the team captain, so there’s also a link from BowlerID in the Bowlers table to CaptainID in the Teams table.
If you want to list the team name, the name of the team captain, and the names of all the bowlers in one request, you must include two copies of the Bowlers table in your query—one to link to CaptainID to retrieve the name of the team captain and another to link to TeamID to get a list of all the team members. In this case, you must assign an alias name to one or both copies of the Bowlers table so that your database system can differentiate between the copy that links in the captain’s name and the copy that provides the list of all team members. Later in this chapter, I’ll show an example that requires including multiple copies of one table and assigning alias names. You can find this example using the Bowling League database in the “More Than Two Tables” subsection of “Sample Statements.”
Let’s make it more interesting. In most implementations of SQL, you can substitute an entire SELECT statement for any table name in your FROM clause. In the SQL Standard, an embedded SELECT statement like this is called a derived table. If you think about it, using a SELECT statement is simply a way to derive a subset of data from one or more tables. Of course, you must assign a correlation name so that the result of evaluating your embedded query has a name. Figure 8-6 shows how to assemble a JOIN clause using embedded SELECT statements.
Notice in the figure that a SELECT statement can include all query clauses except an ORDER BY clause. Also, you can mix and match SELECT statements with table names on either side of the INNER JOIN keywords.
Let’s look at the Recipes and Recipe_Classes tables again. I’ll assume that your request still needs only main courses and desserts. Here’s the query again with the Recipe_Classes table filtered in a SELECT statement that’s part of the INNER JOIN:
|
|
Note: Some database systems do not support embedding a SELECT statement inside a FROM clause.
If your system does not support this feature, you can often save the inner SELECT
statement as a view, and use the view name in place of the select statement.
Watch out! First, when you decide to substitute a SELECT statement for a table name, be sure to include not only the columns you want to appear in the final result but also any linking columns needed to perform the JOIN. That’s why you see both RecipeClassID and RecipeClassDescription in the embedded statement. Just for fun, I gave RecipeClassDescription an alias name of ClassName in the embedded statement. As a result, the SELECT clause asks for ClassName rather than RecipeClassDescription. Note that the ON clause now references the correlation name of the embedded SELECT statement—RCFiltered—rather than the original name of the table or the correlation name I assigned the table inside the embedded SELECT statement.
If your database system has a very smart optimizer, defining your request this way should be just as fast as the previous example where the filter on RecipeClassDescription was applied via a WHERE clause after the JOIN. You would like to think that your database system, in order to answer your request most efficiently, would first filter the rows from Recipe_Classes before attempting to find any matching rows in Recipes. It could be much slower to first join all rows from Recipe_Classes with matching rows from Recipes and then apply the filter. If you find it’s taking longer to solve this request than it should, moving the WHERE clause into a SELECT statement within the JOIN might force your database system to do the filtering on Recipe_Classes first.
Although you can solve many problems by linking only two tables, you’ll often need to link three, four, or more tables to get all the data you require. For example, you might want to fetch all the relevant information about recipes—the type of recipe, the recipe name, and all the ingredients for the recipe—in one query. Figure 8-7 shows the tables required to answer this request.
Figure 8-7 The tables needed from the Recipes sample database to fetch all the information about recipes
Looks like you need to get data from five different tables! (The Measurements and Recipe_Classes tables are “lookup” tables, hence the difference in the diagram.) Never fear—you can do this by constructing a more complex FROM clause, embedding JOIN clauses within JOIN clauses. Here’s the trick: Everywhere you can specify a table name, you can also specify an entire JOIN clause surrounded with parentheses. Figure 8-8 is a simplified version of Figure 8-4. (I’ve left off correlation name clauses and chosen the ON clause to form a simple JOIN of two tables.)
To add a third table to the mix, just place an open parenthesis before the first table name, add a close parenthesis after the search condition, and insert INNER JOIN, a table name, the ON keyword, and another search condition. Figure 8-9 shows how to do this.
If you think about it, the INNER JOIN of two tables inside the parentheses forms a logical table, or inner result set. This result set now takes the place of the first simple table name in Figure 8-8. You can continue this process of enclosing an entire JOIN clause in parentheses and then adding another JOIN keyword, table name, ON keyword, and search condition until you have all the result sets you need. Let’s make a request that needs data from all the tables shown in Figure 8-7 and see how it turns out:
“I need the recipe type, recipe name, preparation instructions, ingredient names, ingredient step numbers, ingredient quantities, and ingredient measurements from my recipes database, sorted in step number sequence.”
Translation |
Select the recipe class description, recipe title, preparation instructions, ingredient name, recipe sequence number, amount, and measurement description from the recipe classes table joined with the recipes table on recipe class ID in the recipe classes table matching recipe class ID in the recipes table, then joined with the recipe ingredients table on recipe ID in the recipes table matching recipe ID in the recipe ingredients table, then joined with the ingredients table on ingredient ID in the ingredients table matching ingredient ID in the recipe ingredients table, and then finally joined with the measurements table on measurement amount ID in the measurements table matching measurement amount ID in the recipe ingredients table, order by recipe title and recipe sequence number |
Select the recipe class description, recipe title, preparation instructions, ingredient name, recipe sequence number, amount, and measurement description from the recipe classes table inner joined with the recipes table on recipe_classes.recipe class ID in the recipe classes table matching = recipes.recipe class ID in the recipes table, then inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table matching = recipe_ ingredients.recipe ID in the recipe ingredients table, then inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table matching = ingredients.ingredient ID in the recipe ingredients table, and then finally inner joined with the measurements table on measurements.measurement amount ID in the measurements table matching = recipe ingredients. measurement amount ID in the recipe ingredients table, order by recipe title and recipe sequence number |
|
|
|
Wow! Anyone care to jump in and add a filter for recipe class Main courses? If you said you need to add the WHERE clause just before the ORDER BY clause, you guessed the correct way to do it.
In truth, you can substitute an entire JOIN of two tables anywhere you could otherwise place only a table name. In Figure 8-9, I implied that you must first join the first table with the second table and then join that result with the third table. You could also join the second and third tables first (as long as the third table is, in fact, related to the second table and not the first one) and then perform the final join with the first table. Figure 8-10 shows this alternate method.
Let’s look at the problem from a painting perspective. If you’re trying to get pastel green, the mixing sequence doesn’t matter that much. You can mix white with blue to get pastel blue and then mix in some yellow, or you can mix blue with yellow to get green and then add some white to get the final color.
To solve the request I just showed you using five tables, I could also have stated the SQL as follows:
|
You need to be aware of this feature because you might run into this sort of construction either in queries others have written or in the SQL built for you by Query By Example software. Also, the optimizers in some database systems are sensitive to the sequence of the JOIN definitions. If you find your query using many JOINs is taking a long time to execute on a large database, you might be able to get it to run faster by changing the sequence of JOINs in your SQL statement. For simplicity, I’ll build most of the examples later in this chapter using a direct construction of JOINs by following a simple path from left to right and top to bottom, using the diagrams that you can find in Appendix B, “Schema for the Sample Databases.”
It should be obvious at this point that knowing the relationships between your tables is of utmost importance. When you find that the columns of data you need reside in different tables, you might need to construct a FROM clause as complicated as the one I just showed you to be able to gather all the pieces in a way that logically makes sense. If you don’t know the relationships between your tables and the linking columns that form the relationships, you’ll paint yourself into a corner!
In many cases, you might have to follow a path through several relationships to get the data you want. For example, let’s simplify the previous request and just ask for recipe name and ingredient names:
“Show me the names of all my recipes and the names of all the ingredients for each of those recipes.”
Translation |
Select the recipe title and the ingredient name from the recipes table joined with the recipe ingredients table on recipe ID in the recipes table matching recipe ID in the recipe ingredients table, and then joined with the ingredients table on ingredient ID in the ingredients table matching ingredient ID in the recipe ingredients table |
Clean Up |
Select the recipe title and the ingredient name from the recipes table inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table matching = recipe_ingredients.recipe ID in the recipe ingredients table, and then inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table matching = recipe_ingredients.ingredient ID in the recipe ingredients table |
|
|
Did you notice that even though you don’t need any columns from the Recipe_Ingredients table, you still must include it in the query? You must do so because the only way that Recipes and Ingredients are related is through the Recipe_Ingredients table.
Now that you have a basic understanding of the mechanics for constructing an INNER JOIN, let’s look at some of the types of problems you can solve with it.
As you know, the most common use for an INNER JOIN is to link tables so that you can fetch columns from different tables that are related. Following is a sample list of the kinds of requests you can solve from the sample databases using an INNER JOIN:
“Show me the vendors and the products they supply to us.”
“List employees and the customers for whom they booked an order.”
“Display agents and the engagement dates they booked.”
“List customers and the entertainers they booked.”
“Find the entertainers who played engagements for customers Berg or Hallmark.”
“Display buildings and all the classrooms in each building.”
“List the faculty staff and the subject each teaches.”
“Display bowling teams and the name of each team captain.”
“List the bowling teams and all the team members.”
“Show me the recipes that have beef or garlic.”
“Display all the ingredients for recipes that contain carrots.”
I’ll show how to construct queries to answer requests like these (and more) in the Sample Statements section of this chapter.
A more esoteric use of an INNER JOIN is finding rows in two or more tables or result sets that match on one or more values that are not the related key values. Remember that in Chapter 7, ‘Thinking in Sets,’ I promised to show you how to perform the equivalent of an INTERSECT using an INNER JOIN. Following is a small sample of just some of the requests you can solve using this technique:
“Show me customers and employees who have the same name.”
“Show me customers and employees who live in the same city.”
“Find all the customers who ordered a bicycle and also ordered a helmet.”
“Find the agents and entertainers who live in the same postal code.”
“List the entertainers who played engagements for customers Bonnicksen and Rosales.”
“Show me the students and their teachers who have the same first name.”
“Show me the students who have an average score of 85 or better in Art and who also have an average score of 85 or better in Computer Science.”
“Find the bowlers who live in the same ZIP Code.”
“Find the bowlers who had a raw score of 155 or better at both Thunderbird Lanes and Bolero Lanes.”
“Find the ingredients that use the same default measurement amount.”
“Show me the recipes that have beef and garlic.”
The next section shows how to solve several problems like these.
You now know the mechanics of constructing queries using INNER JOIN and have seen some of the types of requests you can answer with an INNER JOIN. Let’s take a look at a fairly robust set of samples, all of which use INNER JOIN. These examples come from each of the sample databases, and they illustrate how you can use an INNER JOIN to fetch data from two tables, fetch data from more than two tables, and solve a problem using matching values.
I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for the book, www.informit.com/title/978013485833.
I stored each query in the appropriate sample database (as indicated within the example) and prefixed the names of the queries relevant to this chapter with “CH08.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.
Note: Because many of these examples use complex JOINs, your database system might choose
a different way to solve these queries. For this reason, the first few rows I show
you might not exactly match the result you obtain, but the total number of rows should
be the same. To simplify the process, I have combined the Translation and Clean Up
steps for all the following examples.
I’ll start out with simple primary colors and show you sample requests that require an INNER JOIN on only two tables.
“Display all products and their categories.”
Translation/Clean Up |
Select category description and product name from the categories table inner joined with the products table on categories.category ID in the categories table matching = products.category ID in the products table |
|
|
CH08_Products_And_Categories (40 rows)
CategoryDescription |
ProductName |
Accessories |
Dog Ear Cyclecomputer |
Accessories |
Dog Ear Helmet Mount Mirrors |
Accessories |
Viscount C-500 Wireless Bike Computer |
Accessories |
Kryptonite Advanced 2000 U-Lock |
Accessories |
Nikoma Lok-Tight U-Lock |
Accessories |
Viscount Microshell Helmet |
Accessories |
Viscount CardioSport Sport Watch |
Accessories |
Viscount Tru-Beat Heart Transmitter |
Accessories |
Dog Ear Monster Grip Gloves |
<< more rows here >> |
Note: Remember that in the absence of an ORDER BY clause, each database system might return
rows in a different sequence. The sequence you see above matches that from Microsoft
Access and MySQL because both systems appear to fetch rows from Categories first and
then find the matching rows in Products, so the rows appear in order by accessory
name. In Microsoft SQL Server and PostgreSQL, the systems fetch Products first and
then look up the matching row in Categories, so the sequence is different. Expect
to see a difference between systems whenever the query does not use ORDER BY.
“Show me entertainers, the start and end dates of their contracts, and the contract price.”
Translation/Clean Up |
Select entertainer stage name, start date, end date, and contract price from the entertainers table inner joined with the engagements table on entertainers.entertainer ID in the entertainers table matching = engagements.entertainer ID in the engagements table |
|
CH08_Entertainers_And_Contracts (111 rows)
EntStageName |
StartDate |
EndDate |
ContractPrice |
Carol Peacock Trio |
2017-09-18 |
2017-09-26 |
$1,670.00 |
Carol Peacock Trio |
2017-10-01 |
2017-10-07 |
$1,940.00 |
Carol Peacock Trio |
2017-10-14 |
2017-10-15 |
$410.00 |
Carol Peacock Trio |
2017-10-21 |
2017-10-21 |
$140.00 |
Carol Peacock Trio |
2017-11-13 |
2017-11-19 |
$680.00 |
Carol Peacock Trio |
2017-12-23 |
2017-12-26 |
$410.00 |
Carol Peacock Trio |
2017-12-29 |
2018-01-07 |
$1,400.00 |
Carol Peacock Trio |
2018-01-08 |
2018-01-08 |
$320.00 |
Carol Peacock Trio |
2018-01-22 |
2018-01-30 |
$1,670.00 |
Carol Peacock Trio |
2018-02-11 |
2018-02-19 |
$1,670.00 |
Carol Peacock Trio |
2018-02-25 |
2018-02-28 |
$770.00 |
Topazz |
2017-09-11 |
2017-09-18 |
$770.00 |
<< more rows here >> |
“List the subjects taught on Wednesday.”
Translation/Clean Up |
Select subject name from the subjects table inner joined with the classes table on subjects.subject ID in the subjects table matching = classes.subject ID in the classes table where Wednesday schedule is = true |
|
Note: Because several sections of the same class might be scheduled on the same day of
the week, I included the DISTINCT keyword to eliminate the duplicates. Some databases
do support a
TRUE
keyword, but I chose to use a more universal “integer with all bits on” value: –1.
If your database system stores a true/false value as a single bit, you can also test
for a true value of 1. A false value is always the number zero (0).
CH08_Subjects_On_Wednesday (34 rows)
SubjectName |
Advanced English Grammar |
Art History |
Biological Principles |
Chemistry |
Composition—Fundamentals |
Composition—Intermediate |
Design |
Drawing |
Elementary Algebra |
<< more rows here >> |
“Display bowling teams and the name of each team captain.”
Translation/Clean Up |
Select team name and captain full name from the teams table inner joined with the bowlers table on team captain ID equals = bowler ID |
|
CH08_Teams_And_Captains (10 rows)
TeamName |
CaptainName |
Marlins |
Fournier, David |
Sharks |
Patterson, Ann |
Terrapins |
Viescas, Carol |
Barracudas |
Sheskey, Richard |
Dolphins |
Viescas, Suzanne |
Orcas |
Thompson, Sarah |
Manatees |
Viescas, Michael |
Swordfish |
Rosales, Joe |
Huckleberrys |
Viescas, David |
MintJuleps |
Hallmark, Alaina |
“Show me the recipes that have beef or garlic.”
Translation/Clean Up |
Select unique distinct recipe title from the recipes table joined with the recipe ingredients table on recipes.recipe ID in the recipes table matching = recipe_ingredients.recipe ID in the recipe ingredients table where ingredient ID is in the list of beef and garlic IDs (1, 9) |
|
|
Note: Because some recipes might have both beef and garlic, I added the DISTINCT keyword
to eliminate potential duplicate rows.
CH08_Beef_Or_Garlic_Recipes (5 rows)
RecipeTitle |
Asparagus |
Garlic Green Beans |
Irish Stew |
Pollo Picoso |
Roast Beef |
Next, let’s add some spice by making requests that require a JOIN of more than two tables.
“Find all the customers who have ever ordered a bicycle helmet.”
Translation/Clean Up |
Select customer first name, customer last name from the customers table inner joined with the orders table on customers.customer ID in the customers table matching = orders.customer ID in the orders table, then inner joined with the order details table on orders.order number in the orders table matching = order_details.order number in the order details table, then inner joined with the products table on products.product number in the products table matching = order_details.product number in the order details table where product name contains LIKE ‘%Helmet%’ |
|
|
Caution: If your database system is case sensitive when performing searches in character fields,
you must be careful to enter the search criteria using the correct case for the letters.
For example, in many database systems, ‘helmet’ is not the same as ‘Helmet’.
Note: Because a customer might have ordered a helmet more than once, I included the DISTINCT
keyword to eliminate duplicate rows.
CH08_Customers_Who_Ordered_Helmets (25 rows)
CustFirstName |
CustLastName |
Andrew |
Cencini |
Angel |
Kennedy |
Caleb |
Viescas |
Darren |
Gehring |
David |
Smith |
Dean |
McCrae |
Estella |
Pundt |
Gary |
Hallmark |
Jim |
Wilson |
John |
Viescas |
<< more rows here >> |
“Find the entertainers who played engagements for customers Berg or Hallmark.”
Select unique distinct entertainer stage name from the entertainers table inner joined with the engagements table on entertainers.entertainer ID in the entertainers table matching = engagements.entertainer ID in the engagements table, then inner joined with the customers table on customers.customer ID in the customers table matching = engagements.customer ID in the engagements table where the customer last name is = ‘Berg’ or the customer last name is = ‘Hallmark’ |
|
|
|
CH08_Entertainers_For_Berg_ OR_Hallmark (8 rows)
EntStageName |
Carol Peacock Trio |
Coldwater Cattle Company |
Country Feeling |
Jim Glynn |
JV & the Deep Six |
Modern Dance |
Susan McLain |
Topazz |
“List all the tournaments, the tournament matches, and the game results.”
Select tourney ID, tourney location, match ID, lanes, odd lane team, even lane team, game number, game winner from the tournaments table inner joined with the tourney matches table on tournaments.tourney ID in the tournaments table matching = tourney_matches.tourney ID in the tourney matches table, then inner joined with the teams table aliased as odd team on oddteam.team ID in the odd team table matches = tourney_matches.odd lane team ID in the tourney matches table, then inner joined with the teams table aliased as even team on eventeam.team ID in the even team table matches = tourney_matches.even lane team ID in the tourney matches table, then inner joined with the match games table on match_games.match ID in the match games table matches = tourney_matches.match ID in the tourney matches table, then inner joined with the teams table aliased as winner on winner.team ID in the winner table matches = match_games.winning team ID in the match games table |
|
|
|
Note: This is a really fun query because it requires three copies of one table (Teams) to get the job done. I had to assign correlation names
to at least two of the tables to keep everything legal, but I went ahead and gave
them all alias names to reflect their specific roles in the query.
CH08_Tournament_Match_Game_Results (168 rows)
Tourney |
Location |
MatchID |
Lanes |
OddLane Team |
EvenLane Team |
GameNo |
Winner |
1 |
Red Rooster Lanes |
1 |
01-02 |
Marlins |
Sharks |
1 |
Marlins |
1 |
Red Rooster Lanes |
1 |
01-02 |
Marlins |
Sharks |
2 |
Sharks |
1 |
Red Rooster Lanes |
1 |
01-02 |
Marlins |
Sharks |
3 |
Marlins |
1 |
Red Rooster Lanes |
2 |
03-04 |
Terrapins |
Barracudas |
1 |
Terrapins |
1 |
Red Rooster Lanes |
2 |
03-04 |
Terrapins |
Barracudas |
2 |
Barracudas |
1 |
Red Rooster Lanes |
2 |
03-04 |
Terrapins |
Barracudas |
3 |
Terrapins |
1 |
Red Rooster Lanes |
3 |
05-06 |
Dolphins |
Orcas |
1 |
Dolphins |
1 |
Red Rooster Lanes |
3 |
05-06 |
Dolphins |
Orcas |
2 |
Orcas |
1 |
Red Rooster Lanes |
3 |
05-06 |
Dolphins |
Orcas |
3 |
Dolphins |
<< more rows here >> |
Note: Although the records appear to be sorted by tournament and match, this is simply
the sequence in which the database system I used (in this case, Microsoft Access)
chose to return the records. If you want to ensure that the records are sorted in
a specific sequence, you must supply an ORDER BY clause.
“Show me the main course recipes and list all the ingredients.”
Translation/Clean Up |
Select recipe title, ingredient name, measurement description, and amount from the recipe classes table inner joined with the recipes table on recipes.recipe class ID in the recipes table matches = recipe_classes.recipe class ID in the recipe classes table, then inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table matches = recipe_ingredients.recipe ID in the recipe ingredients table, then inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table, and finally inner joined with the measurements table on measurements.measure amount ID in the measurements table matches = recipe_ingredients.measure amount ID in the recipe ingredients table, where recipe class description is = ‘Main course’ |
|
|
Caution: You can find a MeasureAmountID in both the Ingredients and the Recipe_Ingredients
tables. If you define the final JOIN on MeasureAmountID using the Ingredients table
instead of the Recipe_Ingredients table, you’ll get the default measurement for the
ingredient rather than the one specified for the ingredient in the recipe.
CH08_Main_Course_Ingredients (53 rows)
RecipeTitle |
IngredientName |
Measurement Description |
Amount |
Irish Stew |
Beef |
Pound |
1 |
Irish Stew |
Onion |
Whole |
2 |
Irish Stew |
Potato |
Whole |
4 |
Irish Stew |
Carrot |
Whole |
6 |
Irish Stew |
Water |
Quarts |
4 |
Irish Stew |
Guinness Beer |
Ounce |
12 |
Fettuccini Alfredo |
Fettuccini Pasta |
Ounce |
16 |
Fettuccini Alfredo |
Vegetable Oil |
Tablespoon |
1 |
Fettuccini Alfredo |
Salt |
Teaspoon |
3 |
<< more rows here >> |
Finally, let’s add a third dimension to the picture. This last set of examples shows requests that use a JOIN on common values from two or more result sets or tables. (If your database supports the INTERSECT keyword, you can also solve many of these problems by intersecting the result sets.)
This request seems simple enough—perhaps too simple. Let’s ask it a different way so that it’s clearer what I need the database to do.
“Find all the customers who ordered a bicycle, then find all the customers who ordered a helmet, and finally list the common customers so that we know who ordered both a bicycle and a helmet.”
Translation 1 |
Select customer first name and customer last name from those common to the set of customers who ordered bicycles and the set of customers who ordered helmets |
Translation 2/Clean Up |
Select customer first name and customer last name from (Select unique distinct customer name, customer first name, customer last name from the customers table inner joined with the orders table on customers.customer ID in the customers table matches = orders.customer ID in the orders table, then inner joined with the order details table on orders.order number in the orders table matches = order_details.order number in the order details table, then inner joined with the products table on products.product number in the products table matches = order_details.product number in the order details table where product name contains LIKE ‘%Bike’) as cust bikes inner joined with (Select unique distinct customer ID from the customers table inner joined with the orders table on customers.customer ID in the customers table matches = orders.customer ID in the orders table, then inner joined with the order details table on orders.order number in the orders table matches = order_details.order number in the order details table, then joined with the products table on products.product number in the products table matches = order_details.product number in the order details table where product name contains LIKE ‘%Helmet’) as cust helmets on cust bikes.customer ID in the cust bikes table matches = cust helmets.customer ID in the cust helmets table |
|
|
Note: I simplified the second embedded SELECT statement to fetch only the CustomerID because
that’s the only column I need for the INNER JOIN of the two sets to work. I could
have actually eliminated the JOIN to the Customers table and fetched the CustomerID
from the Orders table. Remember that you can think of a SELECT s tatement embedded
in a FROM clause as a “logical table,” and I assigned a unique name to each statement
so that I could write the final ON clause.
You could also solve this problem as the INTERSECT of the two sets, but you would need to include all the output columns in both of the result sets that you intersect. Quite frankly, this might not be the best way to solve this problem. I’ll show you how to solve this problem more efficiently in Chapter 11, “Subqueries,” when I teach you how to use subqueries.
CH08_Customers_Both_Bikes_And_Helmets (21 rows)
CustFirstName |
CustLastName |
William |
Thompson |
Robert |
Brown |
Dean |
McCrae |
John |
Viescas |
Mariya |
Sergienko |
Neil |
Patterson |
Andrew |
Cencini |
Angel |
Kennedy |
Liz |
Keyser |
Rachel |
Patterson |
<< more rows here >> |
“List the entertainers who played engagements for both customers Berg and Hallmark.”
As you saw earlier, solving for Berg or Hallmark is easy. Let’s phrase the request a different way so that it’s clearer what we need the database to do for us.
“Find all the entertainers who played an engagement for Berg, then find all the entertainers who played an engagement for Hallmark, and finally list the common entertainers so that we know who played an engagement for both.”
CH08_Entertainers_Berg_AND_Hallmark (4 rows)
EntStageName |
Carol Peacock Trio |
JV & the Deep Six |
Modern Dance |
Country Feeling |
Note: This is another example of a request that can also be solved with INTERSECT. It can
also be solved more efficiently with subqueries, which you’ll learn about in Chapter 11.
“Show me the students and teachers who have the same first name.”
Translation/Clean Up |
Select student full name and staff full name from the students table inner joined with the staff table on students.first name in the students table matches = staff.first name in the staff table |
|
|
CH08_Students_Staff_Same_FirstName (2 rows)
StudFullName |
StfFullName |
Michael Viescas |
Michael Hernandez |
David Hamilton |
David Smith |
“Find the bowlers who had a raw score of 170 or better at both Thunderbird Lanes and Bolero Lanes.”
Yes, this is another “solve an intersection with a JOIN” problem. Let’s ask it a different way so that it’s clearer what I need the database to do for us:
“Find all the bowlers who had a raw score of 170 or better at Thunderbird Lanes, then find all the bowlers who had a raw score of 170 or better at Bolero Lanes, and finally list the common bowlers so that we know who had good scores at both bowling alleys.”
Note: Because a bowler might have had a high score at either bowling alley more than once,
I added the DISTINCT keyword to eliminate the duplicates. Again, this is a problem
that might be better solved with subqueries, which you’ll learn about in Chapter 11.
CH08_Good_Bowlers_TBird_And_Bolero (11 rows)
BowlerFullName |
Kennedy, John |
Patterson, Neil |
Kennedy, Angel |
Patterson, Kathryn |
Viescas, John |
Viescas, Caleb |
Thompson, Sarah |
Thompson, Mary |
Thompson, William |
Patterson, Rachel |
Clothier, Ben |
“Display all the ingredients for recipes that contain carrots.”
Translation/Clean Up |
Select recipe ID, recipe title, and ingredient name from the recipes table inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table matches = recipe_ingredients.recipe ID in the recipe ingredients table, inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table, then finally inner joined with (Select recipe ID from the ingredients table inner joined with the recipe ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table where ingredient name is = ‘Carrot’) as carrots on recipes.recipe ID in the recipes table matches = carrots.recipe ID in the carrots table |
|
|
Note: This request can be solved more simply with a subquery. I’ll show you how to do that
in Chapter 11.
CH08_Recipes_Containing_Carrots (16 rows)
RecipeID |
RecipeTitle |
IngredientName |
1 |
Irish Stew |
Beef |
1 |
Irish Stew |
Onion |
1 |
Irish Stew |
Potato |
1 |
Irish Stew |
Carrot |
1 |
Irish Stew |
Water |
1 |
Irish Stew |
Guinness Beer |
14 |
Salmon Filets in Parchment Paper |
Salmon |
14 |
Salmon Filets in Parchment Paper |
Carrot |
14 |
Salmon Filets in Parchment Paper |
Leek |
14 |
Salmon Filets in Parchment Paper |
Red Bell Pepper |
14 |
Salmon Filets in Parchment Paper |
Butter |
<< more rows here >> |
In this chapter, I thoroughly discussed how to link two or more tables or result sets on matching values. I began by defining the concept of a JOIN, and then I went into the details about forming an INNER JOIN. I discussed what is “legal” to use as the criteria for a JOIN but cautioned you about making nonsensical JOINs.
I started out simply with examples joining two tables. I next showed how to assign correlation (alias) names to tables within your FROM clause. You might want to do this for convenience—or you might be required to assign correlation names when you include the same table more than once or use an embedded SELECT statement.
I showed how to replace a reference to a table with a SELECT statement within your FROM clause. I next showed how to extend your horizons by joining more than two tables or result sets. I wrapped up the discussion of the syntax of an INNER JOIN by reemphasizing the importance of having a good database design and understanding how your tables are related.
I discussed some reasons why INNER JOINs are useful and gave you specific examples. The rest of the chapter provided more than a dozen examples of using INNER JOIN. I broke these examples into JOINs on two tables, JOINs on more than two tables, and JOINs on matching values. In the next chapter, I’ll explore another variant of JOIN—an OUTER JOIN.
The following section presents some requests to work out on your own.
Below, I show you the request statement and the name of the solution query in the sample databases. If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your result set is the same.
1. “List customers and the dates they placed an order, sorted in order date sequence.”
(Hint: The solution requires a JOIN of two tables.)
You can find the solution in CH08_Customers_And_OrderDates (944 rows).
2. “List employees and the customers for whom they booked an order.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Employees_And_Customers (211 rows).
3. “Display all orders, the products in each order, and the amount owed for each product, in order number sequence.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Orders_With_Products (3,973 rows).
4. “Show me the vendors and the products they supply to us for products that cost less than $100.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Vendors_And_Products_Less_Than_100 (66 rows).
5. “Show me customers and employees who have the same last name.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Customers_Employees_Same_LastName (16 rows).
6. “Show me customers and employees who live in the same city.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Customers_Employees_Same_City (10 rows).
1. “Display agents and the engagement dates they booked, sorted by booking start date.”
(Hint: The solution requires a JOIN of two tables.)
You can find the solution in CH08_Agents_Booked_Dates (111 rows).
2. “List customers and the entertainers they booked.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Customers_Booked_Entertainers (75 rows).
3. “Find the agents and entertainers who live in the same postal code.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Agents_Entertainers_Same_Postal (10 rows).
1. “Display buildings and all the classrooms in each building.”
(Hint: The solution requires a JOIN of two tables.)
You can find the solution in CH08_Buildings_Classrooms (47 rows).
2. “List students and all the classes in which they are currently enrolled.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Student_Enrollments (50 rows).
3. “List the faculty staff and the subject each teaches.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Staff_Subjects (110 rows).
4. “Show me the students who have a grade of 85 or better in art and who also have a grade of 85 or better in any computer course.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Good_Art_CS_Students (1 row).
1. “List the bowling teams and all the team members.”
(Hint: The solution requires a JOIN of two tables.)
You can find the solution in CH08_Teams_And_Bowlers (32 rows).
2. “Display the bowlers, the matches they played in, and the bowler game scores.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Bowler_Game_Scores (1,344 rows).
3. “Find the bowlers who live in the same ZIP Code.”
(Hint: The solution requires a JOIN on matching values, and be sure to not match bowlers with themselves.)
You can find the solution in CH08_Bowlers_Same_ZipCode (92 rows).
1. “List all the recipes for salads.”
(Hint: The solution requires a JOIN of two tables.)
You can find the solution in CH08_Salads (1 row).
2. “List all recipes that contain a dairy ingredient.”
(Hint: The solution requires a JOIN of more than two tables.)
You can find the solution in CH08_Recipes_Containing_Dairy (2 rows).
3. “Find the ingredients that use the same default measurement amount.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Ingredients_Same_Measure (628 rows).
4. “Show me the recipes that have beef and garlic.”
(Hint: The solution requires a JOIN on matching values.)
You can find the solution in CH08_Beef_And_Garlic_Recipes (1 row).