“We can’t solve problems by using the same kind of thinking we used when we created them.”
—ALBERT EINSTEIN
In the previous three chapters, I showed you many ways to work with data from more than one table. All the techniques I’ve covered to this point have been focused on linking subsets of information—one or more columns and one or more rows from an entire table or a query embedded in the FROM clause. I’ve also explored combining sets of information using the UNION operator. In this chapter, I’ll show you effective ways to fetch a single column from a table or query and use it as a value expression in either a SELECT clause or a WHERE clause.
There are two significant points you should learn in this chapter:
1. There’s always more than one way to solve a particular problem in SQL. In fact, this chapter will show you new ways to solve problems already covered in previous chapters.
2. You can build complex filters that do not rely on the tables in your FROM clause. This is an important concept because using a subquery in a WHERE clause is the only way to get the correct number of rows in your answer when you want rows from one table based on the filtered contents from other related tables. I’ll explain this in more detail later in the chapter.
Note: This chapter covers advanced concepts and assumes that you’ve read and thoroughly
understood Chapter 7, “Thinking in Sets”; Chapter 8, “INNER JOINs”; and Chapter 9, “OUTER JOINs.”
Simply put, a subquery is a SELECT expression that you embed inside one of the clauses of a SELECT statement to form your final query statement. In this chapter, I’ll define more formally a subquery and show how to use it other than in the FROM clause.
The SQL Standard defines three types of subqueries:
1. Row subquery—an embedded SELECT expression that returns more than one column and no more than one row
2. Table subquery—an embedded SELECT expression that returns one or more columns and zero to many rows
3. Scalar subquery—an embedded SELECT expression that returns only one column and no more than one row
You’ve already created queries that embed a SELECT statement in a FROM clause to let you filter rows before joining that result with other tables or queries. (That’s called a table subquery, as you’ll learn below.) A row subquery is a special form of a SELECT statement that returns more than one column but only one row.
In the SQL Standard, you can use a row subquery to build something the standard calls a row value constructor. When you create a WHERE clause, you build a search condition that is typically some sort of comparison of one column from one of your tables either with another column or with a literal. The SQL Standard, however, allows you to build a search condition that compares multiple values as a logical row with another set of values as a logical row (two row value constructors). You can enter the list of comparison values either by making a list in parentheses or by using a row subquery to fetch a single row from one of your tables. The bad news is that not many commercial database systems support this syntax.
Why might this be useful? Consider a Products table that has a compound part identifier in two separate fields. The first part of the identifier might be characters that indicate the subclass of parts (SKUClass), such as CPU or DSK for a computer parts manufacturer. The second part of the identifier could be a number that identifies the part within the subclass (SKUNumber). Let’s say you want all parts that have a combined identifier of DSK09775 or higher. Here’s an example of a WHERE clause that uses a row value constructor to solve the problem:
|
|
The preceding WHERE clause asks for rows where the combination of SKUClass and SKUNumber is greater than the combination of DSK and 9775. It’s the same as requesting the following:
|
|
Here’s where you could substitute a SELECT statement that returns a single row of two columns—a row subquery—for the second part of the comparison (probably using a WHERE clause to limit the result to one row). Most commercial databases support neither a row value constructor nor row subqueries. That’s all I’m going to say about them in this chapter.
Wait a minute! Didn’t I already show you how to embed a SELECT expression returning multiple rows and columns inside a FROM clause in the previous three chapters? The answer is yes—I snuck it in on you! I’ve already liberally used table subqueries in the previous chapters to specify a complex result that I then embedded in the FROM clause of another query. In this chapter, I’ll show you how to use a table subquery as the source for the list of comparison values for an IN predicate—something about which you learned the basics in Chapter 6, “Filtering Your Data.” I’ll also teach you a few new comparison predicate keywords that are used only with table subqueries.
In this chapter, I’ll also show how to use a scalar subquery anywhere you might otherwise use a value expression. A scalar subquery lets you fetch a single column or calculated expression from another table that does not have to be in the FROM clause of the main query. You can use the single value fetched by a scalar subquery in the list of columns you request in a SELECT clause or as a comparison value in a WHERE clause.
In Chapter 5, “Getting More Than Simple Columns,” you learned a lot about using expressions to generate calculated columns to be output by your query. I didn’t tell you then that you can also use a special type of SELECT statement—a subquery—to fetch data from another table, even if the table isn’t in your FROM clause.
Let’s go back to the basics and take a look at a simple form of a SELECT statement in Figure 11-1.
This looks simple, but it really isn’t! In fact, the value expression part can be quite complex. Figure 11-2 shows all the options that can constitute a value expression.
In Chapter 5, I showed you how to create basic value expressions using literal values, column references, and functions. I’ll explore CASE Expression in Chapter 19, “Condition Testing.” Notice that SELECT Expression now appears on the list. This means that you can embed a scalar subquery in the list of expressions immediately following the SELECT keyword. As noted earlier, a scalar subquery is a SELECT expression that returns exactly one column and no more than one row. This makes sense because you’re substituting the subquery where you would normally enter a single column name or expression that results in a single column.
You might be wondering at this point, “Why is this useful?” A subquery used in this way lets you pluck a single value from some other table or query to include in the output of your query. You don’t need to reference the table or query that is the source of the data in the FROM clause of the subquery at all in the FROM clause of the outer query. In most cases, you will need to add criteria in the WHERE clause of the subquery to make certain it returns no more than one row. You can even have the criteria in the subquery reference a value being returned by the outer query to pluck out the data related to the current row.
Let’s look at some simple examples using only the Customers and Orders tables from the Sales Orders example database. Figure 11-3 shows the relationship between these two tables.
Now, let’s build a query that lists all the orders for a particular date and plucks the related customer last name from the Customers table using a subquery.
Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced
in Chapter 4, “Creating a Simple Query.” Also, I include parentheses around the parts that are
subqueries in the Clean Up step and indent the subqueries where possible to help you
see how I am using them.
“Show me all the orders shipped on October 3, 2017, and each order’s related customer last name.”
Select order number, order date, ship date, and also select the related customer last name out of the customers table from the orders table where ship date is October 3, 2017 |
|
Clean Up |
Select order number, order date, ship date, and also (select the related customer last name out of the from customers table) from the orders table where ship date is = October 3, 2017 ‘2017-10-03’ |
|
|
Notice that I had to restrict the value of the CustomerID in the subquery to the value of the CustomerID in each row I’m fetching from the Orders table. Otherwise, I’ll get all the rows in Customers in the subquery. Remember that this must be a scalar subquery—a query that returns only one value from one row—so I must do something to restrict what gets returned to no more than one row. Because CustomerID is the primary key of the Customers table, I can be confident that the match on the CustomerID column from the Orders table will return exactly one row.
Those of you who caught on to the concept of INNER JOIN in Chapter 8 are probably wondering why you would want to solve this problem as just described rather than to JOIN Orders to Customers in the FROM clause of the outer query. Right now I’m focusing on the concept of using a subquery to create an output column with a very simple example. In truth, you probably should solve this particular problem with the following query using an INNER JOIN:
|
|
Now that you understand the basic concept of using a subquery to generate an output column, let’s expand your horizons and see how this feature can be really useful. First, I need to give you an overview of a couple of aggregate functions. (I’ll cover all the aggregate functions in detail in the next chapter.)
The SQL Standard defines many functions that calculate values in a query. One subclass of functions—aggregate functions—lets you calculate a single value for a group of rows in a result set. For example, you can use an aggregate function to count the rows, find the largest or smallest value within the set of rows, or calculate the average or total of some value or expression across the result set.
Let’s take a look at a couple of these functions and then see how they can be most useful in a subquery. Figure 11-4 shows the diagram for the COUNT and MAX functions, which can generate an output column in a SELECT clause.
You can use COUNT to determine the number of rows or the number of non-Null values in a result set. Use COUNT(*) to find out how many rows are in the entire set. If you specify a particular column in the result set using COUNT(column_name), the database system counts the number of rows with non-Null values in that column. You can also ask to count only the unique values by adding the DISTINCT keyword.
Likewise, you can find the largest value in a column by using MAX. If the value expression is numeric, you get the largest number value from the column or expression you specify. If the value expression returns a character data type, the largest value will depend on the collating sequence of your database system. If the value expression is a date or time, you get the latest date or time value from the column or expression.
Let’s use these functions in a subquery to solve a couple of interesting problems:
Translation |
Select customer first name, customer last name, and also select the count of orders from the orders table for this customer from the customers table |
Clean Up |
Select customer first name, customer last name, and also (select the count of orders (*) from the orders table for this customer where orders.customer ID = customers.customer ID) from the customers table |
|
|
Subqueries as output columns are starting to look interesting now! In Part IV, “Summarizing and Grouping Data,” you’ll learn more about creative ways to use aggregate
functions. But if all you want is a count of related rows, a subquery is a good way
to do it. In fact, if you don’t want anything other than the customer name and the
count of orders, this is just about the only way to solve the problem. If you add
the Orders table to the FROM clause of the main query (FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
), you’ll get multiple rows for each customer who placed more than one order. In Chapter 13, “Grouping Data,” you’ll learn about another way that involves grouping the rows
on customer name.
Let’s look at an interesting problem that takes advantage of another aggregate function—MAX:
“Show me a list of customers and the last date on which they placed an order.”
Translation |
Select customer first name, customer last name, and also select the highest order date from the orders table for this customer from the customers table |
Clean Up |
Select customer first name, customer last name, and also (select the highest max(order date) from the orders table for this customer where orders.customer ID = customers.customer ID) from the customers table |
|
As you can imagine, using MAX in this way works well for finding the highest or most recent value from any related table. I’ll show you several other ways to use these functions in the “Sample Statements” section later in this chapter.
In Chapter 6, you learned how to filter the information retrieved by adding a WHERE clause. You also learned how to use both simple and complex comparisons to get only the rows you want in your result set. Now I’ll build on your skills and show you how to use a subquery as one of the comparison arguments to do more sophisticated filtering.
Let’s revisit the SELECT statement from Figure 11-1 and look at the syntax for building a query with a simple comparison predicate in a WHERE clause. Figure 11-5 shows the simplified diagram.
As you remember from Figure 11-2, a value expression can be a subquery. In the simple example in Figure 11-5, you’re comparing the value expression to a single column. Thus, the value expression must be a single value—that is, a scalar subquery that returns exactly one column and no more than one row. Let’s solve a simple problem requiring a comparison to a value returned from a subquery. In this example, I am going to ask for all the details about customer orders, but I want only the last order for each customer. Figure 11-6 shows the tables needed.
“List customers and all the details from their last order.”
Translation |
Select customer first name, customer last name, order number, order date, product number, product name, and quantity ordered from the customers table joined with the orders table on customer ID in the customers table equals customer ID in the orders table, then joined with the order details table on order number in the orders table equals order number in the order details table, and then joined with the products table on product number in the products table equals product number in the order details table where the order date equals the maximum order date from the orders table for this customer |
Select customer first name, customer last name, order number, order date, product number, product name, and quantity ordered from the customers table inner joined with the orders table on customers.customer ID in the customers table equals = orders.customer ID in the orders table, then inner joined with the order details table on orders.order number in the orders table equals = order_details.order number in the order details table, and then inner joined with the products table on products.product number in the products table equals = order_details.product number in the order details table where the order date equals = (select the maximum (order date) from the orders table for this customer where orders.customer ID = customers.customer ID) |
|
|
|
Did you notice that I gave an alias name to the second reference to the Orders table (that is, the Orders table in the subquery)? Even if you leave out the alias name, many database systems will recognize that you mean the copy of the Orders table within the subquery. In fact, the SQL Standard dictates that any unqualified reference should be resolved from the innermost query first. Still, I added the alias reference to make it crystal clear that the copy of the Orders table I’m referencing in the WHERE clause of the subquery is the one in the FROM clause of the subquery. If you follow this practice, your request will be much easier to understand—either by you when you come back to it some months later or by someone else who has to figure out what your request meant.
The SQL Standard defines a number of special predicate keywords for use in a WHERE clause with a subquery.
You learned in Chapter 6 how to use the IN keyword in a WHERE clause to compare a column or expression to a list of values. You now know that each value expression in the IN list could be a scalar subquery. How about using a subquery to generate the entire list? As Figure 11-7 shows, you can certainly do that!
In this case, you can use a table subquery that returns one column and as many rows as necessary to build the list. Let’s use the Recipes sample database for an example. Figure 11-8 shows the tables of interest.
Let’s suppose you’re having someone over for dinner who just adores seafood. Although you know you have some recipes containing seafood ingredients, you’re not sure of all the ingredient names in your database. You do know that you have an IngredientClassDescription of Seafood, so you can join all the tables and filter on IngredientClassDescription—or you can get creative and use subqueries and the IN predicate instead.
“List all my recipes that have a seafood ingredient.”
Translation |
Select recipe title from the recipes table where the recipe ID is in the selection of recipe IDs from the recipe ingredients table where the ingredient ID is in the selection of ingredient IDs from the ingredients table joined with the ingredient classes table on ingredient class ID in the ingredients table matches ingredient class ID in the ingredient classes table where ingredient class description is ‘seafood’ |
Clean Up |
Select recipe title from the recipes table where the recipe ID is in the (selection of recipe IDs from the recipe ingredients table where the ingredient ID is in the (selection of ingredient IDs from the ingredients table inner joined with the ingredient classes table on ingredients.ingredient class ID in the ingredients table matches = ingredient_classes.ingredient class ID in the ingredient classes table where ingredient class description is = ‘seafood’)) |
|
Did it occur to you that you could put a subquery within a subquery? I actually could have gone one level deeper by eliminating the INNER JOIN from the second subquery. I could have stated the second subquery using the following syntax:
|
|
That would be overkill, however, because embedding IN clauses within IN clauses only makes the query harder to read. I did so in the previous example to show you that you can do it. It’s worth restating, though, that just because you can do something doesn’t mean you should! I think you’ll agree that it’s easier to see what’s going on by using a single IN predicate and a more complex JOIN in the subquery. Here’s another solution using this technique:
|
|
You might be asking at this point, “Why go to all this trouble? Why not just do the complex JOIN in the outer query and be done with it?” The reason is that you’ll get the wrong answer! Actually, the rows returned will all be rows from the Recipes table for seafood recipes, but you might get some rows more than once. Let’s try to solve this without the subquery to see why you get duplicate rows.
|
|
If you look back at Figure 11-8, you can see that the Recipe_Ingredients table might have many rows for each row in the Recipes table. The result set defined by the FROM clause will contain at least as many rows as there are in Recipe_Ingredients, with the RecipeTitle column value repeated many times. Even when I add the filter to restrict the result to ingredients in class Seafood, I will still get more than one row per recipe in any recipe that has more than one seafood ingredient.
Yes, you could include the DISTINCT keyword, but the odds are your database system will have to do more work to eliminate the duplicates. Although this statement saved as a view probably won’t be updatable in most database systems because the single output column might have duplicate values in many rows, keep in mind that a view using DISTINCT will never be updatable because DISTINCT masks the unique identity of each underlying row, so your database system won’t know which row to update.
Using this subquery technique also becomes really important when you want to list more than just the recipe title. For example, suppose you also want to list all the ingredients from any recipe that has a seafood ingredient. If you use a complex JOIN in the outer query and filter for an ingredient class of Seafood as I just did, all you will get is seafood ingredients—you won’t get all the other ingredients for the recipes. Let’s ask one additional and slightly more complex request:
“List recipes and all ingredients for each recipe for recipes that have a seafood ingredient.”
Translation |
Select recipe title and ingredient name from the recipes table joined with the recipe ingredients table on recipe ID in the recipes table equals recipe ID in the recipe ingredients table, and then joined with the ingredients table on ingredient ID in the ingredients table equals ingredient ID in the recipe ingredients table where the recipe ID is in the selection of recipe IDs from the recipe ingredients table joined with the ingredients table on ingredient ID in the recipe ingredients table equals ingredient ID in the ingredients table, and then joined with the ingredient classes table on ingredient class ID in the ingredients table equals ingredient class ID in the ingredient classes table where ingredient class description is ‘seafood’ |
|
Select recipe title, and ingredient name from the recipes table inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table equals = recipe_ingredients.recipe ID in the recipe ingredients table, and then inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table equals = recipe_ingredients.ingredient ID in the recipe ingredients table where the recipe ID is in the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on recipe_ingredients.ingredient ID in the recipe ingredients table equals = ingredients.ingredient ID in the ingredients table, and then inner joined with the ingredient classes table on ingredients.ingredient class ID in the ingredients table equals = ingredient_classes.ingredient class ID in the ingredient classes table where ingredient class description is = ‘seafood’) |
|
|
|
|
The key here is that the complex INNER JOIN in the main part of the query retrieves all the ingredients for the recipes selected, and the complex subquery returns a list of recipe IDs for just the seafood recipes. It seems like I’m doing a complex JOIN twice, but there’s method in the madness!
As you have just seen, the IN predicate lets you compare a column or expression to a list to see whether that column or expression is in the list. In other words, the column or expression equals one of the members of the list. If you want to find out whether the column or expression is greater than or less than any, all, or some of the items in the list, you can use a quantified predicate. Figure 11-9 shows the syntax.
In this case, the SELECT expression must be a table subquery that returns exactly one column and zero or more rows. When the subquery returns more than one row, the values in the rows make up a list. As you can see, this predicate combines a comparison operator with a keyword that tells your database system how to apply the operator to the members of the list. When you use the keyword ALL, the comparison must be true for all the values returned by the subquery. When you use either of the keywords SOME or ANY, the comparison need be true for only one value in the list.
If you think about it, when the subquery returns multiple rows, asking for = ALL will always be false unless all the values returned by the subquery are the same and the value expression on the left of the comparison equals all of them. By the same logic, you might think that <> ANY will always be false if the value expression on the left does equal any of the values in the list. In truth, the SQL Standard treats SOME and ANY the same. So if you say <> SOME or <> ANY, then the predicate is true if the value expression on the left does not equal at least one of the values in the list. Another confusing point is that if the subquery returns no rows, then any comparison predicate with the ALL keyword is true, and any comparison predicate with the SOME or ANY keyword is false.
Let’s work through a couple of requests to see quantified predicates in action. First, let’s do a problem in the Recipes database. Refer to Figure 11-8 to see the tables I’ll use.
“Show me the recipes that have beef or garlic.”
Translation |
Select recipe title from the recipes table where recipe ID is in the selection of recipe IDs from the recipe ingredients table where ingredient ID equals any of the selection of ingredient IDs from the ingredients table where ingredient name is ‘beef’ or ‘garlic’ |
Clean Up |
Select recipe title from the recipes table where recipe ID is in the (selection of recipe IDs from the recipe ingredients table where ingredient ID equals = any of the (selection of ingredient IDs from the ingredients table where ingredient name is in ‘beef’ or ‘garlic’)) |
|
|
Do you get the feeling I could have also used IN instead of = ANY? If so, you’re right! I could have also created a JOIN between Recipe_Ingredients and Ingredients in the first subquery to return the requisite list of RecipeIDs. As I stated at the beginning of the chapter, there’s almost always more than one way to solve a particular problem in SQL. Sometimes, using a quantified predicate might make your request clearer.
Let’s now solve a more complex problem to show you the real power of quantified predicates. This example uses the Sales Orders sample database, and Figure 11-10 shows the tables involved.
“Find all accessories that are priced greater than any clothing item.”
Translation |
Select product name and retail price from the products table joined with the categories table on category ID in the products table matches category ID in the categories table where category description is ‘accessories’ and retail price is greater than all the selection of retail price from the products table joined with the categories table on category ID in the products table matches category ID in the categories table where category name is ‘clothing’ |
Clean Up |
Select product name and retail price from the products table inner joined with the categories table on products.category ID in the products table matches = categories.category ID in the categories table where category description is = ‘accessories’ and retail price is greater than > all the (selection of retail price from the products table inner joined with the categories table on products.category ID in the products table matches = categories.category ID in the categories table where category name is = ‘clothing’) |
|
What’s happening here? The subquery fetches all the prices for clothing items. The outer query then lists all accessories whose prices are greater than all the prices in the clothing items subquery. Note that you could also solve this query by finding the RetailPrice that is greater than the MAX price fetched in a subquery, but the point here is to demonstrate a use of ALL.
Both set membership (IN) and quantified (SOME, ANY, ALL) predicates perform a comparison with a value expression—usually a column from the source you specify in the FROM clause of your outer query. Sometimes it’s useful to know simply that a related row EXISTS in the result set returned by a subquery. In Chapter 8, I showed a technique for solving AND problems using complex INNER JOINs. You can also use EXISTS to solve those same sorts of problems. Let’s take another look at a problem I solved in Chapter 8.
Translation |
Select customer ID, customer first name, and customer last name from the customers table where there exists some row from the orders table joined with the order details table on order ID in the orders table equals order ID in the order details table, and then joined with the products table on product ID in the products table equals product ID in the order details table where category ID equals 2 (Bikes) and the orders table customer ID equals the customers table customer ID |
Clean Up |
Select customer ID, customer first name, and customer last name from the customers table where there exists some row (select * from the orders table inner joined with the order details table on orders.order ID in the orders table equals = order_details.order ID in the order details table, and then inner joined with the products table on products.product ID in the products table equals = order_details.product ID in the order details table where product name category ID equals = 2 (Bikes) and the orders table customer ID equals = the customers table customer ID) |
|
|
Notice that you can use any column name from any of the tables in the FROM clause
as the column to be fetched in the SELECT clause of the subquery. I chose to use the
shorthand “*” for all columns. (Remember from the discussion earlier in “An Introduction
to Aggregate Functions: COUNT and MAX,” that using * or a column name in the COUNT
function does make a difference. It does not matter in this case.) Stated another
way, this query is asking, “Give me the customers for whom there exists some row in
order details for a bike.” Because I didn’t match on the OrderID column, I don’t care
which column gets returned by the subquery. Also, I took a shortcut by using CategoryID = 2
because I know that this category covers all bike products. That’s more efficient
than doing an additional JOIN to the Products table and using a test on the ProductName
column for LIKE '%Bike%'
.
Note: Because this is such an interesting query, I saved this solution as CH11_Customer_Ordered_Bikes_EXISTS
in the sample database. You can find the INNER JOIN solution in CH11_Customers_Ordered_Bikes_JOIN.
Because the INNER JOIN depends on using DISTINCT to avoid returning duplicate rows,
the JOIN solution won’t be updatable. You can also solve this using IN, but I’ll leave
that as a challenge for you to solve! (Hint: I saved a sample query using IN in the
sample database so that you can check your work.)
At this point, you should have a pretty good understanding of the concept of using a subquery either to generate an output column or to perform a complex comparison in a WHERE clause. The best way to give you an idea of the wide range of uses for subqueries is to list some problems you can solve with subqueries and then present a robust set of examples in the “Sample Statements” section.
As mentioned earlier in this chapter, using a subquery to fetch a single value from a related table is probably more effectively done with a JOIN. When you consider aggregate functions, however, subqueries to fetch the result of a function calculation make the idea much more interesting. I’ll explore this use of aggregate functions further in the next chapter. In the meantime, here are some problems you can solve using a subquery to generate an output column:
“List vendors and a count of the products they sell to us.”
“Display products and the latest date the product was ordered.”
“Show me entertainers and the count of each entertainer’s engagements.”
“Display all customers and the date of the last booking each made.”
“List all staff members and the count of classes each teaches.”
“Display all subjects and the count of classes for each subject on Monday.”
“Show me all the bowlers and a count of games each bowled.”
“Display the bowlers and the highest game each bowled.”
“List all the meats and the count of recipes each appears in.”
“Show me the types of recipes and the count of recipes in each type.”
Now that you know about subqueries, you can really expand your kit of tools for solving complex queries. In this chapter, I explored many interesting ways to use subqueries as filters in a WHERE clause. In Chapter 14, “Filtering Grouped Data,” I’ll show you how to use subqueries as filters for groups of information in a HAVING clause.
Here’s a sample of problems you can solve using subqueries as a filter for rows in a WHERE clause. Note that I solved many of these same problems in earlier chapters. Now, you get to think about solving them an alternate way by using a subquery!
Note: As a hint, I’ve included the keyword(s) you can use to solve the problem in parentheses
after the problem statement.
“List customers who ordered bikes.” (IN)
“Display customers who ordered clothing or accessories.” (= SOME)
“Find all the customers who ordered a bicycle helmet.” (IN)
“What products have never been ordered?” (NOT IN)
“List customers who have booked entertainers who play country or country rock.” (IN)
“Find the entertainers who played engagements for customers Bonnicksen or Rosales.” (= SOME)
“Display agents who haven’t booked an entertainer.” (NOT IN)
“List the entertainers who played engagements for customer Bonnicksen.” (EXISTS)
“Display students enrolled in a class on Tuesday.” (IN)
“Display students who have never withdrawn from a class.” (NOT IN)
“List the subjects taught on Wednesday.” (IN)
“Display team captains with a current average higher than all other members on their team.” (> ALL)
“List all the bowlers who have a current average that’s less than all the other bowlers on the same team.” (< ALL)
“Display all the ingredients for recipes that contain carrots.” (IN)
“List the ingredients that are used in some recipe where the measurement amount in the recipe is not the default measurement amount.” (<> SOME)
You now know the mechanics of constructing queries using subqueries and have seen some of the types of requests you can answer with a subquery. Let’s take a look at a fairly robust set of samples, all of which use one or more subqueries. These examples come from each of the sample databases, and they illustrate the use of the subqueries to either generate an output column or act as a filter.
I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for this book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), and I prefixed the names of the queries relevant to this chapter with “CH11.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.
Note: Remember that all the column names and table names used in these examples are drawn
from the sample database structures shown in Appendix B, “Schema for the Sample Databases.” Because many of these examples use complex JOINs,
your database system might choose a different way to solve these queries. For this
reason, the first few rows might not exactly match the result you obtain, but the
total number of rows should be the same. To simplify the process, I have combined
the Translation and Clean Up steps for all the following examples.
“List vendors and a count of the products they sell to us.”
Translation/Clean Up |
Select vendor name and also (select the count(*) of products from the product vendors table where the product vendor table vendor ID equals = the vendors table vendor ID) from the vendors table |
|
|
Note: I assigned an alias name to the subquery in the SELECT clause so that the output
displays a meaningful name. If you don’t do that, your database system will generate
something like Expr1.
CH11_Vendors_Product_Count (10 rows)
VendName |
VendProductCount |
Shinoman, Incorporated |
3 |
Viscount |
6 |
Nikoma of America |
5 |
ProFormance |
3 |
Kona, Incorporated |
1 |
Big Sky Mountain Bikes |
22 |
Dog Ear |
9 |
Sun Sports Suppliers |
5 |
Lone Star Bike Supply |
30 |
Armadillo Brand |
6 |
“Display all customers and the date of the last booking each made.”
Translation/Clean Up |
Select customer first name, customer last name, and also (select the highest MAX(start date) from the engagements table where the engagements table customer ID equals = the customers table customer ID) from the customers table |
|
|
CH11_Customers_Last_Booking (15 rows)
CustFirstName |
CustLastName |
LastBooking |
|
Doris |
Hartwig |
2018-02-24 |
|
Deb |
Waldal |
2018-02-18 |
|
Brehm |
2018-02-27 |
||
Dean |
McCrae |
2018-02-25 |
|
Elizabeth |
Hallmark |
2018-02-20 |
|
Matt |
Berg |
2018-02-24 |
|
Liz |
Keyser |
2018-02-20 |
|
Darren |
Gehring |
|
|
Sarah |
Thompson |
2018-02-25 |
|
<< more rows here >> |
Note: The LastBooking column for some customers is blank (Null) because those customers
have no bookings.
“Display all subjects and the count of classes for each subject on Monday.”
Translation/Clean Up |
Select subject name and also (select the count(*) of classes from the classes table where Monday schedule is = true and the classes table subject ID equals = the subjects table subject ID) from the subjects table |
|
|
Note: Be sure to use the test for true that your database system supports. Remember that
some database systems require you to compare to a keyword TRUE or to the integer value
1 or –1.
CH11_Subjects_Monday_Count (56 rows)
SubjectName |
MondayCount |
Financial Accounting Fundamentals I |
2 |
Financial Accounting Fundamentals II |
0 |
Fundamentals of Managerial Accounting |
0 |
Intermediate Accounting |
0 |
Business Tax Accounting |
0 |
Introduction to Business |
0 |
Developing A Feasibility Plan |
0 |
Introduction to Entrepreneurship |
0 |
<< more rows here >> |
Note: Rather than return a Null value when there are no rows, the COUNT aggregate function
returns a zero.
“Display the bowlers and the highest game each bowled.”
Translation/Clean Up |
Select bowler first name, bowler last name, and also (select the highest MAX(raw score) from the bowler scores table where the bowler scores table bowler ID equals = the bowlers table bowler ID) from the bowlers table |
|
|
CH11_Bowler_High_Score (32 rows)
BowlerFirstName |
BowlerLastName |
HighScore |
|
Barbara |
Fournier |
164 |
|
David |
Fournier |
178 |
|
John |
Kennedy |
191 |
|
Sara |
Sheskey |
149 |
|
Ann |
Patterson |
165 |
|
Neil |
Patterson |
179 |
|
David |
Viescas |
195 |
|
Stephanie |
Viescas |
150 |
|
<< more rows here >> |
“List all the meats and the count of recipes each appears in.”
Translation/Clean Up |
Select ingredient class description, ingredient name, and also (select the count(*) of rows from the recipe ingredients table where the recipe ingredients table ingredient ID equals = the ingredients table ingredient ID) from the ingredient classes table inner joined with the ingredients table on ingredient_classes.ingredient class ID in the ingredients classes table matches = ingredients.ingredient class ID in the ingredients table where ingredient class description is = ‘meat’ |
|
|
CH11_Meat_Ingredient_Recipe_Count (11 rows)
IngredientClassDescription |
IngredientName |
RecipeCount |
Meat |
Beef |
2 |
Meat |
Chicken, Fryer |
0 |
Meat |
Bacon |
0 |
Meat |
Chicken, Pre-cut |
0 |
Meat |
T-bone Steak |
0 |
Meat |
Chicken Breast |
0 |
Meat |
Chicken Leg |
1 |
Meat |
Chicken Wing |
0 |
Meat |
Chicken Thigh |
1 |
Meat |
New York Steak |
0 |
Meat |
Ground Pork |
1 |
“Display customers who ordered clothing or accessories.”
Translation/Clean Up |
Select customer ID, customer first name, customer last name from the customers table where customer ID is equal to = any of the (selection of customer ID from the orders table inner joined with the order details table on orders.order number in the orders table matches = order_details.order number in the order details table, then inner joined with the products table on products.product number in the products table matches = order_details.product number in the order details table, and then inner joined with the categories table on categories.category ID in the categories table matches = products.category ID in the products table where ‑category description is = ‘clothing’ or category description is = ‘accessories’) |
|
CH11_Customers_Clothing_OR_Accessories (27 rows)
CustomerID |
CustFirstName |
CustLastName |
|
1001 |
Suzanne |
Viescas |
|
1002 |
William |
Thompson |
|
1003 |
Gary |
Hallmark |
|
1004 |
Robert |
Brown |
|
1005 |
Dean |
McCrae |
|
1006 |
John |
Viescas |
|
1007 |
Mariya |
Sergienko |
|
1008 |
Neil |
Patterson |
|
<< more rows here >> |
Note: Just for fun, I solved this query by using = ANY. Can you think of a solution using
IN or EXISTS? You can find these solutions in the sample database saved as CH11_Customers_Clothing_OR_Accessories_IN
and CH11_Customers_Clothing_OR_Accessories_EXISTS. If you look at the scripts I supplied
for MySQL, you’ll find that I used ANY in the script, but if you look in the sample
database, you’ll find that MySQL and PostgreSQL converted the actual stored view to
use IN. Go figure.
“List the entertainers who played engagements for customer Berg.”
Translation/Clean Up |
Select entertainer ID, and entertainer stage name from the entertainers table where there exists (select * some row from the customers table inner joined with the engagements table on customers.customer ID in the customers table matches = engagements.customer ID in the engagements table where customer last name is = ‘Berg’ and the engagements table entertainer ID equals = the entertainers table entertainer ID) |
|
|
CH11_Entertainers_Berg_EXISTS (6 rows)
EntertainerID |
EntStageName |
1001 |
Carol Peacock Trio |
1003 |
JV & the Deep Six |
1004 |
Jim Glynn |
Modern Dance |
|
1007 |
Coldwater Cattle Company |
1008 |
Country Feeling |
Note: Just for a bit of challenge, I decided to solve this problem using EXISTS. Can you
solve it using IN? You can find the second solution in CH11_Entertainers_Berg_IN.
“Display students who have never withdrawn from a class.”
Translation/Clean Up |
Select student ID, student first name, and student last name from the students table where the student ID is not in the (selection of student ID from the student schedules table inner joined with the student class status table on student_schedules.class status in the student schedules table matches = student_class_status.class status in the student class status table where class status description is = ‘withdrew’) |
|
|
Note: This is a pretty simple query that finds all the students who ever withdrew from
a class in the subquery and then asks for all the students NOT IN this list. Can you
think how you would solve this with an OUTER JOIN?
CH11_Students_Never_Withdrawn (16 rows)
StudentID |
StudFirstName |
StudLastName |
1001 |
Kerry |
Patterson |
1003 |
Betsey |
Stadick |
1004 |
Janice |
Galvin |
1005 |
Doris |
Hartwig |
1006 |
Scott |
Bishop |
1007 |
Elizabeth |
Hallmark |
1008 |
Sara |
Sheskey |
1009 |
Karen |
Smith |
<< more rows here >> |
“Display team captains with a handicap score higher than all other members on their teams.”
Translation/Clean Up |
Select team name, bowler ID, bowler first name, bowler last name, and handicap score from the bowlers table inner joined with the teams table on bowlers.bowler ID in the bowlers table matches = teams.captain ID in the teams table inner joined with the bowler scores table on bowlers.bowler ID in the bowlers table matches = bowler_scores.bowler ID in the bowler scores table where the handicap score is greater than > all the (selection of handicap score from bowlers as B2 inner joined with the bowler scores table as BS2 on B2.bowler ID in the B2 table matches = BS2.bowler ID in the BS2 table where the B2 table bowler ID is not equal <> the bowlers table bowler ID and the B2 table team ID is equal = to the bowlers table team ID) |
|
Note: I explicitly gave aliases to the second copy of the Bowlers table and the second
copy of the Bowler_Scores table in the subquery to make it crystal clear what’s going
on. I specifically do not want to compare against the score of the current bowler—that
would cause the > ALL predicate to fail. I also want to compare only with the other
bowlers on the same team.
CH11_Team_Captains_High_Score (1 row)
TeamName |
BowlerID |
BowlerFirstName |
BowlerLastName |
HandiCapScore |
Huckleberrys |
7 |
David |
Viescas |
224 |
“Display all the ingredients for recipes that contain carrots.”
Note: I promised in Chapter 8 that I would show you how to solve this problem with a subquery. I keep my promises!
Select recipe title and ingredient name from the recipes table inner joined with the recipe ingredients table on recipes.recipe ID in the recipes table matches = recipe_ingredients.recipe ID in the recipe ingredients table, and then inner joined with the ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table where recipe ID is in the (selection of recipe ID from the ingredients table inner joined with the recipe ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table where ingredient name is = ‘carrot’) |
|
|
|
Note: If you place the filter for ‘carrot’ in the outer query, you will see only carrot
ingredients in the output. In this problem, you want to see all the ingredients from any recipe that uses carrots, so the subquery is a good way
to solve it. This query result appears to be sorted by recipe title even though there
is no ORDER BY clause. If you want to ensure this sequence in any database system,
be sure to include an ORDER BY clause.
CH11_Recipes_Ingredients_With_Carrots (16 rows)
RecipeTitle |
IngredientName |
Irish Stew |
Beef |
Irish Stew |
Onion |
Irish Stew |
Potato |
Irish Stew |
Carrot |
Irish Stew |
Water |
Irish Stew |
Guinness Beer |
Salmon Filets in Parchment Paper |
Salmon |
Salmon Filets in Parchment Paper |
Carrot |
Salmon Filets in Parchment Paper |
Leek |
<< more rows here >> |
I began the chapter with a definition of the three types of subqueries defined by the SQL Standard—row, table, and scalar—and recalled that I had already covered how to use table subqueries in a FROM clause. I also briefly described the use of a row subquery and explained that not many commercial implementations support this yet.
Next, I showed how to use a subquery to generate a column expression in a SELECT clause. I discussed a simple example and then introduced two aggregate functions that are useful for fetching related summary information from another table. (I’ll cover all the aggregate functions in detail in the next chapter.)
I then discussed using subqueries to create complex filters in the WHERE clause. I first covered simple comparisons and then introduced special comparison keywords—IN, SOME, ANY, ALL, and EXISTS—that are useful for building predicates with subqueries.
I summarized why subqueries are useful and provided a sample list of problems to solve using subqueries. The rest of the chapter showed examples of how to use subqueries. I broke these examples into two groups: using subqueries in column expressions and using subqueries in filters.
The following section presents several requests that you can work out on your own.
Below, I show you the request statement and the name of the solution query in the sample databases. If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your result set is the same.
1. “Display products and the latest date each product was ordered.”
(Hint: Use the MAX aggregate function.)
You can find the solution in CH11_Products_Last_Date (40 rows). Do you see any blank dates in the result? Can you explain why?
2. “List customers who ordered bikes.”
(Hint: Build a filter using IN.)
You can find the solution in CH11_Customers_Ordered_Bikes_IN (23 rows).
3. “What products have never been ordered?”
(Hint: Build a filter using NOT IN.)
You can find the solution in CH11_Products_Not_Ordered (2 rows).
1. “Show me all entertainers and the count of each entertainer’s engagements.”
(Hint: Use the COUNT aggregate function.)
You can find the solution in CH11_Entertainer_Engagement_Count (13 rows).
2. “List customers who have booked entertainers who play country or country rock.”
(Hint: Build a filter using IN.)
You can find the solution in CH11_Customers_Who_Like_Country (13 rows).
3. “Find the entertainers who played engagements for customers Berg or Hallmark.”
(Hint: Build a filter using = SOME.)
You can find the solution in CH11_Entertainers_Berg_OR_Hallmark_SOME (8 rows).
4. “Display agents who haven’t booked an entertainer.”
(Hint: Build a filter using NOT IN.)
You can find the solution in CH11_Bad_Agents (1 row).
1. “List all staff members and the count of classes each teaches.”
(Hint: Use the COUNT aggregate function.)
You can find the solution in CH11_Staff_Class_Count (27 rows).
2. “Display students enrolled in a class on Tuesday.”
(Hint: Build a filter using IN.)
You can find the solution in CH11_Students_In_Class_Tuesdays (18 rows).
3. “List the subjects taught on Wednesday.”
(Hint: Build a filter using IN.)
You can find the solution in CH11_Subjects_On_Wednesday (34 rows).
1. “Show me all the bowlers and a count of games each bowled.”
(Hint: Use the COUNT aggregate function.)
You can find the solution in CH11_Bowlers_And_Count_Games (32 rows).
2. “List all the bowlers who have a raw score that’s less than all of the other bowlers on the same team.”
(Hint: Build a filter using < ALL. Also use DISTINCT in case a bowler has multiple games with the same low score.)
You can find the solution in CH11_Bowlers_Low_Score (3 rows).
1. “Show me the types of recipes and the count of recipes in each type.”
(Hint: Use the COUNT aggregate function.)
You can find the solution in CH11_Count_Of_Recipe_Types (7 rows).
2. “List the ingredients that are used in some recipe where the measurement amount in the recipe is not the default measurement amount.”
(Hint: Build a filter using <> SOME.)
You can find the solution in CH11_Ingredients_Using_NonStandard_Measure (21 rows).