“For every complex problem, there’s an answer that is clear, simple, and wrong.”
—H. L. MENCKEN
At this point (especially if you’ve been a good student and have worked through all the sample statements and problems), you should be very comfortable with the basics of the SQL database language. Now, in the words of a famous Louisiana chef, it’s time to “kick it up a notch.” In this chapter, I’m going to walk you through more complex problems to find out when something “is not” and when something “is” under multiple conditions. In Chapter 19, “Condition Testing,” I’ll introduce you to logic testing in a Value Expression using CASE. In Chapter 20, “Using Unlinked Data and ‘Driver’ Tables,” I’ll prompt you to “think outside the box” using disconnected tables to solve problems. In Chapter 21, “Performing Complex Calculations on Groups,” I’ll show you how to get subtotals when you group data, and in Chapter 22, “Partitioning Data into ‘Windows,” I’ll lead you through looking at “windows” of related data. Let’s get started!
Remember in Chapter 7, “Thinking in Sets,” I used Venn diagrams to help you visualize how you need the overlapping part of two sets to solve “and” problems and the excluded part of two sets to solve “not” problems. Solving a problem requiring something to be “not” one criterion is easy (recipes that do not have beef), but it starts to get tough when your result must satisfy two or more “not” criteria (recipes that do not have beef OR carrots OR onions). The same is surely true when you’re looking for something that “is” with one criterion (recipes that have cheese). It gets a bit tougher when a set of things must satisfy two (recipes that have beef AND onions), and it becomes a head-scratcher when a set must satisfy three or more criteria (recipes that have beef AND onions AND carrots). And it can be mind-boggling to visualize a set of things that IS one or more criteria but is also NOT several other criteria.
Let’s take a look at the multiple “AND” (“IS”) case first because that’s easier to visualize. Figure 18-1 shows you a possible solution for recipes that have beef AND onions AND carrots.
Looks pretty simple, doesn’t it? But keep in mind that recipes with their ingredients are themselves sets of data with two or more rows. (You can’t think of a recipe that has only one ingredient, can you?) If you try to solve the problem with a search condition like this:
WHERE Ingredient IN ('Beef', 'Onions', 'Carrots')
you’ll get the wrong answer! Why? Well, remember that your database system tests each row against the search condition. If a recipe has beef OR onions OR carrots, the preceding search condition will be true. You want recipes that have all three, not just one. You need something more complex to find the recipes that have all three items, not just one, so you should state the problem like this:
“Find the recipes whose list of ingredients includes beef AND whose list of ingredients includes onions AND whose list of ingredients includes carrots.”
In Chapter 8, “INNER JOINs,” I showed you one way to search for recipes with two ingredients using individual SELECT statements inside the FROM clause (CH08_Beef_And_Garlic_Recipes). In Chapter 14, “Filtering Grouped Data,” I showed another way to do this for two ingredients with a creative use of GROUP BY and HAVING (CH14_Beef_And_Garlic_Recipes). In this chapter, I’ll show you some additional ways to tackle a problem like this.
Excluding multiple criteria involves finding all the items that DO include one of the criteria and then subtracting (removing) them all from the set of all items. If I want to find all recipes that do not have beef or onions or carrots, the Venn diagram looks like Figure 18-2.
Think of it as finding all the recipes that have beef and removing them from the set of all recipes, then finding the recipes that have onions and removing all of those, and finally finding the set of recipes that includes carrots and removing those as well. What you have left is the answer. Again, you might be tempted to solve it with a search condition like this:
WHERE Ingredient NOT IN ('Beef', 'Onions', 'Carrots')
From the previous discussion, you should be able to see why this won’t work. A search condition like the preceding one will return any recipe that has some ingredient other than beef, onions, or carrots. It will find and eliminate a recipe that has ONLY those three ingredients, but that would be a strange recipe, indeed! Because ingredients for any recipe form a set, you need to think of the problem like this:
“Find the recipes whose list of ingredients does not include beef, and whose list of ingredients does not include onions, and whose list of ingredients does not include carrots.”
Stated another way, you could also do:
“Find the recipes that are NOT in the list of recipes whose list of ingredients includes beef or onions or carrots.”
I haven’t solved this particular problem in previous chapters, but rest assured I will show you some ways to do it in this one.
Just for completeness, let’s take a quick look at the case where you want to include items that meet one or more criteria but exclude items that also meet one or more criteria. Suppose you want all recipes that have beef but do not want any recipes that have onions or carrots. Figure 18-3 shows you a possible situation for this problem.
I bet you can figure this one out on your own, but to make sure you really “get it,” you should not try to solve the problem like this:
WHERE Ingredient = 'Beef' AND Ingredient NOT IN ('Onions', 'Carrot')
A search like this will certainly find all recipes that have beef, but it will also include any recipe that has any ingredient other than onions or carrots, including all the recipes that have beef! Oops. Again, the ingredients for a recipe form a set, so you need to think of solving the problem like this:
“Find the recipes whose list of ingredients includes beef, and whose list of ingredients does not include onions, and whose list of ingredients does not include carrots.”
Let’s move on now to finding out exactly how to solve these complex “NOT” and “AND” problems.
Note: I’m going to show you several techniques for solving both “NOT” and “AND” problems
without regard to performance. In truth, some methods might perform terribly in one
database system but be the best solution in another. You can learn about advanced
techniques for discovering performance problems in Effective SQL: 61 Specific Ways to Write Better SQL (Addison-Wesley, ISBN 978-0134578897) that I wrote with Ben Clothier and Doug Steele.
Having said that, I do comment when appropriate where I think one technique is more
likely to run faster than another.
You might recall that you’ve already learned how to solve simple “not” cases. In Chapter 9, “OUTER JOINs,” I showed you, for example, how to find any ingredients not used in any recipe (CH09_Ingredients_Not_Used), customers who haven’t ordered a helmet (CH09_Customers_No_Helmets), and any agents who have no contracts (CH09_Agents_No_Contracts). In Chapter 11, “Subqueries,” I showed you how to find students who have never withdrawn (CH11_Students_Never_Withdrawn) and products not ordered (CH11_Products_Not_Ordered). Now let’s learn how to handle multiple “not” criteria using four different techniques:
• OUTER JOIN
• NOT IN
• NOT EXISTS
• GROUP BY/HAVING
Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced
in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you,
I have combined the Translation/Clean Up steps for all the following examples to simplify
the process.
In Chapter 9, you learned that you can use an OUTER JOIN with an IS NULL test to find rows in one table that do not have a matching row in another table. An example is the query to find ingredients that are not used:
“List ingredients not used in any recipe yet.”
Translation/Clean Up |
Select ingredient name from the ingredients table left outer joined with the recipe ingredients table on ingredients.ingredient ID in the ingredients table matches = recipe_ingredients.ingredient ID in the recipe ingredients table where recipe ID is null |
|
CH09_Ingredients_Not_Used (20 rows)
IngredientName |
Halibut |
Chicken, Fryer |
Bacon |
Iceberg Lettuce |
Butterhead Lettuce |
Scallop |
Vinegar |
Red Wine |
<< more rows here >> |
Notice that this works because where there is no match in the second table, your database engine returns a Null value for any column in that table. You can use the same technique to exclude rows from the second table that match certain criteria, but you must combine the OUTER JOIN technique that you learned in Chapter 9 with a subquery that you learned how to use in Chapter 11. You must do this because the second “table” that you want to match with must first be filtered by the excluding criteria.
Let’s solve the beef, onions, carrots problem using an OUTER JOIN and a subquery:
“Find the recipes that have neither beef, nor onions, nor carrots.”
Select recipe ID and recipe title from the recipes table left outer joined with the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is in the values (‘Beef’, ‘Onion’, or ‘Carrot’)) where the recipe ID in the selection is NULL empty |
|
|
|
CH18_Recipes_NOT_Beef_Onion_Carrot_OUTERJOIN (8 rows)
RecipeID |
RecipeTitle |
4 |
Garlic Green Beans |
5 |
Fettuccini Alfredo |
6 |
Pollo Picoso |
7 |
Mike’s Summer Salad |
8 |
Trifle |
10 |
Yorkshire Pudding |
12 |
Asparagus |
15 |
Coupe Colonel |
What is happening is the query to the right of the OUTER JOIN is finding the recipe ID for all recipes that have beef, onions, or carrots, then the OUTER JOIN with an IS NULL test eliminates all those recipe IDs from consideration—including only the recipes that do not match. You might be tempted to directly do a join on the recipe ingredients table and ingredients table and put the criteria for beef, onion, and carrot in the final WHERE clause. However, as I explained in Chapter 9, applying a filter to the “right” side of a “left” join (or vice-versa) effectively nullifies the “outer” part of the join. The result will be as though you had asked for an INNER JOIN, which won’t solve the problem.
You have already seen in Chapter 11 how to use NOT IN to solve simple “not” queries. For example, in the Sales Orders database, you can find a query to discover which products have never been ordered (CH11_Products_Not_Ordered). In the Entertainment database, there is a query to list agents who haven’t booked anything (CH11_Bad_Agents). But as you might suspect, it gets a bit tricky when you want to find rows using multiple “not” criteria.
Let’s solve our old friend in the Recipes database using NOT IN. First, let’s do it the hard way using three separate NOT IN clauses.
“Find the recipes that have neither beef, nor onions, nor carrots.”
Translation/Clean Up |
Select recipe ID and recipe title from the recipes table where the recipe ID is not in the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is = ‘Beef’) and the recipe ID is not in the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is = ‘Onion’) and the recipe ID is not in the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is = ‘Carrot’) |
|
Whew! This query is doing three eliminations, first eliminating all recipes that have beef, then eliminating all recipes that have onions, and finally eliminating all recipes that have carrots. (You can find this query saved as CH18_Recipes_NOT_Beef_Onion_Carrot_NOTIN_1.) But if you think about it, if you can collect all the recipes that have neither beef nor onions nor carrots in one subquery, you can do a single elimination like this:
“Find the recipes that have neither beef, nor onions, nor carrots.”
Translation/Clean Up |
Select recipe ID and recipe title from the recipes table where the recipe ID is not in the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is in the values (‘Beef’, ‘Onion’, or ‘Carrots’)) |
|
That’s really lots simpler, and, in fact, this is arguably the simplest way to solve a multiple “not” problem. It’s also very efficient because your database system will run the subquery once then use the result from that to eliminate recipes that match. You can find this query saved in the Recipes sample database as CH18_Recipes_NOT_Beef_Onion_ Carrot_NOTIN_2.
In Chapter 11, you also learned about using EXISTS and a subquery to search for related data on a single criterion. You can imagine how this can be expanded to handle multiple criteria. And it’s a simple matter to use NOT EXISTS to handle the “not” case. Let’s solve our trusty not beef-onions-carrots again using NOT EXISTS.
“Find the recipes that have neither beef, nor onions, nor carrots.”
Translation/Clean Up |
Select recipe ID and recipe title from the recipes table where does not exist the (selection of recipe IDs from the recipe ingredients table inner joined with the ingredients table on Recipe_Ingredients.recipe ID in the recipe ingredients table matches = Ingredients.recipe ID in the ingredients table where ingredient name is in the values (‘Beef’, ‘Onion’, or ‘Carrot’) and the Recipe_Ingredients.recipe ID from the recipe ingredients table matches the = Recipes.recipe ID from the recipes table) |
|
This operates similarly to the principles behind the NOT IN solution. You first find all the recipes that include beef, onions, or carrots, and then eliminate them by matching on recipe ID and using NOT EXISTS. The one drawback to this approach is the subquery must make a reference to a field in the main query. This means that your database system must execute the subquery once for every row it finds in the Recipes table—once for each unique RecipeID value. (In some more advanced books, you will find this sort of subquery called a “correlated” subquery because the subquery is, in effect, co-dependent on each row in the outer query.) You can find this query saved as CH18_Recipes_NOT_Beef_Onion_Carrot_NOTEXISTS in the Recipes sample database.
In Chapter 14 you learned how to find out if there are “n” or more rows that qualify for one or more criteria. For example, in the Entertainment database, you can find a query to show you the entertainers who play jazz and have three or more members (CH14_Jazz_Entertainers_More_Than_3).
Did it occur to you that you could test for a count of zero to find sets of data that do not qualify? Sure you can! Let’s solve our handy, not beef-onions-carrots using GROUP BY and HAVING COUNT = 0.
“Find the recipes that have neither beef, nor onions, nor carrots.”
Select recipe ID and recipe title from the recipes table left joined with the (selection of recipe ID from the recipe ingredients table inner joined with the ingredients table on Ingredients.ingredient ID in the ingredients table equals matches = Recipe_Ingredients.ingredient ID in the recipe ingredients table where ingredient name is in the values (‘beef’, ‘onion’, or ‘carrot’)) AS RIBOC on Recipes.recipe ID in the recipes table equals matches = RIBOC.recipe ID in the selection where RIBOC.recipe ID in the selection is NULL empty, then grouped by recipe ID and recipe title and having the count of RIBOC.recipe ID in the selection equals zero = 0 |
|
|
|
If you noticed that this looks a lot like the LEFT JOIN solution, you’re absolutely correct! In fact, the LEFT JOIN solution for a single table is the better method because it avoids the overhead of grouping the rows. If you want to do this sort of exclusion on a JOIN of two or more tables and some other criteria, however, using GROUP BY and COUNT is a good way to do it. Remember that you learned in Chapter 13, “Grouping Data,” that when you use COUNT (or, for that matter, any aggregate function) on a column and that column contains a Null value in some rows, the aggregate function ignores the Null values. This is why, when the LEFT JOIN returns no rows from the subquery, COUNT(RIBOC.RecipeID) = 0 works. When a recipe has no rows matching in the set of recipes that have beef, onions, or carrots, the COUNT is zero. (You can find this query saved as CH18_Recipes_NOT_Beef_Onion_Carrot_GROUPBY in the Recipes sample database.)
Let’s look at an example where the GROUP BY and HAVING make more sense:
“Find the recipes that have butter but have neither beef, nor onions, nor carrots.”
CH18_Recipes_Butter_NOT_Beef_Onion_Carrot_GROUPBY (2 rows)
RecipeID |
RecipeTitle |
5 |
Fettuccini Alfredo |
12 |
Asparagus |
Now, this makes more sense because the JOIN between Recipes, Recipe_Ingredients, and Ingredients will certainly return multiple rows, but I want only one row per recipe to appear in the final result. The GROUP BY accomplishes returning one row per recipe, and the HAVING eliminates all recipes that have beef, onions, or carrots in the ingredients.
That pretty much covers the different ways to solve “not” problems that have multiple criteria. I’ll show you some more sample statements and challenge you with some problems later in the chapter.
Now, let’s look at the other side of the coin—queries that need to find matches on multiple criteria. You had a taste of this in Chapter 8 when you learned how to find customers who have ordered both a bike and a helmet in the Sales Orders Database (CH08_Customers_Both_Bikes_And_Helmets), and in the Entertainment database to discover entertainers who played for both Berg and Hallmark (CH08_Entertainers_Berg_AND_Hallmark). Let’s explore the many ways to solve this type of problem in more detail:
• INNER JOIN
• IN
• EXISTS
• GROUP BY/HAVING
Remember from Chapter 7 that you can find matching items in two sets by performing an intersection of the two sets. I also told you that it’s most common when working in SQL to perform an intersection on key values using an INNER JOIN. Because the Primary Key of each row in a table uniquely identifies each row, an intersection on Primary Key values will show you the rows that are common to two sets.
So, one way to find rows that match multiple criteria is to create a set of data (using a subquery) for each criterion and then JOIN the multiple sets on Primary Key values. Let’s work through an example from the Entertainment database:
“List the customers who have booked Carol Peacock Trio, Caroline Coie Cuartet, and Jazz Persuasion.”
CH18_Customers_Peacock_Coie_Jazz_INNERJOIN (2 rows)
CustomerID |
CustFirstName |
CustLastName |
10004 |
Dean |
McCrae |
10010 |
Zachary |
Ehrlich |
The three SELECT expressions in the FROM clause fetch the three sets I want—one for customers who booked Carol Peacock Trio; one for customers who booked Caroline Coie Cuartet; and one for customers who booked Jazz Persuasion. I included the customer name fields in the first query so that I can display those fields in the final result, but all I need in the second and third queries is the CustomerID field (the Primary Key of the Customers table) to perform the JOIN to find out who booked all three groups. Finally, I used the DISTINCT keyword to eliminate any duplicate rows produced when a customer booked one of the entertainers multiple times.
If you look back in Chapter 8, you’ll find I use the same technique to solve CH08_Entertainers_Berg_AND_Hallmark. The only difference is that I used DISTINCT in each of the subqueries instead of in the outer SELECT statement.
Let’s solve our customers booking three entertainment groups problem using IN. When you want to find a match on multiple criteria using IN, you might be tempted to do it this simple way:
SELECT Customers.CustomerID, Customers.CustFirstName,
Customers.CustLastName
FROM Customers
WHERE Customers.CustomerID IN
(SELECT Customers.CustomerID
FROM (Customers INNER JOIN Engagements
ON Customers.CustomerID = Engagements.CustomerID)
INNER JOIN Entertainers
ON Engagements.EntertainerID = Entertainers.
EntertainerID
WHERE Entertainers.EntStageName IN
('Carol Peacock Trio', 'Caroline Coie Cuartet',
'Jazz Persuasion'))
Why won’t this work? The answer is you’ll get any customer who booked any of the three groups. You won’t get only the customers who booked all three groups! You can find this query saved as CH18_Customers_Peacock_Coie_Jazz_IN_WRONG in the Entertainment sample database.
Remember that to find the customers who booked all three, you need an intersection of three sets: one for the customers who booked Carol Peacock Trio; one for the customers who booked Caroline Coie Cuartet; and one for customers who booked Jazz Persuasion. To solve this with IN, you need three IN clauses, and you must find the customers who are IN the first set AND IN the second set AND IN the third set. Let’s take a whack at it:
“List the customers who have booked Carol Peacock Trio, Caroline Coie Cuartet, and Jazz Persuasion.”
Translation/Clean Up |
Select customer ID, customer first name, and customer last name from the customers table where customerID is in the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Carol Peacock Trio’) and customer id is in the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Caroline Coie Cuartet’) and customer id is in the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Jazz Persuasion’) |
|
You should get the same two rows that you found in the solution for INNER JOIN. I have specifically spaced out the three subqueries in the preceding SQL so that you can clearly see how to fetch the three sets. You can find this query in the Entertainment sample database saved as CH18_Customers_Peacock_Coie_Jazz_IN_RIGHT.
To solve our customers who booked three specific groups problem using EXISTS, you’ll use a technique similar to the one you used to solve the problem using IN. The key difference is that each of your subqueries must also match on customer ID. Because you’re testing for the existence of each set, each set must match the customer ID being examined in the current row. Here’s how to do it:
“List the customers who have booked Carol Peacock Trio, Caroline Coie Cuartet, and Jazz Persuasion.”
Translation/Clean Up |
Select customer ID, customer first name, and customer last name from the customers table where there exists the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Carol Peacock Trio’ and the Engagements.customer ID in the engagements table equals the = Customers.customer ID in the customers table) and there exists the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Caroline Coie Cuartet’ and the Engagements.customer ID in the engagements table equals the = Customers. customer ID in the customers table) and there exists the (selection of customer id from the engagements table inner joined with the entertainers table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table where Entertainers.entertainer stage name in the entertainers table equals = ‘Jazz Persuasion’ and the Engagements.customer ID in the engagements table equals the = Customers.customer ID in the customers table) |
|
|
This operates similarly to the principles behind the IN solution. You find the three sets of customers who have booked each of the groups and test using EXISTS. The one drawback to this approach is that the subqueries must make a reference to a field in the main query. This means that your database system must execute each of the subqueries once for every row it finds in the Customers table—once for each unique CustomerID value. (In some more advanced books, you will find this sort of subquery called a “correlated” subquery because the subquery is, in effect, co-dependent on each row in the outer query.) You can find this query saved as CH18_Customers_Peacock_Coie_Jazz_EXISTS in the Entertainment sample database.
You could try to solve your customers who booked three specific entertainers using GROUP BY and HAVING, but it would be difficult. When I did it for recipes and ingredients, I knew that any one ingredient appears only once in the Recipe_Ingredients table. That’s not the case for customers and entertainers because a customer can choose to book an entertainer more than once. Sure, I could do something with groupings on SELECT DISTINCT, but why bother when there are several other ways to solve the problem?
Instead, let’s tackle an interesting problem in the Entertainment sample database that is really best solved with GROUP BY and HAVING. Here’s the problem:
“Display customers and groups where the musical styles of the group match all of the musical styles preferred by the customer.”
This is a “match many” problem because each customer potentially has told the agency that there are several styles that they prefer. The difficulty is the “many” isn’t a fixed list—the list of potential matches changes with each customer!
Let’s take a look at the tables you need to see how you might begin to construct the request. Figure 18-4 shows the tables you need to find entertainers and all the styles they play, and Figure 18-5 shows the tables you need to find customers and all the styles they prefer.
Do you see any column that is common in the two sets of tables? How about the StyleID column? In fact, you probably don’t need the Musical_Styles table at all unless you also want to list the matching style. If you look at the full diagram for the Entertainment Agency sample database, you won’t see a direct relationship between StyleID in the Musical_Preferences table and StyleID in the Entertainer_Styles table. However, it’s perfectly legal to ask for a JOIN between those two tables on StyleID because the columns in both tables are the same data type. It’s also logical to do a JOIN this way because the columns you need to use in the JOIN have the same meaning. You want to find all styles that match between customers and entertainers, and you specifically want to find the matches where the number (COUNT) of matches between the two equals the total number of styles preferred by the customer. Let’s get started:
“Display customers and groups where the musical styles of the group match all of the musical styles preferred by the customer.”
Note: You could get the same result by doing:
HAVING Count(Musical_Preferences.StyleID) =
(SELECT Count(*)
FROM Entertainer_Styles
WHERE Entertainer_Styles.EntertainerID =
Entertainers.EntertainerID)
CH18_Entertainers_Fully_Match_Customers_Style (8 rows)
CustomerID |
CustFirstName |
CustLastName |
EntertainerID |
EntStageName |
CountOfStyleID |
10002 |
Deb |
Waldal |
1003 |
JV & the Deep Six |
2 |
10003 |
Peter |
Brehm |
1002 |
Topazz |
2 |
10005 |
Elizabeth |
Hallmark |
1009 |
Katherine Ehrlich |
2 |
Elizabeth |
Hallmark |
1011 |
Julia Schnebly |
2 |
|
10008 |
Darren |
Gehring |
1001 |
Carol Peacock Trio |
2 |
10010 |
Zachary |
Ehrlich |
1005 |
Jazz Persuasion |
3 |
10012 |
Kerry |
Patterson |
1001 |
Carol Peacock Trio |
2 |
10013 |
Estella |
Pundt |
1005 |
Jazz Persuasion |
2 |
This works because each customer or entertainer has a style listed only once. Note that you don’t need to know how many styles you have to match on—the query does that for you. I included the CountOfStyleID column only to demonstrate that the number of style preferences varies from customer to customer. Imagine what a sales tool this would be when one of the customers in the list calls up asking for a group recommendation. The agent can confidently recommend at least one group per customer where the group plays all the styles the customer prefers.
You now know the mechanics of constructing queries that solve complex “not” and “and” questions and have seen some of the types of requests you can answer. Let’s take a look at a fairly robust set of samples that solve a variety of “not” and “and” problems. These examples come from each of the sample databases, and they illustrate the use of the JOINs, IN, EXISTS, and grouping to find answers requiring multiple search criteria.
Note: Remember in the Introduction that I warned you that results from each database system
won’t necessarily match the sort order you see in examples in this book unless you
include an ORDER BY clause.
Even when you include that specification, the system might return results in columns not included in the ORDER BY clause in a different sequence because of different optimization techniques.
If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. You must open the design of the view and execute it from there to see the ORDER BY clause honored.
Also, when you use GROUP BY, you’ll often see the results returned by your database system as though the rows are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.
I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for the book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), using “CH18” as the leading part of the query or view name. You can follow the instructions at the beginning of this book to load the samples onto your computer and try them out.
Note: Remember that all of the field names and table names used in these examples are drawn
from the sample database structures shown in Appendix B, “Schema for the Sample Databases.”
Because many of these examples use complex joins, the optimizer for your database system may choose a different way to solve these queries. For this reason, the first few rows I show you may not exactly match the result you obtain, but the total number of rows should be the same.
“Find all the customers who ordered a bicycle and also ordered a helmet.”
Note: In Chapter 8, I solved this problem using an INNER JOIN of two SELECT DISTINCT subqueries. Here,
I solve it using EXISTS.
Translation/Clean Up |
Select customer ID, customer first name, and customer last name from the customers table where there exists some row in (SELECT * FROM the orders table inner joined with the order details table on orders.order number in the orders table equals = order_details.order number in the order details table, and then inner joined with the products table on products.product ID in the products table equals = order_details.product ID in the order details table where product name contains LIKE ‘%Bike’ and Orders.customer ID in the orders table equals = the Customers.customer ID in the customers table), and there also exists some row in (SELECT * FROM the orders table inner joined with the order details table on orders.order ID in the orders table equals = order_details.order ID in the order details table, and then inner joined with the products table on products.product ID in the products table equals = order_details.product ID in the order details table where product name contains LIKE ‘%Helmet’ and the Orders.customer ID in the orders table equals = the Customers.customer ID in the customers table) |
|
|
CH18_Cust_Bikes_And_Helmets_EXISTS (21 rows)
CustomerID |
CustFirstName |
CustLastName |
1002 |
William |
Thompson |
1004 |
Robert |
Brown |
1005 |
Dean |
McCrae |
1006 |
John |
Viescas |
1007 |
Mariya |
Sergienko |
1008 |
Neil |
Patterson |
1009 |
Andrew |
Cencini |
1010 |
Angel |
Kennedy |
1012 |
Liz |
Keyser |
1013 |
Rachel |
Patterson |
<< more rows here >> |
“Find all the customers who have not ordered either bikes or tires.”
Note: I simplified this a bit because I know the category ID for bikes is 2, and the category
ID for tires is 6. If I didn’t know this, I should have included an additional JOIN
to the Categories table and then looked for ‘Bikes’ and ‘Tires’.
Select customer ID, customer first name, and customer last name from the customers table where customer ID is not in the (selection of customer ID from the orders table inner joined with the order details table on Orders.order number in the orders table equals = Order_Details.order number in the order details table, and then inner joined with the products table on Products.product ID in the products table equals = Order_Details.product ID in the order details table where product category is = 2), and customer ID is not in the (selection of customer ID from the orders table inner joined with the order details table on Orders.order number in the orders table equals = Order_Details.order number in the order details table, and then inner joined with the products table on Products.product ID in the products table equals = Order_Details.product ID in the order details table where product category ID is = 6) |
|
|
|
CH18_Customers_Not_Bikes_Or_Tires_NOTIN_2 (2 rows)
CustomerID |
CustFirstName |
CustLastName |
1022 |
Caleb |
Viescas |
1028 |
Jeffrey |
Tirekicker |
Note: I would expect Jeffrey Tirekicker to show up in any query that asks for customers
who haven’t bought certain items because this customer has never bought anything!
See CH18_Customers_No_Orders_JOIN and CH18_Customers_No_Orders_NOT_IN to verify this.
“List the entertainers who played engagements for customers Berg and Hallmark.”
Note: I solved this problem in Chapter 8 with a JOIN of two complex table subqueries. This time, I’ll use EXISTS.
CH18_Entertainers_Berg_AND_Hallmark_EXISTS (4 rows)
EntertainerID |
EntStageName |
1001 |
Carol Peacock Trio |
1003 |
JV & the Deep Six |
1006 |
Modern Dance |
1008 |
Country Feeling |
“Display agents who have never booked a Country or Country Rock group.”
Translation/Clean Up |
Select agent ID, agent first name, and agent last name from the agents table where agent ID is not in the (selection of agent ID from the engagements table inner joined with the engagements table on Engagements.entertainer ID in the engagements table equals = Entertainers.entertainer ID in the entertainers table, and then inner joined with the entertainer styles table on Entertainers.entertainer ID in the entertainers table equals = Entertainer_Styles.entertainer ID in the entertainer styles table, and then inner joined with the musical styles table on Entertainer_Styles.style ID in the entertainer styles table equals = Musical_Styles.style ID in the musical styles table where style name is in (‘Country’, or ‘Country Rock’)) |
|
CH18_Agents_Not_Book_Country_CountryRock (3 rows)
AgentID |
AgtFirstName |
AgtLastName |
2 |
Scott |
Bishop |
8 |
Maria |
Patterson |
9 |
Daffy |
Dumbwit |
Note: I would expect Daffy Dumbwit to show up in any query that asks for agents who haven’t
booked certain items because this agent has never booked anything!
“List students who have a grade of 85 or better in both art and computer science.”
Note: I showed you how to solve this problem in Chapter 8 with an INNER JOIN of two DISTINCT subqueries. Here’s how to solve it using IN.
Select student ID, student first name, and student last name from the students table where student ID is in the (selection of student ID from the student schedules table inner joined with the classes table on Classes.student ID in the classes table equals = Student_Schedules.student ID in the student schedules table, then inner joined with the subjects table on Subjects.subject ID in the subjects table equals = Classes.subject ID in the classes styles table, and then inner joined with the categories table on Categories.category ID in the categories table equals = Subjects.category ID in the subjects table where category description is equal to = ‘art’ and grade is greater than or equal to >= 85) and student ID is in the (selection of student ID from the student schedules table inner joined with the classes table on Classes.student ID in the classes table equals = Student_Schedules.student ID in the student schedules table, then inner joined with the subjects table on Subjects.subject ID in the subjects table equals = Classes.subject ID in the classes styles table, and then inner joined with the categories table on Categories.category ID in the categories table equals = Subjects.category ID in the subjects table where category description contains LIKE ‘%computer%’ and grade is greater than or equal to >= 85) |
|
|
|
CH18_Good_Art_CS_Students_IN (1 row)
StudentID |
StudFirstName |
StudLastName |
1011 |
John |
Kennedy |
“Show me students registered for classes for which they have not completed the prerequisite course.”
Note: This is an interesting combination of “and” and “not.” The query needs to compare
an unknown number of classes for which a student has registered with an unknown number
of those classes that have prerequisites for which the student has not previously
registered or completed. (The problem assumes that it’s OK if a student is concurrently
registered for a prerequisite course.)
Let’s restate that so it’s a bit clearer how you should solve this problem.
“Show the students and the courses for which they are registered that have prerequisites for which there is not a registration for this student in the prerequisite course (and the student did not withdraw) with a start date of the prerequisite course that is equal to or earlier than the current course.”
Select student ID, student first name, student last name start date, subject code, subject name, and subject prereq from the students table inner joined with the student schedules table on Students.student ID in the students table equals = Student_Schedules.student ID in the student schedules table, then inner joined with the classes table on Classes.class ID in the classes table equals = Student_Schedules.class ID in the student schedules table, and then inner joined with the subjects table on Subjects.subject ID in the subjects table equals = Classes.subject ID in the classes table where subject prereq is not null and subject prereq is not in the (selection of subject code from the subjects table inner joined with the classes table aliased as c2 on Subjects.subject ID in the subjects table equals = C2.subject ID in the c2 aliased table, and then inner joined with the student schedules table on C2.class ID in the c2 aliased table equals = Student_Schedules.class ID in the student schedules table, and then inner joined with the student class status table on Student_Schedules.class status in the student schedules table equals = Student_Class_Status.class status in the student class status table where class status description does not equal <> ‘withdrew’ and Student_Schedules.student ID in the student schedules table equals = Students.student ID in the students table and C2.start date in the aliased c2 table is less than or equal to <= Classes.start date in the classes table) |
|
|
|
CH18_Students_Missing_Prerequisites (5 rows)
StudentID |
StudFirst Name |
StudLast Name |
StartDate |
Subject Code |
SubjectName |
Prerequisite |
1005 |
Doris |
Hartwig |
2017-09-11 |
ENG 102 |
Composition- Intermediate |
ENG 101 |
1007 |
Elizabeth |
Hallmark |
2017-09-11 |
ENG 102 |
Composition-Intermediate |
ENG 101 |
1012 |
Sarah |
Thompson |
2017-09-11 |
ENG 102 |
Composition-Intermediate |
ENG 101 |
1014 |
Kendra |
Bonnicksen |
2017-09-11 |
ENG 102 |
Composition-Intermediate |
ENG 101 |
1018 |
Richard |
Lum |
2017-09-11 |
ENG 102 |
Composition-Intermediate |
ENG 101 |
“List the bowlers, the match number, the game number, the handicap score, the tournament date, and the tournament location for bowlers who won a game with a handicap score of 190 or less at Thunderbird Lanes, Totem Lanes, and Bolero Lanes.”
Note: You first need to find all bowlers who won a game with a handicap score of 190 or
less at one of the three locations, then verify that the bowler ID is also in the
list of bowlers who won a game with a handicap score of 190 or less at each of the
three locations. (Remember, not in (a, b, c) but in (a) AND in (b), AND in (c).)
CH18_Bowlers_Won_LowScore_TBird_Totem_Bolero_RIGHT (11 rows)
Bowler ID |
Bowler FirstName |
Bowler LastName |
Match ID |
Game Number |
HandiCap Score |
Tourney Date |
Tourney Location |
13 |
Elizabeth |
Hallmark |
10 |
1 |
189 |
2017-09-18 |
Bolero Lanes |
13 |
Elizabeth |
Hallmark |
24 |
3 |
190 |
2017-10-09 |
Totem Lanes |
13 |
Elizabeth |
Hallmark |
34 |
1 |
189 |
2017-10-30 |
Thunderbird Lanes |
19 |
John |
Viescas |
7 |
3 |
185 |
2017-09-11 |
Thunderbird Lanes |
John |
Viescas |
12 |
1 |
181 |
2017-09-18 |
Bolero Lanes |
|
19 |
John |
Viescas |
36 |
1 |
179 |
2017-10-30 |
Thunderbird Lanes |
19 |
John |
Viescas |
52 |
2 |
185 |
2017-11-27 |
Totem Lanes |
25 |
Megan |
Patterson |
7 |
1 |
188 |
2017-09-11 |
Thunderbird Lanes |
25 |
Megan |
Patterson |
21 |
1 |
189 |
2017-10-09 |
Totem Lanes |
25 |
Megan |
Patterson |
35 |
1 |
187 |
2017-10-30 |
Thunderbird Lanes |
25 |
Megan |
Patterson |
39 |
2 |
181 |
2017-11-06 |
Bolero Lanes |
Note: You can also find the incorrect IN solution saved as CH18_Bowlers_Won_LowScore_TBird_Totem_Bolero_WRONG
in the Bowling League sample database.
“Show me the bowlers who have not bowled a raw score better than 165 at Thunderbird Lanes and Bolero Lanes.”
CH18_Bowlers_LTE_165_Thunderbird_Bolero (15 rows)
BowlerID |
BowlerLastName |
BowlerFirstName |
1 |
Fournier |
Barbara |
4 |
Sheskey |
Sara |
5 |
Patterson |
Ann |
8 |
Viescas |
Stephanie |
9 |
Black |
Alastair |
12 |
Viescas |
Carol |
13 |
Hallmark |
Elizabeth |
16 |
Sheskey |
Richard |
17 |
Hernandez |
Kendra |
20 |
Viescas |
Suzanne |
<< more rows here >> |
“Display the ingredients that are not used in the recipes for Irish Stew, Pollo Picoso, and Roast Beef.”
Translation/Clean Up |
Select ingredient ID, and ingredient name from the ingredients table where ingredient ID is not in the (selection of ingredient ID from the recipe ingredients table inner joined with the recipes table on Recipe_Ingredients.recipe ID in the recipe ingredients table equals = Recipes.recipe ID in the recipes table where recipe title is in the list of (‘Irish Stew’, ‘Pollo Picoso’, and ‘Roast Beef’)) |
|
CH18_Ingredients_NOTIN_IrishStew_PolloPicoso_RoastBeef (67 rows)
IngredientID |
IngredientName |
7 |
Tomato |
8 |
Jalapeno |
12 |
Halibut |
13 |
Chicken, Fryer |
14 |
Bacon |
15 |
Romaine Lettuce |
16 |
Iceberg Lettuce |
17 |
Butterhead Lettuce |
18 |
Scallop |
19 |
Salmon |
<< more rows here >> |
“List the pairs of recipes where both recipes have at least the same three ingredients.”
Note: This is similar to the query I showed you earlier matching customers and entertainers
who play all the customer’s preferred styles. Do you suppose you need two copies of
the recipes and recipe ingredients tables?
Select Recipes.recipe ID and Recipes.recipe title in the first copy of the recipes table and R2.recipe ID AS R2ID and R2.recipe title AS R2Title in the second copy of the recipes table, and the count of (Recipe_Ingredients.recipe ID) AS CountOfRecipeID in the first copy of the recipe ingredients table from the recipes table inner joined with the recipe ingredients table on Recipes.recipe ID in the recipes table equals = Recipe_Ingredients.recipe ID in the recipe ingredients table, then inner joined with the second copy of the recipe ingredients table AS RI2 on Recipe_Ingredients.ingredient ID in the recipe ingredients table equals = RI2.ingredient ID in the second copy of the recipe ingredients table, then inner joined with the second copy of the recipes table AS R2 on R2.recipe ID in the second copy of the recipes table equals = RI2.recipe ID in the second copy of the recipe ingredients table where RI2.recipe ID in the second copy of the recipe ingredients table is greater than > Recipes.recipe ID in the first copy of the recipes table, grouped by Recipes.recipe ID in the first copy of the recipes table, Recipes.recipe title in the first copy of the recipes table, R2.recipe ID in the second copy of the recipes table, and R2.recipe title in the second copy of the recipes table, and having the count of matching (Recipe_Ingredients.ingredient ID) in the recipe ingredients table greater than > 3 |
|
|
|
CH18_Recipes_AtLeast_3_Same_Ingredients (4 rows)
RecipeID |
RecipeTitle |
R2ID |
R2Title |
CountOf RecipeID |
2 |
Salsa Buena |
11 |
Huachinango Veracruzana (Red Snapper, Veracruz style |
3 |
2 |
Salsa Buena |
13 |
Tourtière (French-Canadian Pork Pie) |
3 |
6 |
Pollo Picoso |
9 |
Roast Beef |
3 |
11 |
Huachinango Veracruzana (Red Snapper, Veracruz style) |
13 |
Tourtière (French-Canadian Pork Pie) |
4 |
Note: I threw in the check to make sure the ID of the second recipe is always higher than
the ID of the first so that I don’t get a pair of recipes listed twice. Following
the SQL Standard, I could have put that filter in the JOIN on the two copies of the
Recipe_Ingredients tables, but I chose to put the filter in the WHERE clause to ensure
compatibility with most database systems.
I began the chapter with a review of sets to help you get a picture of how you go about solving problems that involve multiple “not” and “and” criteria. I then followed that with an extensive review of four different ways to approach solving problems with multiple “not” criteria, including OUTER JOIN, NOT IN, NOT EXISTS, and GROUP BY/HAVING.
I then covered four different ways to think about solving problems with multiple “and” criteria: INNER JOIN, IN, EXISTS, and GROUP BY/HAVING. To help cement the concepts, I provided five sets of sample statements for each of the sample databases. I was careful to include one “and” example and one “not” example for each sample database.
The following section presents several requests that you can work out on your own.
Below, I show you the request statement and the name of the solution query in the sample databases. If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your Result Set is the same.
1. “Display the customers who have never ordered bikes or tires.”
I showed you how to solve this earlier using NOT IN on multiple subqueries. Can you figure out a way to solve it more simply using NOT IN? You can find the solution in CH18_Customers_Not_Bikes_Or_Tires_NOTIN_1 (2 rows).
2. “List the customers who have purchased a bike but not a helmet.”
First, solve this problem using EXISTS and NOT EXISTS. The solution is in CH18_Cust_Bikes_No_Helmets_EXISTS (2 rows). For extra credit, solve the problem using IN and NOT IN. You can find the solution in CH18_Customer_Bikes_No_Helmets (2 rows).
3. “Show me the customer orders that have a bike but do not have a helmet.”
This might seem to be the same as problem 2 above, but it’s not. Show me the orders, not the customers. Solve it using EXISTS and NOT EXISTS. You can find the solution in CH18_Orders_Bikes_No_Helmet_EXISTS (402 rows).
4. “Display the customers and their orders that have a bike and a helmet in the same order.”
Solve this problem using EXISTS. You can find the solution in CH18_Customers_Bikes_And_Helmets_Same_Order (184 rows).
5. “Show the vendors who sell accessories, car racks, and clothing.”
Solve this problem using IN. You can find the solution in CH18_Vendors_Accessories_CarRacks_Clothing (3 rows).
1. “List the entertainers who play the Jazz, Rhythm and Blues, and Salsa styles.”
Solve the problem using IN, but be careful to not take the easy way out! You can find the solution in CH18_Entertainers_Jazz_RhythmBlues_Salsa_IN (1 row). CH18_Entertainers_Jazz_RhythmBlues_Salsa_IN_WRONG shows you the incorrect IN solution (4 rows). For extra credit, solve the problem using GROUP BY and HAVING. You can find the solution in CH18_Entertainers_Jazz_RhythmBlues_Salsa_HAVING (1 row).
2. “Show the entertainers who did not have a booking in the 90 days preceding May 1, 2018.”
You can solve this problem using NOT IN, but be careful to use the date and time function appropriate for your database system. You can find the solution in CH18_Entertainers_Not_Booked_90Days_Before_May1_2018 (2 rows).
3. “Display the customers who have not booked Topazz or Modern Dance.”
You can solve this problem in a couple of different ways using NOT IN. You can find one solution in CH18_Customers_Not_Booked_Topazz_Or_ModernDance (6 rows).
4. “List the entertainers who have performed for Hartwig, McCrae, and Rosales.”
There are several ways to solve this. You can find the solution using EXISTS in CH18_Entertainers_Hartwig_McCrae_AND_Rosales_EXISTS (2 rows).
5. “Display the customers who have never booked an entertainer.” “Show the entertainers who have no bookings.”
You can solve both problems using a simple NOT IN. You can find the solutions in CH18_Customers_No_Bookings_NOT_IN (2 rows), and CH18_Entertainers_Never_Booked_NOT_IN (1 row).
1. “Show students who have a grade of 85 or better in both Art and Computer Science.”
I showed you earlier how to solve this problem using IN. Now solve it using EXISTS. You can find the solution in CH18_Good_Art_CS_Students_EXISTS (1 row).
2. “Display the staff members who are teaching classes for which they are not accredited.”
The trick is to find rows in the faculty classes table that are not in the faculty subjects table.
You can find the solution in CH18_Staff_Teaching_NonAccredited_Classes (4 rows).
3. “List the students who have passed all completed classes with a grade of 80 or better.”
As you might suspect, this is best done with GROUP BY and HAVING. You can find the solution in CH18_Students_Passed_All_Grade_GTE_80 (3 rows).
4. Solve three of the following simple NOT problems: “Find classes with no students.” “Display staff members not teaching.” “Show which students have never withdrawn.” “List students not currently enrolled.” “Find subjects that have no faculty assigned.”
You can find the solutions in: CH18_Classes_No_Students_Enrolled_NOT_IN (118 rows), CH18_Staff_Not_Teaching_EXISTS (5 rows), CH18_Students_Never_Withdrawn_EXISTS (16 rows), CH18_Students_Not_Currently_Enrolled_NOT_IN (2 rows), and CH18_Subjects_No_Faculty_NOT_IN (1 row).
1. “Display the bowlers who have never bowled a raw score greater than 150.”
You can find one way to solve this in CH18_Mediocre_Bowlers (7 rows).
2. “Show the bowlers who have a raw score greater than 170 at both Thunderbird Lanes and Bolero Lanes.”
I have shown you how to solve this using an INNER JOIN of SELECT DISTINCT queries. Now solve it using EXISTS. You can find the solution in CH18_Good_Bowlers_TBird_And_Bolero_EXISTS (11 rows).
3. “List the tournaments that have not yet been played.”
This is an easy one to solve using NOT IN. You can find the solution in CH18_Tourney_Not_Yet_Played_NOT_IN (6 rows).
1. “Show me the recipes that have beef and garlic.”
Solve the problem this time using EXISTS. You can find the solution in CH18_Recipes_Beef_And_Garlic (1 row).
2. “List the recipes that have beef, onion, and carrot.”
This time, solve the problem using IN, but do it carefully! You can find the solution in CH18_Recipes_Beef_Onion_Carrot (1 row).
3. “Which recipes use no dairy products (cheese, butter, dairy)?”
Solve this using NOT IN, but be careful you do it correctly. You can find the correct solution in CH18_Recipes_No_Dairy_RIGHT (10 rows). If you did it incorrectly, your solution might look like CH18_Recipes_No_Dairy_WRONG (15 rows).
4. Solve both of the following using NOT IN: “Display ingredients not used in any recipe.” “Show recipe classes for which there is no recipe.”
You can find the solution in CH18_Ingredients_No_Recipe (20 rows) and CH18_Recipe_Classes_No_Recipes_NOT_IN (1 row).