14 Filtering Grouped Data

14
Filtering Grouped Data

“Let schoolmasters puzzle their brain, With grammar, and nonsense, and learning; Good liquor, I stoutly maintain, Gives genius a better discerning.”

—OLIVER GOLDSMITH

Topics Covered in This Chapter

A New Meaning of “Focus Groups”

Where You Filter Makes a Difference

Uses for HAVING

Sample Statements

Summary

Problems for You to Solve

In Chapter 12, “Simple Totals,” I gave you the details about all the aggregate functions defined in the SQL Standard. I followed that up in Chapter 13, “Grouping Data,” with a discussion of how to ask your database system to group sets of rows and then calculate aggregate values in each group. One of the advantages to grouping is that you can also display value expressions based on the grouping columns to identify each group.

In this chapter, I’ll put the final piece of the summarizing and grouping puzzle into place. After you group rows and calculate aggregate values, it’s often useful to filter further the final result using a predicate on an aggregate calculation. As you will soon see, you need the last piece of this puzzle—the HAVING clause—to do that.

A New Meaning for “Focus Groups”

You now know that once you’ve gathered your information into groups of rows, you can request the MIN, MAX, AVG, SUM, or COUNT of all the values in each group. Suppose you want to refine further the final result set—to focus the groups—by testing one of the aggregate values. Let’s take a look at a simple request:

“Show me the entertainer groups that play in a jazz style and have more than three members.”

Doesn’t sound too difficult, does it? Figure 14-1 shows the tables needed to solve this request.

Figure 14-1 The tables needed to figure out which entertainers play jazz and also have more than three members

Note: I again use the “Request/Translation/Clean Up/SQL” technique introduced in Chapter 4, “Creating a Simple Query.” I also use some JOIN and subquery techniques you learned in Chapter 8, “INNER JOINs”; Chapter 9, “OUTER JOINs”; and Chapter 11, “Subqueries.”

Without knowing about the HAVING clause, you’d probably be tempted to solve it in the following incorrect manner:

Translation

Select the entertainer stage name and the count of members from the entertainers table joined with the entertainer members table on entertainer ID in the entertainers table matches entertainer ID in the entertainer members table where the entertainer ID is in the selection of entertainer IDs from the entertainer styles table joined with the musical styles table on style ID in the entertainer styles table matches style ID in the musical styles table where the stylename is ‘Jazz’ and where the count of the members is greater than 3, grouped by entertainer stage name

Clean Up

Select the entertainer stage name and the count(*) of members as CountOfMembers from the entertainers table inner joined with the entertainer members table on entertainers.entertainer ID in the entertainers table matches = entertainer_members.entertainer ID in the entertainer members table where the entertainer ID is in the (selection of entertainer IDs from the entertainer styles table inner joined with the musical styles table on entertainer_styles.style ID in the entertainer styles table matches = musical_styles.style ID in the musical styles table where the style name is = ‘Jazz’) and where the count(*) of the members is greater than > 3, grouped by entertainer stage name

SQL

SELECT Entertainers.EntStageName,

COUNT(*) AS CountOfMembers

FROM Entertainers

INNER JOIN Entertainer_Members

ON Entertainers.EntertainerID =

Entertainer_Members.EntertainerID

WHERE Entertainers.EntertainerID

IN

(SELECT Entertainer_Styles.EntertainerID

FROM Entertainer_Styles

INNER JOIN Musical_Styles

ON Entertainer_Styles.StyleID =

Musical_Styles.StyleID

WHERE Musical_Styles.StyleName = 'Jazz')

AND COUNT(*) > 3

GROUP BY Entertainers.EntStageName

What’s wrong with this picture? The key is that any column you reference in a WHERE clause (remember Chapter 6, “Filtering Your Data”?) must be a column in one of the tables defined in the FROM clause. Is COUNT(*) a column generated from the FROM clause? I don’t think so! In fact, you can calculate COUNT for each group only after the rows are grouped.

Looks like I need a new clause after GROUP BY. Figure 14-2 shows the entire syntax for a SELECT statement, including the new HAVING clause.

Figure 14-2 The SELECT statement and all its clauses

Because the HAVING clause acts on rows after they have been grouped, the SQL Standard defines some restrictions on the columns you reference in any predicate in the search condition. Note that when you do not have a GROUP BY clause, the HAVING clause operates on all rows returned by the FROM and WHERE clauses as though they are a single group. Frankly, I can’t think of a good reason why you would want to construct a request with a HAVING clause and no GROUP BY clause.

The restrictions are the same as those for columns referenced in the SELECT clause of a grouped query. Any reference to a column in a predicate within the search condition of a HAVING clause either must name a column listed in the GROUP BY clause or must be enclosed within an aggregate function. This makes sense because any column comparisons must use something generated from the grouped rows—either a grouping value or an aggregate calculation across rows in each group.

Now that you know a bit about HAVING, let’s solve the earlier problem in the correct way:

“Show me the entertainer groups that play in a jazz style and have more than three members.”

Translation

Select the entertainer stage name and the count of members from the entertainers table joined with the entertainer members table on entertainer ID in the entertainers table matches entertainer ID in the entertainer members table then inner joined with the entertainer styles table on style ID in the entertainers table matches style ID in the entertainer styles table, then inner joined with the musical styles table on style ID in the entertainer styles table matches style ID in the musical styles table where the style name is ‘Jazz,’ grouped by entertainer stage name, and having the count of the members greater than 3

Clean Up

Select the entertainer stage name and the count(*) of members as CountOfMembers from the entertainers table inner joined with the entertainer members table on entertainers.entertainer ID in the entertainers table matches = entertainer_members.entertainer ID in the entertainer members table then inner joined with the entertainer styles table on entertainers.style ID in the entertainers table matches = entertainer_styles.style ID in the entertainer styles table, then inner joined with the musical styles table on entertainer_styles.style ID in the entertainer styles table matches = musical_styles.style ID in the musical styles table where the style name is = ‘Jazz’, grouped by entertainer stage name, and having the count(*) of the members greater than > 3

SQL

SELECT Entertainers.EntertainerID,

Entertainers.EntStageName,

Count(Entertainer_Members.EntertainerID)

AS CountOfMembers

FROM ((Entertainers INNER JOIN Entertainer_Members

ON Entertainers.EntertainerID =

Entertainer_Members.EntertainerID)

INNER JOIN Entertainer_Styles

ON Entertainers.EntertainerID =

Entertainer_Styles.EntertainerID)

INNER JOIN Musical_Styles

ON Musical_Styles.StyleID =

Entertainer_Styles.StyleID

WHERE Musical_Styles.StyleName = 'Jazz'

GROUP BY Entertainers.EntertainerID,

Entertainers.EntStageName

HAVING

Count(Entertainer_Members.EntertainerID) > 3

Although I also included the COUNT in the final output of the request, I didn’t need to do that to ask for COUNT(*) in the HAVING clause. As long as any calculated value or column reference I use in HAVING can be derived from the grouped rows, I’m OK. I saved this query in the Entertainment Agency sample database as CH14_Jazz_Entertainers_More_Than_3.

Where You Filter Makes a Difference

You now know two ways to filter your final result set: WHERE and HAVING. You also know that there are certain limitations on the predicates you can use within a search condition in a HAVING clause. In some cases, however, you have the choice of placing a predicate in either clause. Let’s take a look at the reasons for putting your filter in the WHERE clause instead of the HAVING clause.

Should You Filter in WHERE or in HAVING?

You learned in Chapter 6 about five major types of predicates you can build to filter the rows returned by the FROM clause of your request. These are comparison (=, <>, <, >, >=, <=), range (BETWEEN), set membership (IN), pattern match (LIKE), and Null (IS NULL). In Chapter 11, I expanded your horizons by showing you how to use a subquery as one of the arguments in comparison and set membership predicates. Also, I introduced you to two additional classes of predicates—quantified (ANY, SOME, ALL) and existence (EXISTS)—that require a subquery as one of the arguments.

Keep in mind that the search condition in a WHERE clause filters rows before your database system groups them. In general, when you want to ultimately group only a subset of rows, it’s better to eliminate unwanted rows first in the WHERE clause. For example, let’s assume you want to solve the following problem:

“Show me the states on the west coast of the United States where the total of the orders is greater than $1 million.”

Figure 14-3 shows the tables needed to solve this problem.

Figure 14-3 The tables needed to sum all orders by state

You could legitimately state the request in the following manner, placing the predicate on customer state into the HAVING clause:

Click here to view code image

SQL

SELECT Customers.CustState,

SUM(Order_Details.QuantityOrdered *

Order_Details.QuotedPrice) AS SumOfOrders

FROM (Customers

INNER JOIN Orders

ON Customers.CustomerID = Orders.CustomerID)

INNER JOIN Order_Details

ON Orders.OrderNumber =

Order_Details.OrderNumber

GROUP BY Customers.CustState

HAVING SUM(Order_Details.QuantityOrdered *

Order_Details.QuotedPrice) > 1000000

AND CustState IN ('WA', 'OR', 'CA')

Because you are grouping on the state column, you can construct a predicate on that column in the HAVING clause, but you might be asking your database system to do more work than necessary. As it turns out, the total of all orders for customers in the state of Texas also exceeds $1 million. If you place the filter on customer state in the HAVING clause as shown here, your database will calculate the total for all the rows in Texas as well, evaluate the first predicate in the HAVING clause and keep the result, and then finally throw it out when the Texas group isn’t one you want. In my sample database, I have customers only in the states of CA, TX, OR, and WA. You can imagine how this performance problem would be compounded if you had customers in all 50 states. Your database would do the calculation for all states and then throw out all but three of them!

If you want to calculate a result based on grouping by customer state but want only customers in Washington, Oregon, and California, it makes more sense to filter down to the rows in those three states using a WHERE clause before you ask to GROUP BY state. If you don’t do so, the FROM clause returns rows for all customers in all states and must do extra work to group rows you’re not even going to need. Here’s the better way to solve the problem:

Translation

Select customer state and the sum of quantity ordered times quoted price as SumOfOrders from the customers table joined with the orders table on customer ID in the customers table matches customer ID in the orders table, and then joined with the order details table on order number in the orders table matches order number in the order details table where customer state is in the list ‘WA’, ‘OR’, ‘CA’, grouped by customer state, and having the sum of the orders greater than $1 million

Clean Up

Select customer state, and the sum of (quantity ordered times * quoted price) as SumOfOrders from the customers table inner joined with the orders table on customers.customer ID in the customers table matches = orders.customer ID in the orders table, and then joined with the order details table on orders.order number in the orders table matches = order_details.order number in the order details table where customer state is in the list (‘WA’, ‘OR’, ‘CA’), grouped by customer state, and having the sum of the orders (quantity ordered * quoted price) greater than > $1 million 1000000

SQL

SELECT Customers.CustState,

SUM(Order_Details.QuantityOrdered *

Order_Details.QuotedPrice) AS SumOfOrders

FROM (Customers

INNER JOIN Orders

ON Customers.CustomerID = Orders.CustomerID)

INNER JOIN Order_Details

ON Orders.OrderNumber =

Order_Details.OrderNumber

WHERE Customers.CustState IN ('WA', 'OR', 'CA')

GROUP BY Customers.CustState

HAVING SUM(Order_Details.QuantityOrdered *

Order_Details.QuotedPrice) > 1000000

Notice that you must repeat the expression in the HAVING clause; you cannot use the alias name assigned in the SELECT clause. I saved this query in the sample database as CH14_West_Coast_Big_Order_States.

Avoiding the HAVING COUNT Trap

Often, you might want to know which categories of items have fewer than a certain number of members. For example, you might want to know which entertainment groups have two or fewer members, which recipes have two or fewer dairy ingredients, or which subjects have three or fewer full-time professors teaching. The trick here is you also want to know which categories have zero members.

Let’s look at a request that illustrates the trap you can fall into:

“Show me the subject categories that have fewer than three full professors teaching that subject.”

Figure 14-4 shows the tables needed to solve this problem.

Figure 14-4 The tables needed to find out which categories have fewer than three faculty teaching in that category

Translation

Select category description and the count of staff ID as ProfCount from the categories table joined with the faculty categories table on category ID in the categories table matches category ID in the faculty categories table, and then joined with the faculty table on staff ID in the faculty table matches staff ID in the faculty categories table where title is ‘Professor,’ grouped by category description, and having the count of staff ID less than 3

Clean Up

Select category description and the count of (staff ID) as ProfCount from the categories table inner joined with the faculty categories table on categories.category ID in the categories table matches = faculty_categories.category ID in the faculty categories table, and then inner joined with the faculty table on faculty.staff ID in the faculty table matches = faculty_categories.staff ID in the faculty categories table where title is = ‘Professor,’ grouped by category description, and having the count of (staff ID) less than < 3

SQL

SELECT Categories.CategoryDescription,

COUNT(Faculty_Categories.StaffID) AS

ProfCount

FROM (Categories

INNER JOIN Faculty_Categories

ON Categories.CategoryID =

Faculty_Categories.CategoryID)

INNER JOIN Faculty

ON Faculty.StaffID = Faculty_Categories.StaffID

WHERE Faculty.Title = 'Professor'

GROUP BY Categories.CategoryDescription

HAVING COUNT(Faculty_Categories.StaffID) < 3

Looks good, doesn’t it? Below is the result set returned from this query.

CH14_Subjects_Fewer_3_Professors_WRONG

CategoryDescription	ProfCount
Accounting	1
Business	2
Computer Information Systems	1
Economics	1
Geography	1
History	1
Journalism	1
Math	1
Political Science	1

Do you notice that the result set lists no subject categories with zero professors? This happened because the COUNT function is counting only the rows that are left in the Faculty_Categories table after filtering for full professors. I threw away any potential zero rows with the WHERE clause!

Just to confirm my suspicions that some categories exist with no full professors, let’s construct a query that will test my theory. Remember that the COUNT aggregate function will return a zero if I ask it to count an empty set, and I can get an empty set if I force the request to consider how many rows exist for a specific subject category. I do this by forcing the query to look at the subject categories one at a time. I’ll be counting category rows, not faculty subject rows. Consider the following SELECT statement:

Click here to view code image

SQL

SELECT COUNT(Faculty.StaffID)

AS BiologyProfessors

FROM (Faculty

INNER JOIN Faculty_Categories

ON Faculty.StaffID =

Faculty_Categories.StaffID)

INNER JOIN Categories

ON Categories.CategoryID =

Faculty_Categories.CategoryID

WHERE Categories.CategoryDescription =

'Biology'

AND Faculty.Title = 'Professor'

BiologyProfessors

I saved this query as CH14_Count_Of_Biology_Professors in the sample database. As you can see, there really are no full professors in the School Scheduling sample database who teach biology. I asked the query to consider just one subject category. Because there are no rows that are both Professor and Biology, I get a legitimate empty set. The COUNT function, therefore, returns a zero.

Now that I know this, I can embed this request as a subquery in a WHERE clause that extracts a match on category ID from the outer query. This forces the request to consider the categories one at a time as it fetches the category descriptions one row at a time from the Categories table in the outer request. The SQL is as follows:

Click here to view code image

SQL

SELECT Categories.CategoryDescription,

(SELECT COUNT(Faculty.StaffID)

FROM (Faculty

INNER JOIN Faculty_Categories

ON Faculty.StaffID =

Faculty_Categories.StaffID)

INNER JOIN Categories AS C2

ON C2.CategoryID =

Faculty_Categories.CategoryID

WHERE C2.CategoryID = Categories.CategoryID

AND Faculty.Title = 'Professor')

AS ProfCount

FROM Categories

WHERE

(SELECT COUNT(Faculty.StaffID)

FROM (Faculty

INNER JOIN Faculty_Categories

ON Faculty.StaffID =

Faculty_Categories.StaffID)

INNER JOIN Categories AS C3

ON C3.CategoryID =

Faculty_Categories.CategoryID

WHERE C3.CategoryID = Categories.CategoryID

AND Faculty.Title = 'Professor') < 3

I saved this query as CH14_Subjects_Fewer_3_Professors_RIGHT in the sample database. Notice that I also included a copy of the subquery in the SELECT clause so that I can see the actual counts per category. This now works correctly because the subquery in the WHERE clause legitimately returns zero for a category that has no full professors. The correct result is below.

CH14_Subjects_Fewer_3_Professors_RIGHT

CategoryDescription	ProfCount
Accounting	1
Biology	0
Business	2
Chemistry	0
Computer Information Systems	1
Computer Science	0
Economics	1
Geography	1
History	1
Journalism	1
Math	1
Physics	0
Political Science	1
Psychology	0
French	0
German	0

As you can see, many subject categories actually have no full professors assigned to teach the subject. Although this final solution does not use HAVING at all, I include it to make you aware that HAVING isn’t always the clear solution for this type of problem. Keep in mind that you can still use HAVING for many “… having fewer than …” problems. For example, if you want to see all customers who spent less than $500 last month, but you don’t care about customers who bought nothing at all, then the HAVING solution works just fine (and will most likely execute faster). However, if you also need to see customers who bought nothing, you will have to use the non-HAVING technique I just showed you.

But “having” said all that (pun intended), there actually is a way to solve this problem using GROUP BY and HAVING. Recall from Chapter 13 that I showed you how to get a zero count in a query that joined entertainers and engagements using an OUTER JOIN. The trick to solve the subject categories and professors problem is to use a subquery in the FROM clause that filters for professors. You must do this so that the result set you use in the JOIN is already filtered before you do the JOIN. I have left the final solution up to you in the “Problems for You to Solve” section at the end of this chapter. Never fear—the solution is in the sample database!

Uses for HAVING

At this point, you should have a good understanding of how to ask for subtotals across groups using aggregate functions and the GROUP BY clause and how to filter the grouped data using HAVING. The best way to give you an idea of the wide range of uses for HAVING is to list some problems you can solve with this new clause and then present a set of examples in the “Sample Statements” section.

“Show me each vendor and for each vendor, the average of the number of days to deliver products, but display only the vendors whose average number of days to deliver is greater than the average number of delivery days for all vendors.”

“Display for each product the product name and the total sales that are greater than the average of sales for all products in that category.”

“List for each customer and order date the customer full name and the total cost of items ordered that is greater than $1,000.”

“How many orders are for only one product?”

“Which agents booked more than $3,000 worth of business in December 2017?”

“Show me the entertainers who have more than two overlapped bookings.”

“Show each agent name, the sum of the contract price for the engagements booked, and the agent’s total commission for agents whose total commission is more than $1,000.”

“Do any team captains have a raw score that is higher than any other member of the team?”

“Display for each bowler the bowler name and the average of the bowler’s raw game scores for bowlers whose average is greater than 155.”

“List the bowlers whose highest raw scores are at least 20 higher than their current averages.”

“For completed classes, list by category and student the category name, the student name, and the student’s average grade of all classes taken in that category for those students who have an average of 90 or better.”

“Display by category the category name and the count of classes offered for those categories that have three or more classes.”

“List each staff member and the count of classes each is scheduled to teach for those staff members who teach at least one but fewer than three classes.”

“List the recipes that contain both beef and garlic.”

“Sum the amount of salt by recipe class, and display those recipe classes that require more than three teaspoons.”

“For what type of recipe do I have two or more recipes?”

Sample Statements

You now know the mechanics of constructing queries using a HAVING clause and have seen some of the types of requests you can answer. Let’s take a look at a set of samples, all of which request that the information be grouped and then filtered on an aggregate value from the group. These examples come from each of the sample databases.

Note: Remember in the Introduction that I warned you that results from each database system won’t necessarily match the sort order you see in examples in this book unless you include an ORDER BY clause. Even when you include that specification, the system might return results in columns not included in the ORDER BY clause in a different sequence because of different optimization techniques.

If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. For the ORDER BY clause to be honored, you must open the design of the view and execute it from there.

Also, when you use GROUP BY, you’ll often see the results returned by your database system as though they are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.

I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for this book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), and I prefixed the names of the queries relevant to this chapter with “CH14.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.

Note: Remember that all the column names and table names used in these examples are drawn from the sample database structures shown in Appendix B, “Schema for the Sample Databases.” To simplify the process, I have combined the Translation and Clean Up steps for all the examples. These samples assume you have thoroughly studied and understood the concepts covered in previous chapters, especially the chapters on JOINs and subqueries.

Sales Orders Database

“List for each customer and order date the customer’s full name and the total cost of items ordered that is greater than $1,000.”

Translation/Clean Up

Select customer first name and || ' ‘ || customer last name as CustFullName, order date, and the sum of (quoted price times * quantity ordered) as TotalCost from the customers table inner joined with the orders table on customers.customer ID in the customers table matches = orders.customer ID in the orders table, and then inner joined with the order details table on orders.order number in the orders table matches = order_details.order number in the order details table, grouped by customer first name, customer last name, and order date, having the sum of (quoted price times * quantity ordered) greater than > 1000

SQL

SELECT Customers.CustFirstName || ' ' ||

Customers.CustLastName AS CustFullName,

Orders.OrderDate,

SUM(Order_Details.QuotedPrice *

Order_Details.QuantityOrdered)

AS TotalCost

FROM (Customers

INNER JOIN Orders

ON Customers.CustomerID = Orders.CustomerID)

INNER JOIN Order_Details

ON Orders.OrderNumber =

Order_Details.OrderNumber

GROUP BY Customers.CustFirstName,

Customers.CustLastName, Orders.OrderDate

HAVING SUM(Order_Details.QuotedPrice *

Order_Details.QuantityOrdered) > 1000

CH14_Order_Totals_By_Customer_And_Date_GT1000 (649 rows)

CustFullName	OrderDate	TotalCost
Alaina Hallmark	2017-09-03	$4,699.98
Alaina Hallmark	2017-09-15	$4,433.95
Alaina Hallmark	2017-09-22	$3,951.90
Alaina Hallmark	2017-09-23	$10,388.68
Alaina Hallmark	2017-10-01	$3,088.00
Alaina Hallmark	2017-10-13	$6,775.06
Alaina Hallmark	2017-10-23	$15,781.10
Alaina Hallmark	2017-10-31	$15,969.50
<< more rows here >>

Entertainment Agency Database

“Which agents booked more than $3,000 worth of business in December 2017?”

Translation/Clean Up

Select the agent first name, agent last name, and the sum of (contract price) as TotalBooked from the agents table inner joined with the engagements table on agents.agent ID in the agents table matches = engagements.agent ID in the engagements table where the engagement start date is between December 1, 2017, ‘2017-12-01’ and December 31, 2017, ‘2017-12-31’, grouped by agent first name, and agent last name, and having the sum of (contract price) greater than > 3000

SQL

SELECT Agents.AgtFirstName, Agents.AgtLastName,

SUM(Engagements.ContractPrice)

AS TotalBooked

FROM Agents

INNER JOIN Engagements

ON Agents.AgentID = Engagements.AgentID

WHERE Engagements.StartDate

BETWEEN '2017-12-01' AND '2017-12-31'

GROUP BY Agents.AgtFirstName, Agents.AgtLastName

HAVING SUM(Engagements.ContractPrice) > 3000

CH14_Agents_Book_Over_3000_12_2017 (2 rows)

AgtFirstName	AgtLastName	TotalBooked
Marianne	Weir	$6,000.00
William	Thompson	$3,340.00

Caution: If your database uses a data type that stores both dates and times, the BETWEEN search condition might not work as expected because the user could have entered both a date and a time value in what you expect to contain only a date. (I entered only dates in the Microsoft Office Access samples where I was forced to use the Date/Time data type.) When a date and time column contains both a date and a time, the value is greater than just the date portion. For example, 2017-12-31 12:00:00 is greater than 2017-12-31, so the BETWEEN search condition will fail to fetch that row. If you suspect this might be the case, you should write the above search condition as

Click here to view code image

StartDate >= '2017-12-01' AND StartDate < '2018-01-01'

The second search condition ensures that you fetch all the rows for December 31, 2017, even if some of the rows have a time value in them.

School Scheduling Database

Translation/Clean Up

Select category description, student first name, student last name, and the average avg(of grade) as AvgOfGrade from the categories table inner joined with the subjects table on categories.category ID in the categories table matches = subjects.category ID in the subjects table, then inner joined with the classes table on subjects.subject ID in the subjects table matches = classes.subject ID in the classes table, then inner joined with the student schedules table on classes.class ID in the classes table matches = student_schedules.class ID in the student schedules table, then inner joined with the student class status table on student_class_status.class status in the student class status table matches = student_schedules.class status in the student schedules table, and finally inner joined with the students table on students.student ID in the students table matches = student_schedules.student ID in the student schedules table where class status description is = ‘Completed,’ grouped by category description, student first name, and student last name, and having the average avg(of grade) greater than > 90

SQL

SELECT Categories.CategoryDescription,

Students.StudFirstName,

Students.StudLastName,

AVG(Student_Schedules.Grade) AS AvgOfGrade

FROM ((((Categories

INNER JOIN Subjects

ON Categories.CategoryID = Subjects.CategoryID)

INNER JOIN Classes

ON Subjects.SubjectID = Classes.SubjectID)

INNER JOIN Student_Schedules

ON Classes.ClassID = Student_Schedules.ClassID)

INNER JOIN Student_Class_Status

ON Student_Class_Status.ClassStatus =

Student_Schedules.ClassStatus)

INNER JOIN Students

ON Students.StudentID =

Student_Schedules.StudentID

WHERE Student_Class_Status.ClassStatusDescription =

'Completed'

GROUP BY Categories.CategoryDescription,

Students.StudFirstName,

Students.StudLastName

HAVING AVG(Student_Schedules.Grade) > 90

CH14_A_Students (17 rows)

CategoryDescription	StudFirstName	StudLastName	AvgOfGrade
Accounting	Elizabeth	Hallmark	91.12
Art	Kerry	Patterson	99.83
Biology	Brannon	Jones	94.54
Biology	Karen	Smith	93.05
Chemistry	Richard	Lum	98.31
Computer Information Systems	Janice	Galvin	90.56
Computer Information Systems	John	Kennedy	92.36
Computer Information Systems	Steve	Pundt	98.01
English	Brannon	Jones	91.66
English	Janice	Galvin	91.44
<< more rows here >>

“List each staff member and the count of classes each is scheduled to teach for those staff members who teach at least one but fewer than three classes.”

Note: I avoided the HAVING COUNT zero problem by specifically stating that I want staff members who teach at least one class.

Translation/Clean Up

Select staff first name, staff last name, and the count of classes (*) as ClassCount from the staff table inner joined with the faculty classes table on staff.staff ID in the staff table matches = faculty_classes.staff ID in the faculty classes table, grouped by staff first name, and staff last name, and having the count of classes (*) less than < 3

SQL

SELECT Staff.StfFirstName, Staff.StfLastName,

COUNT(*) AS ClassCount

FROM Staff

INNER JOIN Faculty_Classes

ON Staff.StaffID = Faculty_Classes.StaffID

GROUP BY Staff.StfFirstName, Staff.StfLastName

HAVING COUNT(*) < 3

CH14_Staff_Class_Count_1_To_3 (2 rows)

StfFirstName	StfLastName	ClassCount
Luke	Patterson	2
Michael	Hernandez	2

Bowling League Database

“List the bowlers whose highest raw scores are more than 20 pins higher than their current averages.”

Translation/Clean Up

Select bowler first name, bowler last name, the average avg(raw score) as CurrentAverage, and the maximum (raw score) as HighGame from the bowlers table inner joined with the bowler scores table on bowlers.bowler ID in the bowlers table matches = bowler_scores.bowler ID in the bowler scores table, grouped by bowler first name, and bowler last name, and having the maximum (raw score) greater than > the average avg(raw score) plus + 20

SQL

SELECT Bowlers.BowlerFirstName,

Bowlers.BowlerLastName,

AVG(Bowler_Scores.RawScore) AS CurrentAverage,

MAX(Bowler_Scores.RawScore) AS HighGame

FROM Bowlers INNER JOIN Bowler_Scores

ON Bowlers.BowlerID = Bowler_Scores.BowlerID

GROUP BY Bowlers.BowlerFirstName,

Bowlers.BowlerLastName

HAVING MAX(Bowler_Scores.RawScore) >

(AVG(Bowler_Scores.RawScore) + 20)

CH14_Bowlers_Big_High_Score (15 rows)

BowlerFirstName	BowlerLastName	CurrentAverage	HighGame
Alaina	Hallmark	158	180
Angel	Kennedy	163	194
Ben	Clothier	163	192
Caleb	Viescas	164	193
David	Fournier	157	178
David	Viescas	168	195
Gary	Hallmark	157	179
John	Kennedy	166	191
John	Viescas	168	193
<< more rows here >>

Recipes Database

“List the recipes that contain both beef and garlic.”

Translation/Clean Up

Select recipe title from the recipes table where the recipe ID is in the (selection of recipe ID from the ingredients table inner joined with the recipe ingredients table on recipe_ingredients.ingredient ID in the recipe ingredients table matches = ingredients.ingredient ID in the ingredients table where the ingredient name is = ‘Beef’ or the ingredient name is = ‘Garlic,’ grouped by recipe ID and having the count of the values in (recipe ID) equal to = 2)

SQL

SELECT Recipes.RecipeTitle

FROM Recipes

WHERE Recipes.RecipeID

IN (SELECT Recipe_Ingredients.RecipeID

FROM Ingredients

INNER JOIN Recipe_Ingredients

ON Ingredients.IngredientID =

Recipe_Ingredients.IngredientID

WHERE Ingredients.IngredientName = 'Beef'

OR Ingredients.IngredientName = 'Garlic'

GROUP BY Recipe_Ingredients.RecipeID

HAVING COUNT(Recipe_Ingredients.RecipeID) = 2)

CH14_Recipes_Beef_And_Garlic (1 row)

RecipeTitle

Roast Beef

Note: This illustrates a creative use of GROUP BY and HAVING in a subquery to find recipes that have both ingredients. When a recipe has neither of the ingredients, the recipe won’t appear in the subquery. When a recipe has only one of the ingredients, the count will be 1, so the row will be eliminated. Only when a recipe has both will the COUNT be 2. Be careful, though. If a particular recipe calls for both minced and whole garlic but no beef, this technique won’t work! You will get a COUNT of 2 for the two garlic entries, so the recipe will be selected even though it has no beef.

If you wonder why I used an OR operator when I want both beef and garlic, be sure to review the Using OR topic in the Using Multiple Conditions section in Chapter 6. I showed you an alternative way to solve this problem in Chapter 8. In Chapter 18, “‘Not’ and ‘And’ Problems,” I’ll show you another creative way to solve this problem.

Summary

I started the chapter with a discussion about focusing the groups you form by using the HAVING clause to filter out groups based on aggregate calculations. I introduced the syntax of this final clause for a SELECT statement and explained a simple example.

Next, I showed an example of when to use the WHERE clause rather than the HAVING clause to filter rows. I explained that when you have a choice, you’re better off placing your filter in the WHERE clause. Before you got too comfortable with HAVING, I showed you a common trap to avoid when counting groups that might contain a zero result. I also showed you an alternative way to solve this type of problem.

Finally, I summarized why the HAVING clause is useful and gave you a sample list of problems you can solve using HAVING. The rest of the chapter provided examples of how to build requests that require the HAVING clause.

The following section presents several requests that you can work out on your own.

Problems for You to Solve

Below, I show you the request statement and the name of the solution query in the sample databases. If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your result set is the same.

Sales Orders Database

1. “Show me each vendor and the average by vendor of the number of days to deliver products that are greater than the average delivery days for all vendors.”

(Hint: You need a subquery to fetch the average delivery time for all vendors.)

You can find the solution in CH14_Vendor_Avg_Delivery_GT_Overall_Avg (5 rows).

2. “Display for each product the product name and the total sales that is greater than the average of sales for all products in that category.”

(Hint: To calculate the comparison value, you must first SUM the sales for each product within a category and then AVG those sums by category.)

You can find the solution in CH14_Sales_By_Product_GT_Category_Avg (13 rows).

3. “How many orders are for only one product?”

(Hint: You need to use an inner query in the FROM clause that lists the order numbers for orders having only one row and then COUNT those rows in the outer SELECT clause.)

You can find the solution in CH14_Single_Item_Order_Count (1 row).

Entertainment Agency Database

1. “Show me the entertainers who have more than two overlapped bookings.”

(Hint: Use a subquery to find those entertainers with overlapped bookings HAVING a COUNT greater than 2. Remember that in Chapter 6, I showed you how to compare for overlapping ranges efficiently.)

You can find the solution in CH14_Entertainers_MoreThan_2_Overlap (1 row).

2. “Show each agent’s name, the sum of the contract price for the engagements booked, and the agent’s total commission for agents whose total commission is more than $1,000.”

(Hint: Use the similar problem from Chapter 13 and add a HAVING clause.)

You can find the solution in CH14_Agent_Sales_Big_Commissions (4 rows).

School Scheduling Database

1. “Display by category the category name and the count of classes offered for those categories that have three or more classes.”

(Hint: JOIN categories to subjects and then to classes. COUNT the rows and add a HAVING clause to get the final result.)

You can find the solution in CH14_Category_Class_Count_3_Or_More (14 rows).

2. “List each staff member and the count of classes each is scheduled to teach for those staff members who teach fewer than three classes.”

(Hint: This is a HAVING COUNT zero trap! Use subqueries instead.)

You can find the solution in CH14_Staff_Teaching_LessThan_3 (7 rows).

3. “Show me the subject categories that have fewer than three full professors teaching that subject.”

I did show you one way to correctly solve this problem in the section “Avoiding the HAVING COUNT Trap” using subqueries. Now try to solve it correctly using JOINs and GROUP BY.

(Hint: Consider using OUTER JOIN and a subquery in the FROM clause.)

You can find the solution in CH14_Subjects_Fewer_3_Professors_Join_RIGHT (16 rows).

4. “Count the classes taught by every staff member.”

(Hint: This really isn’t a HAVING problem, but you might be tempted to solve it incorrectly using a GROUP BY using COUNT(*).)

You can find the correct solution in CH14_Staff_Class_Count_Subquery (27 rows) and CH14_Staff_Class_Count_GROUPED_RIGHT (27 rows). The incorrect solution is in CH14_Staff_Class_Count_GROUPED_WRONG (22 rows).

Bowling League Database

1. “Do any team captains have a raw score that is higher than any other member of the team?”

(Hint: You find out the top raw score for captains by JOINing teams to bowlers on captain ID and then to bowler scores. Use a HAVING clause to compare the MAX value for all other members from a subquery.)

You can find the solution in CH14_Captains_Who_Are_Hotshots (0 rows). (There are no captains who bowl better than their teammates!)

2. “Display for each bowler the bowler name and the average of the bowler’s raw game scores for bowlers whose average is greater than 155.”

(Hint: You need a simple HAVING clause comparing the AVG to a numeric literal.)

You can find the solution in CH14_Good_Bowlers (17 rows).

3. “List the last name and first name of every bowler whose average raw score is greater than or equal to the overall average score.”

(Hint: I showed you how to solve this in Chapter 12 in the “Sample Statements” section with a subquery in a WHERE clause. Now solve it using HAVING!) (use closing parenthesis, not backslash)

You can find the solution in CH14_Better_Than_Overall_Average_HAVING (17 rows).

Recipes Database

1. “Sum the amount of salt by recipe class, and display those recipe classes that require more than three teaspoons.”

(Hint: This requires a complex JOIN of five tables to filter out salt and teaspoon, SUM the result, and then eliminate recipe classes that use more than three teaspoons.)

You can find the solution in CH14_Recipe_Classes_Lots_Of_Salt (1 row).

2. “For what class of recipe do I have two or more recipes?”

(Hint: JOIN recipe classes with recipes, count the result, and keep the ones with two or more with a HAVING clause.)

You can find the solution in CH14_Recipe_Classes_Two_Or_More (4 rows).