In my head there are several windows, that I do know, but perhaps it is always the same one, open variously on the parading universe.
SAMUEL BECKETT
What You Can Do with a “Window” into Your Data
I’ve shown you several different ways that you can group and aggregate data, but I still haven’t shown you everything.
In earlier incarnations of the SQL standards, there really was no ability to work with data where the results depended on adjacent rows. It was always felt that the order of the rows didn’t matter, as long as you could match the rows to your filters. While the ability to sort the data using the ORDER BY clause existed, it was seen as being related to presentation, not to data manipulation. This meant that operations such as generating running sums, where the sum on a particular row is dependent on the values of the rows that precede that row, were very difficult (or even impossible) to write using only SQL.
With the introduction of the SQL:2003 Standard, though, this changed. The SQL:2003 Standard introduced the concept of window functions, which are functions that are applied to a set of rows defined by a window descriptor and return a single value for each row from the underlying query.
I intend to delve into the world of window functions in this chapter. Don’t worry if you use an Apple computer: window functions have nothing to do with the PC operating system. In fact, Microsoft’s SQL Server was late coming to the party with respect to window functions; they weren’t introduced until SQL Server 2017, and they still aren’t in Microsoft Access!
Note: You can find all the sample statements and solutions for the respective sample databases—SalesOrdersExample,
EntertainmentAgencyExample, RecipesExample, SchoolSchedulingExample, and BowlingLeagueExample.
Because neither Microsoft Access nor MySQL support window functions, you’ll find the
sample solutions only in the Microsoft SQL Server and PostgreSQL example databases.
In all the examples I’ve shown you to aggregate data so far, the aggregated data is replaced by a subtotal row. You lose the actual details of the data that went into the aggregation.
For instance, if you wanted to know how many styles of music each customer prefers, you might use a query like this:
|
|
You will obtain results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Count in the Entertainment Agency Example database.)
CustomerID |
Customer |
Preferences |
10001 |
Doris Hartwig |
2 |
10002 |
Deb Waldal |
2 |
10003 |
Peter Brehm |
2 |
10004 |
Dean McCrae |
2 |
10005 |
Elizabeth Hallmark |
2 |
10006 |
Matt Berg |
2 |
10007 |
Liz Keyser |
3 |
10008 |
Darren Gehring |
2 |
10009 |
Sarah Thompson |
3 |
10010 |
Zachary Ehrlich |
3 |
10011 |
Joyce Bonnicksen |
3 |
10012 |
Kerry Patterson |
2 |
10013 |
Estella Pundt |
2 |
10014 |
Mark Rosales |
3 |
10015 |
Carol Viescas |
3 |
Great. This does let you know exactly how many preferences each customer has indicated. However, it doesn’t tell you what their preferences are!
What if we could use a window function to figure out the total number of preferences for customers but also return what each of their preferences is? It turns out you can get results like the following table:
CustomerID |
Customer |
StyleName |
Preferences |
10001 |
Doris Hartwig |
Contemporary |
2 |
10001 |
Doris Hartwig |
Top 40 Hits |
2 |
Deb Waldal |
60’s Music |
2 |
|
10002 |
Deb Waldal |
Classic Rock & Roll |
2 |
10003 |
Peter Brehm |
Motown |
2 |
10003 |
Peter Brehm |
Rhythm and Blues |
2 |
10004 |
Dean McCrae |
Jazz |
2 |
10004 |
Dean McCrae |
Standards |
2 |
10005 |
Elizabeth Hallmark |
Classical |
2 |
10005 |
Elizabeth Hallmark |
Chamber Music |
2 |
10006 |
Matt Berg |
Folk |
2 |
10006 |
Matt Berg |
Variety |
2 |
10007 |
Liz Keyser |
70’s Music |
3 |
10007 |
Liz Keyser |
Classic Rock & Roll |
3 |
10007 |
Liz Keyser |
Rhythm and Blues |
3 |
10008 |
Darren Gehring |
Contemporary |
2 |
10008 |
Darren Gehring |
Standards |
2 |
10009 |
Sarah Thompson |
Country |
3 |
10009 |
Sarah Thompson |
Country Rock |
3 |
10009 |
Sarah Thompson |
Modern Rock |
3 |
10010 |
Zachary Ehrlich |
Jazz |
3 |
10010 |
Zachary Ehrlich |
Rhythm and Blues |
3 |
10010 |
Zachary Ehrlich |
Salsa |
3 |
10011 |
Joyce Bonnicksen |
40’s Ballroom Music |
3 |
10011 |
Joyce Bonnicksen |
Classical |
3 |
10011 |
Joyce Bonnicksen |
Standards |
3 |
10012 |
Kerry Patterson |
Contemporary |
2 |
10012 |
Kerry Patterson |
Show Tunes |
2 |
Estella Pundt |
Jazz |
2 |
|
10013 |
Estella Pundt |
Salsa |
2 |
10014 |
Mark Rosales |
80’s Music |
3 |
10014 |
Mark Rosales |
Modern Rock |
3 |
10014 |
Mark Rosales |
Top 40 Hits |
3 |
10015 |
Carol Viescas |
40’s Ballroom Music |
3 |
10015 |
Carol Viescas |
Show Tunes |
3 |
10015 |
Carol Viescas |
Standards |
3 |
Note: If you’re really on your toes, you would probably tell me, “But, John, I already
know how to get that result using a subquery, like this”:
SELECT CustomerID, CustFirstName || ' ' || CustLastName,
StyleName, (SELECT COUNT(*)
FROM Musical_Preferences AS MP
WHERE MP.CustomerID = Customers.
CustomerID)
AS Preferences
FROM Customers INNER JOIN Musical_Preferences
ON Customers.CustomerID = Musical_Preferences.
CustomerID
INNER JOIN Musical_Styles
ON Musical_Styles.StyleID = Musical_Preferences.
StyleID;
And you would be correct! But the focus of this chapter is to show you another, perhaps more efficient, way to get that result.
Now we know not only that Doris Hartwig has specified two preferences, but we also know that the two styles she prefers are Contemporary and Top 40 Hits! I’m sure you can see that these results could be far more useful.
So how do you go about achieving results such as this? You guessed it: you use window functions. So without further ado, let me show you how.
Let’s take a close look at the syntax for window functions. Figure 22-1 shows the basic diagram. Note that they are only valid in SELECT and ORDER BY clauses. Hmmm, that looks pretty complicated, doesn't it?
But that’s not all! (Sounds like a TV game show, doesn’t it?) And you can “power up” an Aggregate Function by turning it into a window, as shown in Figure 22-2.
Note: As you can imagine from how lengthy those lists in Figures 22-1 and 22-2 are, window functions are a huge topic; more than can possibly be covered in a single
chapter. As a result, I will be not covering ALL of the window functions, just the
ones that I think are most useful. However, you can find syntax for all the window
functions in Appendix A.
You can see that the Aggregate Functions (COUNT, SUM, AVG, MIN, and MAX) I taught you about in Chapter 12, “Simple Totals,” are in those figures, along with several new ones: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK(), and NTILE(), but there seems to be a lot more that can be specified as well.
The clause that allows you to use the windows functions is OVER(). That clause lets you define the range on which the aggregate functions are applied.
The key difference between GROUP BY and OVER() is that GROUP BY applies aggregations across the entire query, rolling up the specified non-aggregate fields, thereby reducing the number of rows returned (usually). With OVER(), however, the same number of rows will be returned as the base query. Any aggregations will be returned for each row in the range identified in the OVER() clause.
Within the OVER() clause, there are several predicates which can be used:
PARTITION BY
ORDER BY
ROWS (or RANGE)
The PARTITION BY predicate specifies how the window should be divided. In the example above, I’m dividing the window by the customer. The SQL for the table above that includes a Preferences column looks like this:
|
|
Now you know how to create that “magical” table I showed you at the beginning of the chapter. I saved this query as CH22_Customers_PreferredStyles_Details_Count in the Entertainment sample database.
If I don’t specify anything for the PARTITION BY predicate, the database system applies the function over the entire result set.
Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced
in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you,
I have combined the Translation/Clean Up steps for all the following examples to simplify
the process.
“For each customer, show me the musical preference styles they’ve selected. Show me a running total of the number of styles selected for all the customers.”
Translation/Clean Up |
Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, and the count(*) OVER with no partition but ordered by CustomerID from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID |
|
|
That query returns results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_NO_PARTITION in the Entertainment Agency Example database.)
CustomerID |
Customer |
StyleName |
Preferences |
10001 |
Doris Hartwig |
Contemporary |
2 |
10001 |
Doris Hartwig |
Top 40 Hits |
2 |
Deb Waldal |
60’s Music |
4 |
|
10002 |
Deb Waldal |
Classic Rock & Roll |
4 |
10003 |
Peter Brehm |
Motown |
6 |
10003 |
Peter Brehm |
Rhythm and Blues |
6 |
10004 |
Dean McCrae |
Jazz |
8 |
10004 |
Dean McCrae |
Standards |
8 |
10005 |
Elizabeth Hallmark |
Classical |
10 |
10005 |
Elizabeth Hallmark |
Chamber Music |
10 |
10006 |
Matt Berg |
Folk |
12 |
10006 |
Matt Berg |
Variety |
12 |
10007 |
Liz Keyser |
70’s Music |
15 |
10007 |
Liz Keyser |
Classic Rock & Roll |
15 |
10007 |
Liz Keyser |
Rhythm and Blues |
15 |
10008 |
Darren Gehring |
Contemporary |
17 |
10008 |
Darren Gehring |
Standards |
17 |
10009 |
Sarah Thompson |
Country |
20 |
10009 |
Sarah Thompson |
Country Rock |
20 |
10009 |
Sarah Thompson |
Modern Rock |
20 |
10010 |
Zachary Ehrlich |
Jazz |
23 |
10010 |
Zachary Ehrlich |
Rhythm and Blues |
23 |
10010 |
Zachary Ehrlich |
Salsa |
23 |
10011 |
Joyce Bonnicksen |
40’s Ballroom Music |
26 |
10011 |
Joyce Bonnicksen |
Classical |
26 |
10011 |
Joyce Bonnicksen |
Standards |
26 |
10012 |
Kerry Patterson |
Contemporary |
28 |
10012 |
Kerry Patterson |
Show Tunes |
28 |
10013 |
Estella Pundt |
Jazz |
30 |
10013 |
Estella Pundt |
Salsa |
30 |
Mark Rosales |
80’s Music |
33 |
|
10014 |
Mark Rosales |
Modern Rock |
33 |
10014 |
Mark Rosales |
Top 40 Hits |
33 |
10015 |
Carol Viescas |
40’s Ballroom Music |
36 |
10015 |
Carol Viescas |
Show Tunes |
36 |
10015 |
Carol Viescas |
Standards |
36 |
I still get the same data, but the totals are different. Rather than indicating the number of preferences for each customer, I’m now getting a running total. The totals in both rows for Doris Hartwig are still two, corresponding to the two preference styles she’s indicated, but now the totals for Deb Waldal are four (her two preference styles added to Doris’s two preference styles), the totals for Peter Brehm are six (his two preference styles added to Deb’s two preference styles and Doris’s two) and so on. Basically, I’ve generated a running sum of the preferences count.
Note that you can specify a different OVER() clause for each aggregate function. For instance, you can get both counts of preferences per customer plus overall counts of preferences using a query like the following:
“For each customer, show me the musical preference styles they’ve selected. Show me both the total for each customer plus a running total of the number of styles selected for all the customers.”
Translation/Clean Up |
Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by CustomerID and the count(*) OVER with no partition but ordered by CustomerID from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID |
|
|
That will return results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts in the Entertainment Agency Example database.)
Customer |
StyleName |
Customer Preferences |
Total Preferences |
|
10001 |
Doris Hartwig |
Contemporary |
2 |
2 |
10001 |
Doris Hartwig |
Top 40 Hits |
2 |
2 |
10002 |
Deb Waldal |
60’s Music |
2 |
4 |
10002 |
Deb Waldal |
Classic Rock & Roll |
2 |
4 |
10003 |
Peter Brehm |
Motown |
2 |
6 |
10003 |
Peter Brehm |
Rhythm and Blues |
2 |
6 |
10004 |
Dean McCrae |
Jazz |
2 |
8 |
10004 |
Dean McCrae |
Standards |
2 |
8 |
10005 |
Elizabeth Hallmark |
Classical |
2 |
10 |
10005 |
Elizabeth Hallmark |
Chamber Music |
2 |
10 |
10006 |
Matt Berg |
Folk |
2 |
12 |
10006 |
Matt Berg |
Variety |
2 |
12 |
10007 |
Liz Keyser |
70’s Music |
3 |
15 |
10007 |
Liz Keyser |
Classic Rock & Roll |
3 |
15 |
10007 |
Liz Keyser |
Rhythm and Blues |
3 |
15 |
10008 |
Darren Gehring |
Contemporary |
2 |
17 |
10008 |
Darren Gehring |
Standards |
2 |
17 |
10009 |
Sarah Thompson |
Country |
3 |
20 |
10009 |
Sarah Thompson |
Country Rock |
3 |
20 |
10009 |
Sarah Thompson |
Modern Rock |
3 |
20 |
10010 |
Zachary Ehrlich |
Jazz |
3 |
23 |
10010 |
Zachary Ehrlich |
Rhythm and Blues |
3 |
23 |
10010 |
Zachary Ehrlich |
Salsa |
3 |
23 |
10011 |
Joyce Bonnicksen |
40’s Ballroom Music |
3 |
26 |
10011 |
Joyce Bonnicksen |
Classical |
3 |
26 |
10011 |
Joyce Bonnicksen |
Standards |
3 |
26 |
10012 |
Kerry Patterson |
Contemporary |
2 |
28 |
10012 |
Kerry Patterson |
Show Tunes |
2 |
28 |
10013 |
Estella Pundt |
Jazz |
2 |
30 |
10013 |
Estella Pundt |
Salsa |
2 |
30 |
10014 |
Mark Rosales |
80’s Music |
3 |
33 |
10014 |
Mark Rosales |
Modern Rock |
3 |
33 |
10014 |
Mark Rosales |
Top 40 Hits |
3 |
33 |
10015 |
Carol Viescas |
40’s Ballroom Music |
3 |
36 |
10015 |
Carol Viescas |
Show Tunes |
3 |
36 |
10015 |
Carol Viescas |
Standards |
3 |
36 |
Note that this now gives me both a total count of preferences by customer as well as a running total over all customers. As its name suggests, the ORDER BY predicate controls the order in which the rows are returned. I could change the ORDER BY predicate in the previous query like this:
“For each customer, show me the musical preference styles they’ve selected. Show me both the total for each customer plus a running total of the number of styles selected for all the customers. I want to see the customers sorted by name.”
Translation/Clean Up |
Select the CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by CustLastName, CustFirstName and the count(*) OVER with no partition but ordered by CustLastName, CustFirstName from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID |
|
|
This returns results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts_Sort1 in the Entertainment Agency Example database.)
You can see that the final output is now sorted by customer last name and first name, not by customer ID. Note, however, that ORDER BY clause should be consistent with the PARTITION clause, or the results you get might be confusing.
“For each customer, show me the musical preference styles they’ve selected. Show me both the total for each customer plus a running total of the number of styles selected for all the customers. I want to see the styles sorted by name.”
Translation/Clean Up |
Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by StyleName and the count(*) OVER with no partition but ordered by StyleName from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID |
|
|
That query would lead to results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts_Sort2 in the Entertainment Agency Example database.) This time, the output is sorted by style name, as I requested.
CustomerID |
Customer |
StyleName |
Customer Preferences |
Total Preferences |
10011 |
Joyce Bonnicksen |
40’s Ballroom Music |
1 |
2 |
10015 |
Carol Viescas |
40’s Ballroom Music |
1 |
2 |
10002 |
Deb Waldal |
60’s Music |
1 |
3 |
10007 |
Liz Keyser |
70’s Music |
1 |
4 |
10014 |
Mark Rosales |
80’s Music |
1 |
5 |
10005 |
Elizabeth Hallmark |
Chamber Music |
1 |
6 |
10007 |
Liz Keyser |
Classic Rock & Roll |
2 |
8 |
10002 |
Deb Waldal |
Classic Rock & Roll |
2 |
8 |
10005 |
Elizabeth Hallmark |
Classical |
2 |
10 |
10011 |
Joyce Bonnicksen |
Classical |
2 |
10 |
10012 |
Kerry Patterson |
Contemporary |
1 |
13 |
10008 |
Darren Gehring |
Contemporary |
1 |
13 |
10001 |
Doris Hartwig |
Contemporary |
1 |
13 |
10009 |
Sarah Thompson |
Country |
1 |
14 |
10009 |
Sarah Thompson |
Country Rock |
2 |
15 |
10006 |
Matt Berg |
Folk |
1 |
16 |
Dean McCrae |
Jazz |
1 |
19 |
|
10010 |
Zachary Ehrlich |
Jazz |
1 |
19 |
10013 |
Estella Pundt |
Jazz |
1 |
19 |
10014 |
Mark Rosales |
Modern Rock |
2 |
21 |
10009 |
Sarah Thompson |
Modern Rock |
3 |
21 |
10003 |
Peter Brehm |
Motown |
1 |
22 |
10003 |
Peter Brehm |
Rhythm and Blues |
2 |
25 |
10007 |
Liz Keyser |
Rhythm and Blues |
3 |
25 |
10010 |
Zachary Ehrlich |
Rhythm and Blues |
2 |
25 |
10010 |
Zachary Ehrlich |
Salsa |
3 |
27 |
10013 |
Estella Pundt |
Salsa |
2 |
27 |
10015 |
Carol Viescas |
Show Tunes |
2 |
29 |
10012 |
Kerry Patterson |
Show Tunes |
2 |
29 |
10011 |
Joyce Bonnicksen |
Standards |
3 |
33 |
10015 |
Carol Viescas |
Standards |
3 |
33 |
10008 |
Darren Gehring |
Standards |
2 |
33 |
10004 |
Dean McCrae |
Standards |
2 |
33 |
10001 |
Doris Hartwig |
Top 40 Hits |
2 |
35 |
10014 |
Mark Rosales |
Top 40 Hits |
3 |
35 |
10006 |
Matt Berg |
Variety |
2 |
36 |
The TotalPreferences counts still look okay (two people like 40’s Ballroom Music, so the value for TotalPreferences is 2 for both those rows, one person likes 60’s Music, so the value for TotalPreferences for that style is 3. TotalPreferences is a running total—the total for the previous style plus the total for the current style and so on). The CustomerPreferences counts are a little harder to figure out, but they are simply a running sum of the number of preferences specified by each customer. The first row for any given customer shows the value 1, the second row the value 2, and so on.
Note: It is possible to include an OVER() clause with no optional clauses. The result is
the aggregate you specified calculated for all the rows returned by the FROM and WHERE
clauses and displayed on each row. For example, this query:
SELECT C.CustomerID,
C.CustFirstName || ' ' || C.CustLastName AS Customer,
MS.StyleName,
COUNT(*) OVER () AS NumRows
FROM Customers AS C
INNER JOIN Musical_Preferences AS MP
ON MP.CustomerID = C.CustomerID
INNER JOIN Musical_Styles AS MS
ON MS.StyleID = MP.StyleID;
… returns the count of all rows in the NumRows column. You can think of it as a way to display all rows and get a grand total without having to include a GROUP BY clause.
The ROWS (or RANGE) predicate allows you to further limit the rows of data that are included within the partition. It does this by letting you specify a range of rows with respect to the current row.
The ROWS clause lets you specify the range physically by specifying a fixed number of rows preceding or following the current row. For instance, you can specify ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING, which would mean that the rows to be considered are simply this row and the one right after it.
The RANGE clause lets you specify the range logically by specifying a range of values with respect to the value in the current row.
Note: Both the ROWS and RANGE clauses require that the ORDER BY clause be specified. If
ORDER BY specifies multiple columns, CURRENT ROW FOR RANGE considers all columns in
the ORDER BY list when determining what constitutes the range.
Let’s look at an example to see the difference between using ROWS and RANGES when calculating the total number of preferences. (Don’t worry, I’ll be talking more about using aggregate functions like SUM with the OVER() clause in a little bit.) Let’s just go straight to the SQL, asking for a SUM of the COUNT OVER ROWS and a SUM of the COUNT OVER RANGE to see the difference.
“For each city where we have customers, show me the customer and the number of musical preference styles they’ve selected. Also give me a running total by city, both for each customer in the city as well as for the city overall.”
|
|
That query would lead to results like the following table. (I saved this request as CH22_Customer_ByCity_PreferredStyles_Sums in the Entertainment Agency Example database.)
CustCity |
Customer |
Preferences |
Total UsingRows |
Total UsingRange |
Auburn |
Elizabeth Hallmark |
2 |
2 |
2 |
Bellevue |
Estella Pundt |
2 |
4 |
16 |
Bellevue |
Joyce Bonnicksen |
3 |
7 |
16 |
Bellevue |
Liz Keyser |
3 |
10 |
16 |
Mark Rosales |
3 |
13 |
16 |
|
Bellevue |
Sarah Thompson |
3 |
16 |
16 |
Kirkland |
Darren Gehring |
2 |
18 |
23 |
Kirkland |
Peter Brehm |
2 |
20 |
23 |
Kirkland |
Zachary Ehrlich |
3 |
23 |
23 |
Redmond |
Dean McCrae |
2 |
25 |
27 |
Redmond |
Kerry Patterson |
2 |
27 |
27 |
Seattle |
Carol Viescas |
3 |
30 |
32 |
Seattle |
Doris Hartwig |
2 |
32 |
32 |
Tacoma |
Deb Waldal |
2 |
34 |
36 |
Tacoma |
Matt Berg |
2 |
36 |
36 |
Look at the two calculated SUM fields. The first (TotalUsingRows) included a ROWS predicate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, while the second (TotalUsingRange) included a RANGE predicate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Each value in the TotalUsingRows column is the value in the Preferences column for that row added to the value in the TotalUsingRows column for the preceding row. In other words, using the SUM aggregate function with the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW predicate results in a running total.
The values in the TotalUsingRange column are quite different (although the values on the first and last rows are the same). The RANGE predicate creates a range of all rows with the same value for the column specified in the ORDER BY predicate (CustCity) as the current row. In other words, the first row is the sum of Preferences for all rows corresponding to the city of the first row (Auburn). There are five rows for Bellevue, with preferences 2, 3, 3, 3, 3, respectively. Those five values result in 14 when added together. That value is added to the value for the previous range so that the TotalUsingRange column shows 16 for the five Bellevue rows. Similarly, there are three rows for Kirkland, which total 7. That total for Kirkland is added to the value for the previous range (16), resulting in the TotalUsingRange column showing 23 for the range consisting of the three Kirkland rows. You can still call it a running total, but the running total for each row in the range is the same. Looking at it another way, the value in the TotalUsingRange column for any city matches the last row for that city in the TotalUsingRows column (Auburn, Elizabeth Hallmark; Bellevue, Sarah Thompson; Kirkland, Zachary Ehrlich; and so on).
One thing that should be pointed out is that it’s only possible to use the ROWS (or RANGE) predicate with aggregate functions. Figure 22-3 summarizes the rules.
One of the new functions introduced in Figure 22-1 is ROW_NUMBER(). As you might expect from the name, this allows you to assign unique numbers to each row.
“Assign a number for each customer. Show me their CustomerID, their name and their state. Return the customers in alphabetic order.”
That query would lead to results like the following table. (I saved this request as CH22_Customers_Numbering in the Sales Orders Example database.)
RowNumber |
CustomerID |
CustomerName |
CustState |
1 |
1014 |
Sam Abolrous |
CA |
2 |
1020 |
Joyce Bonnicksen |
WA |
3 |
1004 |
Robert Brown |
TX |
4 |
1009 |
Andrew Cencini |
WA |
5 |
1026 |
Kirk DeGrasse |
TX |
6 |
1019 |
Zachary Ehrlich |
CA |
7 |
1015 |
Darren Gehring |
CA |
8 |
1011 |
Alaina Hallmark |
WA |
9 |
1003 |
Gary Hallmark |
WA |
10 |
1010 |
Angel Kennedy |
TX |
11 |
1012 |
Liz Keyser |
WA |
12 |
1005 |
Dean McCrae |
WA |
13 |
1027 |
Luke Patterson |
OR |
14 |
1025 |
Maria Patterson |
TX |
15 |
1008 |
Neil Patterson |
CA |
16 |
1013 |
Rachel Patterson |
CA |
17 |
1021 |
Estella Pundt |
TX |
18 |
1024 |
Mark Rosales |
TX |
19 |
1023 |
Julia Schnebly |
WA |
1017 |
Manuela Seidel |
OR |
|
21 |
1007 |
Mariya Sergienko |
OR |
22 |
1018 |
David Smith |
CA |
23 |
1002 |
William Thompson |
WA |
24 |
1028 |
Jeffrey Tirekicker |
WA |
25 |
1022 |
Caleb Viescas |
CA |
26 |
1006 |
John Viescas |
WA |
27 |
1001 |
Suzanne Viescas |
WA |
28 |
1016 |
Jim Wilson |
OR |
Of course, because ROW_NUMBER is a window function, we can use the OVER clause to partition the table differently.
“Assign a number for each customer within their state. Show me their CustomerID, their name, and their state. Return the customers in alphabetic order.”
Translation/Clean Up |
Select the ROW_NUMBER() OVER partition by CustState, ordered by CustLastName, CustFirstName, the CustomerID, CustFirstName || ' ' || CustLastName, and CustState from the Customers table |
|
|
That query would lead to results like the following table. (I saved this request as CH22_Customers_Numbering_By_State in the Sales Orders Example database.)
CustomerID |
CustomerName |
CustState |
|
1 |
1014 |
Sam Abolrous |
CA |
2 |
1019 |
Zachary Ehrlich |
CA |
3 |
1015 |
Darren Gehring |
CA |
4 |
1008 |
Neil Patterson |
CA |
5 |
1013 |
Rachel Patterson |
CA |
6 |
1018 |
David Smith |
CA |
7 |
1022 |
Caleb Viescas |
CA |
1 |
1027 |
Luke Patterson |
OR |
2 |
1017 |
Manuela Seidel |
OR |
3 |
1007 |
Mariya Sergienko |
OR |
4 |
1016 |
Jim Wilson |
OR |
1 |
1004 |
Robert Brown |
TX |
2 |
1026 |
Kirk DeGrasse |
TX |
3 |
1010 |
Angel Kennedy |
TX |
4 |
1025 |
Maria Patterson |
TX |
5 |
1021 |
Estella Pundt |
TX |
6 |
1024 |
Mark Rosales |
TX |
1 |
1020 |
Joyce Bonnicksen |
WA |
2 |
1009 |
Andrew Cencini |
WA |
3 |
1011 |
Alaina Hallmark |
WA |
4 |
1003 |
Gary Hallmark |
WA |
5 |
1012 |
Liz Keyser |
WA |
6 |
1005 |
Dean McCrae |
WA |
7 |
1023 |
Julia Schnebly |
WA |
8 |
1002 |
William Thompson |
WA |
9 |
1028 |
Jeffrey Tirekicker |
WA |
10 |
1006 |
John Viescas |
WA |
11 |
1001 |
Suzanne Viescas |
WA |
Because I asked for a partition by state, the RowNumber column restarts at 1 for each different state, and the rows are sorted by customer last name and first name within each state.
Another one of the new functions introduced in Figure 22-1 is RANK(). As you might expect from the name, this allows you to rank rows of data relative to one another.
“List all students who have completed English courses and rank them by the grade they received.”
Translation/Clean Up |
Select the SubjectID, StudFirstName, StudLastName, SubjectName, Grade, and RANK() OVER ordered by Grade DESC from the Students table inner joined with the Student_Schedules table on Students.StudentID = Student_Schedules.StudentID inner joined with the Classes table on Classes.ClassID = Student_Schedules.ClassID inner joined with the Subjects table on Subjects.SubjectID = Classes.SubjectID where ClassStatus = 2 and CategoryID = ‘ENG’ |
|
|
That query would lead to results like the following table. (I saved this request as CH22_English_Students_Rank in the School Scheduling Example database.)
Stud FirstName |
Stud LastName |
SubjectName |
Grade |
Rank |
|
37 |
Scott |
Bishop |
Composition - Fundamentals |
98.07 |
1 |
37 |
Sara |
Sheskey |
Composition - Fundamentals |
97.59 |
2 |
37 |
John |
Kennedy |
Composition - Fundamentals |
93.01 |
3 |
37 |
Brannon |
Jones |
Composition - Fundamentals |
91.66 |
4 |
37 |
Janice |
Galvin |
Composition - Fundamentals |
91.44 |
5 |
38 |
Kendra |
Bonnicksen |
Composition - Intermediate |
88.91 |
6 |
37 |
George |
Chavez |
Composition - Fundamentals |
88.54 |
7 |
37 |
Marianne |
Wier |
Composition - Fundamentals |
87.4 |
8 |
37 |
David |
Hamilton |
Composition - Fundamentals |
86.33 |
9 |
37 |
Steve |
Pundt |
Composition - Fundamentals |
82.58 |
10 |
38 |
Doris |
Hartwig |
Composition - Intermediate |
81.66 |
11 |
37 |
Michael |
Viescas |
Composition - Fundamentals |
77.59 |
12 |
38 |
Elizabeth |
Hallmark |
Composition - Intermediate |
72.88 |
13 |
37 |
Karen |
Smith |
Composition - Fundamentals |
72.05 |
14 |
37 |
Betsy |
Stadick |
Composition - Fundamentals |
71.09 |
15 |
37 |
Kerry |
Patterson |
Composition - Fundamentals |
70 |
16 |
38 |
Sarah |
Thompson |
Composition - Intermediate |
67.6 |
17 |
38 |
Richard |
Lum |
Composition - Intermediate |
67.19 |
18 |
In this case, there are no duplicate grade values, so the ranking is simply the grades in descending order.
If two or more rows tie for a rank in the same partition, each of the tied rows receives the same rank. If we take the above table and give both Brannon Jones and Janice Galvin a grade of 91.66 for Composition – Fundamentals, both Brannon and Janice would be listed at rank value 4, and Kendra Bonnicksen would still have rank 6, with the value 5 skipped. Like this:
SubjectID |
StudFirstName |
StudLastName |
SubjectName |
Grade |
Rank |
37 |
Scott |
Bishop |
Composition - Fundamentals |
98.07 |
1 |
37 |
Sara |
Sheskey |
Composition - Fundamentals |
97.59 |
2 |
37 |
John |
Kennedy |
Composition - Fundamentals |
93.01 |
3 |
37 |
Brannon |
Jones |
Composition - Fundamentals |
91.66 |
4 |
37 |
Janice |
Galvin |
Composition - Fundamentals |
91.66 |
4 |
38 |
Kendra |
Bonnicksen |
Composition - Intermediate |
88.91 |
6 |
<< more rows here >> |
Figure 22-1 also included the DENSE_RANK() and PERCENT_RANK() functions. The difference between those two functions and RANK() is straightforward. Because there is only one distinct value that precedes the row, the DENSE_RANK() function would not skip a value. It would return the value 5 for Kendra Bonnicksen. This is one more than the number of distinct rows that come before the current row. The numbers returned by the DENSE_RANK function do not have gaps and always have consecutive ranks.
The PERCENT_RANK() function, however, returns a number that represents the percentage of values less than the current value in the group, excluding the highest value. In the absence of ties, the PERCENT_RANK() function will always return 0 for the first value in the group, and 1 for the last value in a group. For the remaining rows in the partition, the PERCENT_RANK function ranks a value by calculating its rank minus 1 (rk − 1), and dividing that value by the number of rows in the partition minus 1 (nr − 1). Here’s the formula:
Let’s take a look at an example.
“List all bowlers in the league, ranking them by their average handicapped score. Show all three of RANK(), DENSE_RANK(), and PERCENT_RANK() to show the difference. (Remember that bowling scores are reported as rounded integer values.)”
Translation/Clean Up |
Select the BowlerID, BowlerName, ROUND(AVG(HandiCapScore), 0), RANK() OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS Rank, DENSE_RANK () OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS DenseRank and PERCENT_RANK () OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS PercentRank from table Bowlers inner joined with the Bowler_Scores table ON Bowler_Scores.BowlerID = Bowlers.BowlerID, grouped by BowlerID, BowlerFirstName and BowlerLastName |
|
|
Note: I rounded the averages because that’s what a real bowling league does. You need a
rounded integer value to be able to calculate the handicap. I had to include the rounded
expression in each of the ORDER BY clauses to get the correct answer. And if you look
at the code I had to use in Microsoft SQL Server, I also CAST the HandiCapScore column
AS FLOAT because, if I don’t do that, the database system returns a truncated, not
rounded, average value.
That query would lead to results like the following table. (I saved this request as CH22_Bowlers_Average_Score_Rankings in the Bowling League Example database.)
BowlerID |
BowlerName |
Avg Handicap |
Rank |
Dense Rank |
Percent Rank |
15 |
Kathryn Patterson |
198 |
1 |
1 |
0 |
6 |
Neil Patterson |
198 |
1 |
1 |
0 |
27 |
William Thompson |
198 |
1 |
1 |
0 |
19 |
John Viescas |
198 |
1 |
1 |
0 |
25 |
Megan Patterson |
197 |
5 |
2 |
0.129032258064516 |
3 |
John Kennedy |
197 |
5 |
2 |
0.129032258064516 |
29 |
Bailey Hallmark |
197 |
5 |
2 |
0.129032258064516 |
14 |
Gary Hallmark |
197 |
5 |
2 |
0.129032258064516 |
2 |
David Fournier |
196 |
9 |
3 |
0.258064516129032 |
31 |
Ben Clothier |
196 |
9 |
3 |
0.258064516129032 |
11 |
Angel Kennedy |
196 |
9 |
3 |
0.258064516129032 |
26 |
Mary Thompson |
196 |
9 |
3 |
0.258064516129032 |
7 |
David Viescas |
196 |
9 |
3 |
0.258064516129032 |
1 |
Barbara Fournier |
196 |
9 |
3 |
0.258064516129032 |
24 |
Sarah Thompson |
196 |
9 |
3 |
0.258064516129032 |
18 |
Michael Hernandez |
195 |
16 |
4 |
0.483870967741936 |
10 |
Doug Steele |
195 |
16 |
4 |
0.483870967741936 |
Carol Viescas |
195 |
16 |
4 |
0.483870967741936 |
|
9 |
Alastair Black |
195 |
16 |
4 |
0.483870967741936 |
5 |
Ann Patterson |
195 |
16 |
4 |
0.483870967741936 |
22 |
Alaina Hallmark |
195 |
16 |
4 |
0.483870967741936 |
16 |
Richard Sheskey |
194 |
22 |
5 |
0.67741935483871 |
20 |
Suzanne Viescas |
194 |
22 |
5 |
0.67741935483871 |
28 |
Michael Viescas |
194 |
22 |
5 |
0.67741935483871 |
23 |
Caleb Viescas |
194 |
22 |
5 |
0.67741935483871 |
21 |
Zachary Ehrlich |
194 |
22 |
5 |
0.67741935483871 |
4 |
Sara Sheskey |
194 |
22 |
5 |
0.67741935483871 |
13 |
Elizabeth Hallmark |
194 |
22 |
5 |
0.67741935483871 |
30 |
Rachel Patterson |
194 |
22 |
5 |
0.67741935483871 |
32 |
Joe Rosales |
193 |
30 |
6 |
0.935483870967742 |
8 |
Stephanie Viescas |
193 |
30 |
6 |
0.935483870967742 |
17 |
Kendra Hernandez |
193 |
30 |
6 |
0.935483870967742 |
You can see the difference between the values in the Rank and DenseRank columns. The first four bowlers (Kathryn Patterson, Neil Patterson, William Thompson, and John Viescas) all have the same highest average score (198), so they’re each ranked 1 by both functions. The next four people (Megan Patterson, John Kennedy, Bailey Hallmark, and Gary Hallmark) also have the same average score (197), so the values in the Rank and DenseRank columns are the same for all of them as well. Since four bowlers have been ranked before them, the RANK() function returns a value of 5, one more than the number already ranked. However, there’s only one distinct value ahead of them (198), so the DENSE_RANK() function returns a value of 2, one more than the number of distinct values.
Ranking continues in this manner. The next seven bowlers (David Fournier, Ben Clothier, Angel Kennedy, Mary Thompson, David Viescas, Barbara Fournier, and Sarah Thompson) have the same average score (196). Since eight bowlers have already been ranked ahead of them, the RANK() function returns 9 for each of them. However, there are only two distinct values ahead of them (198 and 197), so the DENSE_RANK() function returns 3. Finally, Joe Rosales, Stephanie Viescas, and Kendra Hernandez all get a rank of 30 (because twenty-nine bowlers have already been ranked ahead of them) and a dense rank of 6 (because there are only five distinct values ahead of theirs)
Look, too, at the PercentRank column. Once again, since Kathryn Patterson, Neil Patterson, William Thompson, and John Viescas have the highest average score, the PERCENT_RANK() function returns 0 for their rows. The next four bowlers (Megan Patterson, John Kennedy, Bailey Hallmark, and Gary Hallmark) are all ranked as 5, and there are 32 rows in the partition. As explained above, that means the PERCENT_RANK function returns
Similarly, the next seven bowlers (David Fournier, Ben Clothier, Angel Kennedy, Mary Thompson, David Viescas, Barbara Fournier, and Sarah Thompson) are all ranked and 9, so the PERCENT_RANK function returns
One thing that doesn’t show up in this example is the fact that, should the last row represent a single individual, the PERCENT_RANK function would return 1 for that row. Because Joe Rosales, Stephanie Viescas, and Kendra Hernandez are tied for last place, the PERCENT_RANK function returns the same value for all three of them:
You may remember that I showed you how to calculate quintiles in Chapter 20, “Using Unlinked Data and ‘Driver’ Tables.” You may also recall that you needed to CROSS JOIN two separate queries (one that assigned a ranking number to each student who completed an English course and one that counted all students who completed an English course). That’s a lot of data retrieval that can be prevented through the use of window functions! In fact, one of the aggregate functions introduced along with window functions is NTILE, which lets you divide the data into as many different ranges as you want.
“List all students who have completed English courses, rank them by the grades they received, and indicate the Quintile into which they fall.”
Translation/Clean Up |
Select the SubjectID, StudFirstName, StudLastName, SubjectName, Grade, RANK() OVER ordered by Grade DESC and NTILE (5) OVER ordered by Grade DESC from the Students table inner joined with the Student_Schedules table on Students.StudentID = Student_Schedules.StudentID inner joined with the Classes table on Classes.ClassID = Student_Schedules.ClassID inner joined with the Subjects table on Subjects.SubjectID = Classes.SubjectID where ClassStatus = 2 and CategoryID = ’ENG’ |
|
|
That query would lead to results like the following table. (I saved this request as CH22_English_Students_Quintiles in the School Scheduling Example database.)
Stud FirstName |
Stud LastName |
Subject Name |
Grade |
Rank |
Quintile |
|
37 |
Scott |
Bishop |
Composition - Fundamentals |
98.07 |
1 |
1 |
37 |
Sara |
Sheskey |
Composition - Fundamentals |
97.59 |
2 |
1 |
37 |
John |
Kennedy |
Composition - Fundamentals |
93.01 |
3 |
1 |
37 |
Brannon |
Jones |
Composition - Fundamentals |
91.66 |
4 |
1 |
37 |
Janice |
Galvin |
Composition - Fundamentals |
91.44 |
5 |
2 |
38 |
Kendra |
Bonnicksen |
Composition - Intermediate |
88.91 |
6 |
2 |
37 |
George |
Chavez |
Composition - Fundamentals |
88.54 |
7 |
2 |
37 |
Marianne |
Wier |
Composition - Fundamentals |
87.4 |
8 |
2 |
37 |
David |
Hamilton |
Composition - Fundamentals |
86.33 |
9 |
3 |
37 |
Steve |
Pundt |
Composition - Fundamentals |
82.58 |
10 |
3 |
38 |
Doris |
Hartwig |
Composition - Intermediate |
81.66 |
11 |
3 |
37 |
Michael |
Viescas |
Composition - Fundamentals |
77.59 |
12 |
3 |
38 |
Elizabeth |
Hallmark |
Composition - Intermediate |
72.88 |
13 |
4 |
37 |
Karen |
Smith |
Composition - Fundamentals |
72.05 |
14 |
4 |
37 |
Betsy |
Stadick |
Composition - Fundamentals |
71.09 |
15 |
4 |
37 |
Kerry |
Patterson |
Composition - Fundamentals |
70 |
16 |
5 |
Sarah |
Thompson |
Composition - Intermediate |
67.6 |
17 |
5 |
|
38 |
Richard |
Lum |
Composition - Intermediate |
67.19 |
18 |
5 |
Note: The more astute of you might notice that the quintiles returned are slightly different
in this example from those returned by the example in Chapter 20, CH20_English_Student_Quintiles.
The NTILE() function is used to distribute rows into a specified number of groups, When the number of rows is not divisible equally by the number of groups, larger groups will come before smaller groups. For instance, if you have 10 rows of data, NTILE(2) can divide the rows into two equal size groups, but NTILE(3) will put 4 in the first group, and 3 in each of the second and third groups. That’s what happened in this case: there were 18 rows being divided into five groups, so the groups were 4, 4, 4, 3 and 3. That’s different from the algorithm used to distribute the rows among the five groups in the CH20_English_Student_Quintiles.
One approach is not necessarily better or worse than the other. If there had been twenty students, the quintiles would have been the same.
As I already pointed out, you can use any of the Aggregate Functions I showed you in Chapter 12 with the OVER() clause. You saw the full diagram earlier as Figure 22–2. I’ve already shown the use of the COUNT(*) function and SUM(COUNT(*)), but let’s take a quick look at it again in conjunction with the ROWS and RANGE predicates.
Remember that both the ROWS and RANGE predicates limit the data on which the aggregate function will work. Using the same ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW predicate in conjunction with the COUNT(*) function really doesn’t provide much. It’ll let you count all the rows “in front” of the current row: in other words, it’ll simply provide you with the same values as the ROW_NUMBER() function you’ve already seen. Using RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW potentially is more useful: it’ll give you a running total by range.
“Give a count of how many detail lines are associated with each order placed. I want to see the order number, the product purchased and a count of how many items are on the invoice. I’d also like to see how many detail lines there are in total.”
Translation/Clean Up |
Select the OrderNumber, ProductName, COUNT(*) OVER partition by OrderNumber, COUNT(*) OVER order by OrderNumber rows between unbounded preceding and current row, COUNT(*) OVER order by OrderNumber ranges between unbounded preceding and current row from the Orders table inner joined with the Order_Details table on Order_Details.OrderNumber = Orders.OrderNumber inner joined with the Products table on Products.ProductNumber = OrderDetails.ProductNumber |
|
|
That query would lead to results like the following table. (I saved this request as CH22_Order_Counts_ByInvoice_ROWS_RANGE in the Sales Orders Example database.)
OrderNo |
ProductName |
Total |
Total UsingRows |
Total UsingRange |
1 |
Trek 9000 Mountain Bike |
7 |
1 |
7 |
1 |
Viscount Mountain Bike |
7 |
2 |
7 |
1 |
GT RTS-2 Mountain Bike |
7 |
3 |
7 |
1 |
ProFormance ATB All-Terrain Pedal |
7 |
4 |
7 |
1 |
Dog Ear Aero-Flow Floor Pump |
7 |
5 |
7 |
1 |
Glide-O-Matic Cycling Helmet |
7 |
6 |
7 |
1 |
Ultimate Export 2G Car Rack |
7 |
7 |
7 |
2 |
X-Pro All Weather Tires |
2 |
8 |
9 |
2 |
Ultimate Export 2G Car Rack |
2 |
9 |
9 |
3 |
Trek 9000 Mountain Bike |
8 |
10 |
17 |
3 |
Viscount Mountain Bike |
8 |
11 |
17 |
<< more rows here >> |
You can see that there are seven separate products associated with order 1, so the Total column contains 7 for each of the rows. Similarly, the two separate rows associated with order 2 both have 2 for Total associated with them. (Only part of order 3 is shown, but you can see that the two rows that are shown both have the same value associated with them as well.)
As mentioned, the TotalUsingRows column is simply consecutive numbers, the same as the ROW_NUMBER() function would provide.
Finally, the TotalUsingRange column shows the total number of products by order. In other words, it shows 7 for all seven rows in order 1, then 9 for both rows in order 2 (corresponding to the seven rows from order 1 plus the two rows from order 2). You can see that the Total column indicates that there are eight rows in order 3, so the TotalUsingRange column contains 17 (7 + 2 + 8) for the rows associated with order 3.
Of course, other aggregate functions can be useful. For instance, perhaps you want to see the details of all the orders your customers have placed, and you’d like to see the Order total associated with each.
“List all orders placed, including the customer name, the order number, the product name, the quantity ordered, the quoted price and the total price per order.”
Translation/Clean Up |
Select the CustFirstName || ' ' ||, CustLastName, OrderNumber, ProductName, QuantityOrdered, QuotedPrice, and SUM(QuotedPrice) OVER partition by OrderNumber from the Orders table inner joined with the Order_Details table on Order_Details.OrderNumber = Orders.OrderNumber inner joined with the Customers table on Customers.CustomerID = Orders.CustomerID inner joined with the Products table on Products.ProductNumber = OrderDetails.ProductNumber |
|
|
That query would lead to results like the following table. (I saved this request as CH22_Order_Totals_ByInvoice in the Sales Orders Example database.)
Order |
ProductName |
Quantity |
Price |
OrderTotal |
|
David Smith |
1 |
Trek 9000 Mountain Bike |
2 |
1200.00 |
3863.85 |
David Smith |
1 |
Viscount Mountain Bike |
3 |
635.00 |
3863.85 |
David Smith |
1 |
GT RTS-2 Mountain Bike |
4 |
1650.00 |
3863.85 |
David Smith |
1 |
ProFormance ATB All-Terrain Pedal |
1 |
28.00 |
3863.85 |
David Smith |
1 |
Dog Ear Aero-Flow Floor Pump |
3 |
55.00 |
3863.85 |
David Smith |
1 |
Glide-O-Matic Cycling Helmet |
5 |
121.25 |
3863.85 |
David Smith |
1 |
Ultimate Export 2G Car Rack |
6 |
174.60 |
3863.85 |
Suzanne Viescas |
2 |
X-Pro All Weather Tires |
4 |
24.00 |
204.00 |
Suzanne Viescas |
2 |
Ultimate Export 2G Car Rack |
4 |
180.00 |
204.00 |
William Thompson |
3 |
Trek 9000 Mountain Bike |
5 |
1164.00 |
3824.29 |
William Thompson |
3 |
Viscount Mountain Bike |
5 |
615.95 |
3824.29 |
<< more rows here >> |
You can see that the seven separate rows associated with order 1 all have the same OrderTotal associated with them, while the two separate rows associated with order 2 all have the same OrderTotal associated with them. (Only part of order 3 is shown, but you can see that the two rows that are shown both have the same value associated with them as well.)
Let’s look at one more example of using the ROWS predicate. (Although I can’t see a legitimate use for this example, hopefully, it’ll give you a little more insight into how the ROWS predicate works!)
“List all engagements booked, showing the customer, the start date, the contract price and the sum of the current row plus the row before and after. Also, show the sum of the current row plus the row before and after partitioned by customer.”
Select the CustFirstName || ' ' ||, CustLastName, StartDate, ContractPrice, SUM(ContractPrice) OVER order by CustLastName, CustFirstName rows between 1 preceding and 1 following and SUM(ContractPrice) OVER partition by CustLastName, CustFirstName order by CustLastName, CustFirstName rows between 1 preceding and 1 following from the Engagements table inner joined with the Customers table on Customers.CustomerID = Engagements.CustomerID = Orders.CustomerID |
|
|
|
That query would lead to results like the following table. (I saved this request as CH22_Odd_Contract_Sums in the Entertainment Agency Example database.)
Customer |
StartDate |
ContractPrice |
SumOf3 |
PartitionedSumOf3 |
Matt Berg |
2017-09-02 |
200.00 |
1330.00 |
1330.00 |
Matt Berg |
2017-09-12 |
1130.00 |
3180.00 |
3180.00 |
Matt Berg |
2017-09-19 |
1850.00 |
5655.00 |
5655.00 |
Matt Berg |
2017-10-14 |
2675.00 |
6450.00 |
6450.00 |
Matt Berg |
2017-10-23 |
1925.00 |
6150.00 |
6150.00 |
Matt Berg |
2017-12-31 |
1550.00 |
3795.00 |
3795.00 |
Matt Berg |
2018-01-09 |
320.00 |
3540.00 |
3540.00 |
2018-02-12 |
1670.00 |
3840.00 |
3840.00 |
|
Matt Berg |
2018-02-24 |
1850.00 |
4320.00 |
3520.00 |
Peter Brehm |
2018-02-17 |
800.00 |
2940.00 |
1090.00 |
Peter Brehm |
2018-01-07 |
290.00 |
1860.00 |
1860.00 |
Peter Brehm |
2018-01-30 |
770.00 |
1380.00 |
1380.00 |
Peter Brehm |
2018-02-27 |
320.00 |
1590.00 |
1590.00 |
Peter Brehm |
2017-12-10 |
500.00 |
4620.00 |
4620.00 |
Peter Brehm |
2017-10-07 |
3800.00 |
5070.00 |
5070.00 |
Peter Brehm |
2017-09-18 |
770.00 |
6120.00 |
4570.00 |
Zachary Ehrlich |
2017-10-03 |
1550.00 |
3690.00 |
2920.00 |
Zachary Ehrlich |
2017-09-19 |
1370.00 |
4170.00 |
4170.00 |
Zachary Ehrlich |
2017-10-08 |
1250.00 |
3030.00 |
3030.00 |
<< more rows here >> |
The column labeled SumOf3 represents the sum of the ContractPrice for the current row, plus the ContractPrice on the previous row and the following row. Look at the first row, where it’s 1330.00. There is no previous row, the ContractPrice on that row is 200.00, and the ContractPrice on the next row is 1130.00, so the sum is indeed 1330.00. For the second row, the ContractPrice on the previous row is 200.00, the ContractPrice on the current row is 1130.00, and the ContractPrice on the following row is 1850.00, so the value in the SumOf3 column is 200.00 + 1130.00 + 1850.00, or 3180.00. Look down to the tenth row (the first row for Peter Brehm). Even though it’s for a different customer, there is a previous row (remember, I only specified ORDER BY, not PARTITION BY!), with a ContractPrice of 1850.00. The ContractPrice on the current row is 800.00, and on the next row it is 290.00, so the SumOf3 value for that row is 1850.00 + 800.00 + 290.00, or 2940.00.
The first eight rows are all for the same customer (Matt Berg), so there’s no difference between the value in the SumOf3 and the PartitionedSumOf3 columns. However, the ninth row is the last row for Matt, so there isn’t a following row. That means that the value in PartitionedSumOf3 is simply the value of the ContractPrice on the previous row (1670.00) plus the value of the ContractPrice on the current row (1850.00), or 3520.00. Now, there’s no previous row for the tenth row (the first row for Peter Brehm), so the PartitionedSumOf3 is the value of the ContractPrice for the current row (800.00) plus the value of the ContractPrice for the following row (290.00), or 1090.00.
You now know the mechanics of constructing queries using window functions and have seen some of the types of requests you can answer with window functions. Let’s take a look at a fairly robust set of samples, all of which use one or more window functions. These examples come from each of the sample databases, and they illustrate the use of the window functions.
Note: Remember in the Introduction that I warned you that results from each database system
won’t necessarily match the sort order you see in examples in this book unless you
include an ORDER BY clause. Even when you include that specification, the system might
return results in columns not included in the ORDER BY clause in a different sequence
because of different optimization techniques.
If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. You must open the design of the view and execute it from there to see the ORDER BY clause honored.
Also, when you use GROUP BY, you’ll often see the results returned by your database system as though the rows are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.
I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for this book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), and I prefixed the names of the queries relevant to this chapter with “CH22.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.
Note: Remember that all the column names and table names used in these examples are drawn
from the sample database structures shown in Appendix B, “Schema for the Sample Databases.” Because many of these examples use complex JOINs,
your database system might choose a different way to solve these queries. For this
reason, the first few rows might not exactly match the result you obtain, but the
total number of rows should be the same. To simplify the process, I have combined
the Translation and Clean Up steps for all the following examples.
“I’d like a list of all of the engagements. Show me the start date for each engagement, the name of the customer, and the entertainer. Number the engagements overall, plus number the engagements within each start date.”
Translation/Clean Up |
Select the StartDate, CustFirstName || ' ' || CustLastName AS Customer, EntStartName AS Entertaine, the ROW_NUMBER() ordered by StartDate AS Number and the ROW_NUMBER() partitioned by StartDate and ordered by StartDate AS NumberByDate from the Engagements table inner joined with the Entertainers table on Entertainers.EntertainerID = Engagements.EntertainerID inner joined with the Customers table on Customers.CustomerID = Engagements.CustomerID |
|
|
CH22_Engagements_Numbered (111 rows)
Number |
StartDate |
Number ByDate |
Customer |
Entertainer |
1 |
2017-09-02 |
1 |
Matt Berg |
Jim Glynn |
2 |
2017-09-11 |
1 |
Doris Hartwig |
Jazz Persuasion |
3 |
2017-09-11 |
2 |
Mark Rosales |
Country Feeling |
4 |
2017-09-12 |
1 |
Dean McCrae |
Topazz |
5 |
2017-09-12 |
2 |
Liz Keyser |
Jim Glynn |
6 |
2017-09-12 |
3 |
Matt Berg |
JV & the Deep Six |
7 |
2017-09-16 |
1 |
Elizabeth Hallmark |
Country Feeling |
8 |
2017-09-18 |
1 |
Elizabeth Hallmark |
JV & the Deep Six |
9 |
2017-09-18 |
2 |
Peter Brehm |
Modern Dance |
10 |
2017-09-19 |
1 |
Matt Berg |
Coldwater Cattle Company |
11 |
2017-09-19 |
2 |
Zachary Ehrlich |
Saturday Revue |
12 |
2017-09-19 |
3 |
Mark Rosales |
Carol Peacock Trio |
13 |
2017-09-25 |
1 |
Doris Hartwig |
Country Feeling |
2017-09-25 |
2 |
Liz Keyser |
Caroline Coie Cuartet |
|
15 |
2017-09-30 |
1 |
Deb Waldal |
Saturday Revue |
16 |
2017-09-30 |
2 |
Sarah Thompson |
Jim Glynn |
<< more rows here >> |
||||
110 |
2018-02-27 |
1 |
Peter Brehm |
Julia Schnebly |
111 |
2018-03-04 |
1 |
Mark Rosales |
JV & the Deep Six |
“I’d like a numbered list of all of the recipes. Number the recipes overall, plus number each recipe within its recipe class. Sort the lists alphabetically by recipe name within recipe class. Don’t forget to include any recipe classes that don’t have any recipes in them.”
Translation/Clean Up |
Select the RecipeClassDescription, the RecipeTitle, the ROW_NUMBER() ordered by RecipeClassDescription, RecipeTitle AS OverallNumber and the ROW_NUMBER() partitioned by RecipeClassDescription and ordered by RecipeTitle AS ClassNumber from the Recipe_Classes table inner joined with the Recipes table on Recipes.RecipeClass ID = Recipe_Classes.RecipeClassID |
|
|
CH22_Recipe_Classes_Numbered (16 rows)
Overall Number |
RecipeClass Description |
Number InClass |
RecipeTitle |
1 |
Dessert |
1 |
Coupe Colonel |
2 |
Dessert |
2 |
Trifle |
3 |
Hors d’oeuvres |
1 |
Machos Nachos |
4 |
Hors d’oeuvres |
2 |
Salsa Buena |
5 |
Main course |
1 |
Fettuccini Alfredo |
6 |
Main course |
2 |
Huachinango Veracruzana (Red Snapper, Veracruz style) |
7 |
Main course |
3 |
Irish Stew |
8 |
Main course |
4 |
Pollo Picoso |
9 |
Main course |
5 |
Roast Beef |
10 |
Main course |
6 |
Salmon Filets in Parchment Paper |
11 |
Main course |
7 |
Tourtière (French-Canadian Pork Pie) |
12 |
Salad |
1 |
Mike’s Summer Salad |
13 |
Soup |
1 |
NULL |
14 |
Starch |
1 |
Yorkshire Pudding |
15 |
Vegetable |
1 |
Asparagus |
16 |
Vegetable |
2 |
Garlic Green Beans |
“Rank all employees by the number of orders with which they’re associated.”
Translation/Clean Up |
Select the EmployeeID, EmpFirstName || ' ' || EmpLastName AS EmployeeName, COUNT(DISTINCT OrderNumber) AS OrdersReceived and RANK() OVER ordered by COUNT(DISTINCT OrderNumber DESC from the Employees table inner joined with the Orders table ON Orders.EmployeeID = Employees.EmployeeID inner joined with the Order_Details table on OrderDetails.OrderNumber = Orders.OrderNumber, grouped by EmployeeID, EmpFirstName and EmpLastName |
|
CH22_Employee_Sales_Ranked (8 rows)
EmployeeID |
Employee |
OrdersReceived |
Rank |
707 |
Kathryn Patterson |
138 |
1 |
708 |
Susan McLain |
129 |
2 |
702 |
Mary Thompson |
117 |
3 |
704 |
Carol Viescas |
117 |
3 |
705 |
Kirk DeGrasse |
115 |
5 |
701 |
Ann Patterson |
109 |
6 |
706 |
David Viescas |
104 |
7 |
703 |
Matt Berg |
104 |
7 |
If you’re wondering why I included the Order_Details table in the above SQL, the answer is that I know there are some rows in the Orders table that do not have any matching rows in the Order_Details table. I don’t want to include orders that have no items specified, so including the Order_Details in an INNER JOIN with the Orders table eliminates those “empty” orders.
“Rank the staff by how long they’ve been with us as of January 1, 2018. I don’t want to see any gaps in the rank numbers.”
(Remember that I showed you how to calculate the years of service in CH19_Length_Of_Service)
Translation/Clean Up |
Select the StaffID, StfFirstName || ' ' || StfLastName AS StaffName, LengthOfService, and DENSE_ RANK() OVER ordered by LengthOfService DESC from table Staff |
|
|
CH22_Staff_Service_Ranked (27 rows)
StaffID |
StaffName |
LengthOfService |
Rank |
98036 |
Sam Abolrous |
27 |
1 |
98062 |
Caroline Coie |
27 |
1 |
98025 |
Carol Viescas |
26 |
2 |
98028 |
Alaina Hallmark |
26 |
2 |
98010 |
Jeffrey Smith |
26 |
2 |
98011 |
Ann Patterson |
26 |
2 |
98007 |
Gary Hallmark |
25 |
3 |
98020 |
Jim Glynn |
25 |
3 |
98043 |
Kathryn Patterson |
25 |
3 |
98052 |
Katherine Ehrlich |
25 |
3 |
98013 |
Deb Waldal |
24 |
4 |
98014 |
Peter Brehm |
24 |
4 |
98005 |
Suzanne Viescas |
24 |
4 |
98048 |
Joyce Bonnicksen |
24 |
4 |
98059 |
Maria Patterson |
24 |
4 |
98040 |
Jim Wilson |
23 |
5 |
98030 |
Liz Keyser |
22 |
6 |
98063 |
Kirk DeGrasse |
22 |
6 |
98064 |
Luke Patterson |
21 |
7 |
98055 |
Alastair Black |
21 |
7 |
98057 |
Joe Rosales III |
21 |
7 |
98012 |
Robert Brown |
21 |
7 |
98021 |
Tim Smith |
21 |
7 |
98019 |
Mariya Sergienko |
20 |
8 |
98045 |
Michael Hernandez |
20 |
8 |
98053 |
Caleb Viescas |
18 |
9 |
98042 |
David Smith |
18 |
9 |
“Rank all the teams from best to worst based on the average handicap score of all the players. Arrange the teams into four quartiles.”
Translation/Clean Up |
Select the TeamName, AVG(HandiCapScore) and NTILE(4) OVER ordered by AVG(HandiCapScore) DESC from the Teams table inner joined with the Bowlers table ON Bowlers.TeamID = Teams.TeamID inner joined with the Bowler_Scores table on BowlerScores.BowlerID = Bowlers.BowerID, grouped by TeamName |
|
|
CH22_Teams_In_Quartiles (8 rows)
TeamName |
AvgTeamHandicap |
Quartile |
Marlins |
196 |
1 |
Barracudas |
196 |
1 |
Manatees |
196 |
2 |
Swordfish |
195 |
2 |
Orcas |
195 |
3 |
Dolphins |
195 |
3 |
Sharks |
195 |
4 |
Terrapins |
195 |
4 |
I bet you’re wondering what’s going on here. Because there’s so little difference between the team handicaps, the database system sorts the teams by average handicap and then applies the value 1 to the first two rows, 2 to the second two rows, 3 to the fifth and sixth rows, and 4 to the remaining ones. In fact, the sorting within an equal is arbitrary and can vary from system to system. If you run both the Microsoft SQL Server and PostgreSQL examples, you’ll find that the choice of “top two” is entirely different.
“Rank all the entertainers based on the number of engagements booked for each. Arrange the entertainers into three groups. Remember to include any entertainers who haven't been booked for any engagements.”
Translation/Clean Up |
Select the EntStageName, COUNT(Engagements.EntertainerID), and NTILE(3) OVER ordered by COUNT(Engagements.EntertainerID) DESC from the Entertainers table left joined with the Engagements table ON Engagements.EntertainerID = Entertainers.EntertainerID, grouped by EntStageName |
|
|
CH22_Entertainer_3_Groups (13 rows)
Entertainer |
Gigs |
Group |
Country Feeling |
15 |
1 |
Carol Peacock Trio |
11 |
1 |
Caroline Coie Cuartet |
11 |
1 |
JV & the Deep Six |
10 |
1 |
10 |
1 |
|
Saturday Revue |
9 |
2 |
Jim Glynn |
9 |
2 |
Julia Schnebly |
8 |
2 |
Coldwater Cattle Company |
8 |
2 |
Jazz Persuasion |
7 |
3 |
Topazz |
7 |
3 |
Susan McLain |
6 |
3 |
Katherine Ehrlich |
0 |
3 |
“For each team, show me the details of all of the games bowled by the team captains. Include the date and location for each tournament, their handicap score, whether or not they won the game. Include counts for how many games they won and their average handicap score.”
Translation/Clean Up |
Select the TeamName, BowlerFirstName || ' ' || BowlerLastName AS Captain, TourneyDate, TourneyLocation, HandiCapScore, CASE WonGame WHEN 1 THEN ‘Won’ ELSE ‘Lost’ END As WonGame, SUM(CAST(WonGame AS INT) OVER partitioned by TeamName AS TotalWindws, AVG(HandiCapScore) OVER (partitioned by TeamName) AS AvgHandicap from table Teams inner joined with the Bowlers table ON Bowler.BowlerID = Teams.CaptainID inner joined with the Bowler_Scores table on Bowler_Scores.BowlerID = Teams.CaptainID inner joined with the Match_Games table on MatchGames.MatchID = Bowler_Scores.MatchID and MatchGames.GameNumber = Bowler_Scores.GameNumber inner joined with the Tourney_Matches table on Tourney_Matches.MatchID = Match_Games.MatchID inner joined with the Tournaments table on Tournaments.TourneyID = Tourney_Matches.TourneyID |
|
CH22_Comparing_Team_Captains (420 rows)
TeamName |
Captain |
Tourney Date |
Tourney Location |
HandiCap Score |
Won Game |
Total Wins |
Avg Handicap |
Barracudas |
Richard Sheskey |
2017-09-04 |
Red Rooster Lanes |
195 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-04 |
Red Rooster Lanes |
191 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-04 |
Red Rooster Lanes |
195 |
Won |
25 |
194 |
Richard Sheskey |
2017-09-11 |
Thunderbird Lanes |
201 |
Won |
25 |
194 |
|
Barracudas |
Richard Sheskey |
2017-09-11 |
Thunderbird Lanes |
201 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-11 |
Thunderbird Lanes |
188 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-18 |
Bolero Lanes |
196 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-18 |
Bolero Lanes |
189 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-18 |
Bolero Lanes |
193 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-25 |
Imperial Lanes |
190 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-25 |
Imperial Lanes |
188 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-09-25 |
Imperial Lanes |
198 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-02 |
Sports World Lanes |
195 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-02 |
Sports World Lanes |
189 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-02 |
Sports World Lanes |
190 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-09 |
Totem Lanes |
191 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-09 |
Totem Lanes |
200 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-09 |
Totem Lanes |
191 |
Lost |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-16 |
Acapulco Lanes |
190 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-16 |
Acapulco Lanes |
200 |
Won |
25 |
194 |
Barracudas |
Richard Sheskey |
2017-10-16 |
Acapulco Lanes |
187 |
Lost |
25 |
194 |
<< more rows here >> |
Note: I could have just as easily partitioned by BowlerFirstName, BowlerLastName rather
than by TeamName, but I feel it’s usually a good idea not to group (or partition)
by people names, since they’re often not unique. Other possibilities would have been
BowlerID or CaptainID.
I’ll confess that this is yet another somewhat artificial example of using ROWS to try and give you another example of how it works.
“For each order, give me a list of the customer, the product, and the quantity ordered. Give me the total quantity of products for each order. As well, for every group of three products on the invoice, show me their total and the highest and lowest value.”
CH22_Orders_Min_Max (3973 rows)
Order Number |
Customer |
Product Name |
Quantity Ordered |
Total Quantity |
Quantity3 |
Minimum3 |
Maximum3 |
1 |
David Smith |
Trek 9000 Mountain Bike |
2 |
24 |
5 |
2 |
3 |
1 |
David Smith |
Viscount Mountain Bike |
3 |
24 |
9 |
2 |
4 |
1 |
David Smith |
GT RTS-2 Mountain Bike |
4 |
24 |
8 |
1 |
4 |
1 |
David Smith |
ProFormance ATB All-Terrain Pedal |
1 |
24 |
8 |
1 |
4 |
David Smith |
Dog Ear Aero-Flow Floor Pump |
3 |
24 |
9 |
1 |
5 |
|
1 |
David Smith |
Glide-O-Matic Cycling Helmet |
5 |
24 |
14 |
3 |
6 |
1 |
David Smith |
Ultimate Export 2G Car Rack |
6 |
24 |
11 |
5 |
6 |
2 |
Suzanne Viescas |
X-Pro All Weather Tires |
4 |
8 |
8 |
4 |
4 |
2 |
Suzanne Viescas |
Ultimate Export 2G Car Rack |
4 |
8 |
8 |
4 |
4 |
3 |
William Thompson |
Trek 9000 Mountain Bike |
5 |
28 |
10 |
5 |
5 |
3 |
William Thompson |
Viscount Mountain Bike |
5 |
28 |
11 |
1 |
5 |
3 |
William Thompson |
GT RTS-2 Mountain Bike |
1 |
28 |
8 |
1 |
5 |
3 |
William Thompson |
ProFormance ATB All-Terrain Pedal |
2 |
28 |
6 |
1 |
3 |
3 |
William Thompson |
Dog Ear Aero-Flow Floor Pump |
3 |
28 |
8 |
2 |
3 |
3 |
William Thompson |
Glide-O-Matic Cycling Helmet |
3 |
28 |
11 |
3 |
5 |
3 |
William Thompson |
True Grip Competition Gloves |
5 |
28 |
12 |
3 |
5 |
3 |
William Thompson |
Cosmic Elite Road Warrior Wheels |
4 |
28 |
9 |
4 |
5 |
4 |
Andrew Cencini |
Trek 9000 Mountain Bike |
4 |
19 |
7 |
3 |
4 |
<< more rows here >> |
The QuantityOrdered column is hopefully straightforward to understand. The first order (number 1, ordered by David Smith) consisted of seven products. David ordered 2 Trek 9000 Mountain Bikes, 3 Viscount Mountain Bikes, 4 GT RTS-2 Mountain Bikes, 1 ProFormance ATB All-Terrain Pedal, 3 Dog Ear Aero-Flow Floor Pumps, 5 Glide-O-Matic Cycling Helmets, and 6 Ultimate Export 2G Car Racks, for a total of 24 items. The second order (number 2, ordered by Suzanne Viescas) consisted of 4 X-Pro All Weather Tires and 4 Ultimate Export 2G Car Racks for a total of 8 items. That’s pretty standard.
The other three columns are the off-beat ones. I asked for the products on each invoice to be divided into groups of three, and for the sum, the minimum and the maximum of each of those groups. Remember how ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING works. Because I’m partitioning on Order, there is no preceding row for the first product on the first order (Trek 9000 Mountain Bike), so only the current row (with a quantity of 2) and the following row (with a quantity of 3) are considered. The total of those quantities is 5, the minimum is 2, and the maximum is 3. The next row (Viscount Mountain Bike) does have a preceding row, so there are three quantities to be considered: 2 (the preceding row), 3 (the current row), and 4 (the following row). The sum of those quantities is 9, the minimum is 2, and the maximum is 4. Skip down to the seventh row (Ultimate Export 2G Car Rack). Again, because I’m partitioning on Order, there is no following row, so only the preceding row (with a quantity of 5) and the current row (with a quantity of 6) are considered. The total of those two quantities is 11, the minimum is 5, and the maximum is 6.
Because the second order consists of only two rows, those two rows (both with quantities of 4) are the only ones considered in both cases. The sum of the quantities of those two rows is 8, the minimum is 4, and the maximum is 4.
For the third order, there is no preceding row for the Trek 9000 Mountain Bike row, so only it and the following Viscount Mountain Bike row are considered for the first calculation. Similarly, there is no following row for the Cosmic Elite Road Warrior Wheels row, so only it and the preceding True Grip Competition Gloves row are considered for the last calculation.
“For each subject, give me the highest mark that’s been received. Also, show me the highest mark that’s been received for each category of subjects, as well as the highest mark that’s been received overall.”
Translation/Clean Up |
Select the CategoryID, SubjectCode, SubjectName, MAX(Grade) OVER (partioned by SubjectID) AS SubjectMax, MAX(Grade) OVER (partitioned by CategoryID) AS CategoryMax, MAX(Grade) OVER() AS OverallMax from the Subjects table inner joined with the Classes table ON Classes.ClassID = Subjects.ClassID inner joined with the Student_Schedules table on Student_Schedules.ClassID = Classes.ClassID where ClassStatus = 2 |
|
|
CH22_Top_Marks (14 rows)
CategoryID |
SubjectCode |
SubjectName |
Subject Max |
Category Max |
Overall Max |
ACC |
ACC 210 |
Financial Accounting Fundamentals I |
91.12 |
91.12 |
99.83 |
ART |
ART 100 |
Introduction to Art |
99.83 |
99.83 |
99.83 |
ART 210 |
Computer Art |
87.65 |
99.83 |
99.83 |
|
ART |
ART 251 |
Art History |
97.81 |
99.83 |
99.83 |
BIO |
BIO 101 |
General Biology |
94.54 |
94.54 |
99.83 |
CHE |
CHE 139 |
Fundamentals of Chemistry |
98.31 |
98.31 |
99.83 |
CIS |
CIS 101 |
Microcomputer Applications |
98.01 |
98.01 |
99.83 |
ENG |
ENG 101 |
Composition - Fundamentals |
98.07 |
98.07 |
99.83 |
ENG |
ENG 102 |
Composition - Intermediate |
88.91 |
98.07 |
99.83 |
HIS |
HIS 111 |
U.S. History to 1877 |
87.14 |
87.14 |
99.83 |
MAT |
MAT 080 |
Preparatory Mathematics |
93.19 |
94.33 |
99.83 |
MAT |
MAT 097 |
Elementary Algebra |
94.33 |
94.33 |
99.83 |
MUS |
MUS 100 |
Music in the Western World |
97.84 |
97.84 |
99.83 |
MUS |
MUS 101 |
First Year Theory and Ear Training |
86.57 |
97.84 |
99.83 |
I began the chapter by describing to you why you might want to aggregate data differently from how you were shown in previous chapters and included an example to show one possibility.
I then explored ways to use “windows” into your data and talked about how the OVER() clause is the key to using window functions, and I explained the difference between GROUP BY and OVER().
I showed you how PARTITION BY, ORDER BY, and ROWS or RANGE influence your results.
• The PARTITION BY predicate divides the result set from the query into data subsets.
• The ORDER BY predicate controls the order that the rows are evaluated.
• The ROWS or RANGE predicate determines the subset of rows within the partition that are to be applied (and can only be used with aggregate functions).
I went on to show that window functions include all of the aggregate functions I’d already shown you, plus several others: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK and NTILE. Then I described the use of both the PARTITION BY and ORDER BY predicates with the OVER() clause. Finally, I gave several examples of how to use window functions to solve real-world problems.
The following section presents some requests that you can work out on your own.
1.“Divide the teams into quartiles based on the best raw score bowled by any member of the team.”
You can find my solution in CH22_Team_Quartiles_Best_RawScore (8 rows).
2. “Give me a list of all of the bowlers in the league. Number the bowlers overall, plus number them within their teams, sorting their names by LastName then FirstName.”
You can find my solution in CH22_Bowler_Numbers (32 rows).
3. “Rank all of the bowlers in the league by their average handicap score. Show me the “standard” ranking, but also show me the ranking with no gaps.”
You can find my solution in CH22_Bowler_Ranks (32 rows).
1. “Rank all the agents based on the total dollars associated with the engagements that they’ve booked. Make sure to include any agents that haven’t booked any acts.”
You can find my solution in CH22_Agent_Ranks (9 rows).
2. “Give me a list of all of the engagements our entertainers are booked for. Show me the entertainer’s stage name, the customer’s name, and the start date for each engagements, as well as the total number of engagements booked for each entertainer.”
You can find my solution in CH22_Entertainer_Engagements (111 rows).
3. “Give me a list of all of the Entertainers and their members. Number each member within a group.”
You can find my solution in CH22_Entertainer_Lists (40 rows).
1. “Give me a list of all of the recipes I've got. For each recipe, I want to see all of the ingredients in the recipe, plus a count of how many different ingredients there are.”
You can find my solution in CH22_Recipe_Ingredient_Counts (88 rows).
2. “I’d like a list of all the different ingredients, with each recipe that contains that ingredient. While you’re at it, give me a count of how many recipes there are that use each ingredient.”
You can find my solution in CH22_Ingredient_Recipe_Counts (88 rows).
3. “I want a numbered list of all of the ingredients. Number the ingredients overall, plus number each ingredient within its ingredient class. Sort the lists alphabetically by ingredient within ingredient class. Don’t forget to include any ingredient classes that don’t have any ingredients in them.”
You can find my solution in CH22_Ingredients_By_Ingredient_Class (83 rows, including four classes with no ingredients in them).
1. “Show totals for each invoice, ranking them from highest purchase value to lowest.”
You can find my solution in CH22_Order_Totals_RankedByInvoiceTotal (933 rows).
2. “Produce a list of each category and the total purchase price of all products in each category. Include a column showing the total purchase price regardless of category as well.”
You can find my solution in CH22_Sales_Totals (6 rows).
How would you write this differently if you knew there were categories that had no sales in them? (The sample database does have sales in each category, but you can always add a new category with no sales to see whether your query works!) My solution is CH22_Sales_Totals_Handle_Nulls. If you didn't add a new category, it would return the same as CH22_Sales_Totals (6 rows). If you did add a new category, it would return seven rows.
3. “Rank each customer by the number of orders which they’ve placed. Be sure to include customers that haven’t placed any orders yet.”
You can find my solution in CH22_Customer_Order_Counts_Ranked (28 rows, including Jeffrey Tirekicker having placed no orders).
1. “Rank the students in terms of the number of classes they’ve completed.”
You can find my solution in CH22_Student_Class_Totals_Rank (18 rows). (Pretty homogenous class, isn’t it?)
2. “Rank the faculty in terms of the number of classes they’re teaching.”
You can find my solution in CH22_Staff_Class_Totals_Rank (22 rows).
3. “Arrange the students into 3 groups depending on their average grade for each of the classes they’ve completed.”
You can find my solution in CH22_Student_AverageGrade_Groups (18 rows).
4. “For each student, give me a list of each course he or she has completed and the mark he or she got in that course. Show me their overall average for all the completed courses, plus, for every group of three courses, show me their average and the highest and lowest marks of the three highest marks that have been received.”