22 Partitioning Data into Windows

22
Partitioning Data into Windows

In my head there are several windows, that I do know, but perhaps it is always the same one, open variously on the parading universe.

SAMUEL BECKETT

Topics in this Chapter

What You Can Do with a “Window” into Your Data

Calculating a Row Number

Ranking Data

Splitting Data into Quintiles

Using Windows with Aggregate Functions

Sample Statements

Summary

Problems for You to Solve

I’ve shown you several different ways that you can group and aggregate data, but I still haven’t shown you everything.

In earlier incarnations of the SQL standards, there really was no ability to work with data where the results depended on adjacent rows. It was always felt that the order of the rows didn’t matter, as long as you could match the rows to your filters. While the ability to sort the data using the ORDER BY clause existed, it was seen as being related to presentation, not to data manipulation. This meant that operations such as generating running sums, where the sum on a particular row is dependent on the values of the rows that precede that row, were very difficult (or even impossible) to write using only SQL.

With the introduction of the SQL:2003 Standard, though, this changed. The SQL:2003 Standard introduced the concept of window functions, which are functions that are applied to a set of rows defined by a window descriptor and return a single value for each row from the underlying query.

I intend to delve into the world of window functions in this chapter. Don’t worry if you use an Apple computer: window functions have nothing to do with the PC operating system. In fact, Microsoft’s SQL Server was late coming to the party with respect to window functions; they weren’t introduced until SQL Server 2017, and they still aren’t in Microsoft Access!

Note: You can find all the sample statements and solutions for the respective sample databases—SalesOrdersExample, EntertainmentAgencyExample, RecipesExample, SchoolSchedulingExample, and BowlingLeagueExample. Because neither Microsoft Access nor MySQL support window functions, you’ll find the sample solutions only in the Microsoft SQL Server and PostgreSQL example databases.

What You Can Do With a “Window” into Your Data

In all the examples I’ve shown you to aggregate data so far, the aggregated data is replaced by a subtotal row. You lose the actual details of the data that went into the aggregation.

For instance, if you wanted to know how many styles of music each customer prefers, you might use a query like this:

Click here to view code image

SQL

SELECT CustomerID, C.CustFirstName || ' ' || C.CustLastName

AS Customer,

COUNT(*) AS Preferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

GROUP BY C.CustFirstName, C.CustLastName;

You will obtain results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Count in the Entertainment Agency Example database.)

CustomerID	Customer	Preferences
10001	Doris Hartwig	2
10002	Deb Waldal	2
10003	Peter Brehm	2
10004	Dean McCrae	2
10005	Elizabeth Hallmark	2
10006	Matt Berg	2
10007	Liz Keyser	3
10008	Darren Gehring	2
10009	Sarah Thompson	3
10010	Zachary Ehrlich	3
10011	Joyce Bonnicksen	3
10012	Kerry Patterson	2
10013	Estella Pundt	2
10014	Mark Rosales	3
10015	Carol Viescas	3

Great. This does let you know exactly how many preferences each customer has indicated. However, it doesn’t tell you what their preferences are!

What if we could use a window function to figure out the total number of preferences for customers but also return what each of their preferences is? It turns out you can get results like the following table:

CustomerID	Customer	StyleName	Preferences
10001	Doris Hartwig	Contemporary	2
10001	Doris Hartwig	Top 40 Hits	2
10002	Deb Waldal	60’s Music	2
10002	Deb Waldal	Classic Rock & Roll	2
10003	Peter Brehm	Motown	2
10003	Peter Brehm	Rhythm and Blues	2
10004	Dean McCrae	Jazz	2
10004	Dean McCrae	Standards	2
10005	Elizabeth Hallmark	Classical	2
10005	Elizabeth Hallmark	Chamber Music	2
10006	Matt Berg	Folk	2
10006	Matt Berg	Variety	2
10007	Liz Keyser	70’s Music	3
10007	Liz Keyser	Classic Rock & Roll	3
10007	Liz Keyser	Rhythm and Blues	3
10008	Darren Gehring	Contemporary	2
10008	Darren Gehring	Standards	2
10009	Sarah Thompson	Country	3
10009	Sarah Thompson	Country Rock	3
10009	Sarah Thompson	Modern Rock	3
10010	Zachary Ehrlich	Jazz	3
10010	Zachary Ehrlich	Rhythm and Blues	3
10010	Zachary Ehrlich	Salsa	3
10011	Joyce Bonnicksen	40’s Ballroom Music	3
10011	Joyce Bonnicksen	Classical	3
10011	Joyce Bonnicksen	Standards	3
10012	Kerry Patterson	Contemporary	2
10012	Kerry Patterson	Show Tunes	2
10013	Estella Pundt	Jazz	2
10013	Estella Pundt	Salsa	2
10014	Mark Rosales	80’s Music	3
10014	Mark Rosales	Modern Rock	3
10014	Mark Rosales	Top 40 Hits	3
10015	Carol Viescas	40’s Ballroom Music	3
10015	Carol Viescas	Show Tunes	3
10015	Carol Viescas	Standards	3

Note: If you’re really on your toes, you would probably tell me, “But, John, I already know how to get that result using a subquery, like this”:

Click here to view code image

SELECT CustomerID, CustFirstName || ' ' || CustLastName,
   StyleName, (SELECT COUNT(*)
               FROM Musical_Preferences AS MP
               WHERE MP.CustomerID = Customers.
                  CustomerID)
             AS Preferences
FROM Customers INNER JOIN Musical_Preferences
  ON Customers.CustomerID = Musical_Preferences.
     CustomerID
INNER JOIN Musical_Styles
  ON Musical_Styles.StyleID = Musical_Preferences.
     StyleID;

And you would be correct! But the focus of this chapter is to show you another, perhaps more efficient, way to get that result.

Now we know not only that Doris Hartwig has specified two preferences, but we also know that the two styles she prefers are Contemporary and Top 40 Hits! I’m sure you can see that these results could be far more useful.

So how do you go about achieving results such as this? You guessed it: you use window functions. So without further ado, let me show you how.

Syntax

Let’s take a close look at the syntax for window functions. Figure 22-1 shows the basic diagram. Note that they are only valid in SELECT and ORDER BY clauses. Hmmm, that looks pretty complicated, doesn't it?

Figure 22-1 Syntax of the primary Window functions

But that’s not all! (Sounds like a TV game show, doesn’t it?) And you can “power up” an Aggregate Function by turning it into a window, as shown in Figure 22-2.

Note: As you can imagine from how lengthy those lists in Figures 22-1 and 22-2 are, window functions are a huge topic; more than can possibly be covered in a single chapter. As a result, I will be not covering ALL of the window functions, just the ones that I think are most useful. However, you can find syntax for all the window functions in Appendix A.

Figure 22-2 Extended syntax when using an Aggregate Function

You can see that the Aggregate Functions (COUNT, SUM, AVG, MIN, and MAX) I taught you about in Chapter 12, “Simple Totals,” are in those figures, along with several new ones: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK(), and NTILE(), but there seems to be a lot more that can be specified as well.

The clause that allows you to use the windows functions is OVER(). That clause lets you define the range on which the aggregate functions are applied.

The key difference between GROUP BY and OVER() is that GROUP BY applies aggregations across the entire query, rolling up the specified non-aggregate fields, thereby reducing the number of rows returned (usually). With OVER(), however, the same number of rows will be returned as the base query. Any aggregations will be returned for each row in the range identified in the OVER() clause.

Within the OVER() clause, there are several predicates which can be used:

PARTITION BY

ORDER BY

ROWS (or RANGE)

The PARTITION BY predicate specifies how the window should be divided. In the example above, I’m dividing the window by the customer. The SQL for the table above that includes a Preferences column looks like this:

Click here to view code image

SQL

SELECT C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName AS Customer,

MS.StyleName,

COUNT(*) OVER (

PARTITION BY C.CustomerID

) AS Preferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

ON MS.StyleID = MP.StyleID;

Now you know how to create that “magical” table I showed you at the beginning of the chapter. I saved this query as CH22_Customers_PreferredStyles_Details_Count in the Entertainment sample database.

If I don’t specify anything for the PARTITION BY predicate, the database system applies the function over the entire result set.

Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you, I have combined the Translation/Clean Up steps for all the following examples to simplify the process.

“For each customer, show me the musical preference styles they’ve selected. Show me a running total of the number of styles selected for all the customers.”

Translation/Clean Up

Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, and the count(*) OVER with no partition but ordered by CustomerID from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID

SQL

SELECT C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName AS

Customer,

MS.StyleName,

COUNT(*) OVER (

ORDER BY C.CustomerID

) AS Preferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

ON MS.StyleID = MP.StyleID;

That query returns results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_NO_PARTITION in the Entertainment Agency Example database.)

CustomerID	Customer	StyleName	Preferences
10001	Doris Hartwig	Contemporary	2
10001	Doris Hartwig	Top 40 Hits	2
10002	Deb Waldal	60’s Music	4
10002	Deb Waldal	Classic Rock & Roll	4
10003	Peter Brehm	Motown	6
10003	Peter Brehm	Rhythm and Blues	6
10004	Dean McCrae	Jazz	8
10004	Dean McCrae	Standards	8
10005	Elizabeth Hallmark	Classical	10
10005	Elizabeth Hallmark	Chamber Music	10
10006	Matt Berg	Folk	12
10006	Matt Berg	Variety	12
10007	Liz Keyser	70’s Music	15
10007	Liz Keyser	Classic Rock & Roll	15
10007	Liz Keyser	Rhythm and Blues	15
10008	Darren Gehring	Contemporary	17
10008	Darren Gehring	Standards	17
10009	Sarah Thompson	Country	20
10009	Sarah Thompson	Country Rock	20
10009	Sarah Thompson	Modern Rock	20
10010	Zachary Ehrlich	Jazz	23
10010	Zachary Ehrlich	Rhythm and Blues	23
10010	Zachary Ehrlich	Salsa	23
10011	Joyce Bonnicksen	40’s Ballroom Music	26
10011	Joyce Bonnicksen	Classical	26
10011	Joyce Bonnicksen	Standards	26
10012	Kerry Patterson	Contemporary	28
10012	Kerry Patterson	Show Tunes	28
10013	Estella Pundt	Jazz	30
10013	Estella Pundt	Salsa	30
10014	Mark Rosales	80’s Music	33
10014	Mark Rosales	Modern Rock	33
10014	Mark Rosales	Top 40 Hits	33
10015	Carol Viescas	40’s Ballroom Music	36
10015	Carol Viescas	Show Tunes	36
10015	Carol Viescas	Standards	36

I still get the same data, but the totals are different. Rather than indicating the number of preferences for each customer, I’m now getting a running total. The totals in both rows for Doris Hartwig are still two, corresponding to the two preference styles she’s indicated, but now the totals for Deb Waldal are four (her two preference styles added to Doris’s two preference styles), the totals for Peter Brehm are six (his two preference styles added to Deb’s two preference styles and Doris’s two) and so on. Basically, I’ve generated a running sum of the preferences count.

Note that you can specify a different OVER() clause for each aggregate function. For instance, you can get both counts of preferences per customer plus overall counts of preferences using a query like the following:

“For each customer, show me the musical preference styles they’ve selected. Show me both the total for each customer plus a running total of the number of styles selected for all the customers.”

Translation/Clean Up

Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by CustomerID and the count(*) OVER with no partition but ordered by CustomerID from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID

SQL

SELECT C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName AS

Customer,

MS.StyleName,

COUNT(*) OVER (

PARTITION BY C.CustomerID

ORDER BY C.CustomerID

) AS CustomerPreferences,

COUNT(*) OVER (

ORDER BY C.CustomerID

) AS TotalPreferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

ON MS.StyleID = MP.StyleID;

That will return results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts in the Entertainment Agency Example database.)

CustomerID	Customer	StyleName	Customer Preferences	Total Preferences
10001	Doris Hartwig	Contemporary	2	2
10001	Doris Hartwig	Top 40 Hits	2	2
10002	Deb Waldal	60’s Music	2	4
10002	Deb Waldal	Classic Rock & Roll	2	4
10003	Peter Brehm	Motown	2	6
10003	Peter Brehm	Rhythm and Blues	2	6
10004	Dean McCrae	Jazz	2	8
10004	Dean McCrae	Standards	2	8
10005	Elizabeth Hallmark	Classical	2	10
10005	Elizabeth Hallmark	Chamber Music	2	10
10006	Matt Berg	Folk	2	12
10006	Matt Berg	Variety	2	12
10007	Liz Keyser	70’s Music	3	15
10007	Liz Keyser	Classic Rock & Roll	3	15
10007	Liz Keyser	Rhythm and Blues	3	15
10008	Darren Gehring	Contemporary	2	17
10008	Darren Gehring	Standards	2	17
10009	Sarah Thompson	Country	3	20
10009	Sarah Thompson	Country Rock	3	20
10009	Sarah Thompson	Modern Rock	3	20
10010	Zachary Ehrlich	Jazz	3	23
10010	Zachary Ehrlich	Rhythm and Blues	3	23
10010	Zachary Ehrlich	Salsa	3	23
10011	Joyce Bonnicksen	40’s Ballroom Music	3	26
10011	Joyce Bonnicksen	Classical	3	26
10011	Joyce Bonnicksen	Standards	3	26
10012	Kerry Patterson	Contemporary	2	28
10012	Kerry Patterson	Show Tunes	2	28
10013	Estella Pundt	Jazz	2	30
10013	Estella Pundt	Salsa	2	30
10014	Mark Rosales	80’s Music	3	33
10014	Mark Rosales	Modern Rock	3	33
10014	Mark Rosales	Top 40 Hits	3	33
10015	Carol Viescas	40’s Ballroom Music	3	36
10015	Carol Viescas	Show Tunes	3	36
10015	Carol Viescas	Standards	3	36

Note that this now gives me both a total count of preferences by customer as well as a running total over all customers. As its name suggests, the ORDER BY predicate controls the order in which the rows are returned. I could change the ORDER BY predicate in the previous query like this:

Translation/Clean Up

Select the CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by CustLastName, CustFirstName and the count(*) OVER with no partition but ordered by CustLastName, CustFirstName from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID

SQL

SELECT C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName

AS Customer,

MS.StyleName,

COUNT(*) OVER (

PARTITION BY C.CustomerID

ORDER BY C.CustLastName, C.CustFirstName

) AS CustomerPreferences,

COUNT(*) OVER (

ORDER BY C.CustLastName, C.CustFirstName

) AS TotalPreferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

ON MS.StyleID = MP.StyleID;

This returns results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts_Sort1 in the Entertainment Agency Example database.)

CustomerID	Customer	StyleName	Customer Preferences	Total Preferences
10006	Matt Berg	Folk	2	2
10006	Matt Berg	Variety	2	2
10011	Joyce Bonnicksen	40’s Ballroom Music	3	5
10011	Joyce Bonnicksen	Classical	3	5
10011	Joyce Bonnicksen	Standards	3	5
10003	Peter Brehm	Motown	2	7
10003	Peter Brehm	Rhythm and Blues	2	7
10010	Zachary Ehrlich	Jazz	3	10
10010	Zachary Ehrlich	Rhythm and Blues	3	10
10010	Zachary Ehrlich	Salsa	3	10
10008	Darren Gehring	Contemporary	2	12
10008	Darren Gehring	Standards	2	12
10005	Elizabeth Hallmark	Classical	2	14
10005	Elizabeth Hallmark	Chamber Music	2	14
10001	Doris Hartwig	Contemporary	2	16
10001	Doris Hartwig	Top 40 Hits	2	16
10007	Liz Keyser	70’s Music	3	19
10007	Liz Keyser	Classic Rock & Roll	3	19
10007	Liz Keyser	Rhythm and Blues	3	19
10004	Dean McCrae	Jazz	2	21
10004	Dean McCrae	Standards	2	21
10012	Kerry Patterson	Contemporary	2	23
10012	Kerry Patterson	Show Tunes	2	23
10013	Estella Pundt	Jazz	2	25
10013	Estella Pundt	Salsa	2	25
10014	Mark Rosales	80’s Music	3	28
10014	Mark Rosales	Modern Rock	3	28
10014	Mark Rosales	Top 40 Hits	3	28
10009	Sarah Thompson	Country	3	31
10009	Sarah Thompson	Country Rock	3	31
10009	Sarah Thompson	Modern Rock	3	31
10015	Carol Viescas	40’s Ballroom Music	3	34
10015	Carol Viescas	Show Tunes	3	34
10015	Carol Viescas	Standards	3	34
10002	Deb Waldal	60’s Music	2	36
10002	Deb Waldal	Classic Rock & Roll	2	36

You can see that the final output is now sorted by customer last name and first name, not by customer ID. Note, however, that ORDER BY clause should be consistent with the PARTITION clause, or the results you get might be confusing.

Translation/Clean Up

Select the customer CustomerID, CustFirstName || ' ' || CustLastName, StyleName, the count(*) OVER partition by CustomerID ordered by StyleName and the count(*) OVER with no partition but ordered by StyleName from the Customers table, inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.Customer ID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Preferences.StyleID

SQL

SELECT C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName AS

Customer,

MS.StyleName,

COUNT(*) OVER (

PARTITION BY C.CustomerID

ORDER BY MS.StyleName

) AS CustomerPreferences,

COUNT(*) OVER (

ORDER BY MS.StyleName

) AS TotalPreferences

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

ON MS.StyleID = MP.StyleID;

That query would lead to results like the following table. (I saved this request as CH22_Customers_PreferredStyles_Details_Multiple_Counts_Sort2 in the Entertainment Agency Example database.) This time, the output is sorted by style name, as I requested.

CustomerID	Customer	StyleName	Customer Preferences	Total Preferences
10011	Joyce Bonnicksen	40’s Ballroom Music	1	2
10015	Carol Viescas	40’s Ballroom Music	1	2
10002	Deb Waldal	60’s Music	1	3
10007	Liz Keyser	70’s Music	1	4
10014	Mark Rosales	80’s Music	1	5
10005	Elizabeth Hallmark	Chamber Music	1	6
10007	Liz Keyser	Classic Rock & Roll	2	8
10002	Deb Waldal	Classic Rock & Roll	2	8
10005	Elizabeth Hallmark	Classical	2	10
10011	Joyce Bonnicksen	Classical	2	10
10012	Kerry Patterson	Contemporary	1	13
10008	Darren Gehring	Contemporary	1	13
10001	Doris Hartwig	Contemporary	1	13
10009	Sarah Thompson	Country	1	14
10009	Sarah Thompson	Country Rock	2	15
10006	Matt Berg	Folk	1	16
10004	Dean McCrae	Jazz	1	19
10010	Zachary Ehrlich	Jazz	1	19
10013	Estella Pundt	Jazz	1	19
10014	Mark Rosales	Modern Rock	2	21
10009	Sarah Thompson	Modern Rock	3	21
10003	Peter Brehm	Motown	1	22
10003	Peter Brehm	Rhythm and Blues	2	25
10007	Liz Keyser	Rhythm and Blues	3	25
10010	Zachary Ehrlich	Rhythm and Blues	2	25
10010	Zachary Ehrlich	Salsa	3	27
10013	Estella Pundt	Salsa	2	27
10015	Carol Viescas	Show Tunes	2	29
10012	Kerry Patterson	Show Tunes	2	29
10011	Joyce Bonnicksen	Standards	3	33
10015	Carol Viescas	Standards	3	33
10008	Darren Gehring	Standards	2	33
10004	Dean McCrae	Standards	2	33
10001	Doris Hartwig	Top 40 Hits	2	35
10014	Mark Rosales	Top 40 Hits	3	35
10006	Matt Berg	Variety	2	36

The TotalPreferences counts still look okay (two people like 40’s Ballroom Music, so the value for TotalPreferences is 2 for both those rows, one person likes 60’s Music, so the value for TotalPreferences for that style is 3. TotalPreferences is a running total—the total for the previous style plus the total for the current style and so on). The CustomerPreferences counts are a little harder to figure out, but they are simply a running sum of the number of preferences specified by each customer. The first row for any given customer shows the value 1, the second row the value 2, and so on.

Note: It is possible to include an OVER() clause with no optional clauses. The result is the aggregate you specified calculated for all the rows returned by the FROM and WHERE clauses and displayed on each row. For example, this query:

Click here to view code image

SELECT C.CustomerID,
  C.CustFirstName || ' ' || C.CustLastName AS Customer,
  MS.StyleName,
  COUNT(*) OVER () AS NumRows
FROM Customers AS C
  INNER JOIN Musical_Preferences AS MP
    ON MP.CustomerID = C.CustomerID
  INNER JOIN Musical_Styles AS MS
    ON MS.StyleID = MP.StyleID;

… returns the count of all rows in the NumRows column. You can think of it as a way to display all rows and get a grand total without having to include a GROUP BY clause.

The ROWS (or RANGE) predicate allows you to further limit the rows of data that are included within the partition. It does this by letting you specify a range of rows with respect to the current row.

The ROWS clause lets you specify the range physically by specifying a fixed number of rows preceding or following the current row. For instance, you can specify ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING, which would mean that the rows to be considered are simply this row and the one right after it.

The RANGE clause lets you specify the range logically by specifying a range of values with respect to the value in the current row.

Note: Both the ROWS and RANGE clauses require that the ORDER BY clause be specified. If ORDER BY specifies multiple columns, CURRENT ROW FOR RANGE considers all columns in the ORDER BY list when determining what constitutes the range.

Let’s look at an example to see the difference between using ROWS and RANGES when calculating the total number of preferences. (Don’t worry, I’ll be talking more about using aggregate functions like SUM with the OVER() clause in a little bit.) Let’s just go straight to the SQL, asking for a SUM of the COUNT OVER ROWS and a SUM of the COUNT OVER RANGE to see the difference.

“For each city where we have customers, show me the customer and the number of musical preference styles they’ve selected. Also give me a running total by city, both for each customer in the city as well as for the city overall.”

Click here to view code image

SQL

SELECT C.CustCity,

C.CustFirstName || ' ' || C.CustLastName AS Customer,

COUNT(*) AS Preferences,

SUM(COUNT(*)) OVER (

ORDER BY C.CustCity

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS TotalUsingRows,

SUM(COUNT(*)) OVER (

ORDER BY C.CustCity

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS TotalUsingRange

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

ON MP.CustomerID = C.CustomerID

GROUP BY C.CustCity, C.CustFirstName, C.CustLastName;

That query would lead to results like the following table. (I saved this request as CH22_Customer_ByCity_PreferredStyles_Sums in the Entertainment Agency Example database.)

CustCity	Customer	Preferences	Total UsingRows	Total UsingRange
Auburn	Elizabeth Hallmark	2	2	2
Bellevue	Estella Pundt	2	4	16
Bellevue	Joyce Bonnicksen	3	7	16
Bellevue	Liz Keyser	3	10	16
Bellevue	Mark Rosales	3	13	16
Bellevue	Sarah Thompson	3	16	16
Kirkland	Darren Gehring	2	18	23
Kirkland	Peter Brehm	2	20	23
Kirkland	Zachary Ehrlich	3	23	23
Redmond	Dean McCrae	2	25	27
Redmond	Kerry Patterson	2	27	27
Seattle	Carol Viescas	3	30	32
Seattle	Doris Hartwig	2	32	32
Tacoma	Deb Waldal	2	34	36
Tacoma	Matt Berg	2	36	36

Look at the two calculated SUM fields. The first (TotalUsingRows) included a ROWS predicate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, while the second (TotalUsingRange) included a RANGE predicate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Each value in the TotalUsingRows column is the value in the Preferences column for that row added to the value in the TotalUsingRows column for the preceding row. In other words, using the SUM aggregate function with the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW predicate results in a running total.

The values in the TotalUsingRange column are quite different (although the values on the first and last rows are the same). The RANGE predicate creates a range of all rows with the same value for the column specified in the ORDER BY predicate (CustCity) as the current row. In other words, the first row is the sum of Preferences for all rows corresponding to the city of the first row (Auburn). There are five rows for Bellevue, with preferences 2, 3, 3, 3, 3, respectively. Those five values result in 14 when added together. That value is added to the value for the previous range so that the TotalUsingRange column shows 16 for the five Bellevue rows. Similarly, there are three rows for Kirkland, which total 7. That total for Kirkland is added to the value for the previous range (16), resulting in the TotalUsingRange column showing 23 for the range consisting of the three Kirkland rows. You can still call it a running total, but the running total for each row in the range is the same. Looking at it another way, the value in the TotalUsingRange column for any city matches the last row for that city in the TotalUsingRows column (Auburn, Elizabeth Hallmark; Bellevue, Sarah Thompson; Kirkland, Zachary Ehrlich; and so on).

One thing that should be pointed out is that it’s only possible to use the ROWS (or RANGE) predicate with aggregate functions. Figure 22-3 summarizes the rules.

Figure 22-3 The clauses that are required and optional for each window function

Calculating a Row Number

One of the new functions introduced in Figure 22-1 is ROW_NUMBER(). As you might expect from the name, this allows you to assign unique numbers to each row.

“Assign a number for each customer. Show me their CustomerID, their name and their state. Return the customers in alphabetic order.”

Translation/Clean Up

Select the ROW_NUMBER (*) OVER ordered by CustLastName, CustFirstName, the customer and state CustomerID, CustFirstName || ' ' || CustLastName, CustState from the Customers table

SQL

SELECT ROW_NUMBER() OVER (

ORDER BY CustLastName, CustFirstName

) AS RowNumber,

C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName

AS CustomerName,

C.CustState

FROM Customers AS C;

That query would lead to results like the following table. (I saved this request as CH22_Customers_Numbering in the Sales Orders Example database.)

RowNumber	CustomerID	CustomerName	CustState
1	1014	Sam Abolrous	CA
2	1020	Joyce Bonnicksen	WA
3	1004	Robert Brown	TX
4	1009	Andrew Cencini	WA
5	1026	Kirk DeGrasse	TX
6	1019	Zachary Ehrlich	CA
7	1015	Darren Gehring	CA
8	1011	Alaina Hallmark	WA
9	1003	Gary Hallmark	WA
10	1010	Angel Kennedy	TX
11	1012	Liz Keyser	WA
12	1005	Dean McCrae	WA
13	1027	Luke Patterson	OR
14	1025	Maria Patterson	TX
15	1008	Neil Patterson	CA
16	1013	Rachel Patterson	CA
17	1021	Estella Pundt	TX
18	1024	Mark Rosales	TX
19	1023	Julia Schnebly	WA
20	1017	Manuela Seidel	OR
21	1007	Mariya Sergienko	OR
22	1018	David Smith	CA
23	1002	William Thompson	WA
24	1028	Jeffrey Tirekicker	WA
25	1022	Caleb Viescas	CA
26	1006	John Viescas	WA
27	1001	Suzanne Viescas	WA
28	1016	Jim Wilson	OR

Of course, because ROW_NUMBER is a window function, we can use the OVER clause to partition the table differently.

“Assign a number for each customer within their state. Show me their CustomerID, their name, and their state. Return the customers in alphabetic order.”

Translation/Clean Up

Select the ROW_NUMBER() OVER partition by CustState, ordered by CustLastName, CustFirstName, the CustomerID, CustFirstName || ' ' || CustLastName, and CustState from the Customers table

SQL

SELECT ROW_NUMBER() OVER (

PARTITION BY CustState

ORDER BY CustLastName, CustFirstName

) AS RowNumber,

C.CustomerID,

C.CustFirstName || ' ' || C.CustLastName AS

CustomerName,

C.CustState

FROM Customers AS C;

That query would lead to results like the following table. (I saved this request as CH22_Customers_Numbering_By_State in the Sales Orders Example database.)

RowNumber	CustomerID	CustomerName	CustState
1	1014	Sam Abolrous	CA
2	1019	Zachary Ehrlich	CA
3	1015	Darren Gehring	CA
4	1008	Neil Patterson	CA
5	1013	Rachel Patterson	CA
6	1018	David Smith	CA
7	1022	Caleb Viescas	CA
1	1027	Luke Patterson	OR
2	1017	Manuela Seidel	OR
3	1007	Mariya Sergienko	OR
4	1016	Jim Wilson	OR
1	1004	Robert Brown	TX
2	1026	Kirk DeGrasse	TX
3	1010	Angel Kennedy	TX
4	1025	Maria Patterson	TX
5	1021	Estella Pundt	TX
6	1024	Mark Rosales	TX
1	1020	Joyce Bonnicksen	WA
2	1009	Andrew Cencini	WA
3	1011	Alaina Hallmark	WA
4	1003	Gary Hallmark	WA
5	1012	Liz Keyser	WA
6	1005	Dean McCrae	WA
7	1023	Julia Schnebly	WA
8	1002	William Thompson	WA
9	1028	Jeffrey Tirekicker	WA
10	1006	John Viescas	WA
11	1001	Suzanne Viescas	WA

Because I asked for a partition by state, the RowNumber column restarts at 1 for each different state, and the rows are sorted by customer last name and first name within each state.

Ranking Data

Another one of the new functions introduced in Figure 22-1 is RANK(). As you might expect from the name, this allows you to rank rows of data relative to one another.

“List all students who have completed English courses and rank them by the grade they received.”

Translation/Clean Up

Select the SubjectID, StudFirstName, StudLastName, SubjectName, Grade, and RANK() OVER ordered by Grade DESC from the Students table inner joined with the Student_Schedules table on Students.StudentID = Student_Schedules.StudentID inner joined with the Classes table on Classes.ClassID = Student_Schedules.ClassID inner joined with the Subjects table on Subjects.SubjectID = Classes.SubjectID where ClassStatus = 2 and CategoryID = ‘ENG’

SQL

SELECT Su.SubjectID, St.StudFirstName,

St.StudLastName, Su.SubjectName,

SS.Grade,

RANK() OVER (

ORDER BY SS.Grade DESC

) AS Rank

FROM Students AS St

INNER JOIN Student_Schedules AS SS

ON SS.StudentID = St.StudentID

INNER JOIN Classes AS C

ON C.ClassID = SS.ClassID

INNER JOIN Subjects AS Su

ON Su.SubjectID = C.SubjectID

WHERE SS.ClassStatus = 2

AND Su.CategoryID = 'ENG';

That query would lead to results like the following table. (I saved this request as CH22_English_Students_Rank in the School Scheduling Example database.)

SubjectID	Stud FirstName	Stud LastName	SubjectName	Grade	Rank
37	Scott	Bishop	Composition - Fundamentals	98.07	1
37	Sara	Sheskey	Composition - Fundamentals	97.59	2
37	John	Kennedy	Composition - Fundamentals	93.01	3
37	Brannon	Jones	Composition - Fundamentals	91.66	4
37	Janice	Galvin	Composition - Fundamentals	91.44	5
38	Kendra	Bonnicksen	Composition - Intermediate	88.91	6
37	George	Chavez	Composition - Fundamentals	88.54	7
37	Marianne	Wier	Composition - Fundamentals	87.4	8
37	David	Hamilton	Composition - Fundamentals	86.33	9
37	Steve	Pundt	Composition - Fundamentals	82.58	10
38	Doris	Hartwig	Composition - Intermediate	81.66	11
37	Michael	Viescas	Composition - Fundamentals	77.59	12
38	Elizabeth	Hallmark	Composition - Intermediate	72.88	13
37	Karen	Smith	Composition - Fundamentals	72.05	14
37	Betsy	Stadick	Composition - Fundamentals	71.09	15
37	Kerry	Patterson	Composition - Fundamentals	70	16
38	Sarah	Thompson	Composition - Intermediate	67.6	17
38	Richard	Lum	Composition - Intermediate	67.19	18

In this case, there are no duplicate grade values, so the ranking is simply the grades in descending order.

If two or more rows tie for a rank in the same partition, each of the tied rows receives the same rank. If we take the above table and give both Brannon Jones and Janice Galvin a grade of 91.66 for Composition – Fundamentals, both Brannon and Janice would be listed at rank value 4, and Kendra Bonnicksen would still have rank 6, with the value 5 skipped. Like this:

SubjectID	StudFirstName	StudLastName	SubjectName	Grade	Rank
37	Scott	Bishop	Composition - Fundamentals	98.07	1
37	Sara	Sheskey	Composition - Fundamentals	97.59	2
37	John	Kennedy	Composition - Fundamentals	93.01	3
37	Brannon	Jones	Composition - Fundamentals	91.66	4
37	Janice	Galvin	Composition - Fundamentals	91.66	4
38	Kendra	Bonnicksen	Composition - Intermediate	88.91	6
<< more rows here >>

Figure 22-1 also included the DENSE_RANK() and PERCENT_RANK() functions. The difference between those two functions and RANK() is straightforward. Because there is only one distinct value that precedes the row, the DENSE_RANK() function would not skip a value. It would return the value 5 for Kendra Bonnicksen. This is one more than the number of distinct rows that come before the current row. The numbers returned by the DENSE_RANK function do not have gaps and always have consecutive ranks.

The PERCENT_RANK() function, however, returns a number that represents the percentage of values less than the current value in the group, excluding the highest value. In the absence of ties, the PERCENT_RANK() function will always return 0 for the first value in the group, and 1 for the last value in a group. For the remaining rows in the partition, the PERCENT_RANK function ranks a value by calculating its rank minus 1 (rk − 1), and dividing that value by the number of rows in the partition minus 1 (nr − 1). Here’s the formula:

Let’s take a look at an example.

“List all bowlers in the league, ranking them by their average handicapped score. Show all three of RANK(), DENSE_RANK(), and PERCENT_RANK() to show the difference. (Remember that bowling scores are reported as rounded integer values.)”

Translation/Clean Up

Select the BowlerID, BowlerName, ROUND(AVG(HandiCapScore), 0), RANK() OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS Rank, DENSE_RANK () OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS DenseRank and PERCENT_RANK () OVER ordered by ROUND(AVG(HandiCapScore), 0) DESC AS PercentRank from table Bowlers inner joined with the Bowler_Scores table ON Bowler_Scores.BowlerID = Bowlers.BowlerID, grouped by BowlerID, BowlerFirstName and BowlerLastName

SQL

SELECT B.BowlerID,

B.BowlerFirstName || ' ' || B.BowlerLastName

AS BowlerName,

ROUND(AVG(BS.HandiCapScore), 0) AS AvgHandicap,

RANK () OVER (

ORDER BY ROUND(AVG(BS.HandiCapScore), 0) DESC)

AS Rank,

DENSE_RANK () OVER (

ORDER BY ROUND(AVG(BS.HandiCapScore), 0) DESC)

AS DenseRank,

PERCENT_RANK () OVER (

ORDER BY ROUND(AVG(BS.HandiCapScore), 0) DESC)

AS PercentRank

FROM Bowlers AS B

INNER JOIN Bowler_Scores AS BS

ON BS.BowlerID = B.BowlerID

GROUP BY B.BowlerID, B.BowlerFirstName,

B.BowlerLastName;

Note: I rounded the averages because that’s what a real bowling league does. You need a rounded integer value to be able to calculate the handicap. I had to include the rounded expression in each of the ORDER BY clauses to get the correct answer. And if you look at the code I had to use in Microsoft SQL Server, I also CAST the HandiCapScore column AS FLOAT because, if I don’t do that, the database system returns a truncated, not rounded, average value.

That query would lead to results like the following table. (I saved this request as CH22_Bowlers_Average_Score_Rankings in the Bowling League Example database.)

BowlerID	BowlerName	Avg Handicap	Rank	Dense Rank	Percent Rank
15	Kathryn Patterson	198	1	1	0
6	Neil Patterson	198	1	1	0
27	William Thompson	198	1	1	0
19	John Viescas	198	1	1	0
25	Megan Patterson	197	5	2	0.129032258064516
3	John Kennedy	197	5	2	0.129032258064516
29	Bailey Hallmark	197	5	2	0.129032258064516
14	Gary Hallmark	197	5	2	0.129032258064516
2	David Fournier	196	9	3	0.258064516129032
31	Ben Clothier	196	9	3	0.258064516129032
11	Angel Kennedy	196	9	3	0.258064516129032
26	Mary Thompson	196	9	3	0.258064516129032
7	David Viescas	196	9	3	0.258064516129032
1	Barbara Fournier	196	9	3	0.258064516129032
24	Sarah Thompson	196	9	3	0.258064516129032
18	Michael Hernandez	195	16	4	0.483870967741936
10	Doug Steele	195	16	4	0.483870967741936
12	Carol Viescas	195	16	4	0.483870967741936
9	Alastair Black	195	16	4	0.483870967741936
5	Ann Patterson	195	16	4	0.483870967741936
22	Alaina Hallmark	195	16	4	0.483870967741936
16	Richard Sheskey	194	22	5	0.67741935483871
20	Suzanne Viescas	194	22	5	0.67741935483871
28	Michael Viescas	194	22	5	0.67741935483871
23	Caleb Viescas	194	22	5	0.67741935483871
21	Zachary Ehrlich	194	22	5	0.67741935483871
4	Sara Sheskey	194	22	5	0.67741935483871
13	Elizabeth Hallmark	194	22	5	0.67741935483871
30	Rachel Patterson	194	22	5	0.67741935483871
32	Joe Rosales	193	30	6	0.935483870967742
8	Stephanie Viescas	193	30	6	0.935483870967742
17	Kendra Hernandez	193	30	6	0.935483870967742

You can see the difference between the values in the Rank and DenseRank columns. The first four bowlers (Kathryn Patterson, Neil Patterson, William Thompson, and John Viescas) all have the same highest average score (198), so they’re each ranked 1 by both functions. The next four people (Megan Patterson, John Kennedy, Bailey Hallmark, and Gary Hallmark) also have the same average score (197), so the values in the Rank and DenseRank columns are the same for all of them as well. Since four bowlers have been ranked before them, the RANK() function returns a value of 5, one more than the number already ranked. However, there’s only one distinct value ahead of them (198), so the DENSE_RANK() function returns a value of 2, one more than the number of distinct values.

Ranking continues in this manner. The next seven bowlers (David Fournier, Ben Clothier, Angel Kennedy, Mary Thompson, David Viescas, Barbara Fournier, and Sarah Thompson) have the same average score (196). Since eight bowlers have already been ranked ahead of them, the RANK() function returns 9 for each of them. However, there are only two distinct values ahead of them (198 and 197), so the DENSE_RANK() function returns 3. Finally, Joe Rosales, Stephanie Viescas, and Kendra Hernandez all get a rank of 30 (because twenty-nine bowlers have already been ranked ahead of them) and a dense rank of 6 (because there are only five distinct values ahead of theirs)

Look, too, at the PercentRank column. Once again, since Kathryn Patterson, Neil Patterson, William Thompson, and John Viescas have the highest average score, the PERCENT_RANK() function returns 0 for their rows. The next four bowlers (Megan Patterson, John Kennedy, Bailey Hallmark, and Gary Hallmark) are all ranked as 5, and there are 32 rows in the partition. As explained above, that means the PERCENT_RANK function returns

Similarly, the next seven bowlers (David Fournier, Ben Clothier, Angel Kennedy, Mary Thompson, David Viescas, Barbara Fournier, and Sarah Thompson) are all ranked and 9, so the PERCENT_RANK function returns

One thing that doesn’t show up in this example is the fact that, should the last row represent a single individual, the PERCENT_RANK function would return 1 for that row. Because Joe Rosales, Stephanie Viescas, and Kendra Hernandez are tied for last place, the PERCENT_RANK function returns the same value for all three of them:

Splitting Data into Quintiles

You may remember that I showed you how to calculate quintiles in Chapter 20, “Using Unlinked Data and ‘Driver’ Tables.” You may also recall that you needed to CROSS JOIN two separate queries (one that assigned a ranking number to each student who completed an English course and one that counted all students who completed an English course). That’s a lot of data retrieval that can be prevented through the use of window functions! In fact, one of the aggregate functions introduced along with window functions is NTILE, which lets you divide the data into as many different ranges as you want.

“List all students who have completed English courses, rank them by the grades they received, and indicate the Quintile into which they fall.”

Translation/Clean Up

Select the SubjectID, StudFirstName, StudLastName, SubjectName, Grade, RANK() OVER ordered by Grade DESC and NTILE (5) OVER ordered by Grade DESC from the Students table inner joined with the Student_Schedules table on Students.StudentID = Student_Schedules.StudentID inner joined with the Classes table on Classes.ClassID = Student_Schedules.ClassID inner joined with the Subjects table on Subjects.SubjectID = Classes.SubjectID where ClassStatus = 2 and CategoryID = ’ENG’

SQL

SELECT Su.SubjectID, St.StudFirstName,

St.StudLastName,

SS.ClassStatus,

SS.Grade, Su.CategoryID,

Su.SubjectName,

RANK() OVER (ORDER BY Grade DESC) AS Rank,

NTILE(5) OVER(ORDER BY Grade DESC) AS Quintile

FROM Subjects AS Su

INNER JOIN Classes AS C

ON C.SubjectID = S.SubjectID

INNER JOIN Student_Schedules AS SS

ON SS.ClassID = C.ClassID)

INNER JOIN Students AS St

ON St.StudentID = SS.StudentID

WHERE SS.ClassStatus = 2

AND Su.CategoryID = 'ENG';

That query would lead to results like the following table. (I saved this request as CH22_English_Students_Quintiles in the School Scheduling Example database.)

Subject ID	Stud FirstName	Stud LastName	Subject Name	Grade	Rank	Quintile
37	Scott	Bishop	Composition - Fundamentals	98.07	1	1
37	Sara	Sheskey	Composition - Fundamentals	97.59	2	1
37	John	Kennedy	Composition - Fundamentals	93.01	3	1
37	Brannon	Jones	Composition - Fundamentals	91.66	4	1
37	Janice	Galvin	Composition - Fundamentals	91.44	5	2
38	Kendra	Bonnicksen	Composition - Intermediate	88.91	6	2
37	George	Chavez	Composition - Fundamentals	88.54	7	2
37	Marianne	Wier	Composition - Fundamentals	87.4	8	2
37	David	Hamilton	Composition - Fundamentals	86.33	9	3
37	Steve	Pundt	Composition - Fundamentals	82.58	10	3
38	Doris	Hartwig	Composition - Intermediate	81.66	11	3
37	Michael	Viescas	Composition - Fundamentals	77.59	12	3
38	Elizabeth	Hallmark	Composition - Intermediate	72.88	13	4
37	Karen	Smith	Composition - Fundamentals	72.05	14	4
37	Betsy	Stadick	Composition - Fundamentals	71.09	15	4
37	Kerry	Patterson	Composition - Fundamentals	70	16	5
38	Sarah	Thompson	Composition - Intermediate	67.6	17	5
38	Richard	Lum	Composition - Intermediate	67.19	18	5

Note: The more astute of you might notice that the quintiles returned are slightly different in this example from those returned by the example in Chapter 20, CH20_English_Student_Quintiles.

The NTILE() function is used to distribute rows into a specified number of groups, When the number of rows is not divisible equally by the number of groups, larger groups will come before smaller groups. For instance, if you have 10 rows of data, NTILE(2) can divide the rows into two equal size groups, but NTILE(3) will put 4 in the first group, and 3 in each of the second and third groups. That’s what happened in this case: there were 18 rows being divided into five groups, so the groups were 4, 4, 4, 3 and 3. That’s different from the algorithm used to distribute the rows among the five groups in the CH20_English_Student_Quintiles.

One approach is not necessarily better or worse than the other. If there had been twenty students, the quintiles would have been the same.

Using Windows with Aggregate Functions

As I already pointed out, you can use any of the Aggregate Functions I showed you in Chapter 12 with the OVER() clause. You saw the full diagram earlier as Figure 22–2. I’ve already shown the use of the COUNT(*) function and SUM(COUNT(*)), but let’s take a quick look at it again in conjunction with the ROWS and RANGE predicates.

Remember that both the ROWS and RANGE predicates limit the data on which the aggregate function will work. Using the same ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW predicate in conjunction with the COUNT(*) function really doesn’t provide much. It’ll let you count all the rows “in front” of the current row: in other words, it’ll simply provide you with the same values as the ROW_NUMBER() function you’ve already seen. Using RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW potentially is more useful: it’ll give you a running total by range.

“Give a count of how many detail lines are associated with each order placed. I want to see the order number, the product purchased and a count of how many items are on the invoice. I’d also like to see how many detail lines there are in total.”

Translation/Clean Up

Select the OrderNumber, ProductName, COUNT(*) OVER partition by OrderNumber, COUNT(*) OVER order by OrderNumber rows between unbounded preceding and current row, COUNT(*) OVER order by OrderNumber ranges between unbounded preceding and current row from the Orders table inner joined with the Order_Details table on Order_Details.OrderNumber = Orders.OrderNumber inner joined with the Products table on Products.ProductNumber = OrderDetails.ProductNumber

SQL

SELECT O.OrderNumber AS OrderNo, P.ProductName,

COUNT(*) OVER (

PARTITION BY O.OrderNumber

) AS Total,

COUNT(*) OVER (

ORDER BY O.OrderNumber

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT

ROW

) AS TotalUsingRows,

COUNT(*) OVER (

ORDER BY O.OrderNumber

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT

ROW

) AS TotalUsingRange

FROM Orders AS O

INNER JOIN Order_Details AS OD

ON OD.OrderNumber = O.OrderNumber

INNER JOIN Products AS P

ON P.ProductNumber = OD.ProductNumber;

That query would lead to results like the following table. (I saved this request as CH22_Order_Counts_ByInvoice_ROWS_RANGE in the Sales Orders Example database.)

OrderNo	ProductName	Total	Total UsingRows	Total UsingRange
1	Trek 9000 Mountain Bike	7	1	7
1	Viscount Mountain Bike	7	2	7
1	GT RTS-2 Mountain Bike	7	3	7
1	ProFormance ATB All-Terrain Pedal	7	4	7
1	Dog Ear Aero-Flow Floor Pump	7	5	7
1	Glide-O-Matic Cycling Helmet	7	6	7
1	Ultimate Export 2G Car Rack	7	7	7
2	X-Pro All Weather Tires	2	8	9
2	Ultimate Export 2G Car Rack	2	9	9
3	Trek 9000 Mountain Bike	8	10	17
3	Viscount Mountain Bike	8	11	17
<< more rows here >>

You can see that there are seven separate products associated with order 1, so the Total column contains 7 for each of the rows. Similarly, the two separate rows associated with order 2 both have 2 for Total associated with them. (Only part of order 3 is shown, but you can see that the two rows that are shown both have the same value associated with them as well.)

As mentioned, the TotalUsingRows column is simply consecutive numbers, the same as the ROW_NUMBER() function would provide.

Finally, the TotalUsingRange column shows the total number of products by order. In other words, it shows 7 for all seven rows in order 1, then 9 for both rows in order 2 (corresponding to the seven rows from order 1 plus the two rows from order 2). You can see that the Total column indicates that there are eight rows in order 3, so the TotalUsingRange column contains 17 (7 + 2 + 8) for the rows associated with order 3.

Of course, other aggregate functions can be useful. For instance, perhaps you want to see the details of all the orders your customers have placed, and you’d like to see the Order total associated with each.

“List all orders placed, including the customer name, the order number, the product name, the quantity ordered, the quoted price and the total price per order.”

Translation/Clean Up

Select the CustFirstName || ' ' ||, CustLastName, OrderNumber, ProductName, QuantityOrdered, QuotedPrice, and SUM(QuotedPrice) OVER partition by OrderNumber from the Orders table inner joined with the Order_Details table on Order_Details.OrderNumber = Orders.OrderNumber inner joined with the Customers table on Customers.CustomerID = Orders.CustomerID inner joined with the Products table on Products.ProductNumber = OrderDetails.ProductNumber

SQL

SELECT C.CustFirstName || ' ' || C.CustLastName

AS Customer,

O.OrderNumber AS Order, P.ProductName,

OD.QuantityOrdered AS Quantity,

OD.QuotedPrice AS Price,

SUM(OD.QuotedPrice) OVER (

PARTITION BY O.OrderNumber

) AS OrderTotal

FROM Orders AS O

INNER JOIN Order_Details AS OD

ON OD.OrderNumber = O.OrderNumber

INNER JOIN Customers AS C

ON C.CustomerID = O.CustomerID

INNER JOIN Products AS P

ON P.ProductNumber = OD.ProductNumber;

That query would lead to results like the following table. (I saved this request as CH22_Order_Totals_ByInvoice in the Sales Orders Example database.)

Customer	Order	ProductName	Quantity	Price	OrderTotal
David Smith	1	Trek 9000 Mountain Bike	2	1200.00	3863.85
David Smith	1	Viscount Mountain Bike	3	635.00	3863.85
David Smith	1	GT RTS-2 Mountain Bike	4	1650.00	3863.85
David Smith	1	ProFormance ATB All-Terrain Pedal	1	28.00	3863.85
David Smith	1	Dog Ear Aero-Flow Floor Pump	3	55.00	3863.85
David Smith	1	Glide-O-Matic Cycling Helmet	5	121.25	3863.85
David Smith	1	Ultimate Export 2G Car Rack	6	174.60	3863.85
Suzanne Viescas	2	X-Pro All Weather Tires	4	24.00	204.00
Suzanne Viescas	2	Ultimate Export 2G Car Rack	4	180.00	204.00
William Thompson	3	Trek 9000 Mountain Bike	5	1164.00	3824.29
William Thompson	3	Viscount Mountain Bike	5	615.95	3824.29
<< more rows here >>

You can see that the seven separate rows associated with order 1 all have the same OrderTotal associated with them, while the two separate rows associated with order 2 all have the same OrderTotal associated with them. (Only part of order 3 is shown, but you can see that the two rows that are shown both have the same value associated with them as well.)

Let’s look at one more example of using the ROWS predicate. (Although I can’t see a legitimate use for this example, hopefully, it’ll give you a little more insight into how the ROWS predicate works!)

“List all engagements booked, showing the customer, the start date, the contract price and the sum of the current row plus the row before and after. Also, show the sum of the current row plus the row before and after partitioned by customer.”

Translation/Clean Up

Select the CustFirstName || ' ' ||, CustLastName, StartDate, ContractPrice, SUM(ContractPrice) OVER order by CustLastName, CustFirstName rows between 1 preceding and 1 following and SUM(ContractPrice) OVER partition by CustLastName, CustFirstName order by CustLastName, CustFirstName rows between 1 preceding and 1 following from the Engagements table inner joined with the Customers table on Customers.CustomerID = Engagements.CustomerID = Orders.CustomerID

SQL

SELECT C.CustFirstName || ' ' || C.CustLastName AS Customer,

E.StartDate, E.ContractPrice,

SUM(E.ContractPrice) OVER (

ORDER BY C.CustLastName, C.CustFirstName

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS SumOf3,

SUM(E.ContractPrice) OVER (

PARTITION BY C.CustLastName, C.CustFirstName

ORDER BY C.CustLastName, C.CustFirstName

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS PartitionedSumOf3

FROM Engagements AS E

INNER JOIN Customers AS C

ON C.CustomerID = E.CustomerID;

That query would lead to results like the following table. (I saved this request as CH22_Odd_Contract_Sums in the Entertainment Agency Example database.)

Customer	StartDate	ContractPrice	SumOf3	PartitionedSumOf3
Matt Berg	2017-09-02	200.00	1330.00	1330.00
Matt Berg	2017-09-12	1130.00	3180.00	3180.00
Matt Berg	2017-09-19	1850.00	5655.00	5655.00
Matt Berg	2017-10-14	2675.00	6450.00	6450.00
Matt Berg	2017-10-23	1925.00	6150.00	6150.00
Matt Berg	2017-12-31	1550.00	3795.00	3795.00
Matt Berg	2018-01-09	320.00	3540.00	3540.00
Matt Berg	2018-02-12	1670.00	3840.00	3840.00
Matt Berg	2018-02-24	1850.00	4320.00	3520.00
Peter Brehm	2018-02-17	800.00	2940.00	1090.00
Peter Brehm	2018-01-07	290.00	1860.00	1860.00
Peter Brehm	2018-01-30	770.00	1380.00	1380.00
Peter Brehm	2018-02-27	320.00	1590.00	1590.00
Peter Brehm	2017-12-10	500.00	4620.00	4620.00
Peter Brehm	2017-10-07	3800.00	5070.00	5070.00
Peter Brehm	2017-09-18	770.00	6120.00	4570.00
Zachary Ehrlich	2017-10-03	1550.00	3690.00	2920.00
Zachary Ehrlich	2017-09-19	1370.00	4170.00	4170.00
Zachary Ehrlich	2017-10-08	1250.00	3030.00	3030.00
<< more rows here >>

The column labeled SumOf3 represents the sum of the ContractPrice for the current row, plus the ContractPrice on the previous row and the following row. Look at the first row, where it’s 1330.00. There is no previous row, the ContractPrice on that row is 200.00, and the ContractPrice on the next row is 1130.00, so the sum is indeed 1330.00. For the second row, the ContractPrice on the previous row is 200.00, the ContractPrice on the current row is 1130.00, and the ContractPrice on the following row is 1850.00, so the value in the SumOf3 column is 200.00 + 1130.00 + 1850.00, or 3180.00. Look down to the tenth row (the first row for Peter Brehm). Even though it’s for a different customer, there is a previous row (remember, I only specified ORDER BY, not PARTITION BY!), with a ContractPrice of 1850.00. The ContractPrice on the current row is 800.00, and on the next row it is 290.00, so the SumOf3 value for that row is 1850.00 + 800.00 + 290.00, or 2940.00.

The first eight rows are all for the same customer (Matt Berg), so there’s no difference between the value in the SumOf3 and the PartitionedSumOf3 columns. However, the ninth row is the last row for Matt, so there isn’t a following row. That means that the value in PartitionedSumOf3 is simply the value of the ContractPrice on the previous row (1670.00) plus the value of the ContractPrice on the current row (1850.00), or 3520.00. Now, there’s no previous row for the tenth row (the first row for Peter Brehm), so the PartitionedSumOf3 is the value of the ContractPrice for the current row (800.00) plus the value of the ContractPrice for the following row (290.00), or 1090.00.

Sample Statements

You now know the mechanics of constructing queries using window functions and have seen some of the types of requests you can answer with window functions. Let’s take a look at a fairly robust set of samples, all of which use one or more window functions. These examples come from each of the sample databases, and they illustrate the use of the window functions.

Note: Remember in the Introduction that I warned you that results from each database system won’t necessarily match the sort order you see in examples in this book unless you include an ORDER BY clause. Even when you include that specification, the system might return results in columns not included in the ORDER BY clause in a different sequence because of different optimization techniques.

If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. You must open the design of the view and execute it from there to see the ORDER BY clause honored.

Also, when you use GROUP BY, you’ll often see the results returned by your database system as though the rows are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.

I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for this book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), and I prefixed the names of the queries relevant to this chapter with “CH22.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.

Note: Remember that all the column names and table names used in these examples are drawn from the sample database structures shown in Appendix B, “Schema for the Sample Databases.” Because many of these examples use complex JOINs, your database system might choose a different way to solve these queries. For this reason, the first few rows might not exactly match the result you obtain, but the total number of rows should be the same. To simplify the process, I have combined the Translation and Clean Up steps for all the following examples.

Examples Using ROW_NUMBER

Entertainment Agency Database

“I’d like a list of all of the engagements. Show me the start date for each engagement, the name of the customer, and the entertainer. Number the engagements overall, plus number the engagements within each start date.”

Translation/Clean Up

Select the StartDate, CustFirstName || ' ' || CustLastName AS Customer, EntStartName AS Entertaine, the ROW_NUMBER() ordered by StartDate AS Number and the ROW_NUMBER() partitioned by StartDate and ordered by StartDate AS NumberByDate from the Engagements table inner joined with the Entertainers table on Entertainers.EntertainerID = Engagements.EntertainerID inner joined with the Customers table on Customers.CustomerID = Engagements.CustomerID

SQL

SELECT ROW_NUMBER() OVER (

ORDER BY Engagements.StartDate

) AS Number,

Engagements.StartDate,

ROW_NUMBER() OVER (

PARTITION BY Engagements.StartDate

ORDER BY Engagements.StartDate

) AS NumberByDate,

Customers.CustFirstName || ' ' ||

Customers.CustLastName AS Customer,

Entertainers.EntStageName AS Entertainer

FROM Engagements

INNER JOIN Entertainers

ON Entertainers.EntertainerID =

Engagements.EntertainerID

INNER JOIN Customers

ON Customers.CustomerID =

Engagements.CustomerID;

CH22_Engagements_Numbered (111 rows)

Number	StartDate	Number ByDate	Customer	Entertainer
1	2017-09-02	1	Matt Berg	Jim Glynn
2	2017-09-11	1	Doris Hartwig	Jazz Persuasion
3	2017-09-11	2	Mark Rosales	Country Feeling
4	2017-09-12	1	Dean McCrae	Topazz
5	2017-09-12	2	Liz Keyser	Jim Glynn
6	2017-09-12	3	Matt Berg	JV & the Deep Six
7	2017-09-16	1	Elizabeth Hallmark	Country Feeling
8	2017-09-18	1	Elizabeth Hallmark	JV & the Deep Six
9	2017-09-18	2	Peter Brehm	Modern Dance
10	2017-09-19	1	Matt Berg	Coldwater Cattle Company
11	2017-09-19	2	Zachary Ehrlich	Saturday Revue
12	2017-09-19	3	Mark Rosales	Carol Peacock Trio
13	2017-09-25	1	Doris Hartwig	Country Feeling
14	2017-09-25	2	Liz Keyser	Caroline Coie Cuartet
15	2017-09-30	1	Deb Waldal	Saturday Revue
16	2017-09-30	2	Sarah Thompson	Jim Glynn
<< more rows here >>
110	2018-02-27	1	Peter Brehm	Julia Schnebly
111	2018-03-04	1	Mark Rosales	JV & the Deep Six

Recipes Database

“I’d like a numbered list of all of the recipes. Number the recipes overall, plus number each recipe within its recipe class. Sort the lists alphabetically by recipe name within recipe class. Don’t forget to include any recipe classes that don’t have any recipes in them.”

Translation/Clean Up

Select the RecipeClassDescription, the RecipeTitle, the ROW_NUMBER() ordered by RecipeClassDescription, RecipeTitle AS OverallNumber and the ROW_NUMBER() partitioned by RecipeClassDescription and ordered by RecipeTitle AS ClassNumber from the Recipe_Classes table inner joined with the Recipes table on Recipes.RecipeClass ID = Recipe_Classes.RecipeClassID

SQL

SELECT ROW_NUMBER() OVER (

ORDER BY RC.RecipeClassDescription,

R.RecipeTitle

) AS OverallNumber,

RC.RecipeClassDescription,

ROW_NUMBER() OVER (

PARTITION BY RC.RecipeClassDescription

ORDER BY R.RecipeTitle

) AS NumberInClass,

R.RecipeTitle

FROM Recipe_Classes AS RC

LEFT JOIN Recipes AS R

ON R.RecipeClassID = RC.RecipeClassID;

CH22_Recipe_Classes_Numbered (16 rows)

Overall Number	RecipeClass Description	Number InClass	RecipeTitle
1	Dessert	1	Coupe Colonel
2	Dessert	2	Trifle
3	Hors d’oeuvres	1	Machos Nachos
4	Hors d’oeuvres	2	Salsa Buena
5	Main course	1	Fettuccini Alfredo
6	Main course	2	Huachinango Veracruzana (Red Snapper, Veracruz style)
7	Main course	3	Irish Stew
8	Main course	4	Pollo Picoso
9	Main course	5	Roast Beef
10	Main course	6	Salmon Filets in Parchment Paper
11	Main course	7	Tourtière (French-Canadian Pork Pie)
12	Salad	1	Mike’s Summer Salad
13	Soup	1	NULL
14	Starch	1	Yorkshire Pudding
15	Vegetable	1	Asparagus
16	Vegetable	2	Garlic Green Beans

Examples Using RANK, DENSE_RANK, and PERCENT_RANK

Sales Orders Database

“Rank all employees by the number of orders with which they’re associated.”

Translation/Clean Up

Select the EmployeeID, EmpFirstName || ' ' || EmpLastName AS EmployeeName, COUNT(DISTINCT OrderNumber) AS OrdersReceived and RANK() OVER ordered by COUNT(DISTINCT OrderNumber DESC from the Employees table inner joined with the Orders table ON Orders.EmployeeID = Employees.EmployeeID inner joined with the Order_Details table on OrderDetails.OrderNumber = Orders.OrderNumber, grouped by EmployeeID, EmpFirstName and EmpLastName

SQL

SELECT E.EmployeeID, E.EmpFirstName || ' ' ||

E.EmpLastName AS Employee,

COUNT(DISTINCT O.OrderNumber) AS OrdersReceived,

RANK() OVER (

ORDER BY COUNT(DISTINCT O.OrderNumber) DESC

) AS Rank

FROM Employees AS E

INNER JOIN Orders AS O

ON O.EmployeeID = E.EmployeeID

INNER JOIN Order_Details AS OD

ON OD.OrderNumber = O.OrderNumber

GROUP BY E.EmployeeID, E.EmpFirstName,

E.EmpLastName;

CH22_Employee_Sales_Ranked (8 rows)

EmployeeID	Employee	OrdersReceived	Rank
707	Kathryn Patterson	138	1
708	Susan McLain	129	2
702	Mary Thompson	117	3
704	Carol Viescas	117	3
705	Kirk DeGrasse	115	5
701	Ann Patterson	109	6
706	David Viescas	104	7
703	Matt Berg	104	7

If you’re wondering why I included the Order_Details table in the above SQL, the answer is that I know there are some rows in the Orders table that do not have any matching rows in the Order_Details table. I don’t want to include orders that have no items specified, so including the Order_Details in an INNER JOIN with the Orders table eliminates those “empty” orders.

School Scheduling Database

“Rank the staff by how long they’ve been with us as of January 1, 2018. I don’t want to see any gaps in the rank numbers.”

(Remember that I showed you how to calculate the years of service in CH19_Length_Of_Service)

Translation/Clean Up

Select the StaffID, StfFirstName || ' ' || StfLastName AS StaffName, LengthOfService, and DENSE_ RANK() OVER ordered by LengthOfService DESC from table Staff

SQL

SELECT StaffID, StfFirstName || ' ' || StfLastname

AS StaffName,

YEAR(CAST('2018-01-01' As Date)) -

YEAR(DateHired) -

(CASE WHEN Month(DateHired) < 10

THEN 0

WHEN Month(DateHired) > 10

THEN 1

WHEN Day(DateHired) > 1

THEN 1

ELSE 0 END

) AS LengthOfService,

DENSE_RANK() OVER (ORDER BY YEAR

(CAST('2018-01-01' As Date)) - YEAR(DateHired) -

(CASE WHEN Month(DateHired) < 10

THEN 0

WHEN Month(DateHired) > 10

THEN 1

WHEN Day(DateHired) > 1

THEN 1

ELSE 0 END

) DESC) AS Rank

FROM Staff;

CH22_Staff_Service_Ranked (27 rows)

StaffID	StaffName	LengthOfService	Rank
98036	Sam Abolrous	27	1
98062	Caroline Coie	27	1
98025	Carol Viescas	26	2
98028	Alaina Hallmark	26	2
98010	Jeffrey Smith	26	2
98011	Ann Patterson	26	2
98007	Gary Hallmark	25	3
98020	Jim Glynn	25	3
98043	Kathryn Patterson	25	3
98052	Katherine Ehrlich	25	3
98013	Deb Waldal	24	4
98014	Peter Brehm	24	4
98005	Suzanne Viescas	24	4
98048	Joyce Bonnicksen	24	4
98059	Maria Patterson	24	4
98040	Jim Wilson	23	5
98030	Liz Keyser	22	6
98063	Kirk DeGrasse	22	6
98064	Luke Patterson	21	7
98055	Alastair Black	21	7
98057	Joe Rosales III	21	7
98012	Robert Brown	21	7
98021	Tim Smith	21	7
98019	Mariya Sergienko	20	8
98045	Michael Hernandez	20	8
98053	Caleb Viescas	18	9
98042	David Smith	18	9

Examples Using NTILE

Bowling League Database

“Rank all the teams from best to worst based on the average handicap score of all the players. Arrange the teams into four quartiles.”

Translation/Clean Up

Select the TeamName, AVG(HandiCapScore) and NTILE(4) OVER ordered by AVG(HandiCapScore) DESC from the Teams table inner joined with the Bowlers table ON Bowlers.TeamID = Teams.TeamID inner joined with the Bowler_Scores table on BowlerScores.BowlerID = Bowlers.BowerID, grouped by TeamName

SQL

SELECT Teams.TeamName,

ROUND(AVG(Bowler_Scores.HandiCapScore), 0)

AS AvgTeamHandicap,

NTILE(4) OVER (

ORDER BY ROUND(AVG(Bowler_Scores.HandiCap-

Score), 0)

DESC) AS Quartile

FROM Teams INNER JOIN Bowlers

ON Bowlers.TeamID = Teams.TeamID

INNER JOIN Bowler_Scores

ON Bowler_Scores.BowlerID = Bowlers.BowlerID

GROUP BY Teams.TeamName;

CH22_Teams_In_Quartiles (8 rows)

TeamName	AvgTeamHandicap	Quartile
Marlins	196	1
Barracudas	196	1
Manatees	196	2
Swordfish	195	2
Orcas	195	3
Dolphins	195	3
Sharks	195	4
Terrapins	195	4

I bet you’re wondering what’s going on here. Because there’s so little difference between the team handicaps, the database system sorts the teams by average handicap and then applies the value 1 to the first two rows, 2 to the second two rows, 3 to the fifth and sixth rows, and 4 to the remaining ones. In fact, the sorting within an equal is arbitrary and can vary from system to system. If you run both the Microsoft SQL Server and PostgreSQL examples, you’ll find that the choice of “top two” is entirely different.

Entertainment Agency Database

“Rank all the entertainers based on the number of engagements booked for each. Arrange the entertainers into three groups. Remember to include any entertainers who haven't been booked for any engagements.”

Translation/Clean Up

Select the EntStageName, COUNT(Engagements.EntertainerID), and NTILE(3) OVER ordered by COUNT(Engagements.EntertainerID) DESC from the Entertainers table left joined with the Engagements table ON Engagements.EntertainerID = Entertainers.EntertainerID, grouped by EntStageName

SQL

SELECT Entertainers.EntStageName AS Entertainer,

COUNT(Engagements.EntertainerID) AS Gigs,

NTILE(3) OVER (

ORDER BY COUNT(Engagements.EntertainerID) DESC

) AS [Group]

FROM Entertainers

LEFT JOIN Engagements

ON Engagements.EntertainerID = Entertainers.

EntertainerID

GROUP BY Entertainers.EntStageName;

CH22_Entertainer_3_Groups (13 rows)

Entertainer	Gigs	Group
Country Feeling	15	1
Carol Peacock Trio	11	1
Caroline Coie Cuartet	11	1
JV & the Deep Six	10	1
Modern Dance	10	1
Saturday Revue	9	2
Jim Glynn	9	2
Julia Schnebly	8	2
Coldwater Cattle Company	8	2
Jazz Persuasion	7	3
Topazz	7	3
Susan McLain	6	3
Katherine Ehrlich	0	3

Examples Using Aggregate Functions

Bowling League Database

“For each team, show me the details of all of the games bowled by the team captains. Include the date and location for each tournament, their handicap score, whether or not they won the game. Include counts for how many games they won and their average handicap score.”

Translation/Clean Up

Select the TeamName, BowlerFirstName || ' ' || BowlerLastName AS Captain, TourneyDate, TourneyLocation, HandiCapScore, CASE WonGame WHEN 1 THEN ‘Won’ ELSE ‘Lost’ END As WonGame, SUM(CAST(WonGame AS INT) OVER partitioned by TeamName AS TotalWindws, AVG(HandiCapScore) OVER (partitioned by TeamName) AS AvgHandicap from table Teams inner joined with the Bowlers table ON Bowler.BowlerID = Teams.CaptainID inner joined with the Bowler_Scores table on Bowler_Scores.BowlerID = Teams.CaptainID inner joined with the Match_Games table on MatchGames.MatchID = Bowler_Scores.MatchID and MatchGames.GameNumber = Bowler_Scores.GameNumber inner joined with the Tourney_Matches table on Tourney_Matches.MatchID = Match_Games.MatchID inner joined with the Tournaments table on Tournaments.TourneyID = Tourney_Matches.TourneyID

SQL

SELECT Teams.TeamName,

B.BowlerFirstName || ' ' || B.BowlerLastName

AS Captain,

T.TourneyDate, T.TourneyLocation,

BS.HandiCapScore,

CASE BS.WonGame

WHEN 1 THEN 'Won'

ELSE 'Lost'

END AS WonGame,

SUM(CAST(BS.WonGame AS INT)) OVER (

PARTITION BY Teams.TeamName

) AS TotalWins,

AVG(BS.HandiCapScore) OVER (

PARTITION BY Teams.TeamName

) AS AvgHandicap

FROM Teams

INNER JOIN Bowlers AS B

ON B.BowlerID = Teams.CaptainID

INNER JOIN Bowler_Scores AS BS

ON BS.BowlerID = B.BowlerID

INNER JOIN Match_Games AS MG

ON MG.MatchID = BS.MatchID

AND MG.GameNumber = BS.GameNumber

INNER JOIN Tourney_Matches AS TM

ON TM.MatchID = MG.MatchID

INNER JOIN Tournaments AS T

ON T.TourneyID = TM.TourneyID;

CH22_Comparing_Team_Captains (420 rows)

TeamName	Captain	Tourney Date	Tourney Location	HandiCap Score	Won Game	Total Wins	Avg Handicap
Barracudas	Richard Sheskey	2017-09-04	Red Rooster Lanes	195	Won	25	194
Barracudas	Richard Sheskey	2017-09-04	Red Rooster Lanes	191	Lost	25	194
Barracudas	Richard Sheskey	2017-09-04	Red Rooster Lanes	195	Won	25	194
Barracudas	Richard Sheskey	2017-09-11	Thunderbird Lanes	201	Won	25	194
Barracudas	Richard Sheskey	2017-09-11	Thunderbird Lanes	201	Won	25	194
Barracudas	Richard Sheskey	2017-09-11	Thunderbird Lanes	188	Won	25	194
Barracudas	Richard Sheskey	2017-09-18	Bolero Lanes	196	Won	25	194
Barracudas	Richard Sheskey	2017-09-18	Bolero Lanes	189	Lost	25	194
Barracudas	Richard Sheskey	2017-09-18	Bolero Lanes	193	Won	25	194
Barracudas	Richard Sheskey	2017-09-25	Imperial Lanes	190	Lost	25	194
Barracudas	Richard Sheskey	2017-09-25	Imperial Lanes	188	Lost	25	194
Barracudas	Richard Sheskey	2017-09-25	Imperial Lanes	198	Won	25	194
Barracudas	Richard Sheskey	2017-10-02	Sports World Lanes	195	Lost	25	194
Barracudas	Richard Sheskey	2017-10-02	Sports World Lanes	189	Won	25	194
Barracudas	Richard Sheskey	2017-10-02	Sports World Lanes	190	Lost	25	194
Barracudas	Richard Sheskey	2017-10-09	Totem Lanes	191	Lost	25	194
Barracudas	Richard Sheskey	2017-10-09	Totem Lanes	200	Won	25	194
Barracudas	Richard Sheskey	2017-10-09	Totem Lanes	191	Lost	25	194
Barracudas	Richard Sheskey	2017-10-16	Acapulco Lanes	190	Won	25	194
Barracudas	Richard Sheskey	2017-10-16	Acapulco Lanes	200	Won	25	194
Barracudas	Richard Sheskey	2017-10-16	Acapulco Lanes	187	Lost	25	194
<< more rows here >>

Note: I could have just as easily partitioned by BowlerFirstName, BowlerLastName rather than by TeamName, but I feel it’s usually a good idea not to group (or partition) by people names, since they’re often not unique. Other possibilities would have been BowlerID or CaptainID.

Sales Orders Database

I’ll confess that this is yet another somewhat artificial example of using ROWS to try and give you another example of how it works.

“For each order, give me a list of the customer, the product, and the quantity ordered. Give me the total quantity of products for each order. As well, for every group of three products on the invoice, show me their total and the highest and lowest value.”

Translation/Clean Up

Select the CustFirstName || ' ' || CustLastName, OrderNumber, ProductName, QuantityOrders, SUM(QuantityOrdered) OVER (partitioned by OrderNumber) AS TotalQuantity, SUM(QuantityOrdered) OVER (partitioned by OrderNumber ordered by OrderNumber ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS Quantity3, MIN(QuantityOrdered) OVER (partitioned by OrderNumber ordered by OrderNumber ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS Minimum3, MAX(QuantityOrdered) OVER (partitioned by OrderNumber ordered by OrderNumber ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS Maximum3 from the Orders table inner joined with the Order_Details table ON Order_Details.OrderNumber = Orders.OrderNumber inner joined with the Customers table on Customers.CustomerID = Orders.CustomerID inner joined with the Products table on Products.ProductNumber = Order_Details.ProductNumber

SQL

SELECT O.OrderNumber,

C.CustFirstName || ' ' || C.CustLastName AS

Customer,

O.OrderNumber,

P.ProductName,

OD.QuantityOrdered,

SUM(OD.QuantityOrdered) OVER (

PARTITION BY O.OrderNumber

) AS TotalQuantity,

SUM(OD.QuantityOrdered) OVER (

PARTITION BY O.OrderNumber

ORDER BY O.OrderNumber

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS Quantity3,

MIN(OD.QuantityOrdered) OVER (

PARTITION BY O.OrderNumber

ORDER BY O.OrderNumber

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS Minimum3,

MAX(OD.QuantityOrdered) OVER (

PARTITION BY O.OrderNumber

ORDER BY O.OrderNumber

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS Maximum3

FROM Orders AS O

INNER JOIN Order_Details AS OD

ON OD.OrderNumber = O.OrderNumber

INNER JOIN Customers AS C

ON C.CustomerID = O.CustomerID

INNER JOIN Products AS P

ON P.ProductNumber = OD.ProductNumber;

CH22_Orders_Min_Max (3973 rows)

Order Number	Customer	Product Name	Quantity Ordered	Total Quantity	Quantity3	Minimum3	Maximum3
1	David Smith	Trek 9000 Mountain Bike	2	24	5	2	3
1	David Smith	Viscount Mountain Bike	3	24	9	2	4
1	David Smith	GT RTS-2 Mountain Bike	4	24	8	1	4
1	David Smith	ProFormance ATB All-Terrain Pedal	1	24	8	1	4
1	David Smith	Dog Ear Aero-Flow Floor Pump	3	24	9	1	5
1	David Smith	Glide-O-Matic Cycling Helmet	5	24	14	3	6
1	David Smith	Ultimate Export 2G Car Rack	6	24	11	5	6
2	Suzanne Viescas	X-Pro All Weather Tires	4	8	8	4	4
2	Suzanne Viescas	Ultimate Export 2G Car Rack	4	8	8	4	4
3	William Thompson	Trek 9000 Mountain Bike	5	28	10	5	5
3	William Thompson	Viscount Mountain Bike	5	28	11	1	5
3	William Thompson	GT RTS-2 Mountain Bike	1	28	8	1	5
3	William Thompson	ProFormance ATB All-Terrain Pedal	2	28	6	1	3
3	William Thompson	Dog Ear Aero-Flow Floor Pump	3	28	8	2	3
3	William Thompson	Glide-O-Matic Cycling Helmet	3	28	11	3	5
3	William Thompson	True Grip Competition Gloves	5	28	12	3	5
3	William Thompson	Cosmic Elite Road Warrior Wheels	4	28	9	4	5
4	Andrew Cencini	Trek 9000 Mountain Bike	4	19	7	3	4
<< more rows here >>

The QuantityOrdered column is hopefully straightforward to understand. The first order (number 1, ordered by David Smith) consisted of seven products. David ordered 2 Trek 9000 Mountain Bikes, 3 Viscount Mountain Bikes, 4 GT RTS-2 Mountain Bikes, 1 ProFormance ATB All-Terrain Pedal, 3 Dog Ear Aero-Flow Floor Pumps, 5 Glide-O-Matic Cycling Helmets, and 6 Ultimate Export 2G Car Racks, for a total of 24 items. The second order (number 2, ordered by Suzanne Viescas) consisted of 4 X-Pro All Weather Tires and 4 Ultimate Export 2G Car Racks for a total of 8 items. That’s pretty standard.

The other three columns are the off-beat ones. I asked for the products on each invoice to be divided into groups of three, and for the sum, the minimum and the maximum of each of those groups. Remember how ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING works. Because I’m partitioning on Order, there is no preceding row for the first product on the first order (Trek 9000 Mountain Bike), so only the current row (with a quantity of 2) and the following row (with a quantity of 3) are considered. The total of those quantities is 5, the minimum is 2, and the maximum is 3. The next row (Viscount Mountain Bike) does have a preceding row, so there are three quantities to be considered: 2 (the preceding row), 3 (the current row), and 4 (the following row). The sum of those quantities is 9, the minimum is 2, and the maximum is 4. Skip down to the seventh row (Ultimate Export 2G Car Rack). Again, because I’m partitioning on Order, there is no following row, so only the preceding row (with a quantity of 5) and the current row (with a quantity of 6) are considered. The total of those two quantities is 11, the minimum is 5, and the maximum is 6.

Because the second order consists of only two rows, those two rows (both with quantities of 4) are the only ones considered in both cases. The sum of the quantities of those two rows is 8, the minimum is 4, and the maximum is 4.

For the third order, there is no preceding row for the Trek 9000 Mountain Bike row, so only it and the following Viscount Mountain Bike row are considered for the first calculation. Similarly, there is no following row for the Cosmic Elite Road Warrior Wheels row, so only it and the preceding True Grip Competition Gloves row are considered for the last calculation.

School Scheduling Database

“For each subject, give me the highest mark that’s been received. Also, show me the highest mark that’s been received for each category of subjects, as well as the highest mark that’s been received overall.”

Translation/Clean Up

Select the CategoryID, SubjectCode, SubjectName, MAX(Grade) OVER (partioned by SubjectID) AS SubjectMax, MAX(Grade) OVER (partitioned by CategoryID) AS CategoryMax, MAX(Grade) OVER() AS OverallMax from the Subjects table inner joined with the Classes table ON Classes.ClassID = Subjects.ClassID inner joined with the Student_Schedules table on Student_Schedules.ClassID = Classes.ClassID where ClassStatus = 2

SQL

SELECT DISTINCT Subjects.CategoryID,

Subjects.SubjectCode, Subjects.SubjectName,

MAX(Student_Schedules.Grade) OVER (

PARTITION BY Subjects.SubjectID

) AS SubjectMax,

MAX(Student_Schedules.Grade) OVER (

PARTITION BY Subjects.CategoryID

) AS CategoryMax,

MAX(Student_Schedules.Grade) OVER (

) AS OverallMax

FROM Subjects

INNER JOIN Classes

ON Classes.SubjectID = Subjects.SubjectID

INNER JOIN Student_Schedules

ON Student_Schedules.ClassID = Classes.ClassID

WHERE Student_Schedules.ClassStatus = 2;

CH22_Top_Marks (14 rows)

CategoryID	SubjectCode	SubjectName	Subject Max	Category Max	Overall Max
ACC	ACC 210	Financial Accounting Fundamentals I	91.12	91.12	99.83
ART	ART 100	Introduction to Art	99.83	99.83	99.83
ART	ART 210	Computer Art	87.65	99.83	99.83
ART	ART 251	Art History	97.81	99.83	99.83
BIO	BIO 101	General Biology	94.54	94.54	99.83
CHE	CHE 139	Fundamentals of Chemistry	98.31	98.31	99.83
CIS	CIS 101	Microcomputer Applications	98.01	98.01	99.83
ENG	ENG 101	Composition - Fundamentals	98.07	98.07	99.83
ENG	ENG 102	Composition - Intermediate	88.91	98.07	99.83
HIS	HIS 111	U.S. History to 1877	87.14	87.14	99.83
MAT	MAT 080	Preparatory Mathematics	93.19	94.33	99.83
MAT	MAT 097	Elementary Algebra	94.33	94.33	99.83
MUS	MUS 100	Music in the Western World	97.84	97.84	99.83
MUS	MUS 101	First Year Theory and Ear Training	86.57	97.84	99.83

Summary

I began the chapter by describing to you why you might want to aggregate data differently from how you were shown in previous chapters and included an example to show one possibility.

I then explored ways to use “windows” into your data and talked about how the OVER() clause is the key to using window functions, and I explained the difference between GROUP BY and OVER().

I showed you how PARTITION BY, ORDER BY, and ROWS or RANGE influence your results.

• The PARTITION BY predicate divides the result set from the query into data subsets.

• The ORDER BY predicate controls the order that the rows are evaluated.

• The ROWS or RANGE predicate determines the subset of rows within the partition that are to be applied (and can only be used with aggregate functions).

I went on to show that window functions include all of the aggregate functions I’d already shown you, plus several others: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK and NTILE. Then I described the use of both the PARTITION BY and ORDER BY predicates with the OVER() clause. Finally, I gave several examples of how to use window functions to solve real-world problems.

The following section presents some requests that you can work out on your own.

Problems for You to Solve

Bowling League Database

1.“Divide the teams into quartiles based on the best raw score bowled by any member of the team.”

You can find my solution in CH22_Team_Quartiles_Best_RawScore (8 rows).

2. “Give me a list of all of the bowlers in the league. Number the bowlers overall, plus number them within their teams, sorting their names by LastName then FirstName.”

You can find my solution in CH22_Bowler_Numbers (32 rows).

3. “Rank all of the bowlers in the league by their average handicap score. Show me the “standard” ranking, but also show me the ranking with no gaps.”

You can find my solution in CH22_Bowler_Ranks (32 rows).

Entertainment Agency Database

1. “Rank all the agents based on the total dollars associated with the engagements that they’ve booked. Make sure to include any agents that haven’t booked any acts.”

You can find my solution in CH22_Agent_Ranks (9 rows).

2. “Give me a list of all of the engagements our entertainers are booked for. Show me the entertainer’s stage name, the customer’s name, and the start date for each engagements, as well as the total number of engagements booked for each entertainer.”

You can find my solution in CH22_Entertainer_Engagements (111 rows).

3. “Give me a list of all of the Entertainers and their members. Number each member within a group.”

You can find my solution in CH22_Entertainer_Lists (40 rows).

Recipes Database

1. “Give me a list of all of the recipes I've got. For each recipe, I want to see all of the ingredients in the recipe, plus a count of how many different ingredients there are.”

You can find my solution in CH22_Recipe_Ingredient_Counts (88 rows).

2. “I’d like a list of all the different ingredients, with each recipe that contains that ingredient. While you’re at it, give me a count of how many recipes there are that use each ingredient.”

You can find my solution in CH22_Ingredient_Recipe_Counts (88 rows).

3. “I want a numbered list of all of the ingredients. Number the ingredients overall, plus number each ingredient within its ingredient class. Sort the lists alphabetically by ingredient within ingredient class. Don’t forget to include any ingredient classes that don’t have any ingredients in them.”

You can find my solution in CH22_Ingredients_By_Ingredient_Class (83 rows, including four classes with no ingredients in them).

Sales Orders Database

1. “Show totals for each invoice, ranking them from highest purchase value to lowest.”

You can find my solution in CH22_Order_Totals_RankedByInvoiceTotal (933 rows).

2. “Produce a list of each category and the total purchase price of all products in each category. Include a column showing the total purchase price regardless of category as well.”

You can find my solution in CH22_Sales_Totals (6 rows).

How would you write this differently if you knew there were categories that had no sales in them? (The sample database does have sales in each category, but you can always add a new category with no sales to see whether your query works!) My solution is CH22_Sales_Totals_Handle_Nulls. If you didn't add a new category, it would return the same as CH22_Sales_Totals (6 rows). If you did add a new category, it would return seven rows.

3. “Rank each customer by the number of orders which they’ve placed. Be sure to include customers that haven’t placed any orders yet.”

You can find my solution in CH22_Customer_Order_Counts_Ranked (28 rows, including Jeffrey Tirekicker having placed no orders).

School Scheduling Database

1. “Rank the students in terms of the number of classes they’ve completed.”

You can find my solution in CH22_Student_Class_Totals_Rank (18 rows). (Pretty homogenous class, isn’t it?)

2. “Rank the faculty in terms of the number of classes they’re teaching.”

You can find my solution in CH22_Staff_Class_Totals_Rank (22 rows).

3. “Arrange the students into 3 groups depending on their average grade for each of the classes they’ve completed.”

You can find my solution in CH22_Student_AverageGrade_Groups (18 rows).

4. “For each student, give me a list of each course he or she has completed and the mark he or she got in that course. Show me their overall average for all the completed courses, plus, for every group of three courses, show me their average and the highest and lowest marks of the three highest marks that have been received.”

You can find my solution in CH22_Marks_Min_Max (68 rows).