21
Performing Complex Calculations on Groups

Before a group can enter the open society, it must first close ranks.

STOKELY CARMICHAEL

Topics in this Chapter

Grouping in Sub-Groups

Extending the GROUP BY Clause

Getting Totals in a Hierarchy Using ROLLUP

Calculating Totals on Combinations Using CUBE

Creating a Union of Totals with GROUPING SETS

Variations on Grouping Techniques

Sample Statements

Summary

Problems for You to Solve

In Part IV, I showed you how to summarize and group data. Specifically, in Chapter 12, “Simple Totals,” I showed how to calculate totals, counts, and other aggregate functions, in Chapter 13, “Grouping Data,” I showed how to group data, and in Chapter 14, “Filtering Grouped Data,” I showed both how to filter the grouped data, as well as how to filter the data that gets grouped. However, I’ve always found that it can be a somewhat limited way to report the data. You can only group one specific way at a time, and sometimes that’s just not enough.

Let’s see whether we can do a little more with grouping in this chapter.

Grouping in Sub-Groups

In Chapter 13, I showed how to use the GROUP BY clause to specify one or more columns to specify how to group the aggregated data. When you specify more than one column, the aggregation is done for each unique combination of values for all of those columns.

Let's look at our Student Population. Using the techniques I’ve already shown in Chapter 13, I can build a query to fetch the data summarizing by state, gender, and marital status. Here’s the SQL:

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY StudState, StudGender, StudMaritalStatus;

The result looks like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_GROUP_BY in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

CA

F

W

1

CA

M

S

1

OR

F

M

1

OR

F

S

1

OR

M

S

2

TX

F

S

2

TX

M

S

1

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

M

S

4

It’s a useful way of looking at the data, but what if I also want to know the total number of students by State, or the total number of Female students? I’d have to create one or more additional queries for any other breakdowns I want to see.

What if there were another way to retrieve the data? Perhaps I could return something like the following:

State

Gender

MaritalStatus

Number

Any State

Any Gender

Any Status

18

Any State

Any Gender

D

1

Any State

Any Gender

M

2

Any State

Any Gender

S

14

Any State

Any Gender

W

1

Any State

F

Any Status

10

Any State

F

D

1

Any State

F

M

2

Any State

F

S

6

Any State

F

W

1

Any State

M

Any Status

8

Any State

M

S

8

CA

Any Gender

Any Status

2

CA

Any Gender

S

1

CA

Any Gender

W

1

CA

F

Any Status

1

CA

F

W

1

CA

M

Any Status

1

CA

M

S

1

OR

Any Gender

Any Status

4

OR

Any Gender

M

1

OR

Any Gender

S

3

OR

F

Any Status

2

OR

F

M

1

OR

F

S

1

OR

M

Any Status

2

OR

M

S

2

TX

Any Gender

Any Status

3

TX

Any Gender

S

3

TX

F

Any Status

2

TX

F

S

2

TX

M

Any Status

1

TX

M

S

1

WA

Any Gender

Any Status

9

WA

Any Gender

D

1

WA

Any Gender

M

1

WA

Any Gender

S

7

WA

F

Any Status

5

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

M

Any Status

4

WA

M

S

4

This new table may require some explanation. Look at the shaded rows: they represent the 11 rows of data returned by the first query that give the number of students broken down by the various combination of State, Gender, and Marital Status. However, let’s look at some of the other rows in that table. In them, one or more of the columns represents all the values. For instance, the first row represents the total number of students, regardless of the state they’re from, their gender, or their marital status (18 in this case). The next row represents the total number of Divorced students, regardless of the state they’re from or their gender (one in this case). The sixth row represents the total number of Female students, regardless of the state they’re from or their marital status (10 in this case).

Does this look like it might be useful in analyzing the demographics of your student population? I’ll show you how to achieve this in the following section. (And if you’re curious and want to look at the SQL, I saved the above request as CH21_Students_State_Gender_MaritalStatus_Count_CUBE_No_Nulls in the Student Scheduling sample database.)

Extending the GROUP BY Clause

What I talked about in Chapter 13 applies to the more comprehensive groupings I’m going to talk about in this chapter. In fact, when you look at the syntax for these more comprehensive groupings, you’ll see it’s almost identical to the diagram shown there.

Syntax

Let’s take a close look at the complete GROUP BY clause. Figure 21-1 shows the basic diagram for a SELECT statement with GROUP BY expanded to show the additional features you’ll learn about in this chapter. If you compare this figure to Figure 13-1, you can see the difference is the addition of one of three different keywords (ROLLUP, CUBE, or GROUPING SETS), plus the fact that the Column Reference clause for each of these is enclosed in parentheses.

As you might expect, each of the different keywords results in different results being returned.

Note: Because the extensions to GROUP BY are just variations on the GROUP BY syntax, all of the restrictions discussed in Chapter 13 apply equally here. Specifically, any column that’s listed in the SELECT clause that’s not part of an aggregate expression must be included in the GROUP BY clause, and the GROUP BY clause must refer to columns created by the FROM and WHERE, not by expressions created in the SELECT clause.

Image

Figure 21-1Full syntax for the GROUP BY clause

Getting Totals in a Hierarchy Using Rollup

As you’ve already seen, when you use GROUP BY, the results summarize the data for all existing combinations of values that exist for those columns. When you add ROLLUP to the grouping clause, your database system adds new rows to produce group subtotals, plus a grand total. If there are n columns listed in the ROLLUP, there will be n+1 levels of subtotals. Note that the subtotals are added from right to left, so the order in which you list the columns in the clause is important.

Note: You can find all the sample statements and solutions in the respective sample databases—SalesOrdersExample, Entertainment AgencyExample, RecipesExample, SchoolSchedulingExample, and BowlingLeagueExample. Because neither Microsoft Access nor MySQL support grouping sets, you’ll find the sample solutions only in the Microsoft SQL Server and PostgreSQL sample databases.

Suppose you wanted a total by student state, student gender, and student marital status, but with subtotals by state and by state and gender. Your request might look like the following:

Note: Throughout this chapter, I use the “Request/Translation/Clean Up/SQL” technique introduced in Chapter 4, “Creating a Simple Query.” Because this process should now be very familiar to you, I have combined the Translation/Clean Up steps for all the following examples to simplify the process.

“Show me the count for all unique combinations of student state, student gender, and student marital status, summarized for each combination of state and gender and the total by state.”

Translation/Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by and rolled up by student state, student gender, and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY ROLLUP

(StudState, StudGender, StudMaritalStatus);

Note: When you find yourself using “summarized for each unique combination” of a subset but not all combinations, you should replace that with the ROLLUP keyword.

That request results look like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_ROLLUP_Order1 in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

CA

F

W

1

CA

F

NULL

1

CA

M

S

1

CA

M

NULL

1

CA

NULL

NULL

2

OR

F

M

1

OR

F

S

1

OR

F

NULL

2

OR

M

S

2

OR

M

NULL

2

OR

NULL

NULL

4

TX

F

S

2

TX

F

NULL

2

TX

M

S

1

TX

M

NULL

1

TX

NULL

NULL

3

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

F

NULL

5

WA

M

S

4

WA

M

NULL

4

WA

NULL

NULL

9

NULL

NULL

NULL

18

Let’s look at those results. When a column on a particular row is Null, you can interpret that to mean that it represents all the values.

Note: Remember in the Introduction that I warned you that results from each database system won’t necessarily match the sort order you see in examples in this book unless you include an ORDER BY clause. Even when you include that specification, the system might return results in columns not included in the ORDER BY clause in a different sequence because of different optimization techniques.

If you’re running the examples in Microsoft SQL Server, simply selecting the rows from the view does not honor any ORDER BY clause specified in the view. You must open the design of the view and execute it from there to see the ORDER BY clause honored.

Also, when you use GROUP BY, you’ll often see the results returned by your database system as though the rows are sorted by the columns you specified. This happens because some optimizers first sort the data internally to make it faster to process your GROUP BY. Keep in mind that if you want a specific sort order, you must also include an ORDER BY clause.

The first row shows that there is one widowed female from California. Because there are no other females from California, the second row summarizes that there is one female from California. (Remember that the Null value in the StudMaritalStatus means “Any Status”). The third row shows that there is one single male from California. Again, because there are no other males from California, the fourth row summarizes that there is one male from California. After covering all the genders and marital statuses recorded in the table, the fifth row summarizes that there are two students from California.

Rows six and seven indicate that there’s one married female from Oregon and one single female from Oregon respectively. That’s all the females from Oregon, so row eight summarizes that there are two females from Oregon.

This summarization continues to be interspersed with the grouping results until the last row, which summarizes that there are eighteen students in total.

Because there are three columns listed in the ROLLUP clause (StudState, StudGender, and StudMaritalStatus, in that order), there are four levels of subtotals:

1. Unique combinations of StudState, StudGender, and StudMaritalStatus

2. Unique combinations of StudState and StudGender regardless of StudMaritalStatus values

3. Unique values of StudState regardless of StudGender and StudMaritalStatus values

4. Grand total

Note: The query actually returns Null as the value for the columns that are being rolled up. If you like, you can use the CASE expression or the ISNULL or COALESCE functions (in SQL Server) or the CASE expression or COALESCE function (in PostgreSQL) to convert that Null value to something more meaningful in the results.

Perhaps the user would like to see words instead of Null. This query actually returns Null as the value for the columns that are being rolled up. However, that will not always be the case if the column in your table actually contains NULL values. (None of the columns do in this case.) So, you could use the CASE expression or the ISNULL or COALESCE functions (in SQL Server) or the CASE expression or COALESCE function (in PostgreSQL) to convert that Null value to something more meaningful in the results. But there’s a better way. There’s a cool function called GROUPING as shown in Figure 21-2.

Image

Figure 21-2The GROUPING function

This handy little function—when used on a column that is being grouped using CASE, ROLLUP, or GROUPING SETS—returns a numeric value to indicate the level of grouping. When the value returned is zero, the value is not “rolled up” or “summarized,” so you can display the column value. When the value is other than zero, the column is being summarized, so it will definitely contain a Null value. You can use this function in a CASE expression (See Chapter 19, “Condition Testing”) to decide whether to display the column value or an “Any” literal.

So, the request could have been like this:

“Show me the count for all unique combinations of student state, student gender, and student marital status, summarized for each combination of state and gender and the total by state. Show ‘Any State,’ ‘Any Gender,’ or ‘Any Status’ for the subtotaled rows.”

Translation/Clean Up

Select when (CASE WHEN the grouping level of GROUPING(student state) is = 0 then display student state else ‘Any State’ END), when (CASE WHEN the grouping level of GROUPING(student gender) is = 0 then display student gender else ‘Any Gender’ END), when (CASE WHEN the grouping level of GROUPING(student marital status ) is = 0 then display student marital status else ‘Any Status’ END), and the count (*) of rows from the Students table, grouped by and rolled up by student state, student gender, and student marital status

SQL

SELECT (CASE WHEN GROUPING(StudState) = 0

   THEN StudState

   ELSE 'Any State' END) AS State,

 (CASE WHEN GROUPING(StudGender) = 0

THEN StudGender

ELSE 'Any Gender') AS Gender,

 (CASE WHEN GROUPING(StudMaritalStatus) = 0

THEN StudMaritalStatus

ELSE 'Any Status') AS MaritalStatus,

 Count(*) AS Number

FROM Students

GROUP BY ROLLUP (StudState, StudGender,

StudMaritalStatus);

That query returns the following table (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_ROLLUP_No_Nulls in the School Scheduling Example database.):

State

Gender

MaritalStatus

Number

CA

F

W

1

CA

F

Any Status

1

CA

M

S

1

CA

M

Any Status

1

CA

Any Gender

Any Status

2

OR

F

M

1

OR

F

S

1

OR

F

Any Status

2

OR

M

S

2

OR

M

Any Status

2

OR

Any Gender

Any Status

4

TX

F

S

2

TX

F

Any Status

2

TX

M

S

1

TX

M

Any Status

1

TX

Any Gender

Any Status

3

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

F

Any Status

5

WA

M

S

4

WA

M

Any Status

4

WA

Any Gender

Any Status

9

Any State

Any Gender

Any Status

18

The above request returns four levels of subtotals:

1. Unique combinations of StudState, StudGender, and StudMaritalStatus

2. Unique combinations of StudState and StudGender regardless of StudMaritalStatus values

3. Unique values of StudState regardless of StudGender and StudMaritalStatus values

4. Grand total

Note: The query above illustrates the rule that the GROUP BY clause must refer to columns created by the FROM and WHERE, not by expressions created in the SELECT clause. You can see that the columns have been assigned aliases in the SELECT clause, but the GROUP BY clause cannot use those aliases: it must use the original names of the columns.

Let’s take a look at what happens when you use a different column sequence in ROLLUP. Consider the following request:

“Show me the count for all unique combinations of student marital status, student gender, and student state, summarized for each combination of marital status with gender and a total by marital status.”

Translation/Clean Up

Select the student marital status, student gender, student state, and the count (*) of rows from the Students table, grouped by and rolled up by student marital status, student gender, and student state

SQL

SELECT StudMaritalStatus, StudGender, StudState,

Count(*) AS Number

FROM Students

GROUP BY ROLLUP

(StudMaritalStatus, StudGender, StudState);

The result returned looks like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_ROLLUP_Order2 in the School Scheduling Example database.)

StudMaritalStatus

StudGender

StudState

Number

D

F

WA

1

D

F

NULL

1

D

NULL

NULL

1

M

F

OR

1

M

F

WA

1

M

F

NULL

2

M

NULL

NULL

2

S

F

OR

1

S

F

TX

2

S

F

WA

3

S

F

NULL

6

S

M

CA

1

S

M

OR

2

S

M

TX

1

S

M

WA

4

S

M

NULL

8

S

NULL

NULL

14

W

F

CA

1

W

F

NULL

1

W

NULL

NULL

1

NULL

NULL

NULL

18

Rather than 24 rows returned by the first query, there are now only 21 rows returned. Whereas before it was easy to see that there are two students from California, four students from Oregon, three students from Texas, and nine students from Washington making up the 18 students at the school, now it’s easy to see that there is one divorced student, two married students, fourteen single students, and one widowed student.

And because the three columns listed in the ROLLUP clause are StudMaritalStatus, StudGender, and StudState, in that order, the four levels of subtotals are:

1. Unique combinations of StudMaritalStatus, StudGender, and StudState

2. Unique combinations of StudMaritalStatus and StudGender regardless of StudState values

3. Unique values of StudMaritalStatus regardless of StudGender and StudState values

4. Grand total

Notice that the focus is now on marital status, so that is the first column in the ROLLUP. Putting gender second generates a subtotal for each combination of marital status and gender, and finally adding state creates totals for each combination of marital status, gender, and state. The totals are “rolled up” first into gender and then into marital status, with a final “rolled up” grand total of the count of all rows.

If the request is as follows:

“Show me the count for all unique combinations of student state, student gender, and student marital status, summarized for each combination of marital status with gender and a total by marital status.”

Translation/Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by and rolled up by student marital status, student gender, and student state

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY ROLLUP

(StudMaritalStatus, StudGender, StudState);

All I did was change the order of the columns in the SELECT clause. The result returned looks like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_ROLLUP_Order3 in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

WA

F

D

1

NULL

F

D

1

NULL

NULL

D

1

OR

F

M

1

WA

F

M

1

NULL

F

M

2

NULL

NULL

M

2

OR

F

S

1

TX

F

S

2

WA

F

S

3

NULL

F

S

6

CA

M

S

1

OR

M

S

2

TX

M

S

1

WA

M

S

4

NULL

M

S

8

NULL

NULL

S

14

CA

F

W

1

NULL

F

W

1

NULL

NULL

W

1

NULL

NULL

NULL

18

There are still 21 rows returned, and you still get the following sets of totals:

1. Unique combinations of StudMaritalStatus, StudGender, and StudState

2. Unique combinations of StudMaritalStatus and StudGender regardless of StudState values

3. Unique values of StudMaritalStatus regardless of StudGender and StudState values

4. Grand total

However, because I listed the StudState column first, it appears first in the output.

Note: It should be noted that MySQL does, in fact, support the ROLLUP extension. However, the syntax for using it is different, which is why I chose to omit it from the examples. Check your MySQL documentation for more information. (MySQL does not, however, support either the CUBE or GROUPING SETS extensions.)

Calculating Totals on Combinations Using CUBE

You saw how using ROLLUP results in group subtotals from right to left, plus a grand total. The CUBE extension will generate those same group subtotals, but will also produce subtotals for all combinations of the columns specified in the CUBE clause. If there are n columns listed in the CUBE, there will be 2n subtotal combinations generated.

You’ve already seen the results of using the CUBE extension as the second table of results above, although I will admit that I cheated and added an ORDER BY clause to make the different subtotal rows stand out.

If the request were as follows:

“Show me the count for all combinations of student state, student gender, and student marital status, with summarized sets for each combination of state, gender, and marital status, for each combination of state and gender, state and marital status, gender and marital status, and for each state, gender, and marital status on its own.”

Translation / Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, summarized in sets GROUP BY CUBE student state, student gender, and student marital status and by all combinations of pairs of student state, student gender and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY CUBE

(StudState, StudGender, StudMaritalStatus);

Note: When you find yourself using “and each combination of … every combination” or “summarized in sets,” you should replace that with the CUBE keyword.

The results look like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_CUBE_Order1 in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

WA

F

D

1

NULL

F

D

1

NULL

NULL

D

1

OR

F

M

1

WA

F

M

1

NULL

F

M

2

NULL

NULL

M

2

OR

F

S

1

TX

F

S

2

WA

F

S

3

NULL

F

S

6

CA

M

S

1

OR

M

S

2

TX

M

S

1

WA

M

S

4

NULL

M

S

8

NULL

NULL

S

14

CA

F

W

1

NULL

F

W

1

NULL

NULL

W

1

NULL

NULL

NULL

18

CA

NULL

S

1

CA

NULL

W

1

CA

NULL

NULL

2

OR

NULL

M

1

OR

NULL

S

3

OR

NULL

NULL

4

TX

NULL

S

3

TX

NULL

NULL

3

WA

NULL

D

1

WA

NULL

M

1

WA

NULL

S

7

WA

NULL

NULL

9

CA

F

NULL

1

OR

F

NULL

2

TX

F

NULL

2

WA

F

NULL

5

NULL

F

NULL

10

CA

M

NULL

1

OR

M

NULL

2

TX

M

NULL

1

WA

M

NULL

4

NULL

M

NULL

8

There are three columns listed in the CUBE, so your database system generates 23, or 8, subtotals:

1. Unique combinations of StudState, StudGender, and StudMaritalStatus

2. Unique combinations of StudState and StudGender regardless of StudMaritalStatus values

3. Unique combinations of StudState and StudMaritalStatus regardless of StudGender values

4. Unique combinations of StudGender and StudMaritalStatus regardless of StudState values

5. Unique values of StudState regardless of StudGender and StudMaritalStatus values

6. Unique values of StudGender regardless of StudState and StudMaritalStatus values

7. Unique values of StudMaritalStatus regardless of StudState and StudGender combinations

8. Grand total

As you might expect from the fact that a CUBE produces subtotals for all combinations of the columns specified in the GROUP BY CUBE clause, changing the order of the columns in the SQL statement doesn’t affect the results (other than the order).

If instead, the request was as follows:

“Show me the count for all combinations of student marital status, student gender, and student state, with summarized sets for each combination of marital status, gender, and state, for each combination of state and gender, state and marital status, gender and marital status, and for each state, gender, and marital status on its own.”

Translation/Clean Up

Select the student marital status, student gender, student state, and the count (*) of rows from the Students table, summarized in sets GROUP BY CUBE student marital status, student gender, and student state and by all combinations of pairs of student marital status, student gender and student state

SQL

SELECT StudMaritalStatus, StudGender, StudState,

Count(*) AS Number

FROM Students

GROUP BY CUBE

(StudMaritalStatus, StudGender, StudState);

You will see results as shown in the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_CUBE_Order2 in the School Scheduling Example database.)

StudMaritalStatus

StudGender

StudState

Number

W

F

CA

1

NULL

F

CA

1

S

M

CA

1

NULL

M

CA

1

NULL

NULL

CA

2

M

F

OR

1

S

F

OR

1

NULL

F

OR

2

S

M

OR

2

NULL

M

OR

2

NULL

NULL

OR

4

S

F

TX

2

NULL

F

TX

2

S

M

TX

1

NULL

M

TX

1

NULL

NULL

TX

3

D

F

WA

1

M

F

WA

1

S

F

WA

3

NULL

F

WA

5

S

M

WA

4

NULL

M

WA

4

NULL

NULL

WA

9

NULL

NULL

NULL

18

D

NULL

WA

1

D

NULL

NULL

1

M

NULL

OR

1

M

NULL

WA

1

M

NULL

NULL

2

S

NULL

CA

1

S

NULL

OR

3

S

NULL

TX

3

S

NULL

WA

7

S

NULL

NULL

14

W

NULL

CA

1

W

NULL

NULL

1

D

F

NULL

1

M

F

NULL

2

S

F

NULL

6

W

F

NULL

1

NULL

F

NULL

10

S

M

NULL

8

NULL

M

NULL

8

If you look closely at the two sets of results, you’ll see that the 43 rows in both represent the same results. The changes are the order of the columns, and the order of the rows is different.

Creating a Union of Totals with GROUPING SETS

The third possible extension for GROUP BY is GROUPING SETS. As you can probably imagine, calculating all of the possible subtotals in a cube can require a lot of resources, particularly when there are many dimensions in the data, and not all of the subtotals may be of interest to you. If you don’t need all of the subtotals, but want more than either GROUP BY alone or GROUP BY ROLLUP can provide, GROUPING SETS may be the answer for you.

Using GROUPING SETS is similar to having several different GROUP BY queries combined by a UNION statement.

On the simplest level, the request might be as follows:

“Show me the count for all combinations of student state, student gender, and student marital status, with subtotals for each of student state, student gender, and student marital status.”

Translation / Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by and in GROUPING SETS of student state, student gender, and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY GROUPING SETS

(StudState, StudGender, StudMaritalStatus);

This will return results shown the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_GROUPING_SETS in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

NULL

NULL

D

1

NULL

NULL

M

2

NULL

NULL

S

14

NULL

NULL

W

1

NULL

F

NULL

10

NULL

M

NULL

8

CA

NULL

NULL

2

OR

NULL

NULL

4

TX

NULL

NULL

3

WA

NULL

NULL

9

Yes, if your database system doesn’t support the GROUPING SETS syntax, you can obtain the same results by using the UNION operator on three separate GROUP BY queries, as follows:

SQL

SELECT NULL AS StudState, NULL AS StudGender,

StudMaritalStatus, Count(*) AS Number

FROM Students

GROUP BY StudMaritalStatus

UNION

SELECT NULL, StudGender, NULL, Count(*)

FROM Students

GROUP BY StudGender

UNION

SELECT StudState, NULL, NULL, Count(*)

FROM Students

GROUP BY StudState;

(I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_GROUP_BY_UNION in the School Scheduling Example database.)

Where the power of GROUPING SETS comes in is that they give you the flexibility to choose which subtotals you wish to see.

Let’s examine the following request.

“Show me the count for all combinations of student state, student gender, and student marital status, with subtotals for student state, for the combination of student state and student gender and for the combination of student state and student marital status, but no grand total by student state, student gender, and student marital status.”

Note: When you find yourself asking for a subtotal on the combination as well as the individual columns and some but not all combinations of columns, you should replace that with the sets of GROUPING SETS. This is particularly true if you do not need a grand total across all the grouped columns because CUBE and ROLLUP do return a grand total, but GROUPING SETS does not return a grand total unless you include an empty set in the GROUPING SETS list.

Translation/Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by and in GROUPING SETS of the combination of student state and student gender, and by the combination of student state and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY GROUPING SETS

(StudState,

(StudState, StudGender),

(StudState, StudMaritalStatus));

This will return results as shown in the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_GROUPING_SETS_1 in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

CA

NULL

S

1

CA

NULL

W

1

CA

NULL

NULL

2

OR

NULL

M

1

OR

NULL

S

3

OR

NULL

NULL

4

TX

NULL

S

3

TX

NULL

NULL

3

WA

NULL

D

1

WA

NULL

M

1

WA

NULL

S

7

WA

NULL

NULL

9

CA

F

NULL

1

OR

F

NULL

2

TX

F

NULL

2

WA

F

NULL

5

CA

M

NULL

1

OR

M

NULL

2

TX

M

NULL

1

WA

M

NULL

4

If you look carefully at those results, you’ll see three different sets of subtotals:

1. Unique values of StudState, regardless of values of StudGender or StudMaritalStatus

2. Unique combinations of StudState and StudGender, regardless of values of StudMaritalStatus

3. Unique combinations of StudState and StudMaritalStatus, regardless of values of StudGender

Notice that unlike ROLLUP that asks for subtotals to be “rolled up” right to left or CUBE that asks for subtotals of all combinations, you can exactly specify the combinations of columns on which you want subtotals.

You can get a nearly identical result by using a GROUP BY column followed by a CUBE on the remaining columns. I’ll show you how to do that in the following section.

Variations on Grouping Techniques

If you take a closer look at Figure 21-1, you should be able to figure out that it’s perfectly legal to use combinations of simple grouping columns, ROLLUP, CUBE, and GROUPING SETS in a GROUP BY clause. You can also list combinations of columns in the list of columns you pass to ROLLUP, CUBE, or GROUPING SETS. Quite frankly, I’m hard-pressed to think of a case in which you would want to use a combination of ROLLUP, CUBE, and/or GROUPING SETS, but it is reasonable to “promote” one or more columns to simple grouping columns or to specify sub-groups of columns within a grouped set. Let’s take a look at a couple of examples.

Suppose you don’t need all the subtotals; your request could be as follows:

“Show me the count for all combinations of student state, student gender, and student marital status. Summarize by student state and by the combination of student state and student marital status.”

Translation/Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by and in a GROUPING SETS set of student state, student gender, and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY GROUPING SETS (StudState,

(StudGender, StudMaritalStatus));

The returned results look like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_GROUPING_SETS_2 in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

NULL

F

D

1

NULL

F

M

2

NULL

F

S

6

NULL

M

S

8

NULL

F

W

1

CA

NULL

NULL

2

OR

NULL

NULL

4

TX

NULL

NULL

3

WA

NULL

NULL

9

There are only two different sets of subtotals:

1. Unique values of StudState, regardless of values of StudGender or StudMaritalStatus

2. Unique combinations of StudGender and StudMaritalStatus, regardless of values of StudState

“Show me the count for all combinations of student state, student gender, and student marital status, with subtotals for each state and for each combination of gender and marital status.”

The database system returned this result because I specified two different sets of columns in the GROUPING SETS list: StudState by itself and StudGender and StudMaritalStatus in a group. You can see that the result is a total for each of the states and a total for each of the combinations of gender and marital status.

Now, let’s see what happens when you “promote” one of the columns by moving it out of the grouping sets clause and using it as a simple grouped column. Take a look at the following request.

Translation/Clean Up

Select the student state, student gender, student marital status, and the count (*) of rows from the Students table, grouped by student state and rolled up by student gender and student marital status

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY StudState,

ROLLUP (StudMaritalStatus, StudGender);

In that case, the results look like the following table. (I saved this request as CH21_Students_State_Gender_MaritalStatus_Count_ROLLUP_Partial in the School Scheduling Example database.)

StudState

StudGender

StudMaritalStatus

Number

CA

F

W

1

CA

F

NULL

1

CA

M

S

1

CA

M

NULL

1

CA

NULL

NULL

2

OR

F

M

1

OR

F

S

1

OR

F

NULL

2

OR

M

S

2

OR

M

NULL

2

OR

NULL

NULL

4

TX

F

S

2

TX

F

NULL

2

TX

M

S

1

TX

M

NULL

1

TX

NULL

NULL

3

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

F

NULL

5

WA

M

S

4

WA

M

NULL

4

WA

NULL

NULL

9

You can see that it groups by StudState, but does a ROLLUP on the combination of StudGender and StudMaritalStatus, resulting in three levels of subtotals:

1. StudState and unique combinations of StudGender and StudMaritalStatus

2. StudState and unique values of StudGender regardless of StudMaritalStatus values

3. StudState, regardless of StudGender and StudMaritalStatus values

By moving StudState outside the GROUPING SETS, I get the same results as you saw in CH21_Students_State_Gender_MaritalStatus_Count_CUBE_Order1 earlier, but without the grand totals row.

Let’s try that again, but this time use CUBE instead of GROUPING SETS and sort the results to make it easier to see what’s happening. I’ll skip the request and Translation/Cleanup and go straight to the SQL.

SQL

SELECT StudState, StudGender, StudMaritalStatus,

Count(*) AS Number

FROM Students

GROUP BY StudState,

CUBE (StudMaritalStatus, StudGender)

ORDER BY StudState, StudGender, StudMaritalStatus

That gives you in the following result:

StudState

StudGender

StudMaritalStatus

Number

CA

NULL

NULL

2

CA

NULL

S

1

CA

NULL

W

1

CA

F

NULL

1

CA

F

W

1

CA

M

NULL

1

CA

M

S

1

OR

NULL

NULL

4

OR

NULL

M

1

OR

NULL

S

3

OR

F

NULL

2

OR

F

M

1

OR

F

S

1

OR

M

NULL

2

OR

M

S

2

TX

NULL

NULL

3

TX

NULL

S

3

TX

F

NULL

2

TX

F

S

2

TX

M

NULL

1

TX

M

S

1

WA

NULL

NULL

9

WA

NULL

D

1

WA

NULL

M

1

WA

NULL

S

7

WA

F

NULL

5

WA

F

D

1

WA

F

M

1

WA

F

S

3

WA

M

NULL

4

WA

M

S

4

I saved this example as CH21_Students_State_Gender_MaritalStatus_CUBE_Partial in the School Scheduling sample database.

Because StudState is now outside the CUBE, it doesn’t participate directly in the cubing action, but you do get a total for all states and all combinations of gender and marital status. Notice that moving one column out now avoids the grand total column—a result similar to what you would get with GROUPING SETS and multiple combinations of columns. Does your head hurt yet?

Feel free to play with other variations or combinations using my sample databases. You can simply copy the SQL from one of the Views and tinker with it to see what happens.

Sample Statements

You now know the mechanics of constructing queries using grouping sets and have seen some of the types of requests you can answer with grouping sets. Let’s stop worrying about the gender or marital status of students and take a look at a fairly robust set of samples, all of which use one or more grouping set specifications. These examples come from each of the sample databases, and they illustrate the use of the grouping sets to generate subtotals in various ways.

I’ve also included sample result sets that would be returned by these operations and placed them immediately after the SQL syntax line. The name that appears immediately above a result set is the name I gave each query in the sample data on the companion website for this book, www.informit.com/title/9780134858333. I stored each query in the appropriate sample database (as indicated within the example), and I prefixed the names of the queries relevant to this chapter with “CH21.” You can follow the instructions in the Introduction of this book to load the samples onto your computer and try them.

Note: Remember that all the column names and table names used in these examples are drawn from the sample database structures shown in Appendix B, “Schema for the Sample Databases.” Because many of these examples use complex JOINs, your database system might choose a different way to solve these queries. For this reason, the first few rows might not exactly match the result you obtain, but the total number of rows should be the same. To simplify the process, I have combined the Translation and Clean Up steps for all the following examples.

Examples using ROLLUP

Sales Orders Database

“For each category of product, show me, by state, the count of orders and how much revenue the customers have generated. Give me a subtotal for each category plus a grand total.”

Translation/Clean Up

Select CategoryDescription, CustState, the count of DISTINCT orders.OrderNumber, and the sum of (QuotedPrice times * QuantityOrdered) as Price from the Order_Details table inner joined with the Orders table on Orders.OrderNumber = Order_Details.OrderNumber inner joined with the Customers table on Customers.CustomerID = Orders.CustomerID inner joined with the Products table on Products.ProductNumber = Order_Details.ProductNumber inner joined with the Categories table on Categories.CategoryID = Products.Category ID summarized by GROUP BY ROLLUP (CategoryDescription and CustState)

talSQL

SELECT PC.CategoryDescription, C.CustState,

COUNT(DISTINCT O.OrderNumber) AS OrderCount,

SUM(OD.QuotedPrice * QuantityOrdered) AS Revenue

FROM Order_Details AS OD

INNER JOIN Orders AS O

 ON O.OrderNumber = OD.OrderNumber

INNER JOIN Customers AS C

 ON C.CustomerID = O.CustomerID

INNER JOIN Products AS P

 ON P.ProductNumber = OD.ProductNumber

INNER JOIN Categories AS PC

 ON PC.CategoryID = P.CategoryID

GROUP BY ROLLUP

(PC.CategoryDescription, C.CustState)

CH21_ProductCategory_CustomerState_Revenue_ROLLUP (31 rows)

CategoryDescription

CustState

OrderCount

Price

Accessories

CA

174

$85,201.52

Accessories

OR

122

$56,551.79

Accessories

TX

112

$89,104.78

Accessories

WA

37

$141,212.93

Accessories

NULL

94

$372,071.02

Bikes

CA

69

$729,481.45

<< more rows here >>

Tires

NULL

257

$25,249.24

Components

NULL

586

$244242.53

NULL

NULL

933

$4,630,731.37

Just for comparison, if you GROUP BY the two columns, you get one total for each combination of category description and state, with no grand total. If you CUBE the two columns, you get a total for each combination of category description and state, subtotals by category, subtotals by state, and a grand total. Because the request asked for a subtotal only by category, I used ROLLUP.

School Scheduling Database

“Show me how many sessions are scheduled for each classroom over the next two semesters. Give me subtotals by building, by classroom, by semester, and by subject, plus a grand total.”

Because this involves a rather tricky bit of SQL due to the unnormalized list of days in the Classes table, I’m going to use the solution presented in CH20_Class_Schedule_Calendar, which returns one row for each individual class session, as a starting point. You might wish to review that example in the previous chapter if you’re uncertain.

Translation/Clean Up

Select BuildingCode, ClassRoomID, SemesterNo, SubjectCode, and the count of classes Count(*) from the CH20_Class_Schedule_Calendar view summarized by GROUP BY ROLLUP (BuildingCode, ClassRoomID, SemesterNo, SubjectCode)

talSQL

SELECT BuildingCode, ClassRoomID, SemesterNo,

SubjectCode, Count(*) AS NumberOfSessions

FROM CH20_Class_Schedule_Calendar

GROUP BY ROLLUP(BuildingCode, ClassRoomID,

SemesterNo, SubjectCode);

CH21_Building_ClassRoom_Semester_Subject_Count_ROLLUP (212 rows)

BuildingCode

ClassRoomID

SemesterNo

SubjectCode

NumberOfSessions

AS

1514

1

JRN 104

29

AS

1514

1

NULL

29

AS

1514

2

JRN 104

29

AS

1514

2

NULL

29

AS

1514

NULL

NULL

58

<< more rows here >>

TB

1642

2

CIS 114

58

TB

1642

2

NULL

58

TB

1642

NULL

NULL

117

TB

NULL

NULL

NULL

439

NULL

NULL

NULL

NULL

7221

Examples using CUBE

Bowling League Database

“I want to know the average handicap score for each bowler by team and city. Give me subtotals for each combination of team and city, for each team, for each city, plus a grand total.”

Translation/Clean Up

Select TeamName, BowlerCity, and the average of Avg(HandicapScore) as AvgHandicap from the Teams table inner joined with the Bowlers table on Bowlers.TeamID = Teams.TeamID inner joined with the Bowler_Scores table on Bowler_Scores.BowlerID = Bowlers.BowlerID summarized in sets by GROUP BY CUBE (TeamName, and BowlerState)

SQL

SELECT T.TeamName, B.BowlerCity,

Avg(BS.HandicapScore) AS AvgHandicap

FROM Teams AS T

INNER JOIN Bowlers AS B

 ON B.TeamID = T.TeamID

INNER JOIN Bowler_Scores AS BS

 ON BS.BowlerID = B.BowlerID

GROUP BY CUBE (T.TeamName, B.BowlerCity);

The clue here is the request asks not only for the average for each combination of team and city and for subtotals by all individual columns. This calls for CUBE.

CH21_Team_City_AverageHandicapScore_CUBE (44 rows)

TeamName

BowlerCity

AvgHandicap

Barracudas

Auburn

197

Manatees

Auburn

196

Sharks

Auburn

196

Swordfish

Auburn

193

NULL

Auburn

196

Marlins

Ballard

196

Terrapins

Ballard

195

NULL

Ballard

196

Terrapins

Bellevue

194

<< more rows here >>

Sales Orders Database

“For each category of product, show me, by state, how much quantity the vendors have on hand. Give me subtotals for each category, for each state, plus a grand total.”

Translation/Clean Up

Select CategoryDescription, VendState, and the sum of (QuantityOfHand) as Price from the Products table inner joined with the Categories table on Orders.OrderNumber = Categories.CategoryID = Products.CategoryID inner joined with the Product_Vendors table on Product_Vendors.ProductNumber = Products.ProductNumber inner joined with the Vendors table on Vendors.VendorID = Product_Vendors.VendorID summarized in sets by GROUP BY CUBE (CategoryDescription, and VendState)

SQL

SELECT PC.CategoryDescription, V.VendState,

SUM(P.QuantityOnHand) AS QOH

FROM Products AS P

INNER JOIN Categories AS PC

 ON PC.CategoryID = P.CategoryID

 INNER JOIN Product_Vendors AS PV

ON PV.ProductNumber = P.ProductNumber

 INNER JOIN Vendors AS V

ON V.VendorID = PV.VendorID

GROUP BY CUBE (PC.CategoryDescription,

 V.VendState)

CH21_ProductCategory_VendorState_QOH_CUBE (39 rows)

CategoryDescription

VendState

QOH

Accessories

AK

48

Bikes

AK

8

Car racks

AK

14

Clothing

AK

94

Components

AK

278

Tires

AK

60

NULL

AK

502

Accessories

CA

54

NULL

NULL

1914

Accessories

NULL

642

Bikes

NULL

48

Car racks

NULL

28

Clothing

NULL

222

Components

NULL

794

Tires

NULL

180

<< more rows here >>

Examples using GROUPING SETS

Bowling League Database

“Show me how many games each bowler has participated in, summarized by both team and city.”

Translation/Clean Up

Select TeamName, BowlerCity, and the count count(*) of from the Teams table inner joined with the Bowlers table on Bowlers.TeamID = Teams.TeamID inner joined with the Bowler_Scores table on Bowler_Scores.BowlerID = Bowlers.BowlerID summarized both by GROUP BY GROUPING SETS (TeamName, and BowlerCity)

SQL

SELECT T.TeamName, B.BowlerCity, Count(*) AS

GamesBowled

FROM Teams AS T

INNER JOIN Bowlers AS B

 ON B.TeamID = T.TeamID

INNER JOIN Bowler_Scores AS BS

 ON BS.BowlerID = B.BowlerID

GROUP BY GROUPING SETS (T.TeamName,

B.BowlerCity);

Notice that I didn’t ask for subtotals for each team and city combination. I only want summaries by team and by city, so GROUPING SETS is ideal.

CH21_Team_City_GamesBowled_GROUPING_SETS (18 rows)

TeamName

BowlerCity

GamesBowled

NULL

Auburn

210

NULL

Ballard

84

NULL

Bellevue

42

NULL

Bothell

84

NULL

Duvall

126

NULL

Kirkland

126

NULL

Redmond

294

NULL

Seattle

168

NULL

Tacoma

42

NULL

Woodinville

168

Barracudas

NULL

168

Dolphins

NULL

168

Manatees

NULL

168

Marlins

NULL

168

Orcas

NULL

168

Sharks

NULL

168

Swordfish

NULL

168

Terrapins

NULL

168

Entertainment Agency Database

“Show me counts of our customers summarized by both style and zip code.”

Translation/Clean Up

Select StyleName, CustZipCode, and the count count(*) of from the Customers table inner joined with the Musical_Preferences table on Musical_Preferences.CustomerID = Customers.CustomerID inner joined with the Musical_Styles table on Musical_Styles.StyleID = Musical_Styles.Style summarized both by GROUP BY GROUPING SETS (StyleName, and CustZipCode)

talSQL

SELECT MS.StyleName, C.CustZipCode, Count(*) AS

Instances

FROM Customers AS C

INNER JOIN Musical_Preferences AS MP

 ON MP.CustomerID = C.CustomerID

INNER JOIN Musical_Styles AS MS

 ON MS.StyleID = MP.StyleID

GROUP BY GROUPING SETS (MS.StyleName,

 C.CustZipCode)

CH21_Style_CustomerZipCode_Count_GROUPING_SETS (27 rows)

StyleName

CustZipCode

Instances

NULL

98002

2

NULL

98006

14

NULL

98033

7

NULL

98052

4

NULL

98105

2

NULL

98115

3

NULL

98413

4

40’s Ballroom Music

NULL

2

60’s Music

NULL

1

70’s Music

NULL

1

80’s Music

NULL

1

<< more rows here >>

Summary

I began the chapter by describing to you why you might need to group data differently from how you were shown in Chapter 13. This included an example illustrating one possibility.

I went on to explain the difference between the three extensions to GROUP BY:

• GROUP BY ROLLUP

• GROUP BY CUBE

• GROUP BY GROUPING SETS

I kept mentioning that the three keywords ROLLUP, CUBE and GROUPING SETS are simply extensions of the GROUP BY syntax I discussed in Chapter 13, so all restrictions from that chapter apply equally here.

I summarized why these extensions to GROUP BY can be useful, and I provided you with examples of how to build requests that require these extensions to the GROUP BY clause.

The following section presents some requests that you can work out on your own.

Problems for You to Solve

Below, I show you the request statement and the name of the solution query in the sample databases. (Hint: the name of the saved query lets you know whether you should use ROLLUP, CUBE, or GROUPING SETS.) If you want some practice, you can work out the SQL you need for each request and then check your answer with the query I saved in the samples. Don’t worry if your syntax doesn’t exactly match the syntax of the queries I saved—as long as your result set is the same.

Bowling League Database

1. “Show me how many bowlers live in each city. Give me totals for each combination of Team and City, for each Team, for each City plus a grand total.”

You can find my solution in CH21_Team_City_Count_CUBE (44 rows).

2. “Show me the average raw score for each bowler. Give me totals by Team and by City.”

You can find my solution in CH21_Team_City_AverageRawScore_GROUPING_SETS (18 rows).

3. “Show me the average handicap score for each bowler. For each team, give me average for each city in which the bowlers live. Also give me the average for each team, and the overall average for the entire league.”

You can find my solution in CH21_Team_City_AverageHandicapScore_ROLLUP (34 rows).

Entertainment Agency Database

1. “For each city where our entertainers live, show me how many different musical styles are represented. Give me totals for each combination of City and Style, for each City plus a grand total.”

You can find my solution in CH21_EntertainerCity_Style_ROLLUP (36 rows).

2. “For each city where our customers live, show me how many different musical styles they’re interested in. Give me total counts by city, total counts by style and total counts for each combination of city and style.”

You can find my solution in CH21_CustomerCity_Style_GROUPING_SETS (18 rows).

3. “Give me an analysis of all the bookings we’ve had. I want to see the number of bookings and the total charge broken down by the city the agent lives in, the city the customer lives in, and the combination of the two.”

You can find my solution in CH21_AgentCity_CustomerCity_Count_Charge_GROUPING_SETS (34 rows).

Recipes Database

1. “I want to know how many recipes there are in each of the recipe classes in my cookbook, plus an overall total of all the recipes regardless of recipe class. Make sure to include any recipe classes that don’t have any recipes in them.”

You can find my solution in CH21_RecipeClass_Recipe_Counts_ROLLUP (8 rows).

2. “I want to know the relationship between RecipeClasses and IngredientClasses. For each recipe class, show me how many different ingredient classes are represented, and for each ingredient class, show me how many different recipe classes are represented.”

You can find my solution in CH21_RecipeClass_IngredClass_Counts_GROUPING_SETS (25 rows).

3. “I want to know even more about the relationship between RecipeClasses and IngredientClasses. Show me how many recipes there are in each combination of recipe class and ingredient class. Also show me how many recipes there are in each ingredient class regardless of the recipe class, how many recipes there are in each recipe class regardless of the ingredient class, and how many recipes there are in total.”

You can find my solution in CH21_RecipeClass_IngredClass_CUBE (61 rows).

Sales Orders Database

1. “For each category of product, show me, by state, how much revenue the customers have generated. Give me subtotals for each state, for each category, plus a grand total.”

You can find my solution in CH21_ProductCategory_CustomerState_Revenue_CUBE (35 rows).

2. “For each category of product, show me, by state, how much quantity the vendors have on hand. Give me subtotals for each state within a category, plus a grand total.”

You can find my solution in CH21_ProductCategory_VendorState_QOH_ROLLUP (33 rows).

3. “For each of our vendors, let me know how many products they supply in each category. I want to see this broken down by state. For each state, show me the number of products in each category. Show me the number of products for all categories and a grand total as well.”

Note that the counts will not represent the number of different products that are sold!

You can find my solution in CH21_VendorState_Category_Count_ROLLUP (43 rows).

School Scheduling Database

1. “Summarize the number of class sessions scheduled, showing semester, building, classroom, and subject. Give me subtotals for each semester, for each combination of building and classroom and for each subject.”

You can find my solution in CH21_Semester_Building_ClassRoom_Subject_Count_GROUPING_SETS (82 rows).

2. “For each department, show me the number of courses that could be offered, and whether they’re taught by a Professor, an Associate Professor, or an Instructor. Give me total courses per department and total courses overall as well.”

Note that the number of courses returned will be greater than the number of courses offered by the school because some courses could be taught by more than instructors.

You can find my solution in CH21_Department_Title_Count_ ROLLUP (20 rows).

3. “I want to know how many courses our students have been in contact with. Give me totals by whether they completed the course, are currently enrolled in it or withdrew. I’d also like to see this broken down by student major. May as well give me the total courses completed, enrolled and withdrawn while you’re at it. Don’t worry about splitting it up by semester.”

You can find my solution in CH21_Major_ClassStatus_Count_GROUPING_SETS (26 rows).