Hour 17

Summarizing Data Results from a Query

What You’ll Learn in This Hour:

In this hour, you learn about the aggregate functions of SQL. You can perform a variety of useful functions with aggregate functions, such as getting the highest total of a sale or counting the number of orders processed on a given day. The real power of aggregate functions is discussed in the next hour when you tackle the GROUP BY clause. You have already learned how to query the database and return data in an organized way. You have also learned how to sort data from a query. During this hour, you learn how to return data from a query and break it into groups for improved readability.

Using Aggregate Functions

Functions are keywords in SQL that you use to manipulate values within columns for output purposes. A function is a command normally used with a column name or expression that processes the incoming data to produce a result. SQL contains several types of functions. This hour covers aggregate functions. An aggregate function provides summarization information for a SQL statement, such as counts, totals, and averages.

This basic set of aggregate functions is discussed in this hour:

  •    COUNT

  •    SUM

  •    MAX

  •    MIN

  •    AVG

In previous hours, you updated some of the data in the BIRDS database. For example, several records in the table were updated to place a NULL value in the WINGSPAN column. During this hour, you rerun the scripts to rebuild the tables and the data, to return the data to its original form. Remember that at any point, you can execute the script tables.sql to drop and rebuild your tables, and the script data.sql to insert the original data back into your tables. The following tables are used for examples during this hour:

  •    BIRDS

  •    MIGRATION, BIRDS_MIGRATION

  •    FOOD, BIRDS_FOOD

  •    PHOTOGRAPHERS (created during the previous hour)

Be sure to check your data and reset it, if necessary, so that your results match the results in the examples during this hour. For example, if there are null values in the WINGSPAN column of BIRDS, any aggregate functions will return different results than if numeric values existed in all rows of data for WINGSPAN . Remember that you can reset your example BIRDS database at any time by rerunning the provided scripts tables.sql and data.sql to rebuild your tables and insert fresh data back into your tables.

The COUNT Function

You use the COUNT function to count rows or values of a column that do not contain a NULL value. When used within a query, the COUNT function returns a numeric value. You can also use the COUNT function with the DISTINCT command to count only the distinct rows of a dataset. ALL (opposite of DISTINCT) is the default; including ALL in the syntax is not necessary. Duplicate rows are counted if DISTINCT is not specified. One other option with the COUNT function is to use it with an asterisk. COUNT(*) counts all the rows of a table, including duplicates, regardless of whether a column contains a NULL value.

Note

Use DISTINCT Only in Certain Circumstances

You cannot use the DISTINCT command with COUNT(*), only with COUNT (column_name). This is because DISTINCT is a function that looks for a unique value in a column, whereas (*) represents all columns in a table or a complete row of data.

The syntax for the COUNT function follows:

COUNT [ (*) | (DISTINCT | ALL) ] (COLUMN NAME)

Caution

COUNT(*) Differs from Other Count Variations

COUNT(*) produces slightly different calculations than other count variations. When the COUNT function is used with the asterisk, it counts the rows in the returned result set without regard for duplicates and NULL values. This is an important distinction. If you need your query to return a count of a particular field and include NULLs, you should use a function such as ISNULL to replace the NULL values.

The following example simply counts all rows of data in the BIRDS table. In other words, this example tells you how many types of birds are in the database.

SQL> select count(*) from birds;

  COUNT(*)
----------
        23

1 row selected.

Before looking at the next example of the COUNT function, select all the photographers and their associated mentor identifications from the PHOTOGRAPHERS table. Remember that this table is related to itself through the mentor identification and the photographer identification; this is an example of a self join, as you have already learned. Take a minute to study the results, and try to match up photographers with their corresponding mentors.

SQL> select p_id, photographer, mentor_p_id
  2  from photographers;

      P_ID PHOTOGRAPHER                   MENTOR_P_ID
---------- ------------------------------ -----------
         7 Ryan Notstephens
         8 Susan Willamson
         9 Mark Fife
         1 Shooter McGavin
         2 Jenny Forest                             8
         3 Steve Hamm
         4 Harry Henderson                          9
         5 Kelly Hairtrigger                        8
         6 Gordon Flash
        10 Kate Kapteur                             7

10 rows selected.

The following example counts every occurrence of a mentor photographer identification that is NOT NULL. This gives you the total number of photographers in the database that are mentored by another photographer.

SQL> select count(mentor_p_id) "TOTAL PHOTOGRAPHERS MENTORED"
  2  from photographers;

TOTAL PHOTOGRAPHERS MENTORED
----------------------------
                           4

This example is a query that provides a count of any distinct occurrences of the mentor photographer identification, which gives you a total count of the number of photographers that mentor other photographers in the database.

SQL> select count(distinct(mentor_p_id)) "TOTAL MENTORS"
  2  from photographers;

TOTAL MENTORS
-------------
            3

1 row selected.

Note

Data Types Do Not Use COUNT

Because the COUNT function counts the rows, data types do not play a part. The rows can contain columns with any data type. The only consideration that actually counts is whether the value is NULL.

The SUM Function

The SUM function returns a total on the values of a column for a group of rows. You can also use the SUM function with DISTINCT. When you use SUM with DISTINCT, only the distinct rows are totaled, which might not have much purpose. Your total is not accurate in that case because rows of data are omitted.

The syntax for the SUM function follows:

SUM ([ DISTINCT ] COLUMN NAME)

Caution

SUM Must Be Numeric

The value of an argument must be numeric to use the SUM function. You cannot use the SUM function on columns that have a data type other than numeric.

This example simply returns the sum of the wingspan stored for all birds in the database. This simplistic example is not very useful, but you build upon it during this hour.

SQL> select sum(wingspan) from birds;

SUM(WINGSPAN)
-------------
       1163.1

1 row selected.

This next example is a bit more useful: An arithmetic calculation is embedded within the SUM function. First you multiply the number of eggs that a bird lays times a number of broods per year; then you apply the SUM function on that value for every bird in the BIRDS table. This returns the total number of eggs laid by all birds in a season or year.

SQL> select sum(eggs * broods) "TOTAL EGGS LAYED BY ALL BIRDS IN A SEASON"
  2  from birds;

TOTAL EGGS LAYED BY ALL BIRDS IN A SEASON
-----------------------------------------
                                      127

1 row selected.

The AVG Function

The AVG function finds the average value for a given group of rows. When used with the DISTINCT command, the AVG function returns the average of the distinct rows. The syntax for the AVG function follows:

AVG ([ DISTINCT ] COLUMN NAME)

Note

AVG Must Be Numeric

The value of the argument must be numeric for the AVG function to work.

The average value for all values in the BIRDS table’s WINGSPAN column is retrieved in the following example:

SQL> select avg(wingspan) from birds;

AVG(WINGSPAN)
-------------
   50.5695652

1 row selected.

Caution

Sometimes Your Data Is Truncated

In some implementations, the results of your query are truncated to the precision of the data type. Review your database system’s documentation to ensure that you understand the normal precision for the various data types. This will prevent you from unnecessarily truncating data and possibly getting an unexpected result from improper precision.

The MAX Function

The MAX function returns the maximum value from the values of a column in a group of rows. NULL values are ignored when using the MAX function. Using MAX with the DISTINCT command is an option, but because the maximum value for all the rows is the same as the distinct maximum value, DISTINCT is useless.

The syntax for the MAX function follows:

MAX([ DISTINCT ] COLUMN NAME)

The following example returns the highest WINGSPAN in the BIRDS table:

SQL> select max(wingspan) from birds;

MAX(WINGSPAN)
-------------
         94.8

1 row selected.

You can also use aggregate functions such as MAX and MIN on character data. When using these values, collation of your database comes into play again. Most commonly, your database collation is set to a dictionary order, so the results are ranked according to that. For example, say that you perform a MAX operation on the BIRD_NAME column of the BIRDS table:

SQL> select max(bird_name)
  2  from birds;

MAX(BIRD_NAME)
------------------------------
Turkey Vulture

1 row selected.

In this instance, the function returns the largest value according to a dictionary ordering of the data in the column.

The MIN Function

The MIN function returns the minimum value of a column for a group of rows. NULL values are ignored when using the MIN function. Using MIN with the DISTINCT command is an option. However, because the minimum value for all rows is the same as the minimum value for distinct rows, DISTINCT is useless.

The syntax for the MIN function follows:

MIN([ DISTINCT ] COLUMN NAME)

The following example returns the lowest WINGSPAN in the BIRDS table:

SQL> select min(wingspan) from birds;

MIN(WINGSPAN)
-------------
          3.2

1 row selected.

Note

DISTINCT and Aggregate Functions Don’t Always Mix

One important consideration when using aggregate functions with the DISTINCT command is that your query might not return the wanted results. The purpose of aggregate functions is to return summarized data based on all rows of data in a table. When DISTINCT is used, first it is applied to the results and then those results are passed on to the aggregate function, which can dramatically alter the results. Be sure you understand this when you use DISTINCT with aggregate functions.

As with the MAX function, the MIN function can work against character data. It returns the minimum value according to the dictionary ordering of the data.

SQL> select min(bird_name)
  2  from birds;

MIN(BIRD_NAME)
------------------------------
American Coot

1 row selected.

Grouping Data

Grouping data is the process of combining columns with duplicate values in a logical order. For example, a database might contain information about employees; many employees live in different cities, but some employees live in the same city. You might want to execute a query that shows employee information for each particular city. You can group employee information by city and create a summarized report.

Or perhaps you want to figure the average salary paid to employees, according to each city. You can do this by using the aggregate function AVG on the SALARY column, as you learned in the previous hour, and by using the GROUP BY clause to group the output by city.

Grouping data is accomplished by using the GROUP BY clause of a SELECT statement (query). In this lesson, you learn how to use aggregate functions with the GROUP BY clause to display results more effectively.

Using the GROUP BY Clause

The GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups. This clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY clause.

The position of the GROUP BY clause in a query follows:

SELECT
FROM
WHERE
GROUP BY
ORDER BY

The following is the SELECT statement’s syntax, including the GROUP BY clause:

SELECT COLUMN1, COLUMN2
FROM TABLE1, TABLE2
WHERE CONDITIONS
GROUP BY COLUMN1, COLUMN2
ORDER BY COLUMN1, COLUMN2

This ordering normally takes a little getting used to when writing your first queries with the GROUP BY clause; however, it is logical. The GROUP BY clause is normally a much more CPU-intensive operation, and if you do not constrain the rows provided to it, you are grouping unnecessary data that later is discarded. Thus, you intentionally reduce the data set with the WHERE clause so that you perform your grouping only on the rows you need.

You can use the ORDER BY clause, but normally the RDBMS also orders the results by the column ordering in the GROUP BY clause, which is discussed more in depth later in this hour, in the section “Understanding the Difference Between GROUP BY and ORDER BY.” Unless you need to order the values in a different pattern than the GROUP BY clause, the ORDER BY clause is redundant. However, sometimes it is provided because you use aggregate functions in the SELECT statement that are not in the GROUP BY clause, or because your particular RDBMS functions slightly differently from the standard.

The following sections explain how to use the GROUP BY clause and provide examples of using it in a variety of situations.

Group Functions

Typical group functions—functions that the GROUP BY clause uses to arrange data in groups—include AVG, MAX, MIN, SUM, and COUNT. These are the aggregate functions that you learned about in Hour 17. Remember that the aggregate functions were used for single values in Hour 17; now you use the aggregate functions for group values.

Grouping Selected Data

Grouping data is simple. The selected columns (the column list following the SELECT keyword in a query) are the columns you can reference in the GROUP BY clause. If a column is not in the SELECT statement, you cannot use it in the GROUP BY clause. How can you group data on a report if the data is not displayed?

If the column name has been qualified, the qualified name must go into the GROUP BY clause. The column name can also be represented by a number. When grouping the data, the order of columns grouped does not have to match the column order in the SELECT clause.

Creating Groups and Using Aggregate Functions

The SELECT clause has conditions that must be met when using GROUP BY. Specifically, whatever columns are selected must appear in the GROUP BY clause, except for any aggregate values. If the columns in the SELECT clause are qualified, the qualified names of the columns must be used in the GROUP BY clause. Some examples of syntax for the GROUP BY clause are shown next.

The following SQL statement selects the MIGRATION_LOCATION and a count of all entries in the BIRD_ID column in the BIRDS_MIGRATION table. Remember that the BIRDS_MIGRATION table links the MIGRATION table to the BIRDS table, so every BIRD_ID in the BIRDS_MIGRATION table that corresponds with a MIGRATION_ID value indicates the number of birds that migrate to a specific location. The tables are joined appropriately, of course, and the query is grouped by the migration location. This gives you a total count of birds in the database that migrate to each individual location.

SQL> select migration.migration_location, count(birds_migration.bird_id) birds
  2  from migration, birds_migration
  3  where migration.migration_id = birds_migration.migration_id
  4  group by migration.migration_location;

MIGRATION_LOCATION                  BIRDS
------------------------------ ----------
Central America                        12
Mexico                                 14
No Significant Migration                5
Partial, Open Water                     1
South America                           6
Southern United States                 18

6 rows selected.

This next example query focuses on the different types of foods that birds eat. You select the food name and the average wingspan of birds and then apply the ROUND function on the average wingspan to get a whole number. You can see in the FROM clause that you are selecting from three tables and that you have joined those three tables appropriately in the WHERE clause. The results are grouped by the FOOD_NAME and ordered by the second column, which is the AVG(WINGSPAN), in descending order. This sort operation shows the food names with the largest average wingspan of birds first, all the way down to the least. This query gives you a list of all food items in the database and the average wingspan of birds that eat certain food items.

SQL> select food.food_name, round(avg(birds.wingspan)) avg_wingspan
  2  from food, birds, birds_food
  3  where food.food_id = birds_food.food_id
  4  and birds.bird_id = birds_food.bird_id
  5  group by food.food_name
  6  order by 2 desc;

FOOD_NAME                      AVG_WINGSPAN
------------------------------ ------------
Reptiles                                 90
Deer                                     90
Ducks                                    84
Frogs                                    73
Birds                                    69
Crayfish                                 67
Small Mammals                            65
Snakes                                   63
Insects                                  61
Carrion                                  53
Fish                                     52
Plants                                   49
Rodents                                  48
Aquatic Plants                           43
Crustaceans                              41
Fruit                                    40
Aquatic Insects                          40
Seeds                                    38
Corn                                      3

19 rows selected.

Understanding the Difference Between GROUP BY and ORDER BY

The GROUP BY clause works the same as the ORDER BY clause, in that both sort data. Specifically, you use the ORDER BY clause to sort data from a query. The GROUP BY clause also sorts data from a query to properly group the data.

However, some differences and disadvantages arise when you use GROUP BY instead of ORDER BY for sorting operations:

  •    All non-aggregate columns selected must be listed in the GROUP BY clause.

  •    The GROUP BY clause generally is not necessary unless you use aggregate functions.

Let’s look at a few examples. In the following query, you select the migration location and wingspan for birds that migrate to either Central America or Mexico; the results are sorted by the migration location. This is an example of the sort operation performed by the ORDER BY clause.

SQL> select m.migration_location, b.wingspan
  2  from birds b,
  3    birds_migration bm,
  4    migration m
  5  where b.bird_id = bm.bird_id
  6    and m.migration_id = bm.migration_id
  7    and m.migration_location in ('Central America', 'Mexico')
  8  order by 1;

MIGRATION_LOCATION               WINGSPAN
------------------------------ ----------
Central America                        78
Central America                        54
Central America                        72
Central America                        23
Central America                       6.5
Central America                        29
Central America                        18
Central America                        54
Central America                        34
Central America                        72
Central America                      26.8
Central America                      67.2
Mexico                                 78
Mexico                                 54
Mexico                                 72
Mexico                                 23
Mexico                                6.5
Mexico                                 29
Mexico                                 18
Mexico                                 50
Mexico                                 54
Mexico                                 34
Mexico                                 72
Mexico                               26.8
Mexico                               67.2
Mexico                                 42

26 rows selected.

The following example is essentially the same as the first, except that you are trying to improperly use the GROUP BY clause to sort the data in the place of an ORDER BY. The GROUP BY clause performs a sort operation for the purpose of grouping data to perform aggregate functions, but it is not the same as the ORDER BY clause. You can see that an error is returned in this example because, again, you have to group any non-aggregate functions in the query that are in the SELECT clause.

SQL> select m.migration_location, b.wingspan
  2  from birds b,
  3    birds_migration bm,
  4    migration m
  5  where b.bird_id = bm.bird_id
  6    and m.migration_id = bm.migration_id
  7    and m.migration_location in ('Central America', 'Mexico')
  8  group by migration_location;
select m.migration_location, b.wingspan
                             *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

Note

Error Messages Differ

Different SQL implementations return errors in different formats.

Following is a proper and useful example of using the GROUP BY clause in the same query. Here you perform the AVG aggregate function on the WINGSPAN column and group by the MIGRATION_LOCATION, which is required in this query.

SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN"
  2  from birds b,
  3    birds_migration bm,
  4    migration m
  5  where b.bird_id = bm.bird_id
  6    and m.migration_id = bm.migration_id
  7  group by migration_location;

MIGRATION_LOCATION             AVG WINGSPAN
------------------------------ ------------
Partial, Open Water                    94.8
Southern United States           46.3166667
Mexico                                44.75
South America                          48.3
No Significant Migration              61.32
Central America                  44.5416667

6 rows selected.

Now say that you want to sort the same grouped results, but independent of the GROUP BY expression. You simply add the ORDER BY clause to the end of the SQL statement to perform a “final” sort by the AVG(WINGSPAN). This shows locations that have birds migrating there with the lowest collective value of AVG(WINGSPAN).

SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN"
  2  from birds b,
  3    birds_migration bm,
  4    migration m
  5  where b.bird_id = bm.bird_id
  6    and m.migration_id = bm.migration_id
  7  group by migration_location
  8  order by 2;

MIGRATION_LOCATION             AVG WINGSPAN
------------------------------ ------------
Central America                  44.5416667
Mexico                                44.75
Southern United States           46.3166667
South America                          48.3
No Significant Migration              61.32
Partial, Open Water                    94.8

6 rows selected.

Although GROUP BY and ORDER BY perform a similar function, they have one major difference: The GROUP BY clause is designed to group identical data, whereas the ORDER BY clause is designed merely to put data into a specific order. You can use GROUP BY and ORDER BY in the same SELECT statement, but you must follow a specific order.

Using CUBE and ROLLUP Expressions

Sometimes getting summary totals within a certain group is advantageous. For instance, you might want to see a breakdown of the SUM of sales per year, country, and product type, but you also want to see the totals in each year and country. The ANSI SQL standard provides for such functionality using the CUBE and ROLLUP expressions.

The ROLLUP expression is used to get subtotals, or what is commonly referred to as super-aggregate rows, along with a grand total row. The ANSI syntax follows:

GROUP BY ROLLUP(ordered column list of grouping sets)

The ROLLUP expression works in this way: For every change in the LAST column provided for the grouping set, an additional row is inserted into the result set with a NULL value for that column and the subtotal of the values in the set. In addition, a row is inserted at the end of the result set with NULL values for each of the group columns and a grand total for the aggregate information. Both Microsoft SQL Server and Oracle follow the ANSI-compliant format.

First, examine a result set of a simple GROUP BY statement in which you examine the average wingspan of birds by MIGRATION:

SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN"
  2  from migration m,
  3    birds b,
  4    birds_migration bm
  5  where m.migration_id = bm.migration_id
  6    and b.bird_id = bm.bird_id
  7  group by migration_location;

MIGRATION_LOCATION             AVG WINGSPAN
------------------------------ ------------
Partial, Open Water                    94.8
Southern United States           46.3166667
Mexico                                44.75
South America                          48.3
No Significant Migration              61.32
Central America                  44.5416667

6 rows selected.

The following is an example of using the ROLLUP expression to get subtotals of the average wingspan of each migration location:

SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN"
  2  from migration m,
  3    birds b,
  4    birds_migration bm
  5  where m.migration_id = bm.migration_id
  6    and b.bird_id = bm.bird_id
  7  group by rollup (migration_location);

MIGRATION_LOCATION             AVG WINGSPAN
------------------------------ ------------
Central America                  44.5416667
Mexico                                44.75
No Significant Migration              61.32
Partial, Open Water                    94.8
South America                          48.3
Southern United States           46.3166667
                                    47.9625

7 rows selected.

Notice that you now get an average super-aggregate row for each one of the migration locations and an overall average for the entire set as the last row.

The CUBE expression is different: It returns a single row of data with every combination of the columns in the column list, along with a row for the grand total of the whole set. Because of its unique nature, CUBE is often used to create crosstab reports. The syntax for the CUBE expression follows:

GROUP BY CUBE(column list of grouping sets)

For a useful illustration of CUBE, study the example that follows. Let’s suppose a column has been added to the MIGRATION table called REGION.

The following SQL query selects the REGION, MIGRATION_LOCATION, and AVG(WINGSPAN) from the MIGRATIONS table, BIRDS table, and BIRDS_MIGRATION table. The average wingspan was already calculated based on the migration location, but the CUBE expression provides an overall average wingspan for each region, each migration location itself, and the data set returned as a whole.

SQL> select m.region, m.migration_location,
  2      avg(b.wingspan) "AVG WINGSPAN"
  3  from migration m,
  4    birds b,
  5    birds_migration bm
  6  where m.migration_id = bm.migration_id
  7    and b.bird_id = bm.bird_id
  8  group by cube(region, migration_location)
  9  order by 1;

REGION               MIGRATION_LOCATION             AVG WINGSPAN
-------------------- ------------------------------ ------------
Minimal Migration    No Significant Migration              61.32
Minimal Migration    Partial, Open Water                    94.8
Minimal Migration                                           66.9
North                Mexico                                44.75
North                Southern United States           46.3166667
North                                                   45.63125
South                Central America                  44.5416667
South                South America                          48.3
South                                                 45.7944444
                     Central America                  44.5416667
                     Mexico                                44.75
                     No Significant Migration              61.32
                     Partial, Open Water                    94.8
                     South America                          48.3
                     Southern United States           46.3166667
                                                         47.9625
16 rows selected.

You can see that, with the CUBE expression, there are even more rows because the statement needs to return each combination of columns within the column set that you provide. An aggregate (average, in this case) is returned for the region, each location, and the result set as a whole.

Using the HAVING Clause

When the HAVING clause is used with the GROUP BY clause in a SELECT statement, it tells GROUP BY which groups to include in the output. HAVING is to GROUP BY as WHERE is to SELECT. In other words, the WHERE clause places conditions on the selected columns, and the HAVING clause places conditions on groups created by the GROUP BY clause. Therefore, when you use the HAVING clause, you are effectively including (or excluding) whole groups of data from the query results.

The following shows the position of the HAVING clause in a query:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

This is the syntax of the SELECT statement that includes the HAVING clause:

SELECT COLUMN1, COLUMN2
FROM TABLE1, TABLE2
WHERE CONDITIONS
GROUP BY COLUMN1, COLUMN2
HAVING CONDITIONS
ORDER BY COLUMN1, COLUMN2

In this final example, you select the FOOD_NAME and calculate the average wingspan of birds related to each food item. The results are grouped by FOOD_NAME, which is a requirement in this query, and the HAVING clause is applied to show only the food names that are eaten by birds that have an average wingspan greater than 50 inches. Finally, you order the results by the second column, AVG(BIRDS.WINGSPAN), in descending order to show the food items that are eaten by birds with the greatest wingspan, all the way down to the food items eaten by birds with the smallest wingspan.

SQL> select food.food_name, round(avg(birds.wingspan)) avg_wingspan
  2  from food, birds, birds_food
  3  where food.food_id = birds_food.food_id
  4  and birds.bird_id = birds_food.bird_id
  5  group by food.food_name
  6  having avg(birds.wingspan) > 50
  7  order by 2 desc;

FOOD_NAME                      AVG_WINGSPAN
------------------------------ ------------
Deer                                     90
Reptiles                                 90
Ducks                                    84
Frogs                                    73
Birds                                    69
Crayfish                                 67
Small Mammals                            65
Snakes                                   63
Insects                                  61
Carrion                                  53
Fish                                     52

11 rows selected.

Summary

Aggregate functions can be useful and are quite simple to use. In this hour, you learned how to count values in columns, count rows of data in a table, get the maximum and minimum values for a column, figure the sum of the values in a column, and figure the average value for values in a column. Remember that NULL values are not considered when using aggregate functions, except when using the COUNT function in the format COUNT(*). Aggregate functions are the first functions in SQL that you learned in this book, but more follow in the coming hours. You can also use aggregate functions for group values, which are discussed during the next hour. As you learn about other functions, you see that the syntaxes of most functions are similar to one another and that their concepts of use are relatively easy to understand.

You also learned in this hour how to group the results of a query using the GROUP BY clause. The GROUP BY clause is primarily used with aggregate SQL functions, such as SUM, AVG, MAX, MIN, and COUNT. The nature of GROUP BY is like that of ORDER BY, in that both sort query results. The GROUP BY clause must sort data to group results logically, but you can also use it exclusively to sort data. However, an ORDER BY clause is much simpler for this purpose. The HAVING clause, an extension to the GROUP BY clause, places conditions on the established groups of a query. The WHERE clause places conditions on a query’s SELECT clause. During the next hour, you learn a new arsenal of functions that enable you to further manipulate query results.

Q&A

Q. Why are NULL values ignored when using the MAX or MIN functions?

A. A NULL value means that nothing is there, so no maximum or minimum value is possible.

Q. Why don’t data types matter when using the COUNT function?

A. The COUNT function counts only rows.

Q. Does the data type matter when using the SUM or AVG functions?

A. Not exactly. If the data can be implicitly converted to numeric data, it will still work. The data type matters less than what data is stored in it.

Q. Am I limited to using only column names inside of aggregate functions?

A. No, you can use any type of calculation or formula, as long as the output corresponds to the proper type of data that the function is expecting to use.

Q. Is using the ORDER BY clause mandatory when using the GROUP BY clause in a SELECT statement?

A. No, using the ORDER BY clause is strictly optional, but it can be helpful when used with GROUP BY.

Q. Must a column appear in the SELECT statement to use a GROUP BY clause on it?

A. Yes, a column must be in the SELECT statement to use a GROUP BY clause on it.

Q. Must all columns that appear in the SELECT statement be used in the GROUP BY clause?

A. Yes, every column that appears in the SELECT statement (except for aggregate functions) must be used in the GROUP BY clause, to avoid getting an error.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

  1. 1. True or false: The AVG function returns an average of all rows from a SELECT column, including any NULL values.

  2. 2. True or false: The SUM function adds column totals.

  3. 3. True or false: The COUNT(*) function counts all rows in a table.

  4. 4. True or false: The COUNT([column name]) function counts NULL values.

  5. 5. Do the following SELECT statements work? If not, what fixes the statements?

    1. SELECT COUNT *

      FROM BIRDS;

    2. SELECT COUNT(BIRD_ID), BIRD_NAME
      FROM BIRDS;
    3. SELECT MIN(WEIGHT), MAX(HEIGHT)
      FROM BIRDS
      WHERE WINGSPAN > 48;
    4. SELECT COUNT(DISTINCT BIRD_ID) FROM BIRDS;
    5. SELECT AVG(BIRD_NAME) FROM BIRDS;
  6. 6. What is the purpose of the HAVING clause, and which other clause is it closest to?

  7. 7. True or false: You must also use the GROUP BY clause when using the HAVING clause.

  8. 8. True or false: The columns selected must appear in the GROUP BY clause in the same order.

  9. 9. True or false: The HAVING clause tells the GROUP BY clause which groups to include.

Exercises

  1. 1. What is the average wingspan of birds?

  2. 2. What is the average wingspan of birds that eat fish?

  3. 3. How many different types of food does the Common Loon eat?

  4. 4. What is the average number of eggs per type of nest?

  5. 5. What is the lightest bird?

  6. 6. Generate a list of birds that are above average in all the following areas: height, weight, and wingspan.

  7. 7. Write a query to generate a list of all migration locations and their average wingspans, but only for locations of birds that have an average wingspan greater than 48 inches.

  8. 8. Write a query showing a list of all photographers and the number of photographers mentored by each photographer.

  9. 9. Experiment on your own using aggregate functions, along with other functions that you learned in previous hours.