What You’ll Learn in This Hour:
▶ Defining functions
▶ Using aggregate functions
▶ Summarizing data with aggregate functions
▶ Getting results from functions
▶ Understanding why you want to group data
▶ Grouping results with the GROUP BY clause
▶ Using group value functions
▶ Understanding group functions
▶ Grouping by columns
▶ Deciding between GROUP BY and ORDER BY
▶ Reducing groups with the HAVING clause
In this hour, you learn about the aggregate functions of SQL. You can perform a variety of useful functions with aggregate functions, such as getting the highest total of a sale or counting the number of orders processed on a given day. The real power of aggregate functions is discussed in the next hour when you tackle the GROUP BY
clause. You have already learned how to query the database and return data in an organized way. You have also learned how to sort data from a query. During this hour, you learn how to return data from a query and break it into groups for improved readability.
Functions are keywords in SQL that you use to manipulate values within columns for output purposes. A function is a command normally used with a column name or expression that processes the incoming data to produce a result. SQL contains several types of functions. This hour covers aggregate functions. An aggregate function provides summarization information for a SQL statement, such as counts, totals, and averages.
This basic set of aggregate functions is discussed in this hour:
▶ COUNT
▶ SUM
▶ MAX
▶ MIN
▶ AVG
In previous hours, you updated some of the data in the BIRDS
database. For example, several records in the table were updated to place a NULL
value in the WINGSPAN
column. During this hour, you rerun the scripts to rebuild the tables and the data, to return the data to its original form. Remember that at any point, you can execute the script tables.sql
to drop and rebuild your tables, and the script data.sql
to insert the original data back into your tables. The following tables are used for examples during this hour:
▶ BIRDS
▶ MIGRATION
, BIRDS_MIGRATION
▶ FOOD
, BIRDS_FOOD
▶ PHOTOGRAPHERS
(created during the previous hour)
Be sure to check your data and reset it, if necessary, so that your results match the results in the examples during this hour. For example, if there are null values in the WINGSPAN
column of BIRDS
, any aggregate functions will return different results than if numeric values existed in all rows of data for WINGSPAN
. Remember that you can reset your example BIRDS
database at any time by rerunning the provided scripts tables.sql
and data.sql
to rebuild your tables and insert fresh data back into your tables.
COUNT
FunctionYou use the COUNT
function to count rows or values of a column that do not contain a NULL
value. When used within a query, the COUNT
function returns a numeric value. You can also use the COUNT
function with the DISTINCT
command to count only the distinct rows of a dataset. ALL
(opposite of DISTINCT
) is the default; including ALL
in the syntax is not necessary. Duplicate rows are counted if DISTINCT
is not specified. One other option with the COUNT
function is to use it with an asterisk. COUNT(*)
counts all the rows of a table, including duplicates, regardless of whether a column contains a NULL
value.
Note
Use DISTINCT
Only in Certain Circumstances
You cannot use the DISTINCT
command with COUNT(*)
, only with COUNT
(column_name
). This is because DISTINCT
is a function that looks for a unique value in a column, whereas (*)
represents all columns in a table or a complete row of data.
The syntax for the COUNT
function follows:
COUNT [ (*) | (DISTINCT | ALL) ] (COLUMN NAME)
Caution
COUNT(*)
Differs from Other Count Variations
COUNT(*)
produces slightly different calculations than other count variations. When the COUNT
function is used with the asterisk, it counts the rows in the returned result set without regard for duplicates and NULL
values. This is an important distinction. If you need your query to return a count of a particular field and include NULLs
, you should use a function such as ISNULL
to replace the NULL
values.
The following example simply counts all rows of data in the BIRDS
table. In other words, this example tells you how many types of birds are in the database.
SQL> select count(*) from birds; COUNT(*) ---------- 23 1 row selected.
Before looking at the next example of the COUNT
function, select all the photographers and their associated mentor identifications from the PHOTOGRAPHERS
table. Remember that this table is related to itself through the mentor identification and the photographer identification; this is an example of a self join, as you have already learned. Take a minute to study the results, and try to match up photographers with their corresponding mentors.
SQL> select p_id, photographer, mentor_p_id 2 from photographers; P_ID PHOTOGRAPHER MENTOR_P_ID ---------- ------------------------------ ----------- 7 Ryan Notstephens 8 Susan Willamson 9 Mark Fife 1 Shooter McGavin 2 Jenny Forest 8 3 Steve Hamm 4 Harry Henderson 9 5 Kelly Hairtrigger 8 6 Gordon Flash 10 Kate Kapteur 7 10 rows selected.
The following example counts every occurrence of a mentor photographer identification that is NOT NULL
. This gives you the total number of photographers in the database that are mentored by another photographer.
SQL> select count(mentor_p_id) "TOTAL PHOTOGRAPHERS MENTORED" 2 from photographers; TOTAL PHOTOGRAPHERS MENTORED ---------------------------- 4
This example is a query that provides a count of any distinct occurrences of the mentor photographer identification, which gives you a total count of the number of photographers that mentor other photographers in the database.
SQL> select count(distinct(mentor_p_id)) "TOTAL MENTORS" 2 from photographers; TOTAL MENTORS ------------- 3 1 row selected.
Note
Data Types Do Not Use COUNT
Because the COUNT
function counts the rows, data types do not play a part. The rows can contain columns with any data type. The only consideration that actually counts is whether the value is NULL
.
SUM
FunctionThe SUM
function returns a total on the values of a column for a group of rows. You can also use the SUM
function with DISTINCT
. When you use SUM
with DISTINCT
, only the distinct rows are totaled, which might not have much purpose. Your total is not accurate in that case because rows of data are omitted.
The syntax for the SUM
function follows:
SUM ([ DISTINCT ] COLUMN NAME)
Caution
SUM
Must Be Numeric
The value of an argument must be numeric to use the SUM
function. You cannot use the SUM
function on columns that have a data type other than numeric.
This example simply returns the sum of the wingspan stored for all birds in the database. This simplistic example is not very useful, but you build upon it during this hour.
SQL> select sum(wingspan) from birds; SUM(WINGSPAN) ------------- 1163.1 1 row selected.
This next example is a bit more useful: An arithmetic calculation is embedded within the SUM
function. First you multiply the number of eggs that a bird lays times a number of broods per year; then you apply the SUM
function on that value for every bird in the BIRDS
table. This returns the total number of eggs laid by all birds in a season or year.
SQL> select sum(eggs * broods) "TOTAL EGGS LAYED BY ALL BIRDS IN A SEASON" 2 from birds; TOTAL EGGS LAYED BY ALL BIRDS IN A SEASON ----------------------------------------- 127 1 row selected.
AVG
FunctionThe AVG
function finds the average value for a given group of rows. When used with the DISTINCT
command, the AVG
function returns the average of the distinct rows. The syntax for the AVG
function follows:
AVG ([ DISTINCT ] COLUMN NAME)
Note
AVG
Must Be Numeric
The value of the argument must be numeric for the AVG
function to work.
The average value for all values in the BIRDS
table’s WINGSPAN
column is retrieved in the following example:
SQL> select avg(wingspan) from birds; AVG(WINGSPAN) ------------- 50.5695652 1 row selected.
Caution
Sometimes Your Data Is Truncated
In some implementations, the results of your query are truncated to the precision of the data type. Review your database system’s documentation to ensure that you understand the normal precision for the various data types. This will prevent you from unnecessarily truncating data and possibly getting an unexpected result from improper precision.
MAX
FunctionThe MAX
function returns the maximum value from the values of a column in a group of rows. NULL
values are ignored when using the MAX
function. Using MAX
with the DISTINCT
command is an option, but because the maximum value for all the rows is the same as the distinct maximum value, DISTINCT
is useless.
The syntax for the MAX
function follows:
MAX([ DISTINCT ] COLUMN NAME)
The following example returns the highest WINGSPAN
in the BIRDS
table:
SQL> select max(wingspan) from birds; MAX(WINGSPAN) ------------- 94.8 1 row selected.
You can also use aggregate functions such as MAX
and MIN
on character data. When using these values, collation of your database comes into play again. Most commonly, your database collation is set to a dictionary order, so the results are ranked according to that. For example, say that you perform a MAX
operation on the BIRD_NAME
column of the BIRDS
table:
SQL> select max(bird_name) 2 from birds; MAX(BIRD_NAME) ------------------------------ Turkey Vulture 1 row selected.
In this instance, the function returns the largest value according to a dictionary ordering of the data in the column.
MIN
FunctionThe MIN
function returns the minimum value of a column for a group of rows. NULL
values are ignored when using the MIN
function. Using MIN
with the DISTINCT
command is an option. However, because the minimum value for all rows is the same as the minimum value for distinct rows, DISTINCT
is useless.
The syntax for the MIN
function follows:
MIN([ DISTINCT ] COLUMN NAME)
The following example returns the lowest WINGSPAN
in the BIRDS
table:
SQL> select min(wingspan) from birds; MIN(WINGSPAN) ------------- 3.2 1 row selected.
Note
DISTINCT
and Aggregate Functions Don’t Always Mix
One important consideration when using aggregate functions with the DISTINCT
command is that your query might not return the wanted results. The purpose of aggregate functions is to return summarized data based on all rows of data in a table. When DISTINCT
is used, first it is applied to the results and then those results are passed on to the aggregate function, which can dramatically alter the results. Be sure you understand this when you use DISTINCT
with aggregate functions.
As with the MAX
function, the MIN
function can work against character data. It returns the minimum value according to the dictionary ordering of the data.
SQL> select min(bird_name) 2 from birds; MIN(BIRD_NAME) ------------------------------ American Coot 1 row selected.
Grouping data is the process of combining columns with duplicate values in a logical order. For example, a database might contain information about employees; many employees live in different cities, but some employees live in the same city. You might want to execute a query that shows employee information for each particular city. You can group employee information by city and create a summarized report.
Or perhaps you want to figure the average salary paid to employees, according to each city. You can do this by using the aggregate function AVG
on the SALARY
column, as you learned in the previous hour, and by using the GROUP BY
clause to group the output by city.
Grouping data is accomplished by using the GROUP BY
clause of a SELECT
statement (query). In this lesson, you learn how to use aggregate functions with the GROUP BY
clause to display results more effectively.
The GROUP BY
clause is used in collaboration with the SELECT
statement to arrange identical data into groups. This clause follows the WHERE
clause in a SELECT
statement and precedes the ORDER BY
clause.
The position of the GROUP BY
clause in a query follows:
SELECT FROM WHERE GROUP BY ORDER BY
The following is the SELECT
statement’s syntax, including the GROUP BY
clause:
SELECT COLUMN1, COLUMN2 FROM TABLE1, TABLE2 WHERE CONDITIONS GROUP BY COLUMN1, COLUMN2 ORDER BY COLUMN1, COLUMN2
This ordering normally takes a little getting used to when writing your first queries with the GROUP BY
clause; however, it is logical. The GROUP BY
clause is normally a much more CPU-intensive operation, and if you do not constrain the rows provided to it, you are grouping unnecessary data that later is discarded. Thus, you intentionally reduce the data set with the WHERE
clause so that you perform your grouping only on the rows you need.
You can use the ORDER BY
clause, but normally the RDBMS also orders the results by the column ordering in the GROUP BY
clause, which is discussed more in depth later in this hour, in the section “Understanding the Difference Between GROUP BY
and ORDER BY
.” Unless you need to order the values in a different pattern than the GROUP BY
clause, the ORDER BY
clause is redundant. However, sometimes it is provided because you use aggregate functions in the SELECT
statement that are not in the GROUP BY
clause, or because your particular RDBMS functions slightly differently from the standard.
The following sections explain how to use the GROUP BY
clause and provide examples of using it in a variety of situations.
Typical group functions—functions that the GROUP BY
clause uses to arrange data in groups—include AVG
, MAX
, MIN
, SUM
, and COUNT
. These are the aggregate functions that you learned about in Hour 17. Remember that the aggregate functions were used for single values in Hour 17; now you use the aggregate functions for group values.
Grouping data is simple. The selected columns (the column list following the SELECT
keyword in a query) are the columns you can reference in the GROUP BY
clause. If a column is not in the SELECT
statement, you cannot use it in the GROUP BY
clause. How can you group data on a report if the data is not displayed?
If the column name has been qualified, the qualified name must go into the GROUP BY
clause. The column name can also be represented by a number. When grouping the data, the order of columns grouped does not have to match the column order in the SELECT
clause.
The SELECT
clause has conditions that must be met when using GROUP BY
. Specifically, whatever columns are selected must appear in the GROUP BY
clause, except for any aggregate values. If the columns in the SELECT
clause are qualified, the qualified names of the columns must be used in the GROUP BY
clause. Some examples of syntax for the GROUP BY
clause are shown next.
The following SQL statement selects the MIGRATION_LOCATION
and a count of all entries in the BIRD_ID
column in the BIRDS_MIGRATION
table. Remember that the BIRDS_MIGRATION
table links the MIGRATION
table to the BIRDS
table, so every BIRD_ID
in the BIRDS_MIGRATION
table that corresponds with a MIGRATION_ID
value indicates the number of birds that migrate to a specific location. The tables are joined appropriately, of course, and the query is grouped by the migration location. This gives you a total count of birds in the database that migrate to each individual location.
SQL> select migration.migration_location, count(birds_migration.bird_id) birds 2 from migration, birds_migration 3 where migration.migration_id = birds_migration.migration_id 4 group by migration.migration_location; MIGRATION_LOCATION BIRDS ------------------------------ ---------- Central America 12 Mexico 14 No Significant Migration 5 Partial, Open Water 1 South America 6 Southern United States 18 6 rows selected.
This next example query focuses on the different types of foods that birds eat. You select the food name and the average wingspan of birds and then apply the ROUND
function on the average wingspan to get a whole number. You can see in the FROM
clause that you are selecting from three tables and that you have joined those three tables appropriately in the WHERE
clause. The results are grouped by the FOOD_NAME
and ordered by the second column, which is the AVG(WINGSPAN)
, in descending order. This sort operation shows the food names with the largest average wingspan of birds first, all the way down to the least. This query gives you a list of all food items in the database and the average wingspan of birds that eat certain food items.
SQL> select food.food_name, round(avg(birds.wingspan)) avg_wingspan 2 from food, birds, birds_food 3 where food.food_id = birds_food.food_id 4 and birds.bird_id = birds_food.bird_id 5 group by food.food_name 6 order by 2 desc; FOOD_NAME AVG_WINGSPAN ------------------------------ ------------ Reptiles 90 Deer 90 Ducks 84 Frogs 73 Birds 69 Crayfish 67 Small Mammals 65 Snakes 63 Insects 61 Carrion 53 Fish 52 Plants 49 Rodents 48 Aquatic Plants 43 Crustaceans 41 Fruit 40 Aquatic Insects 40 Seeds 38 Corn 3 19 rows selected.
GROUP BY
and ORDER BY
The GROUP BY
clause works the same as the ORDER BY
clause, in that both sort data. Specifically, you use the ORDER BY
clause to sort data from a query. The GROUP BY
clause also sorts data from a query to properly group the data.
However, some differences and disadvantages arise when you use GROUP BY
instead of ORDER BY
for sorting operations:
▶ All non-aggregate columns selected must be listed in the GROUP BY
clause.
▶ The GROUP BY
clause generally is not necessary unless you use aggregate functions.
Let’s look at a few examples. In the following query, you select the migration location and wingspan for birds that migrate to either Central America or Mexico; the results are sorted by the migration location. This is an example of the sort operation performed by the ORDER BY
clause.
SQL> select m.migration_location, b.wingspan 2 from birds b, 3 birds_migration bm, 4 migration m 5 where b.bird_id = bm.bird_id 6 and m.migration_id = bm.migration_id 7 and m.migration_location in ('Central America', 'Mexico') 8 order by 1; MIGRATION_LOCATION WINGSPAN ------------------------------ ---------- Central America 78 Central America 54 Central America 72 Central America 23 Central America 6.5 Central America 29 Central America 18 Central America 54 Central America 34 Central America 72 Central America 26.8 Central America 67.2 Mexico 78 Mexico 54 Mexico 72 Mexico 23 Mexico 6.5 Mexico 29 Mexico 18 Mexico 50 Mexico 54 Mexico 34 Mexico 72 Mexico 26.8 Mexico 67.2 Mexico 42 26 rows selected.
The following example is essentially the same as the first, except that you are trying to improperly use the GROUP BY
clause to sort the data in the place of an ORDER BY
. The GROUP BY
clause performs a sort operation for the purpose of grouping data to perform aggregate functions, but it is not the same as the ORDER BY
clause. You can see that an error is returned in this example because, again, you have to group any non-aggregate functions in the query that are in the SELECT
clause.
SQL> select m.migration_location, b.wingspan 2 from birds b, 3 birds_migration bm, 4 migration m 5 where b.bird_id = bm.bird_id 6 and m.migration_id = bm.migration_id 7 and m.migration_location in ('Central America', 'Mexico') 8 group by migration_location; select m.migration_location, b.wingspan * ERROR at line 1: ORA-00979: not a GROUP BY expression
Note
Error Messages Differ
Different SQL implementations return errors in different formats.
Following is a proper and useful example of using the GROUP BY
clause in the same query. Here you perform the AVG
aggregate function on the WINGSPAN
column and group by the MIGRATION_LOCATION
, which is required in this query.
SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN" 2 from birds b, 3 birds_migration bm, 4 migration m 5 where b.bird_id = bm.bird_id 6 and m.migration_id = bm.migration_id 7 group by migration_location; MIGRATION_LOCATION AVG WINGSPAN ------------------------------ ------------ Partial, Open Water 94.8 Southern United States 46.3166667 Mexico 44.75 South America 48.3 No Significant Migration 61.32 Central America 44.5416667 6 rows selected.
Now say that you want to sort the same grouped results, but independent of the GROUP BY
expression. You simply add the ORDER BY
clause to the end of the SQL statement to perform a “final” sort by the AVG(WINGSPAN)
. This shows locations that have birds migrating there with the lowest collective value of AVG(WINGSPAN)
.
SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN" 2 from birds b, 3 birds_migration bm, 4 migration m 5 where b.bird_id = bm.bird_id 6 and m.migration_id = bm.migration_id 7 group by migration_location 8 order by 2; MIGRATION_LOCATION AVG WINGSPAN ------------------------------ ------------ Central America 44.5416667 Mexico 44.75 Southern United States 46.3166667 South America 48.3 No Significant Migration 61.32 Partial, Open Water 94.8 6 rows selected.
Although GROUP BY
and ORDER BY
perform a similar function, they have one major difference: The GROUP BY
clause is designed to group identical data, whereas the ORDER BY
clause is designed merely to put data into a specific order. You can use GROUP BY
and ORDER BY
in the same SELECT
statement, but you must follow a specific order.
CUBE
and ROLLUP
ExpressionsSometimes getting summary totals within a certain group is advantageous. For instance, you might want to see a breakdown of the SUM
of sales per year, country, and product type, but you also want to see the totals in each year and country. The ANSI SQL standard provides for such functionality using the CUBE
and ROLLUP
expressions.
The ROLLUP
expression is used to get subtotals, or what is commonly referred to as super-aggregate rows, along with a grand total row. The ANSI syntax follows:
GROUP BY ROLLUP(ordered column list of grouping sets)
The ROLLUP
expression works in this way: For every change in the LAST
column provided for the grouping set, an additional row is inserted into the result set with a NULL
value for that column and the subtotal of the values in the set. In addition, a row is inserted at the end of the result set with NULL
values for each of the group columns and a grand total for the aggregate information. Both Microsoft SQL Server and Oracle follow the ANSI-compliant format.
First, examine a result set of a simple GROUP BY
statement in which you examine the average wingspan of birds by MIGRATION
:
SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN" 2 from migration m, 3 birds b, 4 birds_migration bm 5 where m.migration_id = bm.migration_id 6 and b.bird_id = bm.bird_id 7 group by migration_location; MIGRATION_LOCATION AVG WINGSPAN ------------------------------ ------------ Partial, Open Water 94.8 Southern United States 46.3166667 Mexico 44.75 South America 48.3 No Significant Migration 61.32 Central America 44.5416667 6 rows selected.
The following is an example of using the ROLLUP
expression to get subtotals of the average wingspan of each migration location:
SQL> select m.migration_location, avg(b.wingspan) "AVG WINGSPAN" 2 from migration m, 3 birds b, 4 birds_migration bm 5 where m.migration_id = bm.migration_id 6 and b.bird_id = bm.bird_id 7 group by rollup (migration_location); MIGRATION_LOCATION AVG WINGSPAN ------------------------------ ------------ Central America 44.5416667 Mexico 44.75 No Significant Migration 61.32 Partial, Open Water 94.8 South America 48.3 Southern United States 46.3166667 47.9625 7 rows selected.
Notice that you now get an average super-aggregate row for each one of the migration locations and an overall average for the entire set as the last row.
The CUBE
expression is different: It returns a single row of data with every combination of the columns in the column list, along with a row for the grand total of the whole set. Because of its unique nature, CUBE
is often used to create crosstab reports. The syntax for the CUBE
expression follows:
GROUP BY CUBE(column list of grouping sets)
For a useful illustration of CUBE
, study the example that follows. Let’s suppose a column has been added to the MIGRATION
table called REGION
.
The following SQL query selects the REGION
, MIGRATION_LOCATION
, and AVG(WINGSPAN)
from the MIGRATIONS
table, BIRDS
table, and BIRDS_MIGRATION
table. The average wingspan was already calculated based on the migration location, but the CUBE
expression provides an overall average wingspan for each region, each migration location itself, and the data set returned as a whole.
SQL> select m.region, m.migration_location, 2 avg(b.wingspan) "AVG WINGSPAN" 3 from migration m, 4 birds b, 5 birds_migration bm 6 where m.migration_id = bm.migration_id 7 and b.bird_id = bm.bird_id 8 group by cube(region, migration_location) 9 order by 1; REGION MIGRATION_LOCATION AVG WINGSPAN -------------------- ------------------------------ ------------ Minimal Migration No Significant Migration 61.32 Minimal Migration Partial, Open Water 94.8 Minimal Migration 66.9 North Mexico 44.75 North Southern United States 46.3166667 North 45.63125 South Central America 44.5416667 South South America 48.3 South 45.7944444 Central America 44.5416667 Mexico 44.75 No Significant Migration 61.32 Partial, Open Water 94.8 South America 48.3 Southern United States 46.3166667 47.9625 16 rows selected.
You can see that, with the CUBE
expression, there are even more rows because the statement needs to return each combination of columns within the column set that you provide. An aggregate (average, in this case) is returned for the region, each location, and the result set as a whole.
HAVING
ClauseWhen the HAVING
clause is used with the GROUP BY
clause in a SELECT
statement, it tells GROUP BY
which groups to include in the output. HAVING
is to GROUP BY
as WHERE
is to SELECT
. In other words, the WHERE
clause places conditions on the selected columns, and the HAVING
clause places conditions on groups created by the GROUP BY
clause. Therefore, when you use the HAVING
clause, you are effectively including (or excluding) whole groups of data from the query results.
The following shows the position of the HAVING
clause in a query:
SELECT FROM WHERE GROUP BY HAVING ORDER BY
This is the syntax of the SELECT
statement that includes the HAVING
clause:
SELECT COLUMN1, COLUMN2 FROM TABLE1, TABLE2 WHERE CONDITIONS GROUP BY COLUMN1, COLUMN2 HAVING CONDITIONS ORDER BY COLUMN1, COLUMN2
In this final example, you select the FOOD_NAME
and calculate the average wingspan of birds related to each food item. The results are grouped by FOOD_NAME
, which is a requirement in this query, and the HAVING
clause is applied to show only the food names that are eaten by birds that have an average wingspan greater than 50 inches. Finally, you order the results by the second column, AVG(BIRDS.WINGSPAN)
, in descending order to show the food items that are eaten by birds with the greatest wingspan, all the way down to the food items eaten by birds with the smallest wingspan.
SQL> select food.food_name, round(avg(birds.wingspan)) avg_wingspan 2 from food, birds, birds_food 3 where food.food_id = birds_food.food_id 4 and birds.bird_id = birds_food.bird_id 5 group by food.food_name 6 having avg(birds.wingspan) > 50 7 order by 2 desc; FOOD_NAME AVG_WINGSPAN ------------------------------ ------------ Deer 90 Reptiles 90 Ducks 84 Frogs 73 Birds 69 Crayfish 67 Small Mammals 65 Snakes 63 Insects 61 Carrion 53 Fish 52 11 rows selected.
Aggregate functions can be useful and are quite simple to use. In this hour, you learned how to count values in columns, count rows of data in a table, get the maximum and minimum values for a column, figure the sum of the values in a column, and figure the average value for values in a column. Remember that NULL
values are not considered when using aggregate functions, except when using the COUNT
function in the format COUNT(*)
. Aggregate functions are the first functions in SQL that you learned in this book, but more follow in the coming hours. You can also use aggregate functions for group values, which are discussed during the next hour. As you learn about other functions, you see that the syntaxes of most functions are similar to one another and that their concepts of use are relatively easy to understand.
You also learned in this hour how to group the results of a query using the GROUP BY
clause. The GROUP BY
clause is primarily used with aggregate SQL functions, such as SUM
, AVG
, MAX
, MIN
, and COUNT
. The nature of GROUP BY
is like that of ORDER BY
, in that both sort query results. The GROUP BY
clause must sort data to group results logically, but you can also use it exclusively to sort data. However, an ORDER BY
clause is much simpler for this purpose. The HAVING
clause, an extension to the GROUP BY
clause, places conditions on the established groups of a query. The WHERE
clause places conditions on a query’s SELECT
clause. During the next hour, you learn a new arsenal of functions that enable you to further manipulate query results.
Q. Why are NULL
values ignored when using the MAX
or MIN
functions?
A. A NULL
value means that nothing is there, so no maximum or minimum value is possible.
Q. Why don’t data types matter when using the COUNT
function?
A. The COUNT
function counts only rows.
Q. Does the data type matter when using the SUM
or AVG
functions?
A. Not exactly. If the data can be implicitly converted to numeric data, it will still work. The data type matters less than what data is stored in it.
Q. Am I limited to using only column names inside of aggregate functions?
A. No, you can use any type of calculation or formula, as long as the output corresponds to the proper type of data that the function is expecting to use.
Q. Is using the ORDER BY
clause mandatory when using the GROUP BY
clause in a SELECT
statement?
A. No, using the ORDER BY
clause is strictly optional, but it can be helpful when used with GROUP BY
.
Q. Must a column appear in the SELECT
statement to use a GROUP BY
clause on it?
A. Yes, a column must be in the SELECT
statement to use a GROUP BY
clause on it.
Q. Must all columns that appear in the SELECT
statement be used in the GROUP BY
clause?
A. Yes, every column that appears in the SELECT
statement (except for aggregate functions) must be used in the GROUP BY
clause, to avoid getting an error.
The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. True or false: The AVG
function returns an average of all rows from a SELECT
column, including any NULL
values.
2. True or false: The SUM
function adds column totals.
3. True or false: The COUNT(*)
function counts all rows in a table.
4. True or false: The COUNT([column name])
function counts NULL
values.
5. Do the following SELECT
statements work? If not, what fixes the statements?
SELECT COUNT *
FROM BIRDS;
SELECT COUNT(BIRD_ID), BIRD_NAME FROM BIRDS;
SELECT MIN(WEIGHT), MAX(HEIGHT) FROM BIRDS WHERE WINGSPAN > 48;
SELECT COUNT(DISTINCT BIRD_ID) FROM BIRDS;
SELECT AVG(BIRD_NAME) FROM BIRDS;
6. What is the purpose of the HAVING
clause, and which other clause is it closest to?
7. True or false: You must also use the GROUP BY
clause when using the HAVING
clause.
8. True or false: The columns selected must appear in the GROUP BY
clause in the same order.
9. True or false: The HAVING
clause tells the GROUP BY
clause which groups to include.
1. What is the average wingspan of birds?
2. What is the average wingspan of birds that eat fish?
3. How many different types of food does the Common Loon eat?
4. What is the average number of eggs per type of nest?
5. What is the lightest bird?
6. Generate a list of birds that are above average in all the following areas: height, weight, and wingspan.
7. Write a query to generate a list of all migration locations and their average wingspans, but only for locations of birds that have an average wingspan greater than 48 inches.
8. Write a query showing a list of all photographers and the number of photographers mentored by each photographer.
9. Experiment on your own using aggregate functions, along with other functions that you learned in previous hours.