Hour 18

Using Subqueries to Define Unknown Data

What You’ll Learn in This Hour:

In this hour, you look at the concept of subqueries, a means by which you can perform additional queries of information from within the same SQL statement. Using subqueries enables you to easily perform complex queries that might rely on complex subsets of data in your database.

Defining Subqueries

A subquery, also known as a nested query, is a query embedded within the WHERE clause of another query to further restrict data returned by the query. A subquery returns data that is used in the main query as a condition to further restrict the data to be retrieved. Subqueries are employed with the SELECT, INSERT, UPDATE, and DELETE statements.

In some cases, you can use a subquery in place of a join operation by indirectly linking data between the tables based on one or more conditions. When you have a subquery in a query, the subquery is resolved first and then the main query is resolved according to the condition(s) resolved by the subquery. The results of the subquery process expressions in the WHERE clause of the main query. You can use the subquery in either the WHERE clause or the HAVING clause of the main query. Additionally, you can use logical and relational operators such as =, >, <, <>,!=, IN, NOT IN, AND, OR, and so on within the subquery to evaluate a subquery in the WHERE or HAVING clause.

Subqueries must follow a few rules:

  •    Subqueries must be enclosed within parentheses.

  •    A subquery can have only one column in the SELECT clause, unless multiple columns exist in the main query for the subquery to compare its selected columns.

  •    You cannot use an ORDER BY clause in a subquery, although the main query can use an ORDER BY clause. You can use the GROUP BY clause to perform the same function as the ORDER BY clause in a subquery.

  •    You can use only subqueries that return more than one row with multiple value operators, such as the IN operator.

  •    The SELECT list cannot include references to values that evaluate to a BLOB, ARRAY, CLOB, or NCLOB.

  •    You cannot immediately enclose a subquery in a SET function.

  •    You cannot use the BETWEEN operator with a subquery; however, you can use the BETWEEN operator within the subquery.

Note

The Rules of Using Subqueries

The same rules that apply to standard queries also apply to subqueries. You can use join operations, functions, conversions, and other options within a subquery.

The basic syntax for a subquery follows:

SELECT COLUMN_NAME
FROM TABLE
WHERE COLUMN_NAME = (SELECT COLUMN_NAME
                           FROM TABLE
                           WHERE CONDITIONS);

The following examples show how you can and cannot use the BETWEEN operator with a subquery. Consider this example of the correct use of BETWEEN in the subquery:

SELECT COLUMN_NAME
FROM TABLE_A
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME
                                    FROM TABLE_B)
                                     WHERE VALUE BETWEEN VALUE)

You cannot use BETWEEN as an operator outside the subquery. The following is an example of the illegal use of BETWEEN with a subquery:

SELECT COLUMN_NAME
FROM TABLE_A
WHERE COLUMN_NAME BETWEEN VALUE AND (SELECT COLUMN_NAME
                                               FROM TABLE_B)

As you soon see in this hour, subqueries produce either a single value or a list of values, just as a query does. When a subquery is embedded within the main query, the result set of the main query must logically make sense when compared to the data set that the subquery returns. The subquery is resolved first and the data returned is substituted into the main query.

Subqueries with the SELECT Statement

Subqueries are most frequently used with the SELECT statement, although you can use them within a data manipulation statement as well. When employed with the SELECT statement, the subquery retrieves data for the main query to use.

The basic syntax follows:

SELECT COLUMN_NAME [, COLUMN_NAME ]
FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR
                      (SELECT COLUMN_NAME [, COLUMN_NAME ]
                       FROM TABLE1 [, TABLE2 ]
                      [ WHERE ])

Consider an example. Let’s say that you want to query the database to get a list of birds that have an above average wingspan. The WINGSPAN column is stored in the database, but the average wingspan is not stored. Therefore, you cannot know the value of the average wingspan without first querying for the average wingspan from the BIRDS table. However, even if you perform this query, the average wingspan is a dynamic data value, which means that it can change at any given point in time as new birds are inserted into the database or as the wingspan data is updated. Thus, not only do you not know the value that you need to supply to the query, but that value might change the next time the query is run.

First, you query for the average wingspan from the BIRDS table.

SQL> select avg(wingspan)
  2  from birds;

AVG(WINGSPAN)
-------------
   50.5695652

1 row selected.

Now that you have the current value for the average wingspan (which is only for comparison purposes in this example), you use the same SELECT statement and substitute that value into the WHERE clause of the query. This is a simple example of a subquery.

SQL> select bird_name "BIRD",
  2  wingspan "WINGSPAN ABOVE AVERAGE"
  3  from birds
  4  where wingspan >
  5        (select avg(wingspan) from birds)
  6  order by 2 desc;

BIRD                           WINGSPAN ABOVE AVERAGE
------------------------------ ----------------------
Mute Swan                                        94.8
Brown Pelican                                      90
Golden Eagle                                       90
Bald Eagle                                         84
Great Blue Heron                                   78
Osprey                                             72
Canadian Goose                                     72
Turkey Vulture                                     72
Great Egret                                      67.2
Common Loon                                        54
Double-crested Cormorant                           54

11 rows selected.

You can see here that output was generated showing the name of a bird and the wingspan of each bird, but only for birds that have an above average wingspan. Compare the value of the previous query of 50 inches to the output of this query.

Tip

Using Subqueries for Unknown Values

Subqueries are frequently used to place conditions on a query when the exact conditions are unknown. The salary for 3908 in the previous example was unknown, but the subquery was designed to do the footwork for you.

Following is an example of a subquery attempt using an operator such as BETWEEN that creates an illogical comparison. Notice that the Oracle database returns an error. This is because the average wingspan value that the subquery returns is a single value, and the BETWEEN operator is looking for exactly two values. Additionally, the AND operator is required when using BETWEEN, which cannot be done here in the subquery and simply does not make sense.

SQL> select bird_name "BIRD",
  2      wingspan "WINGSPAN ABOVE AVERAGE"
  3  from birds
  4  where wingspan between
  5   (select avg(wingspan) from birds)
  6  order by 2 desc;
order by 2 desc
*
ERROR at line 6:
ORA-00905: missing keyword

The following query is a more appropriate example with a subquery that returns multiple records that are substituted into the main query using the IN operator.

SQL> select bird_name "BIRD",
  2      wingspan "WINGSPAN ABOVE AVERAGE"
  3  from birds
  4  where wingspan in
  5   (select wingspan from birds
  6    where wingspan > 48)
  7  order by 2 desc;




BIRD                           WINGSPAN ABOVE AVERAGE
------------------------------ ----------------------
Mute Swan                                        94.8
Brown Pelican                                      90
Golden Eagle                                       90
Bald Eagle                                         84
Great Blue Heron                                   78
Canadian Goose                                     72
Turkey Vulture                                     72
Osprey                                             72
Great Egret                                      67.2
Double-crested Cormorant                           54
Common Loon                                        54
Ring-billed Gull                                   50

12 rows selected.

Subqueries with the CREATE TABLE Statement

You can also use subqueries with Data Definition Language (DDL) statements. The CREATE TABLE that follows uses a subquery to gather the appropriate data. A table called ABOVE_AVG_BIRDS is being created based on the original BIRDS table, but only for birds that have an above average wingspan, an above average height, and an above average weight. All three conditions that have subqueries must be true for data to be returned. For example, records are created in this new table, ABOVE_AVG_BIRDS, only where:

  •    A bird’s wingspan is greater than the average wingspan of all birds (subquery 1) and

  •    A bird’s height is greater than the average height of all birds (subquery 2) and

  •    A bird’s weight is greater than the average weight of all birds (subquery 3)

SQL> create table above_avg_birds as
  2  select bird_id, bird_name, wingspan, height, weight
  3  from birds
  4  where wingspan > (select avg(wingspan) from birds)
  5    and height > (select avg(height) from birds)
  6    and weight > (select avg(weight) from birds);

Table created.

Study the following data within the new ABOVE_AVG_BIRDS table:

SQL> select *
  2  from above_avg_birds;

   BIRD_ID BIRD_NAME             WINGSPAN     HEIGHT     WEIGHT
---------- ------------------- ---------- ---------- ----------
         3 Common Loon                 54         36         18
         4 Bald Eagle                  84         37         14
         5 Golden Eagle                90         40         15
         9 Canadian Goose              72         43         14
        19 Mute Swan                 94.8         60         26
        20 Brown Pelican               90         54        7.6

6 rows selected.

You do not know the current value of any of these averages, but using a subquery, you can easily get a real-time value for each and create a new table specifically based on those results.

Subqueries with the INSERT Statement

You can also use subqueries with Data Manipulation Language (DML) statements. The INSERT statement is the first instance you examine. It uses the data that the subquery returns and inserts it into another table. You can modify the selected data in the subquery with any of the character, date, or number functions.

Note

Always Remember to Save Your Transactions

Remember to use the COMMIT and ROLLBACK commands when using DML commands such as the INSERT statement.

The basic syntax follows:

INSERT INTO TABLE_NAME [ (COLUMN1 [, COLUMN2 ]) ]
SELECT [ *|COLUMN1 [, COLUMN2 ]
FROM TABLE1 [, TABLE2 ]
[ WHERE VALUE OPERATOR ]

Before proceeding, you first need to truncate the ABOVE_AVG_BIRDS table to remove any existing data from that table.

SQL> truncate table above_avg_birds;

Table truncated.

Now you select all records from the ABOVE_AVG_BIRDS table to verify that no data exists.

SQL> select *
  2  from above_avg_birds;

no rows selected

Next, you issue an INSERT statement into the ABOVE_AVG_BIRDS table, selecting data from the BIRDS table using a subquery. You can see that six new rows of data are created in the new table.

SQL> insert into above_avg_birds
  2  select bird_id, bird_name, wingspan, height, weight
  3  from birds
  4  where wingspan > (select avg(wingspan) from birds)
  5    and height > (select avg(height) from birds)
  6    and weight > (select avg(weight) from birds);

6 rows created.

Review the following results and study the data that was inserted into the new table.

SQL> select *
  2  from above_avg_birds;

   BIRD_ID BIRD_NAME             WINGSPAN     HEIGHT     WEIGHT
---------- ------------------- ---------- ---------- ----------
         3 Common Loon                 54         36         18
         4 Bald Eagle                  84         37         14
         5 Golden Eagle                90         40         15
         9 Canadian Goose              72         43         14
        19 Mute Swan                 94.8         60         26
        20 Brown Pelican               90         54        7.6

6 rows selected.

This INSERT statement inserts the EMPLOYEEID, LASTNAME, FIRSTNAME, and SALARY into a table called RICH_EMPLOYEES for all records of employees who have a pay rate greater than the pay rate of the employee with identification 3908.

Subqueries with the UPDATE Statement

You can use subqueries with the UPDATE statement to update single or multiple columns in a table. The basic syntax follows:

UPDATE TABLE
SET COLUMN_NAME [, COLUMN_NAME) ] =
    (SELECT ]COLUMN_NAME [, COLUMN_NAME) ]
    FROM TABLE
    [ WHERE ]

Following is an example of an UPDATE statement that uses similar subqueries as the previous example to update the values in the WINGSPAN column for birds that are above average in three categories: wingspan, height, and weight.

SQL> update birds
  2  set wingspan = 99, height = 99, weight = 99
  3  where wingspan > (select avg(wingspan) from birds)
  4    and height > (select avg(height) from birds)
  5    and weight > (select avg(weight) from birds);

6 rows updated.

Now if you query the BIRDS table, you see that the values in the columns have been set to the number 99, based on the criteria used in the subqueries.

SQL> select bird_name, wingspan, height, weight
  2  from birds;

BIRD_NAME                        WINGSPAN     HEIGHT     WEIGHT
------------------------------ ---------- ---------- ----------
Great Blue Heron                       78         52        5.5
Mallard                               3.2         28        3.5
Common Loon                            99         99         99
Bald Eagle                             99         99         99
Golden Eagle                           99         99         99
Red Tailed Hawk                        48         25        2.4
Osprey                                 72         24          3
Belted Kingfisher                      23         13        .33
Canadian Goose                         99         99         99
Pied-billed Grebe                     6.5         13          1
American Coot                          29         16          1
Common Sea Gull                        18         18          1
Ring-billed Gull                       50         19        1.1
Double-crested Cormorant               54         33        5.5
Common Merganser                       34         27        3.2
Turkey Vulture                         72         32        3.3
American Crow                        39.6         18        1.4
Green Heron                          26.8         22         .4
Mute Swan                              99         99         99
Brown Pelican                          99         99         99
Great Egret                          67.2         38        3.3
Anhinga                                42         35        2.4
Black Skimmer                          15         20          1

23 rows selected.

Subqueries with the DELETE Statement

You can also use subqueries with the DELETE statement. The basic syntax follows:

DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
                (SELECT COLUMN_NAME
                FROM TABLE_NAME)
                [ WHERE) ]

Let’s review the data in the ABOVE_AVG_BIRDS table.

SQL> select bird_name, wingspan, height, weight
  2  from above_avg_birds;

BIRD_NAME                        WINGSPAN     HEIGHT     WEIGHT
------------------------------ ---------- ---------- ----------
Common Loon                            54         36         18
Bald Eagle                             84         37         14
Golden Eagle                           90         40         15
Canadian Goose                         72         43         14
Mute Swan                            94.8         60         26
Brown Pelican                          90         54        7.6

6 rows selected.

You delete the rows of data from the ABOVE_AVG_BIRDS table for any birds that have either a wingspan equal to the maximum wingspan currently stored in the table or a height that is exactly the same as the minimum height value for any birds in the table.

SQL> delete from above_avg_birds
  2  where wingspan = (select max(wingspan) from above_avg_birds)
  3  or height = (select min(height) from above_avg_birds);

2 rows deleted.

Now if you select all records from the ABOVE_AVG_BIRDS table, you can see that two rows of data are deleted for the birds that meet the criteria in the subqueries of the DELETE statement.

SQL> select bird_name, wingspan, height, weight
  2  from above_avg_birds;

BIRD_NAME                        WINGSPAN     HEIGHT     WEIGHT
------------------------------ ---------- ---------- ----------
Bald Eagle                             84         37         14
Golden Eagle                           90         40         15
Canadian Goose                         72         43         14
Brown Pelican                          90         54        7.6

4 rows selected.

Embedded Subqueries

You can embed a subquery within another subquery, just as you can embed the subquery within a regular query. When a subquery is used, that subquery is resolved before the main query. Likewise, the lowest-level subquery is resolved first in embedded or nested subqueries, working out to the main query.

Note

Check the Limits of Your System

Check your particular implementation for limits on the number of subqueries, if any, that you can use in a single statement. This might differ between vendors.

The basic syntax for embedded subqueries follows:

SELECT COLUMN_NAME [, COLUMN_NAME ]
FROM TABLE1 [, TABLE2 ]
WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME
                                   FROM TABLE
                                   WHERE COLUMN_NAME OPERATOR
                                          (SELECT COLUMN_NAME
                                          FROM TABLE
                                         [ WHERE COLUMN_NAME OPERATOR VALUE ]))

In the following query, you are looking for a list of birds whose BIRD_ID is found in a list of BIRD_IDs for only birds that migrate to the location of Mexico. A simple subquery gathers this data.

SQL> select bird_name
  2  from birds
  3  where bird_id in (select bird_id
  4         from birds_migration bm,
  5       migration m
  6         where bm.migration_id = m.migration_id
  7    and m.migration_location = 'Mexico');

BIRD_NAME
------------------------------
Great Blue Heron
Common Loon
Osprey
Belted Kingfisher
Pied-billed Grebe
American Coot
Common Sea Gull
Ring-billed Gull
Double-crested Cormorant
Common Merganser
Turkey Vulture
Green Heron
Great Egret
Anhinga

14 rows selected.

The following query, which does not use a subquery, takes a different approach to verify the output of the previous query. The data results are the same.

SQL> select m.migration_location, b.bird_name
  2  from birds b,
  3    birds_migration bm,
  4    migration m
  5  where b.bird_id = bm.bird_id
  6    and bm.migration_id = m.migration_id
  7    and m.migration_location = 'Mexico';

MIGRATION_LOCATION             BIRD_NAME
------------------------------ ------------------------------
Mexico                         Great Blue Heron
Mexico                         Common Loon
Mexico                         Osprey
Mexico                         Belted Kingfisher
Mexico                         Pied-billed Grebe
Mexico                         American Coot
Mexico                         Common Sea Gull
Mexico                         Ring-billed Gull
Mexico                         Double-crested Cormorant
Mexico                         Common Merganser
Mexico                         Turkey Vulture
Mexico                         Green Heron
Mexico                         Great Egret
Mexico                         Anhinga

14 rows selected.

In this next example, embedded subqueries further break down data. Not only do you want to see a list of birds found in a list of birds that migrate to the location of Mexico, but you also want to see only birds within that location that eat fish. Study the results closely and compare them to the previous examples.

SQL> select bird_name
  2  from birds
  3  where bird_id in
  4    (select bird_id
  5     from birds_migration bm,
  6  migration m
  7     where bm.migration_id = m.migration_id
  8       and m.migration_location = 'Mexico'
  9       and bm.bird_id in
 10   (select bf.bird_id
 11    from birds_food bf,
 12         food f
 13    where bf.food_id = f.food_id
 14      and f.food_name = 'Fish'));

BIRD_NAME
------------------------------
Great Blue Heron
Common Loon
Osprey
Belted Kingfisher
Common Sea Gull
Ring-billed Gull
Double-crested Cormorant
Common Merganser
Green Heron
Great Egret
Anhinga

11 rows selected.

Caution

Subquery Performance

Subquery use can either improve or degrade query performance—it depends on how many tables exist, how the tables are joined, how many conditions exist, how many embedded subqueries are used, and many other factors (including indexes on tables). The use of multiple subqueries results in slower response time and can result in reduced accuracy of the results due to possible mistakes in the statement coding. Consider that a subquery must be evaluated before the main part of the query, so the time that it takes to execute the subquery has a direct effect on the time it takes for the main query to execute.

Using Correlated Subqueries

Correlated subqueries are common in many SQL implementations. The concept of correlated subqueries is discussed as an ANSI-standard SQL topic and is covered briefly in this hour. A correlated subquery is a subquery that is dependent on information in the main query. This means that tables in a subquery can be related to tables in the main query.

Note

Proper Use of Correlated Subqueries

For a correlated subquery, you must reference the table in the main query before you can resolve the subquery.

In the following example, you are looking for a list of only birds and their associated wingspan that have a nickname containing the word eagle. In this example, instead of joining the BIRDS and NICKNAME tables together in the main query, you join the NICKNAMES table found in the subquery with the BIRDS table found in the main query. In this case, the subquery is dependent on the main query to return data.

SQL> select bird_id, bird_name, wingspan
  2  from birds
  3  where bird_id in (select bird_id
  4         from nicknames
  5         where birds.bird_id = nicknames.bird_id
  6   and nicknames.nickname like '%Eagle%');

   BIRD_ID BIRD_NAME                        WINGSPAN
---------- ------------------------------ ----------
         4 Bald Eagle                             99
         5 Golden Eagle                           99

2 rows selected.

Summary

By simple definition and general concept, a subquery is a query that is performed within another query to place further conditions on a query. You can use a subquery in a SQL statement’s WHERE clause or HAVING clause. Queries are typically used within other queries (Data Query Language), but you can also use them in the resolution of DML statements such as INSERT, UPDATE, and DELETE. All basic rules for DML apply when using subqueries with DML commands.

The subquery’s syntax is virtually the same as that of a standalone query, with a few minor restrictions. One of these restrictions is that you cannot use the ORDER BY clause within a subquery; however, you can use a GROUP BY clause, which renders virtually the same effect. Subqueries are used to place conditions that are not necessarily known for a query, providing more power and flexibility with SQL.

Q&A

Q. Is there a limit on the number of embedded subqueries that can be used in a single query?

A. Limitations such as the number of embedded subqueries allowed and the number of tables joined in a query are specific to each implementation. Some implementations do not have limits, although using too many embedded subqueries can drastically hinder SQL statement performance. Most limitations are affected by the actual hardware, CPU speed, and system memory available, but many other considerations also come into play.

Q. It seems that debugging a query with subqueries can be confusing, especially with embedded subqueries. What is the best way to debug a query with subqueries?

A. The best way to debug a query with subqueries is to evaluate the query in sections. First, evaluate the lowest-level subquery and then work your way to the main query (the same way the database evaluates the query). When you evaluate each subquery individually, you can substitute the returned values for each subquery to check your main query’s logic. An error with a subquery often results from the use of the operator that evaluates the subquery, such as (=), IN, >, and <.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

  1. 1. What is the function of a subquery when used with a SELECT statement?

  2. 2. Can you update more than one column when using the UPDATE statement with a subquery?

  3. 3. Can you embed subqueries within other subqueries?

  4. 4. What is a subquery called that has a column related to a column in the main query?

  5. 5. Can you embed subqueries within other subqueries?

  6. 6. What is an example of an operator that cannot be used when accessing a subquery?

Exercises

  1. 1. Write a query with a subquery to create a list of birds and their wingspans for birds that have a wingspan less than the average wingspan in the BIRDS table.

  2. 2. Produce a list of birds and their associated migration locations for only birds that migrate to locations that have birds migrating there with an above average wingspan.

  3. 3. Use a subquery to find any food items that are eaten by the shortest bird in the database.

  4. 4. Applying the concept of a subquery, create a new table called BIRD_APPETIZERS based on the following information: This new table should list the FOOD_ID and FOOD_NAME, but only for food items associated with birds that are in the bottom 25 percentile of height.