What You’ll Learn in This Hour:
▶ Defining a subquery
▶ Justifying the use of subqueries
▶ Understanding examples of subqueries in regular database queries
▶ Using subqueries with data manipulation commands
▶ Using correlated subqueries to make subqueries specific
In this hour, you look at the concept of subqueries, a means by which you can perform additional queries of information from within the same SQL statement. Using subqueries enables you to easily perform complex queries that might rely on complex subsets of data in your database.
A subquery, also known as a nested query, is a query embedded within the WHERE
clause of another query to further restrict data returned by the query. A subquery returns data that is used in the main query as a condition to further restrict the data to be retrieved. Subqueries are employed with the SELECT
, INSERT
, UPDATE
, and DELETE
statements.
In some cases, you can use a subquery in place of a join operation by indirectly linking data between the tables based on one or more conditions. When you have a subquery in a query, the subquery is resolved first and then the main query is resolved according to the condition(s) resolved by the subquery. The results of the subquery process expressions in the WHERE
clause of the main query. You can use the subquery in either the WHERE
clause or the HAVING
clause of the main query. Additionally, you can use logical and relational operators such as =
, >
, <
, <>
,!=, IN
, NOT IN
, AND
, OR
, and so on within the subquery to evaluate a subquery in the WHERE
or HAVING
clause.
Subqueries must follow a few rules:
▶ Subqueries must be enclosed within parentheses.
▶ A subquery can have only one column in the SELECT
clause, unless multiple columns exist in the main query for the subquery to compare its selected columns.
▶ You cannot use an ORDER BY
clause in a subquery, although the main query can use an ORDER BY
clause. You can use the GROUP BY
clause to perform the same function as the ORDER BY
clause in a subquery.
▶ You can use only subqueries that return more than one row with multiple value operators, such as the IN
operator.
▶ The SELECT
list cannot include references to values that evaluate to a BLOB
, ARRAY
, CLOB
, or NCLOB
.
▶ You cannot immediately enclose a subquery in a SET
function.
▶ You cannot use the BETWEEN
operator with a subquery; however, you can use the BETWEEN
operator within the subquery.
Note
The Rules of Using Subqueries
The same rules that apply to standard queries also apply to subqueries. You can use join operations, functions, conversions, and other options within a subquery.
The basic syntax for a subquery follows:
SELECT COLUMN_NAME FROM TABLE WHERE COLUMN_NAME = (SELECT COLUMN_NAME FROM TABLE WHERE CONDITIONS);
The following examples show how you can and cannot use the BETWEEN
operator with a subquery. Consider this example of the correct use of BETWEEN
in the subquery:
SELECT COLUMN_NAME FROM TABLE_A WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE_B) WHERE VALUE BETWEEN VALUE)
You cannot use BETWEEN
as an operator outside the subquery. The following is an example of the illegal use of BETWEEN
with a subquery:
SELECT COLUMN_NAME FROM TABLE_A WHERE COLUMN_NAME BETWEEN VALUE AND (SELECT COLUMN_NAME FROM TABLE_B)
As you soon see in this hour, subqueries produce either a single value or a list of values, just as a query does. When a subquery is embedded within the main query, the result set of the main query must logically make sense when compared to the data set that the subquery returns. The subquery is resolved first and the data returned is substituted into the main query.
SELECT
StatementSubqueries are most frequently used with the SELECT
statement, although you can use them within a data manipulation statement as well. When employed with the SELECT
statement, the subquery retrieves data for the main query to use.
The basic syntax follows:
SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ] WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ] [ WHERE ])
Consider an example. Let’s say that you want to query the database to get a list of birds that have an above average wingspan. The WINGSPAN
column is stored in the database, but the average wingspan is not stored. Therefore, you cannot know the value of the average wingspan without first querying for the average wingspan from the BIRDS
table. However, even if you perform this query, the average wingspan is a dynamic data value, which means that it can change at any given point in time as new birds are inserted into the database or as the wingspan data is updated. Thus, not only do you not know the value that you need to supply to the query, but that value might change the next time the query is run.
First, you query for the average wingspan from the BIRDS
table.
SQL> select avg(wingspan) 2 from birds; AVG(WINGSPAN) ------------- 50.5695652 1 row selected.
Now that you have the current value for the average wingspan (which is only for comparison purposes in this example), you use the same SELECT
statement and substitute that value into the WHERE
clause of the query. This is a simple example of a subquery.
SQL> select bird_name "BIRD", 2 wingspan "WINGSPAN ABOVE AVERAGE" 3 from birds 4 where wingspan > 5 (select avg(wingspan) from birds) 6 order by 2 desc; BIRD WINGSPAN ABOVE AVERAGE ------------------------------ ---------------------- Mute Swan 94.8 Brown Pelican 90 Golden Eagle 90 Bald Eagle 84 Great Blue Heron 78 Osprey 72 Canadian Goose 72 Turkey Vulture 72 Great Egret 67.2 Common Loon 54 Double-crested Cormorant 54 11 rows selected.
You can see here that output was generated showing the name of a bird and the wingspan of each bird, but only for birds that have an above average wingspan. Compare the value of the previous query of 50 inches to the output of this query.
Tip
Using Subqueries for Unknown Values
Subqueries are frequently used to place conditions on a query when the exact conditions are unknown. The salary for 3908
in the previous example was unknown, but the subquery was designed to do the footwork for you.
Following is an example of a subquery attempt using an operator such as BETWEEN
that creates an illogical comparison. Notice that the Oracle database returns an error. This is because the average wingspan value that the subquery returns is a single value, and the BETWEEN
operator is looking for exactly two values. Additionally, the AND
operator is required when using BETWEEN
, which cannot be done here in the subquery and simply does not make sense.
SQL> select bird_name "BIRD", 2 wingspan "WINGSPAN ABOVE AVERAGE" 3 from birds 4 where wingspan between 5 (select avg(wingspan) from birds) 6 order by 2 desc; order by 2 desc * ERROR at line 6: ORA-00905: missing keyword
The following query is a more appropriate example with a subquery that returns multiple records that are substituted into the main query using the IN
operator.
SQL> select bird_name "BIRD", 2 wingspan "WINGSPAN ABOVE AVERAGE" 3 from birds 4 where wingspan in 5 (select wingspan from birds 6 where wingspan > 48) 7 order by 2 desc; BIRD WINGSPAN ABOVE AVERAGE ------------------------------ ---------------------- Mute Swan 94.8 Brown Pelican 90 Golden Eagle 90 Bald Eagle 84 Great Blue Heron 78 Canadian Goose 72 Turkey Vulture 72 Osprey 72 Great Egret 67.2 Double-crested Cormorant 54 Common Loon 54 Ring-billed Gull 50 12 rows selected.
CREATE TABLE
StatementYou can also use subqueries with Data Definition Language (DDL) statements. The CREATE TABLE
that follows uses a subquery to gather the appropriate data. A table called ABOVE_AVG_BIRDS
is being created based on the original BIRDS
table, but only for birds that have an above average wingspan, an above average height, and an above average weight. All three conditions that have subqueries must be true for data to be returned. For example, records are created in this new table, ABOVE_AVG_BIRDS
, only where:
▶ A bird’s wingspan is greater than the average wingspan of all birds (subquery 1) and
▶ A bird’s height is greater than the average height of all birds (subquery 2) and
▶ A bird’s weight is greater than the average weight of all birds (subquery 3)
SQL> create table above_avg_birds as 2 select bird_id, bird_name, wingspan, height, weight 3 from birds 4 where wingspan > (select avg(wingspan) from birds) 5 and height > (select avg(height) from birds) 6 and weight > (select avg(weight) from birds); Table created.
Study the following data within the new ABOVE_AVG_BIRDS
table:
SQL> select * 2 from above_avg_birds; BIRD_ID BIRD_NAME WINGSPAN HEIGHT WEIGHT ---------- ------------------- ---------- ---------- ---------- 3 Common Loon 54 36 18 4 Bald Eagle 84 37 14 5 Golden Eagle 90 40 15 9 Canadian Goose 72 43 14 19 Mute Swan 94.8 60 26 20 Brown Pelican 90 54 7.6 6 rows selected.
You do not know the current value of any of these averages, but using a subquery, you can easily get a real-time value for each and create a new table specifically based on those results.
INSERT
StatementYou can also use subqueries with Data Manipulation Language (DML) statements. The INSERT
statement is the first instance you examine. It uses the data that the subquery returns and inserts it into another table. You can modify the selected data in the subquery with any of the character, date, or number functions.
Note
Always Remember to Save Your Transactions
Remember to use the COMMIT
and ROLLBACK
commands when using DML commands such as the INSERT
statement.
The basic syntax follows:
INSERT INTO TABLE_NAME [ (COLUMN1 [, COLUMN2 ]) ] SELECT [ *|COLUMN1 [, COLUMN2 ] FROM TABLE1 [, TABLE2 ] [ WHERE VALUE OPERATOR ]
Before proceeding, you first need to truncate the ABOVE_AVG_BIRDS
table to remove any existing data from that table.
SQL> truncate table above_avg_birds; Table truncated.
Now you select all records from the ABOVE_AVG_BIRDS
table to verify that no data exists.
SQL> select * 2 from above_avg_birds; no rows selected
Next, you issue an INSERT
statement into the ABOVE_AVG_BIRDS
table, selecting data from the BIRDS
table using a subquery. You can see that six new rows of data are created in the new table.
SQL> insert into above_avg_birds 2 select bird_id, bird_name, wingspan, height, weight 3 from birds 4 where wingspan > (select avg(wingspan) from birds) 5 and height > (select avg(height) from birds) 6 and weight > (select avg(weight) from birds); 6 rows created.
Review the following results and study the data that was inserted into the new table.
SQL> select * 2 from above_avg_birds; BIRD_ID BIRD_NAME WINGSPAN HEIGHT WEIGHT ---------- ------------------- ---------- ---------- ---------- 3 Common Loon 54 36 18 4 Bald Eagle 84 37 14 5 Golden Eagle 90 40 15 9 Canadian Goose 72 43 14 19 Mute Swan 94.8 60 26 20 Brown Pelican 90 54 7.6 6 rows selected.
This INSERT
statement inserts the EMPLOYEEID
, LASTNAME
, FIRSTNAME
, and SALARY
into a table called RICH_EMPLOYEES
for all records of employees who have a pay rate greater than the pay rate of the employee with identification 3908
.
UPDATE
StatementYou can use subqueries with the UPDATE
statement to update single or multiple columns in a table. The basic syntax follows:
UPDATE TABLE SET COLUMN_NAME [, COLUMN_NAME) ] = (SELECT ]COLUMN_NAME [, COLUMN_NAME) ] FROM TABLE [ WHERE ]
Following is an example of an UPDATE
statement that uses similar subqueries as the previous example to update the values in the WINGSPAN
column for birds that are above average in three categories: wingspan, height, and weight.
SQL> update birds 2 set wingspan = 99, height = 99, weight = 99 3 where wingspan > (select avg(wingspan) from birds) 4 and height > (select avg(height) from birds) 5 and weight > (select avg(weight) from birds); 6 rows updated.
Now if you query the BIRDS
table, you see that the values in the columns have been set to the number 99
, based on the criteria used in the subqueries.
SQL> select bird_name, wingspan, height, weight 2 from birds; BIRD_NAME WINGSPAN HEIGHT WEIGHT ------------------------------ ---------- ---------- ---------- Great Blue Heron 78 52 5.5 Mallard 3.2 28 3.5 Common Loon 99 99 99 Bald Eagle 99 99 99 Golden Eagle 99 99 99 Red Tailed Hawk 48 25 2.4 Osprey 72 24 3 Belted Kingfisher 23 13 .33 Canadian Goose 99 99 99 Pied-billed Grebe 6.5 13 1 American Coot 29 16 1 Common Sea Gull 18 18 1 Ring-billed Gull 50 19 1.1 Double-crested Cormorant 54 33 5.5 Common Merganser 34 27 3.2 Turkey Vulture 72 32 3.3 American Crow 39.6 18 1.4 Green Heron 26.8 22 .4 Mute Swan 99 99 99 Brown Pelican 99 99 99 Great Egret 67.2 38 3.3 Anhinga 42 35 2.4 Black Skimmer 15 20 1 23 rows selected.
DELETE
StatementYou can also use subqueries with the DELETE
statement. The basic syntax follows:
DELETE FROM TABLE_NAME [ WHERE OPERATOR [ VALUE ] (SELECT COLUMN_NAME FROM TABLE_NAME) [ WHERE) ]
Let’s review the data in the ABOVE_AVG_BIRDS
table.
SQL> select bird_name, wingspan, height, weight 2 from above_avg_birds; BIRD_NAME WINGSPAN HEIGHT WEIGHT ------------------------------ ---------- ---------- ---------- Common Loon 54 36 18 Bald Eagle 84 37 14 Golden Eagle 90 40 15 Canadian Goose 72 43 14 Mute Swan 94.8 60 26 Brown Pelican 90 54 7.6 6 rows selected.
You delete the rows of data from the ABOVE_AVG_BIRDS
table for any birds that have either a wingspan equal to the maximum wingspan currently stored in the table or a height that is exactly the same as the minimum height value for any birds in the table.
SQL> delete from above_avg_birds 2 where wingspan = (select max(wingspan) from above_avg_birds) 3 or height = (select min(height) from above_avg_birds); 2 rows deleted.
Now if you select all records from the ABOVE_AVG_BIRDS
table, you can see that two rows of data are deleted for the birds that meet the criteria in the subqueries of the DELETE
statement.
SQL> select bird_name, wingspan, height, weight 2 from above_avg_birds; BIRD_NAME WINGSPAN HEIGHT WEIGHT ------------------------------ ---------- ---------- ---------- Bald Eagle 84 37 14 Golden Eagle 90 40 15 Canadian Goose 72 43 14 Brown Pelican 90 54 7.6 4 rows selected.
You can embed a subquery within another subquery, just as you can embed the subquery within a regular query. When a subquery is used, that subquery is resolved before the main query. Likewise, the lowest-level subquery is resolved first in embedded or nested subqueries, working out to the main query.
Note
Check the Limits of Your System
Check your particular implementation for limits on the number of subqueries, if any, that you can use in a single statement. This might differ between vendors.
The basic syntax for embedded subqueries follows:
SELECT COLUMN_NAME [, COLUMN_NAME ] FROM TABLE1 [, TABLE2 ] WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE WHERE COLUMN_NAME OPERATOR (SELECT COLUMN_NAME FROM TABLE [ WHERE COLUMN_NAME OPERATOR VALUE ]))
In the following query, you are looking for a list of birds whose BIRD_ID
is found in a list of BIRD_ID
s for only birds that migrate to the location of Mexico. A simple subquery gathers this data.
SQL> select bird_name 2 from birds 3 where bird_id in (select bird_id 4 from birds_migration bm, 5 migration m 6 where bm.migration_id = m.migration_id 7 and m.migration_location = 'Mexico'); BIRD_NAME ------------------------------ Great Blue Heron Common Loon Osprey Belted Kingfisher Pied-billed Grebe American Coot Common Sea Gull Ring-billed Gull Double-crested Cormorant Common Merganser Turkey Vulture Green Heron Great Egret Anhinga 14 rows selected.
The following query, which does not use a subquery, takes a different approach to verify the output of the previous query. The data results are the same.
SQL> select m.migration_location, b.bird_name 2 from birds b, 3 birds_migration bm, 4 migration m 5 where b.bird_id = bm.bird_id 6 and bm.migration_id = m.migration_id 7 and m.migration_location = 'Mexico'; MIGRATION_LOCATION BIRD_NAME ------------------------------ ------------------------------ Mexico Great Blue Heron Mexico Common Loon Mexico Osprey Mexico Belted Kingfisher Mexico Pied-billed Grebe Mexico American Coot Mexico Common Sea Gull Mexico Ring-billed Gull Mexico Double-crested Cormorant Mexico Common Merganser Mexico Turkey Vulture Mexico Green Heron Mexico Great Egret Mexico Anhinga 14 rows selected.
In this next example, embedded subqueries further break down data. Not only do you want to see a list of birds found in a list of birds that migrate to the location of Mexico, but you also want to see only birds within that location that eat fish. Study the results closely and compare them to the previous examples.
SQL> select bird_name 2 from birds 3 where bird_id in 4 (select bird_id 5 from birds_migration bm, 6 migration m 7 where bm.migration_id = m.migration_id 8 and m.migration_location = 'Mexico' 9 and bm.bird_id in 10 (select bf.bird_id 11 from birds_food bf, 12 food f 13 where bf.food_id = f.food_id 14 and f.food_name = 'Fish')); BIRD_NAME ------------------------------ Great Blue Heron Common Loon Osprey Belted Kingfisher Common Sea Gull Ring-billed Gull Double-crested Cormorant Common Merganser Green Heron Great Egret Anhinga 11 rows selected.
Caution
Subquery Performance
Subquery use can either improve or degrade query performance—it depends on how many tables exist, how the tables are joined, how many conditions exist, how many embedded subqueries are used, and many other factors (including indexes on tables). The use of multiple subqueries results in slower response time and can result in reduced accuracy of the results due to possible mistakes in the statement coding. Consider that a subquery must be evaluated before the main part of the query, so the time that it takes to execute the subquery has a direct effect on the time it takes for the main query to execute.
Correlated subqueries are common in many SQL implementations. The concept of correlated subqueries is discussed as an ANSI-standard SQL topic and is covered briefly in this hour. A correlated subquery is a subquery that is dependent on information in the main query. This means that tables in a subquery can be related to tables in the main query.
Note
Proper Use of Correlated Subqueries
For a correlated subquery, you must reference the table in the main query before you can resolve the subquery.
In the following example, you are looking for a list of only birds and their associated wingspan that have a nickname containing the word eagle. In this example, instead of joining the BIRDS
and NICKNAME
tables together in the main query, you join the NICKNAMES
table found in the subquery with the BIRDS
table found in the main query. In this case, the subquery is dependent on the main query to return data.
SQL> select bird_id, bird_name, wingspan 2 from birds 3 where bird_id in (select bird_id 4 from nicknames 5 where birds.bird_id = nicknames.bird_id 6 and nicknames.nickname like '%Eagle%'); BIRD_ID BIRD_NAME WINGSPAN ---------- ------------------------------ ---------- 4 Bald Eagle 99 5 Golden Eagle 99 2 rows selected.
By simple definition and general concept, a subquery is a query that is performed within another query to place further conditions on a query. You can use a subquery in a SQL statement’s WHERE
clause or HAVING
clause. Queries are typically used within other queries (Data Query Language), but you can also use them in the resolution of DML statements such as INSERT
, UPDATE
, and DELETE
. All basic rules for DML apply when using subqueries with DML commands.
The subquery’s syntax is virtually the same as that of a standalone query, with a few minor restrictions. One of these restrictions is that you cannot use the ORDER BY
clause within a subquery; however, you can use a GROUP BY
clause, which renders virtually the same effect. Subqueries are used to place conditions that are not necessarily known for a query, providing more power and flexibility with SQL.
Q. Is there a limit on the number of embedded subqueries that can be used in a single query?
A. Limitations such as the number of embedded subqueries allowed and the number of tables joined in a query are specific to each implementation. Some implementations do not have limits, although using too many embedded subqueries can drastically hinder SQL statement performance. Most limitations are affected by the actual hardware, CPU speed, and system memory available, but many other considerations also come into play.
Q. It seems that debugging a query with subqueries can be confusing, especially with embedded subqueries. What is the best way to debug a query with subqueries?
A. The best way to debug a query with subqueries is to evaluate the query in sections. First, evaluate the lowest-level subquery and then work your way to the main query (the same way the database evaluates the query). When you evaluate each subquery individually, you can substitute the returned values for each subquery to check your main query’s logic. An error with a subquery often results from the use of the operator that evaluates the subquery, such as (=)
, IN
, >
, and <
.
The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. What is the function of a subquery when used with a SELECT
statement?
2. Can you update more than one column when using the UPDATE
statement with a subquery?
3. Can you embed subqueries within other subqueries?
4. What is a subquery called that has a column related to a column in the main query?
5. Can you embed subqueries within other subqueries?
6. What is an example of an operator that cannot be used when accessing a subquery?
1. Write a query with a subquery to create a list of birds and their wingspans for birds that have a wingspan less than the average wingspan in the BIRDS
table.
2. Produce a list of birds and their associated migration locations for only birds that migrate to locations that have birds migrating there with an above average wingspan.
3. Use a subquery to find any food items that are eaten by the shortest bird in the database.
4. Applying the concept of a subquery, create a new table called BIRD_APPETIZERS
based on the following information: This new table should list the FOOD_ID
and FOOD_NAME
, but only for food items associated with birds that are in the bottom 25 percentile of height.