What You’ll Learn in This Hour:
▶ Defining table joins
▶ Identifying the different types of joins
▶ Understanding how and when joins are used
▶ Taking a look at practical examples of table joins
▶ Understanding the effects of improperly joined tables
▶ Renaming tables in a query using an alias
To this point, all database queries you have executed in this book have extracted data from a single table. However, examples of manual joins have been shown in exercises. In a manual join, you separately select data for multiple tables to get the final result set that you want. During this hour, you learn how to join tables in a query so that you can efficiently and automatically retrieve data from multiple tables.
One of the most powerful features of SQL is the capability to select data from multiple tables. Without this capability, the entire relational database concept would not be feasible. Single-table queries are sometimes quite informative, but in the real world, the most practical queries are the ones whose data is acquired from multiple tables within the database.
As you witnessed throughout this book, especially during the database design hours, a relational database is broken into smaller, more manageable tables for simplicity and the ease of overall management. As tables are divided into smaller tables, the related tables are created with common columns: primary keys and foreign keys. These keys are used to join related tables to one another.
Why should you normalize tables if, in the end, you are going to rejoin the tables to retrieve the data you want? You rarely select all data from all tables, so picking and choosing according to the needs of each query is more beneficial. Performance might suffer slightly from a normalized database, but overall coding and maintenance are much simpler. Remember that you generally normalize the database to reduce redundancy and increase data integrity. Your overreaching task as a database administrator is to safeguard data.
A join combines two or more tables to retrieve data from multiple tables. Different implementations have many ways of joining tables, so you concentrate on the most common joins in this lesson. This hour covers the following types of joins:
▶ Equijoins, or inner joins
▶ Non-equijoins
▶ Outer joins
▶ Self joins
As you have learned in previous hours, both the SELECT
and FROM
clauses are required SQL statement elements; the WHERE
clause is a required element of a SQL statement when joining tables. The tables joined are listed in the FROM
clause. The join is performed in the WHERE
clause. Several operators can be used to join tables, including =
, <
, >
, <>
, <=
, >=
, !=
, BETWEEN
, LIKE
, and NOT
. However, the most common operator is the equals symbol.
Figure 14.1 shows two tables in a database, EMPLOYEES
and DEPENDENTS
, that have a relationship through the ID
column. Each employee can have multiple dependents or none. Each dependent must have an associated employee record. (This is an example shown in Hour 1, “Understanding the Relational Database and SQL.”) This concept encapsulates the primary concepts of a relational database. In this case, two tables are shown with a parent/child relationship. A parent record can have zero to many children. Each child record must have a parent (at least, if primary key and foreign key relationships are properly defined). Study the data in Figure 14.1.
FIGURE 14.1
Joining tables
Let’s query data from the two tables in Figure 14.1. Who are the dependents of Kelly Mitchell? This is easy to resolve because you can see all the data. Kelly Mitchell’s ID
is 4
. If you look to the DEPENDENTS
table and search for Kelly Mitchell’s ID
of 4
, you can easily see that Kelly Mitchell has three records, or rows of data, associated with her ID
: Laura
, Amy
, and Kevin
. This is a simple yet effective example of a table join. The rest of the hour elaborates on this concept.
Perhaps the most used and most important of the joins is the equijoin, also referred to as an inner join. The equijoin joins two tables with a common column, in which each is usually the primary key.
The syntax for an equijoin follows:
SELECT TABLE1.COLUMN1, TABLE2.COLUMN2... FROM TABLE1, TABLE2 [, TABLE3 ] WHERE TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME [ AND TABLE1.COLUMN_NAME = TABLE3.COLUMN_NAME ]
Look at the following example:
SQL> select birds.bird_id, birds.bird_name, 2 nicknames.bird_id, nicknames.nickname 3 from birds, nicknames 4 where birds.bird_id = nicknames.bird_id 5 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- ------------------------------ ---------- ------------------------------ 1 Great Blue Heron 1 Big Cranky 1 Great Blue Heron 1 Blue Crane 2 Mallard 2 Green Cap 2 Mallard 2 Green Head 3 Common Loon 3 Great Northern Diver 4 Bald Eagle 4 Eagle 4 Bald Eagle 4 Sea Eagle 5 Golden Eagle 5 War Eagle 6 Red Tailed Hawk 6 Chicken Hawk 7 Osprey 7 Sea Hawk 8 Belted Kingfisher 8 Preacher Bird 19 Mute Swan 19 Tundra 20 Brown Pelican 20 Pelican 21 Great Egret 21 Common Egret 21 Great Egret 21 White Egret 22 Anhinga 22 Snake Bird 22 Anhinga 22 Spanish Crossbird 22 Anhinga 22 Water Turkey 23 Black Skimmer 23 Sea Dog 19 rows selected.
This SQL statement is a query against the BIRDS
and NICKNAMES
tables. It displays the BIRD_NAME
and the associated NICKNAME
for each bird. The BIRD_ID
is selected twice, to prove that the two tables have been properly joined. For this example, numerous records were deleted from the NICKNAMES
table so that the result set displays easily on one page in the book. You need to tell the query how the tables are related, which is the purpose of the WHERE
clause. Here you specify that the two tables are linked through the BIRD_ID
column. Because the BIRD_ID
exists in both tables, you must justify both columns with the table name in your column listing. By justifying the columns with the table names, you tell the database server where to get the data.
Data in the following example is selected from BIRDS
and NICKNAMES
because the desired data resides in each of the two tables. An equijoin is used.
SQL> select birds.bird_id, birds.bird_name, 2 nicknames.bird_id, nicknames.nickname 3 from birds, nicknames 4 where birds.bird_id = nicknames.bird_id 5 and birds.bird_name like '%Eagle%' 6 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- ------------------------------ ---------- ------------------------------ 4 Bald Eagle 4 Eagle 4 Bald Eagle 4 Sea Eagle 5 Golden Eagle 5 War Eagle 3 rows selected.
Notice that each column in the SELECT
clause is preceded by the associated table name to identify each column. This is called qualifying columns in a query. Qualifying columns is necessary only for columns that exist in more than one table referenced by a query. You usually qualify all columns, for consistency and to avoid questions when debugging or modifying SQL code.
In addition, the SQL syntax provides a version by introducing the JOIN
syntax. The JOIN
syntax follows; it is essentially the same as performing the join in the WHERE
clause if you study the syntax:
SELECT TABLE1.COLUMN1, TABLE2.COLUMN2... FROM TABLE1 INNER JOIN TABLE2 ON TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME
As you can see, the JOIN
operator is removed from the WHERE
clause and instead is replaced with the JOIN
syntax. The table joined is added after the JOIN
syntax; then the JOIN
operators are placed after the ON
qualifier.
You use table aliases to rename a table in a particular SQL statement. Renaming is temporary; the actual table name does not change in the database. As you learn later in the “Self Joins” section, giving the tables aliases is a necessity for the self join. Aliases are most often used to save keystrokes, which results in a shorter and easier-to-read SQL statement. In addition, fewer keystrokes means fewer keystroke errors. Programming errors also are typically less frequent if you can refer to an alias, which is often shorter and more descriptive of the data you are working with. Using table aliases also means that the columns selected must be qualified with the table alias.
Following is the same query as the previous one, but aliases have been assigned to table names. After you assign an alias to a table name, you can use that alias within the query for shorthand and better readability.
SQL> select B.bird_id, B.bird_name, 2 N.bird_id, N.nickname 3 from birds B, 4 nicknames N 5 where B.bird_id = N.bird_id 6 and B.bird_name like '%Eagle%' 7 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- ------------------------------ ---------- ------------------------------ 4 Bald Eagle 4 Eagle 4 Bald Eagle 4 Sea Eagle 5 Golden Eagle 5 War Eagle 3 rows selected.
In the preceding SQL statement, BIRDS
has been renamed B
and NICKNAMES
has been renamed N
. The choice of what to rename the tables is arbitrary; these letters were chosen here because BIRDS
starts with B
and NICKNAMES
starts with N
. The selected columns were justified with the corresponding table alias. Note that BIRD_NAME
was used in the WHERE
clause and was justified with the table alias as well.
A non-equijoin joins two or more tables based on a specified column value not equaling a specified column value in another table. The syntax for the non-equijoin follows:
FROM TABLE1, TABLE2 [, TABLE3 ] WHERE TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME [ AND TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME ]
The example that follows issues the same query as before, but it replaces the join of equality on line 5 with a test for non-equality. Every row of data in the BIRDS
table is paired with every unrelated row of data in the NICKNAMES
table, so this query basically tells you which nicknames each bird does not have. Again, this is a subset of data from the NICKNAMES
table.
SQL> select B.bird_id, B.bird_name, 2 N.bird_id, N.nickname 3 from birds B, 4 nicknames N 5 where B.bird_id != N.bird_id 6 and B.bird_name like '%Eagle%' 7 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- ------------------------------ ---------- ------------------------------ 4 Bald Eagle 1 Big Cranky 4 Bald Eagle 1 Blue Crane 4 Bald Eagle 2 Green Cap 4 Bald Eagle 2 Green Head 4 Bald Eagle 3 Great Northern Diver 4 Bald Eagle 5 War Eagle 4 Bald Eagle 6 Chicken Hawk 4 Bald Eagle 7 Sea Hawk 4 Bald Eagle 8 Preacher Bird 4 Bald Eagle 19 Tundra 4 Bald Eagle 20 Pelican 4 Bald Eagle 21 Common Egret 4 Bald Eagle 21 White Egret 4 Bald Eagle 22 Snake Bird 4 Bald Eagle 22 Spanish Crossbird 4 Bald Eagle 22 Water Turkey 4 Bald Eagle 23 Sea Dog 5 Golden Eagle 1 Big Cranky 5 Golden Eagle 1 Blue Crane 5 Golden Eagle 2 Green Cap 5 Golden Eagle 2 Green Head 5 Golden Eagle 3 Great Northern Diver 5 Golden Eagle 4 Eagle 5 Golden Eagle 4 Sea Eagle 5 Golden Eagle 6 Chicken Hawk 5 Golden Eagle 7 Sea Hawk 5 Golden Eagle 8 Preacher Bird 5 Golden Eagle 19 Tundra 5 Golden Eagle 20 Pelican 5 Golden Eagle 21 Common Egret 5 Golden Eagle 21 White Egret 5 Golden Eagle 22 Snake Bird 5 Golden Eagle 22 Spanish Crossbird 5 Golden Eagle 22 Water Turkey 5 Golden Eagle 23 Sea Dog 35 rows selected.
Caution
Non-Equijoins Can Add Data
When using non-equijoins, you might receive several rows of data that are of no use to you. Check your results carefully.
In the earlier section’s example test for equality, each row in the first table was paired with only one row in the second table (each row’s corresponding row).
An outer join returns all rows that exist in one table, even though corresponding rows do not exist in the joined table. The (+)
symbol denotes an outer join in a query and is placed at the end of the table name in the WHERE
clause. The table with the (+)
should be the table that does not have matching rows. In many implementations, the outer join is broken into joins called left outer join, right outer join, and full outer join. The outer join in these implementations is normally optional.
Caution
Join Syntax Varies Widely
Check your particular implementation for the exact use and syntax of the outer join. The (+)
symbol is used in some major implementations, but it is nonstandard. This varies somewhat among versions of implementations. For example, Microsoft SQL Server 2000 supports this type of join syntax, but SQL Server 2005 and newer versions do not. Be sure to carefully consider using this syntax before implementing it.
The general syntax for an outer join follows:
FROM TABLE1 {RIGHT | LEFT | FULL} [OUTER] JOIN ON TABLE2
The Oracle syntax is
FROM TABLE1, TABLE2 [, TABLE3 ] WHERE TABLE1.COLUMN_NAME[(+)] = TABLE2.COLUMN_NAME[(+)] [ AND TABLE1.COLUMN_NAME[(+)] = TABLE3.COLUMN_NAME[(+)]]
To understand how the outer join works, first write a query as before to acquire the bird name and associated nicknames from the database. You can see in the following example that 19 rows of data are returned. Now, even though you previously temporarily deleted some records from the NICKNAMES
table to create a subset of data, some birds still are not listed here. Also prior to this particular example, the NICKNAME
column was updated in the NICKNAMES
table to a NULL
value for many of the birds, to illustrate the example. The following 19 rows thus represent only the birds in the database that currently have nicknames.
SQL> select B.bird_id, B.bird_name, 2 N.bird_id, N.nickname 3 from birds B, 4 nicknames N 5 where B.bird_id = N.bird_id 6 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- -------------------- ---------- ------------------------------ 1 Great Blue Heron 1 Big Cranky 1 Great Blue Heron 1 Blue Crane 2 Mallard 2 Green Cap 2 Mallard 2 Green Head 3 Common Loon 3 Great Northern Diver 4 Bald Eagle 4 Eagle 4 Bald Eagle 4 Sea Eagle 5 Golden Eagle 5 War Eagle 6 Red Tailed Hawk 6 Chicken Hawk 7 Osprey 7 Sea Hawk 8 Belted Kingfisher 8 Preacher Bird 19 Mute Swan 19 Tundra 20 Brown Pelican 20 Pelican 21 Great Egret 21 Common Egret 21 Great Egret 21 White Egret 22 Anhinga 22 Snake Bird 22 Anhinga 22 Spanish Crossbird 22 Anhinga 22 Water Turkey 23 Black Skimmer 23 Sea Dog 19 rows selected.
If you wanted to show all birds in the result set, regardless of whether they have an associated nickname, you would use an outer join. The BIRDS
table has a list of all birds in the database. However, some birds do not have related rows of data in the NICKNAMES
table in this situation. To show a list of all birds, the outer join operator (+) was used next to the BIRD_ID
in the join associated with the NICKNAMES
table (see line 5 in the query), which is where missing entries of birds are found. When you look at the result set, now you can see that all birds from the database are listed, whether they have a nickname or not.
SQL> select B.bird_id, B.bird_name, 2 N.bird_id, N.nickname 3 from birds B, 4 nicknames N 5 where B.bird_id = N.bird_id(+) 6 order by 1; BIRD_ID BIRD_NAME BIRD_ID NICKNAME ---------- ---------------------- ---------- ------------------------------ 1 Great Blue Heron 1 Big Cranky 1 Great Blue Heron 1 Blue Crane 2 Mallard 2 Green Cap 2 Mallard 2 Green Head 3 Common Loon 3 Great Northern Diver 4 Bald Eagle 4 Eagle 4 Bald Eagle 4 Sea Eagle 5 Golden Eagle 5 War Eagle 6 Red Tailed Hawk 6 Chicken Hawk 7 Osprey 7 Sea Hawk 8 Belted Kingfisher 8 Preacher Bird 9 Canadian Goose 10 Pied-billed Grebe 11 American Coot 12 Common Sea Gull 13 Ring-billed Gull 14 Double-crested Cormorant 15 Common Merganser 16 Turkey Vulture 17 American Crow 18 Green Heron 19 Mute Swan 19 Tundra 20 Brown Pelican 20 Pelican 21 Great Egret 21 White Egret 21 Great Egret 21 Common Egret 22 Anhinga 22 Snake Bird 22 Anhinga 22 Spanish Crossbird 22 Anhinga 22 Water Turkey 23 Black Skimmer 23 Sea Dog 29 rows selected.
Tip
Use of Outer Joins
You can use the outer join on only one side of a JOIN
condition; however, you can use an outer join on more than one column of the same table in the JOIN
condition.
A self join joins a table to itself, as if the table were two tables. Then it temporarily renames at least one table in the SQL statement using a table alias. The syntax follows:
SELECT A.COLUMN_NAME, B.COLUMN_NAME, [ C.COLUMN_NAME ] FROM TABLE1 A, TABLE2 B [, TABLE3 C ] WHERE A.COLUMN_NAME = B.COLUMN_NAME [ AND A.COLUMN_NAME = C.COLUMN_NAME ]
Following is an example with a table called PHOTOGRAPHERS
. First, take a look at the CREATE TABLE
statement. This is a very simple table of photographers that contains the identification number, the photographer’s name, and the identification number of a photographer’s mentor, which is another photographer in the PHOTOGRAPHERS
table. So in this example, the PHOTOGRAPHER_ID
is the primary key, which ensures that every row of data in the table contains a unique value for the identification. A foreign key constraint was also created on the MENTOR_PHOTOGRAPHER_ID
column, which references the PHOTOGRAPHER_ID
column in the same table.
SQL> create table photographers 2 (photographer_id number(3) not null primary key, 3 photographer varchar(30) not null, 4 mentor_photographer_id number(3) null, 5 constraint p_fk1 foreign key (mentor_photographer_id) references photographers (photographer_id)); Table created.
Next, sample records were inserted into the PHOTOGRAPHERS
table, as follows.
SQL> insert into photographers values ( 7, 'Ryan Notstephens' , null); 1 row created. SQL> insert into photographers values ( 8, 'Susan Willamson' , null); 1 row created. SQL> insert into photographers values ( 9, 'Mark Fife' , null); 1 row created. SQL> insert into photographers values ( 1, 'Shooter McGavin' , null); 1 row created. SQL> insert into photographers values ( 2, 'Jenny Forest' , 8); 1 row created. SQL> insert into photographers values ( 3, 'Steve Hamm' , null); 1 row created. SQL> insert into photographers values ( 4, 'Harry Henderson' , 9); 1 row created. SQL> insert into photographers values ( 5, 'Kelly Hairtrigger' , 8); 1 row created. SQL> insert into photographers values ( 6, 'Gordon Flash' , null); 1 row created. SQL> insert into photographers values ( 10, 'Kate Kapteur' , 7); 1 row created.
Now you can query the PHOTOGRAPHERS
table to show all the data. In the result set that follows, you can see 10 photographers in the table. Four of the photographers have mentors who are other photographers in the same table. This also means that some photographers mentor other photographers, if you look at the relationship from the other side. Some photographers are neither mentored nor mentor anyone. Familiarize yourself with the data. Look at the following returned data set and determine who Kate Kapteur’s mentor is.
SQL> select * from photographers; PHOTOGRAPHER_ID PHOTOGRAPHER MENTOR_PHOTOGRAPHER_ID --------------- -------------------- ---------------------- 7 Ryan Notstephens 8 Susan Willamson 9 Mark Fife 1 Shooter McGavin 2 Jenny Forest 8 3 Steve Hamm 4 Harry Henderson 9 5 Kelly Hairtrigger 8 6 Gordon Flash 10 Kate Kapteur 7 10 rows selected.
Kate Kapteur’s mentor is Ryan Notstephens. You can see this because the MENTOR_PHOTOGRAPHER_ID
associated with Kate Kapteur is 7
, which corresponds to Ryan Notstephens’ PHOTOGRAPHER_ID
. Remember that in this table, MENTOR_PHOTOGRAPHER_ID
is a foreign key that references the primary key of PHOTOGRAPHER_ID
in the very same table.
In the next example, you apply the concept of a self join to the PHOTOGRAPHERS
table. Suppose that you want to produce a list of mentors and their protégé. Looking at the SELECT
statement, you can see that you are selecting from the PHOTOGRAPHERS
table twice. The first occurrence of the PHOTOGRAPHERS
table in the FROM
clause has an alias of “mentors”; the second occurrence of the PHOTOGRAPHERS
table in the FROM
clause has an alias of “proteges”. Essentially, you are seeing the PHOTOGRAPHERS
table as two separate tables, for the purpose of this query. The PHOTOGRAPHER_ID
in the “mentors” version of the PHOTOGRAPHERS
table is being joined to the MENTOR_PHOTOGRAPHER_ID
column of the “proteges” version of the same table. Study the results and compare to them the complete output in the previous example. Notice that this output lists only photographers who have protégé.
SQL> select mentors.photographer mentor, 2 proteges.photographer protege 3 from photographers mentors, 4 photographers proteges 5 where mentors.photographer_id = proteges.mentor_photographer_id 6 order by 1; MENTOR PROTEGE ------------------------------ ----------------------------- Mark Fife Harry Henderson Ryan Notstephens Kate Kapteur Susan Willamson Kelly Hairtrigger Susan Willamson Jenny Forest 4 rows selected.
This final example uses the exact same query as before but applies the concept of an outer join along with the self join. If you want to see all photographers, regardless of whether there is an associated record as a mentor or protégé, the outer join allows for this. Study the results and review the previous discussion of the outer join, if necessary. The PROTEGES.PHOTOGRAPHER
column was left as “photographer” because some of the photographers in this output are not necessarily protégés.
SQL> select mentors.photographer mentor, 2 proteges.photographer "PHOTOGRAPHER" 3 from photographers mentors, 4 photographers proteges 5 where mentors.photographer_id(+) = proteges.mentor_photographer_id 6 order by 1; MENTOR PHOTOGRAPHER ------------------------------ ------------------------------ Mark Fife Harry Henderson Ryan Notstephens Kate Kapteur Susan Willamson Kelly Hairtrigger Susan Willamson Jenny Forest Steve Hamm Shooter McGavin Mark Fife Susan Willamson Gordon Flash Ryan Notstephens 10 rows selected.
Self joins are useful when all the data that you want to retrieve resides in one table, but you must somehow compare records in the table to other records in the table.
Most join operations involve merging data based on a key in one table and a key in another table. Depending on how your database has been designed, you might have to join on more than one key field to accurately depict that data in your database. You might have a table with a primary key that consists of more than one column. You might also have a foreign key in a table that consists of more than one column, which references the multiple-column primary key.
The BIRDS
database has not been designed in this way, but consider the following example tables:
SQL> desc products Name Null? Type ----------------------------------- ------- ----------------------------- SERIAL_NUMBER NOT NULL NUMBER(10) VENDOR_NUMBER NOT NULL NUMBER(10) PRODUCT_NAME NOT NULL VARCHAR2(30) COST NOT NULL NUMBER(8,2) SQL> desc orders Name Null? Type ---------------------------------- ------- ----------------------------- ORD_NO NOT NULL NUMBER(10) PROD_NUMBER NOT NULL NUMBER(10) VENDOR_NUMBER NOT NULL NUMBER(10) QUANTITY NOT NULL NUMBER(5) ORD_DATE NOT NULL DATE
The primary key in PROD
is the combination of the columns SERIAL_NUMBER
and VENDOR_NUMBER
. Perhaps two products can have the same serial number within the distribution company, but each serial number is unique per vendor.
The foreign key in ORD
is also the combination of the columns SERIAL_NUMBER
and VENDOR_NUMBER
.
When selecting data from both tables (PROD
and ORD
), the join operation might appear as follows:
SELECT P.PRODUCT_NAME, O.ORD_DATE, O.QUANTITY FROM PRODUCTS P, ORDERS O WHERE P.SERIAL_NUMBER = O.SERIAL_NUMBER AND P.VENDOR_NUMBER = O.VENDOR_NUMBER;
Consider several points before using joins: what columns(s) to join on, whether there is a common column to join on, and any performance issues. More joins in a query means the database server has to do more work, which means that more time is taken to retrieve data. You cannot avoid joins when retrieving data from a normalized database, but it is imperative to ensure that joins are performed correctly from a logical standpoint. Incorrect joins can result in serious performance degradation and inaccurate query results. Hour 23, “Improving Database Performance,” discusses performance issues in more detail.
In a normalized database, a common need involves selecting data from multiple tables that do not have a direct relationship. What should you join on? If you need to retrieve data from two tables that do not have a common column to join, you must join on another table that has a common column or columns to both tables. That table becomes the base table. A base table joins one or more tables that have common columns, or joins tables that do not have common columns.
Thinking back to the previous database design, Hours 5 through 7 (particularly Hour 7 on normalization), you should recall that several intermediate, or base tables, have been created. For instance, Figure 14.1 shows how the original BIRDS
and FOOD
entities are related via a many-to-many relationship. Many-to-many relationships have redundant data; as you continue to normalize the database and remove that redundant data, you end up with tables in between. Figure 14.2 shows how the two tables are normalized into three tables. The middle table is used as a base table to provide a relationship between the BIRDS
and FOOD
tables. Take a minute to refresh your memory on this concept.
FIGURE 14.2
Joining tables through a base table
In the following query, you need to get information about the food items that specific birds eat. To get this information, you need to use the three tables in Figure 14.2. You can see in the FROM
clause that all three tables are listed and given an alias, to simplify and for better readability. You want to see only the BIRD_NAME
and the FOOD
columns. Notice that because table aliases are used, every column in the query is qualified using the alias, which indicates the table in which that column resides. The join operation consists of lines 3 and 4 in the WHERE
clause. The BIRDS
table is joined to the BIRDS_FOOD
table using the BIRD_ID
column, and the FOOD
table is joined to the BIRDS_FOOD
table using the FOOD_ID
column. The BIRDS
table has only one row of data for each bird in the database. The FOOD
table has only one row of data for each food item in the database. The BIRDS_FOOD
table consists of the BIRD_ID
and FOOD_ID
, which together comprise the primary key in that table. Thus, you can have only one combination in that table of a specific bird and a specific food item. You can also see in the following query that you want to see only birds that eat fish, insects, or plants and have a valid entry in the WINGSPAN
column. Finally, you are sorting the results by the bird’s name here. Take some time to study the query and results.
SQL> select B.bird_name, F.food_name 2 from birds B, food F, birds_food BF 3 where B.bird_id = BF.bird_id 4 and F.food_id = BF.food_id 5 and F.food_name in ('Fish', 'Insects', 'Plants') 6 and B.wingspan is not null 7 order by 1; BIRD_NAME FOOD_NAME ------------------------------ ------------------------------ Bald Eagle Fish Brown Pelican Fish Canadian Goose Insects Common Loon Fish Double-crested Cormorant Fish Golden Eagle Insects Golden Eagle Fish Great Blue Heron Insects Great Blue Heron Fish Great Egret Fish Mute Swan Plants Osprey Fish Red Tailed Hawk Insects Ring-billed Gull Fish Ring-billed Gull Insects 15 rows selected.
The Cartesian product is a result of a Cartesian join or “no join.” If you select from two or more tables and do not join the tables, your output is all possible rows from all the tables selected. If your tables are large, the result could be hundreds of thousands, or even millions, of rows of data. A WHERE
clause is highly recommended for SQL statements that retrieve data from two or more tables. The Cartesian product is also known as a cross join.
The syntax follows:
FROM TABLE1, TABLE2 [, TABLE3 ] WHERE TABLE1, TABLE2 [, TABLE3 ]
Following is an example of a cross join, or the dreaded Cartesian product:
SQL> select bird_name, food_name 2 from birds, food; BIRD_NAME FOOD_NAME ------------------------------ --------------- American Coot Aquatic Insects American Coot Aquatic Plants American Coot Berries American Coot Birds American Coot Carrion American Coot Corn American Coot Crayfish American Coot Crustaceans American Coot Deer American Coot Ducks American Coot Fish American Coot Frogs American Coot Fruit American Coot Insects American Coot Moths American Coot Nectar American Coot Nuts American Coot Plants American Coot Pollen American Coot Reptiles American Coot Rodents American Coot Seeds American Coot Small Mammals American Coot Snakes American Coot Worms American Crow Aquatic Insects American Crow Aquatic Plants American Crow Berries ……… Turkey Vulture Crustaceans Turkey Vulture Deer Turkey Vulture Ducks Turkey Vulture Fish Turkey Vulture Frogs Turkey Vulture Fruit Turkey Vulture Insects Turkey Vulture Moths Turkey Vulture Nectar Turkey Vulture Nuts Turkey Vulture Plants Turkey Vulture Pollen Turkey Vulture Reptiles Turkey Vulture Rodents Turkey Vulture Seeds Turkey Vulture Small Mammals Turkey Vulture Snakes Turkey Vulture Worms 575 rows selected.
Data is selected from two separate tables, yet no JOIN
operation is performed. Because you have not specified how to join rows in the first table with rows in the second table, the database server pairs every row in the first table with every row in the second table. Because each table has numerous rows of data each, the product of 575 rows selected is achieved from 23 rows multiplied by 25 rows. The BIRDS
database is a small sample database with limited data, but you can imagine the results of a Cartesian product in a real-world database that has thousands or even millions of rows of data in various tables.
To fully understand exactly how the Cartesian product is derived, study the following example:
SQL> SELECT X FROM TABLE1; X - A B C D 4 rows selected. SQL> SELECT V FROM TABLE2; X - A B C D 4 rows selected. SQL> SELECT TABLE1.X, TABLE2.X 2* FROM TABLE1, TABLE2; X X - - A A B A C A D A A B B B C B D B A C B C C C D C A D B D C D D D 16 rows selected.
Caution
Ensure That All Tables Are Joined
Be careful to join all tables in a query. If two tables in a query have not been joined and each table contains 1,000 rows of data, the Cartesian product then consists of 1,000 rows multiplied by 1,000 rows, which results in a total of 1,000,000 rows of data returned. Cartesian products dealing with large amounts of data can cause the host computer to stall or crash, in some cases, based on resource usage on the host computer. Therefore, it is important for the database administrator (DBA) and the system administrator to closely monitor for long-running queries.
This hour introduced you to one of the most robust features of SQL: the table join. Imagine the limits and intense manual effort needed if you could not extract data from more than one table in a single query. You saw several types of joins in this hour, each serving its own purpose based on the conditions placed on the query. Joins are used to link data from tables based on equality and non-equality. Outer joins are powerful, allowing data to be retrieved from one table even though associated data is not found in a joined table. Self joins are used to join a table to itself. Beware of the cross join, more commonly known as the Cartesian product. The Cartesian product is the result set of a multiple-table query without a join, often yielding a large amount of unwanted output. When selecting data from more than one table, be sure to properly join the tables according to the related columns (normally, primary keys). Failing to properly join tables can result in incomplete or inaccurate output.
Q. When joining tables, must they be joined in the same order that they appear in the FROM
clause?
A. No, they do not have to appear in the same order; however, performance might be impacted, depending on the order of tables in the FROM
clause and the order in which tables are joined.
Q. When using a base table to join unrelated tables, must I select any columns from the base table?
A. No, using a base table to join unrelated tables does not mandate selecting columns from the base table.
Q. Can I join on more than one column between tables?
A. Yes, some queries might require you to join on more than one column per table, to provide a complete relationship between rows of data in the joined tables.
The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. What type of join do you use to return records from one table, regardless of the existence of associated records in the related table?
2. The JOIN
conditions are located in which parts of the SQL statement?
3. What type of JOIN
do you use to evaluate equality among rows of related tables?
4. What happens if you select from two different tables but fail to join the tables?
1. Type the following code into the database and study the result set (Cartesian product):
select bird_name, migration_location from birds, migration;
2. Now modify the previous query with a proper table join to return useful data and avoid the Cartesian product. You might have to review the ERD for the BIRDS
database in Hour 3, “Getting to Know Your Data,” to refresh your memory on how these two tables are related to one another.
3. Generate a list of the food items eaten by the Great Blue Heron.
4. Which birds in the database eat fish?
5. Create a report showing the BIRD_NAME
and MIGRATION_LOCATION
for birds that migrate to South America.
6. Do any birds have a wingspan less than 30 inches and also eat fish?
7. Write a query to display the following results: the BIRD_NAME
, FOOD_NAME
, and NEST_TYPE
for any birds that eat fish or build a platform nest.
8. Ask some questions you would anticipate database users, photographers, bird rescues, and so forth might inquire about the BIRDS
database. Experiment with some of your own queries using table joins.