Hour 14

Joining Tables in Queries

What You’ll Learn in This Hour:

To this point, all database queries you have executed in this book have extracted data from a single table. However, examples of manual joins have been shown in exercises. In a manual join, you separately select data for multiple tables to get the final result set that you want. During this hour, you learn how to join tables in a query so that you can efficiently and automatically retrieve data from multiple tables.

Selecting Data from Multiple Tables

One of the most powerful features of SQL is the capability to select data from multiple tables. Without this capability, the entire relational database concept would not be feasible. Single-table queries are sometimes quite informative, but in the real world, the most practical queries are the ones whose data is acquired from multiple tables within the database.

As you witnessed throughout this book, especially during the database design hours, a relational database is broken into smaller, more manageable tables for simplicity and the ease of overall management. As tables are divided into smaller tables, the related tables are created with common columns: primary keys and foreign keys. These keys are used to join related tables to one another.

Why should you normalize tables if, in the end, you are going to rejoin the tables to retrieve the data you want? You rarely select all data from all tables, so picking and choosing according to the needs of each query is more beneficial. Performance might suffer slightly from a normalized database, but overall coding and maintenance are much simpler. Remember that you generally normalize the database to reduce redundancy and increase data integrity. Your overreaching task as a database administrator is to safeguard data.

Understanding Joins

A join combines two or more tables to retrieve data from multiple tables. Different implementations have many ways of joining tables, so you concentrate on the most common joins in this lesson. This hour covers the following types of joins:

  •    Equijoins, or inner joins

  •    Non-equijoins

  •    Outer joins

  •    Self joins

As you have learned in previous hours, both the SELECT and FROM clauses are required SQL statement elements; the WHERE clause is a required element of a SQL statement when joining tables. The tables joined are listed in the FROM clause. The join is performed in the WHERE clause. Several operators can be used to join tables, including =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT. However, the most common operator is the equals symbol.

Figure 14.1 shows two tables in a database, EMPLOYEES and DEPENDENTS, that have a relationship through the ID column. Each employee can have multiple dependents or none. Each dependent must have an associated employee record. (This is an example shown in Hour 1, “Understanding the Relational Database and SQL.”) This concept encapsulates the primary concepts of a relational database. In this case, two tables are shown with a parent/child relationship. A parent record can have zero to many children. Each child record must have a parent (at least, if primary key and foreign key relationships are properly defined). Study the data in Figure 14.1.

The relationship between two tables namely EMPLOYEES and DEPENDENTS.

FIGURE 14.1
Joining tables

Let’s query data from the two tables in Figure 14.1. Who are the dependents of Kelly Mitchell? This is easy to resolve because you can see all the data. Kelly Mitchell’s ID is 4. If you look to the DEPENDENTS table and search for Kelly Mitchell’s ID of 4, you can easily see that Kelly Mitchell has three records, or rows of data, associated with her ID: Laura, Amy, and Kevin. This is a simple yet effective example of a table join. The rest of the hour elaborates on this concept.

Joins of Equality

Perhaps the most used and most important of the joins is the equijoin, also referred to as an inner join. The equijoin joins two tables with a common column, in which each is usually the primary key.

The syntax for an equijoin follows:

SELECT TABLE1.COLUMN1, TABLE2.COLUMN2...
FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME
[ AND TABLE1.COLUMN_NAME = TABLE3.COLUMN_NAME ]

Look at the following example:

SQL> select birds.bird_id, birds.bird_name,
  2  nicknames.bird_id, nicknames.nickname
  3  from birds, nicknames
  4  where birds.bird_id = nicknames.bird_id
  5  order by 1;


   BIRD_ID BIRD_NAME              BIRD_ID NICKNAME
---------- ------------------------------ ---------- ------------------------------
         1 Great Blue Heron             1 Big Cranky
         1 Great Blue Heron             1 Blue Crane
         2 Mallard                      2 Green Cap
         2 Mallard                      2 Green Head
         3 Common Loon                  3 Great Northern Diver
         4 Bald Eagle                   4 Eagle
         4 Bald Eagle                   4 Sea Eagle
         5 Golden Eagle                 5 War Eagle
         6 Red Tailed Hawk              6 Chicken Hawk
         7 Osprey                       7 Sea Hawk
         8 Belted Kingfisher            8 Preacher Bird
        19 Mute Swan                   19 Tundra
        20 Brown Pelican               20 Pelican
        21 Great Egret                 21 Common Egret
        21 Great Egret                 21 White Egret
        22 Anhinga                     22 Snake Bird
        22 Anhinga                     22 Spanish Crossbird
        22 Anhinga                     22 Water Turkey
        23 Black Skimmer               23 Sea Dog

19 rows selected.

This SQL statement is a query against the BIRDS and NICKNAMES tables. It displays the BIRD_NAME and the associated NICKNAME for each bird. The BIRD_ID is selected twice, to prove that the two tables have been properly joined. For this example, numerous records were deleted from the NICKNAMES table so that the result set displays easily on one page in the book. You need to tell the query how the tables are related, which is the purpose of the WHERE clause. Here you specify that the two tables are linked through the BIRD_ID column. Because the BIRD_ID exists in both tables, you must justify both columns with the table name in your column listing. By justifying the columns with the table names, you tell the database server where to get the data.

Data in the following example is selected from BIRDS and NICKNAMES because the desired data resides in each of the two tables. An equijoin is used.

SQL> select birds.bird_id, birds.bird_name,
  2      nicknames.bird_id, nicknames.nickname
  3  from birds, nicknames
  4  where birds.bird_id = nicknames.bird_id
  5    and birds.bird_name like '%Eagle%'
  6  order by 1;

   BIRD_ID BIRD_NAME                  BIRD_ID NICKNAME
---------- ------------------------------ ---------- ------------------------------
         4 Bald Eagle                       4 Eagle
         4 Bald Eagle                       4 Sea Eagle
         5 Golden Eagle                     5 War Eagle

3 rows selected.

Notice that each column in the SELECT clause is preceded by the associated table name to identify each column. This is called qualifying columns in a query. Qualifying columns is necessary only for columns that exist in more than one table referenced by a query. You usually qualify all columns, for consistency and to avoid questions when debugging or modifying SQL code.

In addition, the SQL syntax provides a version by introducing the JOIN syntax. The JOIN syntax follows; it is essentially the same as performing the join in the WHERE clause if you study the syntax:

SELECT TABLE1.COLUMN1, TABLE2.COLUMN2...
FROM TABLE1
INNER JOIN TABLE2 ON TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME

As you can see, the JOIN operator is removed from the WHERE clause and instead is replaced with the JOIN syntax. The table joined is added after the JOIN syntax; then the JOIN operators are placed after the ON qualifier.

Table Aliases

You use table aliases to rename a table in a particular SQL statement. Renaming is temporary; the actual table name does not change in the database. As you learn later in the “Self Joins” section, giving the tables aliases is a necessity for the self join. Aliases are most often used to save keystrokes, which results in a shorter and easier-to-read SQL statement. In addition, fewer keystrokes means fewer keystroke errors. Programming errors also are typically less frequent if you can refer to an alias, which is often shorter and more descriptive of the data you are working with. Using table aliases also means that the columns selected must be qualified with the table alias.

Following is the same query as the previous one, but aliases have been assigned to table names. After you assign an alias to a table name, you can use that alias within the query for shorthand and better readability.

SQL> select B.bird_id, B.bird_name,
  2  N.bird_id, N.nickname
  3  from birds B,
  4  nicknames N
  5  where B.bird_id = N.bird_id
  6    and B.bird_name like '%Eagle%'
  7  order by 1;

   BIRD_ID BIRD_NAME                         BIRD_ID NICKNAME
---------- ------------------------------ ---------- ------------------------------
         4 Bald Eagle                              4 Eagle
         4 Bald Eagle                              4 Sea Eagle
         5 Golden Eagle                            5 War Eagle

3 rows selected.

In the preceding SQL statement, BIRDS has been renamed B and NICKNAMES has been renamed N. The choice of what to rename the tables is arbitrary; these letters were chosen here because BIRDS starts with B and NICKNAMES starts with N. The selected columns were justified with the corresponding table alias. Note that BIRD_NAME was used in the WHERE clause and was justified with the table alias as well.

Joins of Non-Equality

A non-equijoin joins two or more tables based on a specified column value not equaling a specified column value in another table. The syntax for the non-equijoin follows:

FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME
[ AND TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME ]

The example that follows issues the same query as before, but it replaces the join of equality on line 5 with a test for non-equality. Every row of data in the BIRDS table is paired with every unrelated row of data in the NICKNAMES table, so this query basically tells you which nicknames each bird does not have. Again, this is a subset of data from the NICKNAMES table.

SQL> select B.bird_id, B.bird_name,
  2      N.bird_id, N.nickname
  3  from birds B,
  4    nicknames N
  5  where B.bird_id != N.bird_id
  6    and B.bird_name like '%Eagle%'
  7  order by 1;

   BIRD_ID BIRD_NAME            BIRD_ID NICKNAME
---------- ------------------------------ ---------- ------------------------------
         4 Bald Eagle                 1 Big Cranky
         4 Bald Eagle                 1 Blue Crane
         4 Bald Eagle                 2 Green Cap
         4 Bald Eagle                 2 Green Head
         4 Bald Eagle                 3 Great Northern Diver
         4 Bald Eagle                 5 War Eagle
         4 Bald Eagle                 6 Chicken Hawk
         4 Bald Eagle                 7 Sea Hawk
         4 Bald Eagle                 8 Preacher Bird
         4 Bald Eagle                19 Tundra
         4 Bald Eagle                20 Pelican
         4 Bald Eagle                21 Common Egret
         4 Bald Eagle                21 White Egret
         4 Bald Eagle                22 Snake Bird
         4 Bald Eagle                22 Spanish Crossbird
         4 Bald Eagle                22 Water Turkey
         4 Bald Eagle                23 Sea Dog
         5 Golden Eagle               1 Big Cranky
         5 Golden Eagle               1 Blue Crane
         5 Golden Eagle               2 Green Cap
         5 Golden Eagle               2 Green Head
         5 Golden Eagle               3 Great Northern Diver
         5 Golden Eagle               4 Eagle
         5 Golden Eagle               4 Sea Eagle
         5 Golden Eagle               6 Chicken Hawk
         5 Golden Eagle               7 Sea Hawk
         5 Golden Eagle               8 Preacher Bird
         5 Golden Eagle              19 Tundra
         5 Golden Eagle              20 Pelican
         5 Golden Eagle              21 Common Egret
         5 Golden Eagle              21 White Egret
         5 Golden Eagle              22 Snake Bird
         5 Golden Eagle              22 Spanish Crossbird
         5 Golden Eagle              22 Water Turkey
         5 Golden Eagle              23 Sea Dog

35 rows selected.

Caution

Non-Equijoins Can Add Data

When using non-equijoins, you might receive several rows of data that are of no use to you. Check your results carefully.

In the earlier section’s example test for equality, each row in the first table was paired with only one row in the second table (each row’s corresponding row).

Outer Joins

An outer join returns all rows that exist in one table, even though corresponding rows do not exist in the joined table. The (+) symbol denotes an outer join in a query and is placed at the end of the table name in the WHERE clause. The table with the (+) should be the table that does not have matching rows. In many implementations, the outer join is broken into joins called left outer join, right outer join, and full outer join. The outer join in these implementations is normally optional.

Caution

Join Syntax Varies Widely

Check your particular implementation for the exact use and syntax of the outer join. The (+) symbol is used in some major implementations, but it is nonstandard. This varies somewhat among versions of implementations. For example, Microsoft SQL Server 2000 supports this type of join syntax, but SQL Server 2005 and newer versions do not. Be sure to carefully consider using this syntax before implementing it.

The general syntax for an outer join follows:

FROM TABLE1
{RIGHT | LEFT | FULL} [OUTER] JOIN
ON TABLE2

The Oracle syntax is

FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1.COLUMN_NAME[(+)] = TABLE2.COLUMN_NAME[(+)]
[ AND TABLE1.COLUMN_NAME[(+)] = TABLE3.COLUMN_NAME[(+)]]

To understand how the outer join works, first write a query as before to acquire the bird name and associated nicknames from the database. You can see in the following example that 19 rows of data are returned. Now, even though you previously temporarily deleted some records from the NICKNAMES table to create a subset of data, some birds still are not listed here. Also prior to this particular example, the NICKNAME column was updated in the NICKNAMES table to a NULL value for many of the birds, to illustrate the example. The following 19 rows thus represent only the birds in the database that currently have nicknames.

SQL> select B.bird_id, B.bird_name,
  2  N.bird_id, N.nickname
  3  from birds B,
  4  nicknames N
  5  where B.bird_id = N.bird_id
  6  order by 1;

   BIRD_ID BIRD_NAME                BIRD_ID NICKNAME
---------- -------------------- ---------- ------------------------------
         1 Great Blue Heron               1 Big Cranky
         1 Great Blue Heron               1 Blue Crane
         2 Mallard                        2 Green Cap
         2 Mallard                        2 Green Head
         3 Common Loon                    3 Great Northern Diver
         4 Bald Eagle                     4 Eagle
         4 Bald Eagle                     4 Sea Eagle
         5 Golden Eagle                   5 War Eagle
         6 Red Tailed Hawk                6 Chicken Hawk
         7 Osprey                         7 Sea Hawk
         8 Belted Kingfisher              8 Preacher Bird
        19 Mute Swan                     19 Tundra
        20 Brown Pelican                 20 Pelican
        21 Great Egret                   21 Common Egret
        21 Great Egret                   21 White Egret
        22 Anhinga                       22 Snake Bird
        22 Anhinga                       22 Spanish Crossbird
        22 Anhinga                       22 Water Turkey
        23 Black Skimmer                 23 Sea Dog

19 rows selected.

If you wanted to show all birds in the result set, regardless of whether they have an associated nickname, you would use an outer join. The BIRDS table has a list of all birds in the database. However, some birds do not have related rows of data in the NICKNAMES table in this situation. To show a list of all birds, the outer join operator (+) was used next to the BIRD_ID in the join associated with the NICKNAMES table (see line 5 in the query), which is where missing entries of birds are found. When you look at the result set, now you can see that all birds from the database are listed, whether they have a nickname or not.

SQL> select B.bird_id, B.bird_name,
  2  N.bird_id, N.nickname
  3  from birds B,
  4  nicknames N
  5  where B.bird_id = N.bird_id(+)
  6  order by 1;

   BIRD_ID BIRD_NAME                BIRD_ID NICKNAME
---------- ---------------------- ---------- ------------------------------
         1 Great Blue Heron               1 Big Cranky
         1 Great Blue Heron               1 Blue Crane
         2 Mallard                        2 Green Cap
         2 Mallard                        2 Green Head
         3 Common Loon                    3 Great Northern Diver
         4 Bald Eagle                     4 Eagle
         4 Bald Eagle                     4 Sea Eagle
         5 Golden Eagle                   5 War Eagle
         6 Red Tailed Hawk                6 Chicken Hawk
         7 Osprey                         7 Sea Hawk
         8 Belted Kingfisher              8 Preacher Bird
         9 Canadian Goose
        10 Pied-billed Grebe
        11 American Coot
        12 Common Sea Gull
        13 Ring-billed Gull
        14 Double-crested Cormorant
        15 Common Merganser
        16 Turkey Vulture
        17 American Crow
        18 Green Heron
        19 Mute Swan                     19 Tundra
        20 Brown Pelican                 20 Pelican
        21 Great Egret                   21 White Egret
        21 Great Egret                   21 Common Egret
        22 Anhinga                       22 Snake Bird
        22 Anhinga                       22 Spanish Crossbird
        22 Anhinga                       22 Water Turkey
        23 Black Skimmer                 23 Sea Dog

29 rows selected.

Tip

Use of Outer Joins

You can use the outer join on only one side of a JOIN condition; however, you can use an outer join on more than one column of the same table in the JOIN condition.

Self Joins

A self join joins a table to itself, as if the table were two tables. Then it temporarily renames at least one table in the SQL statement using a table alias. The syntax follows:

SELECT A.COLUMN_NAME, B.COLUMN_NAME, [ C.COLUMN_NAME ]
FROM TABLE1 A, TABLE2 B [, TABLE3 C ]
WHERE A.COLUMN_NAME = B.COLUMN_NAME
[ AND A.COLUMN_NAME = C.COLUMN_NAME ]

Following is an example with a table called PHOTOGRAPHERS. First, take a look at the CREATE TABLE statement. This is a very simple table of photographers that contains the identification number, the photographer’s name, and the identification number of a photographer’s mentor, which is another photographer in the PHOTOGRAPHERS table. So in this example, the PHOTOGRAPHER_ID is the primary key, which ensures that every row of data in the table contains a unique value for the identification. A foreign key constraint was also created on the MENTOR_PHOTOGRAPHER_ID column, which references the PHOTOGRAPHER_ID column in the same table.

SQL> create table photographers
  2  (photographer_id     number(3)      not null   primary key,
  3   photographer        varchar(30)    not null,
  4   mentor_photographer_id number(3)   null,
  5   constraint p_fk1 foreign key (mentor_photographer_id) references
photographers (photographer_id));

Table created.

Next, sample records were inserted into the PHOTOGRAPHERS table, as follows.

SQL> insert into photographers values ( 7, 'Ryan Notstephens' , null);

1 row created.

SQL> insert into photographers values ( 8, 'Susan Willamson' , null);

1 row created.

SQL> insert into photographers values ( 9, 'Mark Fife' , null);

1 row created.

SQL> insert into photographers values ( 1, 'Shooter McGavin' , null);

1 row created.

SQL> insert into photographers values ( 2, 'Jenny Forest' , 8);

1 row created.

SQL> insert into photographers values ( 3, 'Steve Hamm' , null);

1 row created.

SQL> insert into photographers values ( 4, 'Harry Henderson' , 9);

1 row created.

SQL> insert into photographers values ( 5, 'Kelly Hairtrigger' , 8);

1 row created.

SQL> insert into photographers values ( 6, 'Gordon Flash' , null);

1 row created.

SQL> insert into photographers values ( 10, 'Kate Kapteur' , 7);

1 row created.

Now you can query the PHOTOGRAPHERS table to show all the data. In the result set that follows, you can see 10 photographers in the table. Four of the photographers have mentors who are other photographers in the same table. This also means that some photographers mentor other photographers, if you look at the relationship from the other side. Some photographers are neither mentored nor mentor anyone. Familiarize yourself with the data. Look at the following returned data set and determine who Kate Kapteur’s mentor is.

SQL> select * from photographers;

PHOTOGRAPHER_ID PHOTOGRAPHER       MENTOR_PHOTOGRAPHER_ID
--------------- -------------------- ----------------------
              7 Ryan Notstephens
              8 Susan Willamson
              9 Mark Fife
              1 Shooter McGavin
              2 Jenny Forest           8
              3 Steve Hamm
              4 Harry Henderson        9
              5 Kelly Hairtrigger      8
              6 Gordon Flash
             10 Kate Kapteur           7

10 rows selected.

Kate Kapteur’s mentor is Ryan Notstephens. You can see this because the MENTOR_PHOTOGRAPHER_ID associated with Kate Kapteur is 7, which corresponds to Ryan Notstephens’ PHOTOGRAPHER_ID. Remember that in this table, MENTOR_PHOTOGRAPHER_ID is a foreign key that references the primary key of PHOTOGRAPHER_ID in the very same table.

In the next example, you apply the concept of a self join to the PHOTOGRAPHERS table. Suppose that you want to produce a list of mentors and their protégé. Looking at the SELECT statement, you can see that you are selecting from the PHOTOGRAPHERS table twice. The first occurrence of the PHOTOGRAPHERS table in the FROM clause has an alias of “mentors”; the second occurrence of the PHOTOGRAPHERS table in the FROM clause has an alias of “proteges”. Essentially, you are seeing the PHOTOGRAPHERS table as two separate tables, for the purpose of this query. The PHOTOGRAPHER_ID in the “mentors” version of the PHOTOGRAPHERS table is being joined to the MENTOR_PHOTOGRAPHER_ID column of the “proteges” version of the same table. Study the results and compare to them the complete output in the previous example. Notice that this output lists only photographers who have protégé.

SQL> select mentors.photographer mentor,
  2  proteges.photographer protege
  3  from photographers mentors,
  4  photographers proteges
  5  where mentors.photographer_id = proteges.mentor_photographer_id
  6  order by 1;

MENTOR                         PROTEGE
------------------------------ -----------------------------
Mark Fife                      Harry Henderson
Ryan Notstephens               Kate Kapteur
Susan Willamson                Kelly Hairtrigger
Susan Willamson                Jenny Forest

4 rows selected.

This final example uses the exact same query as before but applies the concept of an outer join along with the self join. If you want to see all photographers, regardless of whether there is an associated record as a mentor or protégé, the outer join allows for this. Study the results and review the previous discussion of the outer join, if necessary. The PROTEGES.PHOTOGRAPHER column was left as “photographer” because some of the photographers in this output are not necessarily protégés.

SQL> select mentors.photographer mentor,
  2  proteges.photographer "PHOTOGRAPHER"
  3  from photographers mentors,
  4  photographers proteges
  5  where mentors.photographer_id(+) = proteges.mentor_photographer_id
  6  order by 1;

MENTOR                         PHOTOGRAPHER
------------------------------ ------------------------------
Mark Fife                      Harry Henderson
Ryan Notstephens               Kate Kapteur
Susan Willamson                Kelly Hairtrigger
Susan Willamson                Jenny Forest
                               Steve Hamm
                               Shooter McGavin
                               Mark Fife
                               Susan Willamson
                               Gordon Flash
                               Ryan Notstephens

10 rows selected.

Self joins are useful when all the data that you want to retrieve resides in one table, but you must somehow compare records in the table to other records in the table.

Joins on Multiple Keys

Most join operations involve merging data based on a key in one table and a key in another table. Depending on how your database has been designed, you might have to join on more than one key field to accurately depict that data in your database. You might have a table with a primary key that consists of more than one column. You might also have a foreign key in a table that consists of more than one column, which references the multiple-column primary key.

The BIRDS database has not been designed in this way, but consider the following example tables:

SQL> desc products

Name                                 Null?    Type
-----------------------------------  -------  -----------------------------
SERIAL_NUMBER                        NOT NULL NUMBER(10)
VENDOR_NUMBER                        NOT NULL NUMBER(10)
PRODUCT_NAME                         NOT NULL VARCHAR2(30)
COST                                 NOT NULL NUMBER(8,2)

SQL> desc orders

Name                                 Null?    Type
----------------------------------   -------  -----------------------------
ORD_NO                               NOT NULL NUMBER(10)
PROD_NUMBER                          NOT NULL NUMBER(10)
VENDOR_NUMBER                        NOT NULL NUMBER(10)
QUANTITY                             NOT NULL NUMBER(5)
ORD_DATE                             NOT NULL DATE

The primary key in PROD is the combination of the columns SERIAL_NUMBER and VENDOR_NUMBER. Perhaps two products can have the same serial number within the distribution company, but each serial number is unique per vendor.

The foreign key in ORD is also the combination of the columns SERIAL_NUMBER and VENDOR_NUMBER.

When selecting data from both tables (PROD and ORD), the join operation might appear as follows:

SELECT P.PRODUCT_NAME, O.ORD_DATE, O.QUANTITY
FROM PRODUCTS P, ORDERS O
WHERE P.SERIAL_NUMBER = O.SERIAL_NUMBER
  AND P.VENDOR_NUMBER = O.VENDOR_NUMBER;

Join Considerations

Consider several points before using joins: what columns(s) to join on, whether there is a common column to join on, and any performance issues. More joins in a query means the database server has to do more work, which means that more time is taken to retrieve data. You cannot avoid joins when retrieving data from a normalized database, but it is imperative to ensure that joins are performed correctly from a logical standpoint. Incorrect joins can result in serious performance degradation and inaccurate query results. Hour 23, “Improving Database Performance,” discusses performance issues in more detail.

Base Tables

In a normalized database, a common need involves selecting data from multiple tables that do not have a direct relationship. What should you join on? If you need to retrieve data from two tables that do not have a common column to join, you must join on another table that has a common column or columns to both tables. That table becomes the base table. A base table joins one or more tables that have common columns, or joins tables that do not have common columns.

Thinking back to the previous database design, Hours 5 through 7 (particularly Hour 7 on normalization), you should recall that several intermediate, or base tables, have been created. For instance, Figure 14.1 shows how the original BIRDS and FOOD entities are related via a many-to-many relationship. Many-to-many relationships have redundant data; as you continue to normalize the database and remove that redundant data, you end up with tables in between. Figure 14.2 shows how the two tables are normalized into three tables. The middle table is used as a base table to provide a relationship between the BIRDS and FOOD tables. Take a minute to refresh your memory on this concept.

Illustration of Joining tables through a base table.

FIGURE 14.2
Joining tables through a base table

In the following query, you need to get information about the food items that specific birds eat. To get this information, you need to use the three tables in Figure 14.2. You can see in the FROM clause that all three tables are listed and given an alias, to simplify and for better readability. You want to see only the BIRD_NAME and the FOOD columns. Notice that because table aliases are used, every column in the query is qualified using the alias, which indicates the table in which that column resides. The join operation consists of lines 3 and 4 in the WHERE clause. The BIRDS table is joined to the BIRDS_FOOD table using the BIRD_ID column, and the FOOD table is joined to the BIRDS_FOOD table using the FOOD_ID column. The BIRDS table has only one row of data for each bird in the database. The FOOD table has only one row of data for each food item in the database. The BIRDS_FOOD table consists of the BIRD_ID and FOOD_ID, which together comprise the primary key in that table. Thus, you can have only one combination in that table of a specific bird and a specific food item. You can also see in the following query that you want to see only birds that eat fish, insects, or plants and have a valid entry in the WINGSPAN column. Finally, you are sorting the results by the bird’s name here. Take some time to study the query and results.

SQL> select B.bird_name, F.food_name
  2  from birds B, food F, birds_food BF
  3  where B.bird_id = BF.bird_id
  4    and F.food_id = BF.food_id
  5    and F.food_name in ('Fish', 'Insects', 'Plants')
  6    and B.wingspan is not null
  7  order by 1;

BIRD_NAME                      FOOD_NAME
------------------------------ ------------------------------
Bald Eagle                     Fish
Brown Pelican                  Fish
Canadian Goose                 Insects
Common Loon                    Fish
Double-crested Cormorant       Fish
Golden Eagle                   Insects
Golden Eagle                   Fish
Great Blue Heron               Insects
Great Blue Heron               Fish
Great Egret                    Fish
Mute Swan                      Plants
Osprey                         Fish
Red Tailed Hawk                Insects
Ring-billed Gull               Fish
Ring-billed Gull               Insects

15 rows selected.

The Cartesian Product

The Cartesian product is a result of a Cartesian join or “no join.” If you select from two or more tables and do not join the tables, your output is all possible rows from all the tables selected. If your tables are large, the result could be hundreds of thousands, or even millions, of rows of data. A WHERE clause is highly recommended for SQL statements that retrieve data from two or more tables. The Cartesian product is also known as a cross join.

The syntax follows:

FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1, TABLE2 [, TABLE3 ]

Following is an example of a cross join, or the dreaded Cartesian product:

SQL> select bird_name, food_name
  2  from birds, food;

BIRD_NAME                      FOOD_NAME
------------------------------ ---------------
American Coot                  Aquatic Insects
American Coot                  Aquatic Plants
American Coot                  Berries
American Coot                  Birds
American Coot                  Carrion
American Coot                  Corn
American Coot                  Crayfish
American Coot                  Crustaceans
American Coot                  Deer
American Coot                  Ducks
American Coot                  Fish
American Coot                  Frogs
American Coot                  Fruit
American Coot                  Insects
American Coot                  Moths
American Coot                  Nectar
American Coot                  Nuts
American Coot                  Plants
American Coot                  Pollen
American Coot                  Reptiles
American Coot                  Rodents
American Coot                  Seeds
American Coot                  Small Mammals
American Coot                  Snakes
American Coot                  Worms
American Crow                  Aquatic Insects
American Crow                  Aquatic Plants
American Crow                  Berries
………
Turkey Vulture                 Crustaceans
Turkey Vulture                 Deer
Turkey Vulture                 Ducks
Turkey Vulture                 Fish
Turkey Vulture                 Frogs
Turkey Vulture                 Fruit
Turkey Vulture                 Insects
Turkey Vulture                 Moths
Turkey Vulture                 Nectar
Turkey Vulture                 Nuts
Turkey Vulture                 Plants
Turkey Vulture                 Pollen
Turkey Vulture                 Reptiles
Turkey Vulture                 Rodents
Turkey Vulture                 Seeds
Turkey Vulture                 Small Mammals
Turkey Vulture                 Snakes
Turkey Vulture                 Worms

575 rows selected.

Data is selected from two separate tables, yet no JOIN operation is performed. Because you have not specified how to join rows in the first table with rows in the second table, the database server pairs every row in the first table with every row in the second table. Because each table has numerous rows of data each, the product of 575 rows selected is achieved from 23 rows multiplied by 25 rows. The BIRDS database is a small sample database with limited data, but you can imagine the results of a Cartesian product in a real-world database that has thousands or even millions of rows of data in various tables.

To fully understand exactly how the Cartesian product is derived, study the following example:

SQL> SELECT X FROM TABLE1;

X
-
A
B
C
D

4 rows selected.

SQL> SELECT V FROM TABLE2;

X
-
A
B
C
D

4 rows selected.

SQL> SELECT TABLE1.X, TABLE2.X
  2* FROM TABLE1, TABLE2;

X X
- -
A A
B A
C A
D A
A B
B B
C B
D B
A C
B C
C C
D C
A D
B D
C D
D D

16 rows selected.

Caution

Ensure That All Tables Are Joined

Be careful to join all tables in a query. If two tables in a query have not been joined and each table contains 1,000 rows of data, the Cartesian product then consists of 1,000 rows multiplied by 1,000 rows, which results in a total of 1,000,000 rows of data returned. Cartesian products dealing with large amounts of data can cause the host computer to stall or crash, in some cases, based on resource usage on the host computer. Therefore, it is important for the database administrator (DBA) and the system administrator to closely monitor for long-running queries.

Summary

This hour introduced you to one of the most robust features of SQL: the table join. Imagine the limits and intense manual effort needed if you could not extract data from more than one table in a single query. You saw several types of joins in this hour, each serving its own purpose based on the conditions placed on the query. Joins are used to link data from tables based on equality and non-equality. Outer joins are powerful, allowing data to be retrieved from one table even though associated data is not found in a joined table. Self joins are used to join a table to itself. Beware of the cross join, more commonly known as the Cartesian product. The Cartesian product is the result set of a multiple-table query without a join, often yielding a large amount of unwanted output. When selecting data from more than one table, be sure to properly join the tables according to the related columns (normally, primary keys). Failing to properly join tables can result in incomplete or inaccurate output.

Q&A

Q. When joining tables, must they be joined in the same order that they appear in the FROM clause?

A. No, they do not have to appear in the same order; however, performance might be impacted, depending on the order of tables in the FROM clause and the order in which tables are joined.

Q. When using a base table to join unrelated tables, must I select any columns from the base table?

A. No, using a base table to join unrelated tables does not mandate selecting columns from the base table.

Q. Can I join on more than one column between tables?

A. Yes, some queries might require you to join on more than one column per table, to provide a complete relationship between rows of data in the joined tables.

Workshop

The following workshop consists of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises give you the opportunity to apply the concepts discussed during the current hour, as well as build on the knowledge you acquired in previous hours of study. Be sure to complete the quiz questions and exercises before continuing to the next hour. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.

Quiz

  1. 1. What type of join do you use to return records from one table, regardless of the existence of associated records in the related table?

  2. 2. The JOIN conditions are located in which parts of the SQL statement?

  3. 3. What type of JOIN do you use to evaluate equality among rows of related tables?

  4. 4. What happens if you select from two different tables but fail to join the tables?

Exercises

  1. 1. Type the following code into the database and study the result set (Cartesian product):

    select bird_name, migration_location
    from birds, migration;
  2. 2. Now modify the previous query with a proper table join to return useful data and avoid the Cartesian product. You might have to review the ERD for the BIRDS database in Hour 3, “Getting to Know Your Data,” to refresh your memory on how these two tables are related to one another.

  3. 3. Generate a list of the food items eaten by the Great Blue Heron.

  4. 4. Which birds in the database eat fish?

  5. 5. Create a report showing the BIRD_NAME and MIGRATION_LOCATION for birds that migrate to South America.

  6. 6. Do any birds have a wingspan less than 30 inches and also eat fish?

  7. 7. Write a query to display the following results: the BIRD_NAME, FOOD_NAME, and NEST_TYPE for any birds that eat fish or build a platform nest.

  8. 8. Ask some questions you would anticipate database users, photographers, bird rescues, and so forth might inquire about the BIRDS database. Experiment with some of your own queries using table joins.