Query Set #2b – adding columns using JOIN

While the ALTER TABLE and UPDATE sequence is a good way to add columns to a table one at a time, it can be tedious when you want to copy many columns from the same table. A JOIN operation gives us a second option for copying many columns from the same table.

In a JOIN operation, two tables are combined to produce a single table. In the following example query, the selected columns of the VITALS table are appended on the end of the MORT_FINAL table.

However, the MORT_FINAL table and VITALS table both contain several rows. How does the query know which rows of each table correspond to each other? This is specified using an ON clause (at the end of the query). The ON clause says, "When joining the tables, combine those rows where the visit IDs are equal." So for each row of the MORT_FINAL table, there will be one and only one row of the VISITS table to which it corresponds: the row that has the same visit ID. This makes sense because we are interested in collecting the information from individual visits in their own separate rows.

Another thing to know about JOINs is that there are four different JOIN types in standard SQL: LEFT JOINs, RIGHT JOINs, INNER JOINs, and OUTER JOINs. A LEFT JOIN (referred to as a LEFT OUTER JOIN in SQLite) is the type we use here; it says, "For every row of the first table (MORT_FINAL, in this case), add the corresponding VISIT columns where the visit IDs are equal, and add NULL values if there is no corresponding visit ID in the VISIT table." In other words, all of the rows of the first table are preserved, whether or not there is a corresponding row in the right table. Visits that have a row in the second table but are missing from the first table are discarded.

In a RIGHT JOIN, the opposite is true: unique visit IDs of the second table are preserved, and they are aligned to the corresponding visit IDs of the first table. Visit IDs present in the first table but missing in the second table are discarded. INNER JOINs include in the final result: only visit IDs that are present in both tables. OUTER JOINs include all rows of both tables and replace all missing entries with NULL values. As a note, we should add that RIGHT JOINs and OUTER JOINs are not supported in SQLite.

So why did we choose a LEFT JOIN? Fundamentally, our job is to specify a prediction for every single visit, whether or not vital signs were recorded at that visit. Therefore, every visit ID present in the MORT_FINAL table should be in the final result, and the LEFT JOIN ensures that will be true.

In the following code, we see that by using a JOIN, only one total query is needed to add eight columns of the VITALS table. What are the downsides of this method? For one thing, notice that a new table is created: MORT_FINAL_2. We cannot append to the old MORT_FINAL table; a new table must be created. Also, notice that we have to type out each column that we wish to preserve in the final result. In SQL, the asterisk (*) indicates to add all columns from both tables; we could have written SELECT * FROM MORT_FINAL .... However, if we used an asterisk, there would be duplicate columns (for example, the Visit_id column would be present twice, since it is in both tables).

Then we would have to exclude the duplicate columns with a SELECT statement. Nevertheless, JOINs are useful when there are many columns in a second table that you would like to merge into a first table:

sqlite> CREATE TABLE MORT_FINAL_2 AS
SELECT M.Visit_id, M.Pid, M.Attending_md, M.Visit_date, M.Pri_dx_icd, M.Sec_dx_icd, M.Bdate, M.Sex, V.Height_in, V.Weight_lb, V.Temp_f, V.Pulse, V.Resp_rate, V.Bp_syst, V.Bp_Diast, V.SpO2
FROM MORT_FINAL AS M LEFT OUTER JOIN VITALS AS V ON M.Visit_id = V.Visit_id;