One-to-many join

A few readers may have been able to spot instances of one-to-many joins in the aforementioned examples in this chapter. Let me change the datasets that we use to discuss this join. After going through the example, I would encourage readers to look at the inner, left, right, and full join examples and see if they can spot instances of one-to-many joins.

Let's assume we have datasets X and Y:

Data X;
Input ID VarTabA VarTabB;
Datalines;
1 66 77
2 55 66
3 77 55
;


Data Y;
Input ID Category $ VarTabC VarTabD;
Datalines;
1 A 60 70
1 B 50 60
2 A 50 60
3 C 70 50
;

The datasets formed with X and Y have the same ID variables. However, the Y dataset has an instance where the ID is repeated and has two different sets of values of VarTabC and VarTabD. The following tables shows a one-to-many dataset:

The following query was run to get a one-to-many match:

Proc Sql;
  Create table One_to_One as
Select Coalesce(A.ID, B.ID) as ID, VarTabA, VarTabB, VarTabC, 
VarTabD
  From X as A, Y as B
  Where A.ID=B.ID
  ;
Quit;

Due to the data structure, the resulting join leads to a one-to-many match where one record in table X is mapped to multiple records in table Y where the ID value equals 1:

The preceding query we wrote is an inner join. Don't get confused with the change in the syntax. The query is the same as the following:

Proc Sql;
  Create table One_to_Many as
Select Coalesce(A.ID, B.ID) as ID, VarTabA, VarTabB, VarTabC, 
VarTabD
  From X as A Inner Join Y as B
  On A.ID=B.ID
  ;
Quit;