Union, merging mismatched fields, and removing unnecessary fields

We know that we want to bring together the booking data for all the airlines, so we'll union together the two paths in the flow:

  1. Drag the Clean 2 step onto the Clean 1 step and drop it onto the Union box that appears. This will create a new Union step with input connections from both of the two clean steps:

  1. The Union pane that shows up when the Union step is selected will show you the mismatched fields, indicate the associated input, and give you options for removing or merging the fields. For example, Fare Type and Ticket Type are named differently between the Excel file and the text files, but indicate the same data. Hold down the Ctrl key and select both fields. Then, select Merge Fields from the toolbar at the top of the pane or from the right-click menu:

  1. Also, merge Row ID and Row_ID.
  2. File Paths applies only to the Southwest files, which were unioned together in the Input step. While this auto-generated field can be very useful at times, it does not add anything to the data in this example. Select the field and then click Remove Field from the menu.
  3. Similarly, Travel Insurance? and Passenger ID apply to only one of the inputs and will be of little use in our analysis. Remove those fields as well.
  4. The single remaining mismatched field, Airline, is useful. Leave it for now and click the + icon on the Union 1 step in the flow pane and extend the flow by selecting Add Step. At this point, your flow should look like this:

There is an icon above the Union 1 step in the flow, indicating changes that were made within this step. In this case, the changes are the removal of several of the fields. Each step with changes will have similar icons, which will reveal tooltip details when you hover over them and also allow you to interact with the changes. You can see a complete list of changes, edit them, reorder them, and remove them by clicking the step and opening the Changes pane. Depending on the step type, the Changes pane is available by either expanding it or selecting the Changes tab.