Customizing the way of merging streams

When you use an arbitrary step to unify streams, the rows remain in the same order as they were in their original stream, but the streams are joined in any order. Take a look at the example's preview, the rows with new features and the rows with bugs remained sorted within their original group. However, you did not tell PDI to put the new features before or after the rows of the other stream. If you care about the order in which the union is made, there are some steps that can help you. Here are the options you have:

If you want to ...

You can do this ...

Append two or more streams and don't care about the order

Use any step. The selected step will take all the incoming streams in any order and then will proceed with the specific task.

Append two or more streams in a given order

For two streams, use the Append streams step from the Flow category. It allows you to decide which stream goes first.

For two or more, use the Prioritize streams step from the Flow category. It allows you to decide the order of all the incoming streams.

Merge two streams ordered by one or more fields

Use the Sorted Merge step from the Joins category. This step allows you to decide on which field(s) to order the incoming rows before sending them to the destination step(s). Both input streams must be

sorted on

that field(s).

Merge two streams keeping the newest when there are duplicates

Use the Merge Rows (diff) step from the Joins category.

You tell PDI the key fields, that is, the fields that tell you that a row is the same in both streams. You also give PDI the fields to compare when the row is found in both the streams.

PDI tries to match rows of both streams based on the key fields. Then it creates a field that will act as a flag and fills it as follows:

  • If a row was only found in the first stream, the flag is set to deleted

  • If a row was only found in the second stream, the flag is set to new

  • If the row was found in both streams, and the fields to compare are the same, the flag is set to identical

  • If the row was found in both streams, and at least one of the fields to compare is different, the flag is set to changed


Whether you use arbitrary steps or some of the special steps mentioned here to merge streams, don't forget to verify the layout of the streams you are merging. Pay attention to the warnings of the trap detector and avoid mixing row layouts.