When you use an arbitrary step to unify streams, the rows remain in the same order as they were in their original stream, but the streams are joined in any order. Take a look at the example's preview, the rows with new features and the rows with bugs remained sorted within their original group. However, you did not tell PDI to put the new features before or after the rows of the other stream. If you care about the order in which the union is made, there are some steps that can help you. Here are the options you have:
If you want to ... |
You can do this ... |
Append two or more streams and don't care about the order |
Use any step. The selected step will take all the incoming streams in any order and then will proceed with the specific task. |
Append two or more streams in a given order |
For two streams, use the Append streams step from the Flow category. It allows you to decide which stream goes first. For two or more, use the Prioritize streams step from the Flow category. It allows you to decide the order of all the incoming streams. |
Merge two streams ordered by one or more fields |
Use the Sorted Merge step from the Joins category. This step allows you to decide on which field(s) to order the incoming rows before sending them to the destination step(s). Both input streams must be sorted onthat field(s). |
Merge two streams keeping the newest when there are duplicates |
Use the Merge Rows (diff) step from the Joins category. You tell PDI the key fields, that is, the fields that tell you that a row is the same in both streams. You also give PDI the fields to compare when the row is found in both the streams. PDI tries to match rows of both streams based on the key fields. Then it creates a field that will act as a flag and fills it as follows:
|
Whether you use arbitrary steps or some of the special steps mentioned here to merge streams, don't forget to verify the layout of the streams you are merging. Pay attention to the warnings of the trap detector and avoid mixing row layouts.