Treating invalid data by splitting and merging streams

When you are transforming data, it is not uncommon that you detect inaccuracies or errors. Sometimes the issues you find may not be severe enough to discard the rows. Maybe you can somehow guess what data was supposed to be there instead of the current values, or it can happen that you have default values for the invalid values. Let's see some examples:

In these cases and many more like these, the problem is not so critical and you can do some work to avoid aborting or discarding data because of these anomalies:

In general, in any situation, you can do your best to fix the issues and send the rows back to the main stream. This is valid both for regular streams and streams that are a result of error handling.

In this section, you will see an example of fixing these kinds of issues and avoiding having to discard the rows that cause errors or are considered invalid. You will do it using the concepts learned in the last chapter: splitting and merging streams.