As you saw in the last Transformation, once the data is created in the first step, it travels from step to step through the hops that link those steps. The hop's function is just to direct data from an output buffer to an input one. The real manipulation of data, as well as the modification of a stream, by adding or removing fields, occurs in the steps. In the last Transformation that you created, you used the Calculator step to create new fields and add them to your dataset.
The Calculator step is one of the many steps that PDI has to create new fields. You create the fields from scratch or by combining existent ones. Usually, you will find these steps under the Transform category of the Steps tree. In the following table, you have descriptions of some of the most used steps. The examples reference the first Transformation you created in this chapter:
Step |
Description |
Example |
Add constants |
Adds one or more fields with constant values. |
If the start date was the same for all the projects, you could add that field with an Add constants step. |
Add sequence |
Adds a field with a sequence. By default, the generated sequence will be 1, 2, 3 ... but you can change the start, increment, and maximum values to generate different sequences. |
You could have created the delta field with an Add sequence step instead of using the Clone num field option in the Clone row step. |
Number range |
Creates a new field based on ranges of values. Applies to a numeric field. |
You used this step for creating the performance field based on the duration of the project. |
Replace in string |
Replaces all occurrences of a text in a string field with another text. |
The value for the project_name field includes the word Project. With this step, you could remove the word or replace it with a shorter one. The final name for Project A could be Proj A or simply A. |
Split Fields |
Splits a single field into two or more new fields. You have to specify which character acts as a separator. |
Split the name of the project into two fields: the first word (that in this case is always Project) and the rest. The separator would be a space character. |
String operations |
Applies some operations on strings: trimming and removing special characters, among others. |
You could convert the project name to uppercase. |
Value Mapper |
Creates a correspondence between the values of a field and a new set of values. |
You could define a new field based on the performance field. The value could be Rejected if the performance is poor or unknown, and Approved for the rest of the performance values. |
User Defined Java Expression |
Creates a new field by using a Java expression that involves one or more fields. This step may eventually replace any of the previous steps. |
You used this step in the first section for creating two strings: duration and message. |
Note that some of these steps can be used for generating more than one field at a time; an example of that is the User Defined Java Expression step (or UDJE for short).
Any of these steps when added to your Transformation, is executed for every row in the stream. It takes the row, identifies the fields needed to do its tasks, calculates the new field(s), and adds it (or them) to the dataset.