Explaining how metadata injection works

Let's see how metadata injection works through a very simple example. Suppose that you have an Excel file with sales data. There are several sheets in the file, but only one has the data you want to read. The name of the sheet is not fixed, so you have it in a Kettle variable. The quick solution to read the spreadsheet would be to explicitly declare the sheet name.

Unfortunately, PDI is not prepared to put a field or variable as the name of the sheet. Which other options do you have? As you know, the Sheet tab in the Excel configuration window can be left empty and PDI will read the sheets anyway. However, this would cause an error as, by default, PDI reads all the sheets, and in this case, the rest of the sheets may have a different kind of data.

In a few words, in a development stage, you don't have the means to create the final Transformation that reads the Excel file. The solution is to inject the variable information at execution time.

The general idea is as follows. You should have two transformations:

A template Transformation, here you will implement all the tasks needed, but the Transformation will lack some metadata information. In our example, the missing metadata information will be the sheet name.
A Transformation that injects data, the purpose of this one is to inject the missing metadata into the template, generating a working Transformation.

The following screenshot depicts this idea:

Metadata injection

With this concept in mind, let's implement this example.

In order to follow the instructions for the following examples, you will need the sample Excel files that come with the code bundle for the chapter.