Pentaho Data Integration

  • Pentaho Data Integration (PDI), it is also called as Kettle. Pentaho responsible for the Extract, Transform and Load (ETL) processes to the PDI component.
  • Data warehouses environments are most frequently used by this ETL tools. Other purposes are also used this PDI: Migrating data between applications or databases.
pentaho data integration

Example

  • Lets create a simple transformation to convert a CSV into an XML file.
  • Our Transformation has to do the following:
    • Read the CSV file.
    • Build the greetings message.
    • Save the greetings in the XML file.

Create a Transformation:

  • Now Start the Transformation:

Step1:

    • Left workspace is the Palette, Select the Input category.

Step 2:

    • Drag the CSV file onto the workspace on the right.

Step 3:

    • Select the Scripting category.

Step 4:

    • Drag the Modified JavaScript Value icon to the workspace.

Step 5:

    • Select the Output category.

Step 6:

    • Drag the XML Output icon to the workspace.
  • Now link the CSV file input with the Modified Java Script Value by creating a Hop:

Step 1:

    • Select the first Step.

Step 2:

    • Hold the Shift key and drag the icon onto the second Step.

Step 3:

    • Link the Modified Java Script Value with the XML Output via this same process.

Configuring the CSV file input:

Step 1:

    • Double-click on the CSV file input. The configuration window for the step will appear. The file location will be indicated, file format (e.g. delimiters, enclosure characters, etc.) and column metadata (e.g. column name, data type, etc).

Step 2:

    • Change the step name with one that is more representative of this Step’s function. In this case, list out type in name list.

Step 3:

    • For the Filename field, click Browse and select the input file.

Step 4:

    • Click Get Fields to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Header row present checkbox is checked).

Step 5:

    • The grid has now the names of the columns of your file.

Step 6:

    • Click Preview to ensure that the file will be read as expected. The file will be appear then data showing in window. Click OK to finish.

Configuring the Modified JavaScript Value:

Step 1:

    • Double-click on the Modified JavaScript Value.

Step 2:

    • The Step configuration window will appear. This is different compare to the previous Step config window in that it allows to write JavaScript code. To build the message “Hello, ” concatenated with each of the names.

Step 3:

    • Name this Step Greetings.

Step 4:

    • The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the code. In particular, the last two branches have the input and output fields, ready to use in the code. In this example there are two fields: last_name and name. Write the following code:
var msg = 'Hello, ' + name + "!";

Step 5:

    • Variable created in the code through the bottom. Variable named msg we have created. This message will be send to the output file, the variable name in the grid to write.

Step 6:

    • Configuring the Modified Java Script Value step on click ok to finish.

Step 7:

    • Select the Step you just configured. To check that the new field will leave this Step, we will now see the Input and Output Fields. Input Fields are the data columns that reach a Step. Output Fields are the data columns that leave a Step. Simply transform the input data this Steps. In this case, the input and output fields are usually the same. There are Steps, however, to the Output that add fields – Calculator, for example. Filter or combine data for other Steps that causing that the Output has less fields that the Input – Group by, for example.

Step 8:

    • On click-right the Step to bring up a context menu.

Step 9:

    • Input Fields show to select. The CSV file input Step come to the Input Fields are last_name and name.

Step 10:

    • Output Fields show to select. We see that not only do we have the existing fields, but also the new msg field.

Configuring the XML Output:

Step 1:

    • Double-click the XML Output. This kind of step will appear while configuration in window. To set the name and location of the output file, and we want to include which of the fields that to be established. We may include all or some of the fields.

Step 2:

    • Name the Step File: Greetings.

Step 3:

    • To write in the File box:
${Internal.Transformation.Filename.Directory}/Hello.xml

Step 4:

    • Click Get Fields to fill the grid with the three input fields.

Step 5:

    • Save the Transformation again.

RUN

Step 1:

    • Click on the RUN button on the menu bar and Launch the transformation.

Step 2:

    • We also create a Job which may be used to schedule multiple transformations and then run it.

Categorized in:

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,