Learning Pentaho Data Integration 8 CE(Third Edition)
上QQ阅读APP看书,第一时间看更新

Implementing the error handling functionality

With the error handling functionality, you can capture errors that otherwise would cause the Transformation to halt. Instead of aborting, the rows that cause the errors are sent to a different stream for further treatment.

The error handling functionality is implemented at step level. You don't need to implement error handling in every step. In fact, you cannot do that because not all steps support error handling. The objective of error handling is to implement it in the steps where it is more likely to have errors.

A typical situation where you should consider handling errors is while changing the metadata of fields. That works perfectly as long as you know that the data is good, but it might fail when executing against real data. Let's explain it with a practical example.

In this case, we will work with the original version of the projects.txt file. Remember that you removed an invalid row in that file. Restore it and then proceed:

  1. Open the Transformation of projects and save it under a different name. You can do it from the main menu by navigating to File | Save as... or from the main toolbar.
  2. Edit the CSV file input step and change all the data types from Date to String. Also, delete the values under the Format column. 
  3. Now add a Select values step and insert it into the CSV file input step and the Calculator step. We will use it to convert the strings to Date format.
  1. Double-click on the Select values step and select the Meta-data tab. Fill the tab as follows:

Configuring a Meta-data tab

  1. Close the window and run a preview. There is an error in the Select values step when trying to convert the invalid value ??? to a Date type:
Select values.0 - end_date String<binary-string> : couldn't convert string [???] to a date using format [yyyy-MM-dd] on offset location 0

Now let's get rid of that error by using the error handling feature:

  1. Drag to the canvas the Write to log step. You will find it in the Utility category of steps.
  1. Create a new hop from the Select values step toward the Write to log step. When asked for the kind of hop to create, select Error handling of step. Then, the following Warning window will appear:

Copy or Distribute

  1. Click on Copy.

For now, you don't have to worry about these two offered options. You will learn about them in Chapter 6, Controlling the Flow of Data.

  1. Now your Transformation should look as shown in the following screenshot:

Handling errors in a select values step

  1. Double-click on the Write to log step. In the Write to log textbox, type There was an error changing the metadata of a field.
  2. Click on Get Fields. The grid will be populated with the names of the fields coming from the previous step.
  3. Close the window and save the Transformation.
  4. Now run it. Look at the Logging tab in the Execution Results window. The log will look like this:
- Write to log.0 - ------------> Linenr 1------------------------------
- Write to log.0 - There was an error changing the metadata of a field
- Write to log.0 -
- Write to log.0 - project_name = Project C
- Write to log.0 - start_date = 2017-01-15
- Write to log.0 - end_date = ???
- Write to log.0 -
- Write to log.0 - ====================
- Write to log.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
  1. Run a preview of the Calculator step. You will see all the lines except the line containing the invalid date. This output is exactly the same as the one in the screenshot Preview of a Transformation.
  1. Now run a preview on the Write to log step. You will only see the line that had the invalid end_date value:

Preview of data with errors

With just a couple of clicks, you redirected the rows with errors to an alternative stream, represented by the hop in red. As you could see, both in the preview and in the Execution Results windows, the rows with valid values continued their way towards the Calculator step, while the row whose end_date field could not be converted to Date went to the Write to log step. In the Write to log step, you wrote an informative message as well as the values for all the fields, so it was easy to identify which row (or rows) caused this situation.

Note that we redirected the errors to a Write to log step just for demonstration purposes. You are free to use any other step for that.

As said, not all steps support error handling. It's easy to know if a step does not implement the feature.

A disabled Error Handling ... option in a contextual menu means that the step doesn't support Error Handling. Also, when you try to create a hop from this step toward another, the menu with the option Error handing of step will not show up.