Learning Pentaho Data Integration 8 CE(Third Edition)
上QQ阅读APP看书,第一时间看更新

Reading a file whose name is known at runtime

Suppose you face any of the following situations:

  • The name of the file to read is specified in an XML file
  • The name of the file to read is obtained as a result of a database query
  • The name of the file to read is the combination of a fixed text followed by the current year and month

None of the special cases explained previously include these or similar situations. There are a couple of ways to handle this with PDI. Let's learn the simplest.

Our objective is to read one of the sales files. The exact name of the file to read is in an XML file.

In order to run this exercise, create a file named configuration.xml and inside the file, type the following:

<settings>
<my_file>sales_data_Japan</my_file>
</settings>

The idea is to dynamically build a string with the full filename and then pass this information to the Text file input step, as follows:

  1. Create a new Transformation.
  2. From the Input category of steps, drag to the canvas the Get data from XML step.
  3. Double-click the step for editing it. In the File tab, browse for the configuration.xml file and add it to the grid.
  4. Select the Content tab. In the Loop XPath textbox, type /settings/my_file.
  5. Finally, select the Fields tab. Fill the first row of the grid as follows: under Name, type filename; under XPath, type a dot (.); and as Type, select or type String.
  6. Preview the data. You should see this:

Previewing data from an XML file

  1. After this step, add a UDJE step. Double-click it and configure it for creating the full path for the file, as in the following example:
  2. Configuring a UDJE step.

  3. Close the window, select the UDJE step, and run a preview. You should see the full name of the file to read: D:/LearningPDI/SAMPLEFILES/sales_data_Japan.csv.
  4. Close the preview window, add a Text file input step, and create a link from the UDJE step towards this step.
  5. Double-click on the Text file input step and fill the lower grid as shown in the following screenshot:

Accepting a filename for incoming steps

  1. Fill in the Content and Fields tabs just like you did before. It's worth saying that the Get Fields button will not populate the grid as expected because the filename is not explicit in the configuration window. In order to avoid typing the fields manually, you can refer to the following tip:

Instead of configuring the tabs again, you can open any of the transformations, and copy the Text file input step and paste it here. Leave the Contents and Fields tabs untouched and just configure the File tab as explained previously.