How to Split and Repeat Values within an XML Element using Groovy

Document created by Adam Arrowsmith Employee on Feb 16, 2012Last modified by Adam Arrowsmith Employee on Oct 17, 2016
Version 4Show Document
  • View in full screen mode
This article describes how to "split" an XML element containing a delimited set of values into separate XML elements using Groovy script.

Use Case

You have an XML Document with an element value that contains a variable number of delimited values and you need to map each value to a destination element.

 

For example, note the values separated by semicolons in the territory elements in this sample XML:
<customers>
  <customer>
    <name>Customer A</name>
    <territory>001;002;003</territory>
    <type>direct</type>
  </customer>
  <customer>
    <name>Customer B</name>
    <territory>002;004</territory>
    <type>channel</type>
  </customer>
</customers>
Each territory value cannot be readily mapped using a standard XML Profile and/or Map Functions.

 

Approach

The solution is to use a Data Process Custom Scripting step to massage the source data and separate each delimited value into its own element. This will allow you to use use the natural XML Profile looping capabilities.

 

Implementation

1. Add a Data Process Custom Scripting step with the script below to your Process before the Map step. Modify the two variables at the beginning of the script to match your specific XML structure and delimiter:

String xpathElementToSplit = "/customers/customer/territory";


String delimiter = ";";

 

2. Edit the XML Profile, navigate to the element in question, and set the Max Occurs to "unbounded". For this example, you would edit the territory element. This will let the Map know this element can repeat.

 

Script:

import org.jdom.input.SAXBuilder;
import org.jdom.Document;
import org.jdom.Namespace; 
import org.jdom.Element;
import org.jdom.xpath.XPath;
import org.jdom.output.XMLOutputter;

// Set the full path to the XML element containing the values to split.
String xpathElementToSplit = "/customers/customer/territory";

// Set the delimiter character separating the values.
String delimiter = ";";

// Loop through the Process Documents
for ( int i = 0; i < dataContext.getDataCount(); i++ ) {

     InputStream is = dataContext.getStream(i);
     Properties props = dataContext.getProperties(i);

     // Build XML Document
     SAXBuilder builder = new SAXBuilder();
     Document doc = builder.build(is);

     XPath x = XPath.newInstance(xpathElementToSplit);
     /* OPTIONAL: If XML data has namespaces, uncomment this section and declare
        all namespaces present in data as follows:
     x.addNamespace("<NAMESPACE_PREFIX>", "<NAMESPACE_URI>");
     END OPTIONAL */



     // Select multiple nodes and loop through them
     myElements = x.selectNodes(doc);

     for (Element myElement : myElements) {
        // Get the element name & value
        String elementName = myElement.getName();
        String elementValue = myElement.getText();
        Namespace elementNS = myElement.getNamespace();

        // Get parent element and remove current element
        Element parentElement = myElement.getParent();
        myElement.detach();

        // Loop through parts and add new Elements to parent
        String[] parts = elementValue.split(delimiter);
        for (int j=0; j<parts.length; j++) {

           newElement = new Element(elementName, elementNS).addContent(parts[j]); 
           parentElement.addContent(newElement);
        }
     }

     XMLOutputter outputter = new XMLOutputter();
     is = new ByteArrayInputStream(outputter.outputString(doc).getBytes());
     dataContext.storeStream(is, props);
}

 

Result Document Data:
<customers>
  <customer>
    <name>Customer A</name>
    <type>direct</type>
    <territory>001</territory>
    <territory>002</territory>
    <territory>003</territory>
  </customer>
  <customer>
    <name>Customer B</name>
    <type>channel</type>
    <territory>002</territory>
    <territory>004</territory>
  </customer>
</customers>
Remember the sequence of the elements does not matter for XML mapping.

Handling XML with Namespaces

If your XML data contains namespace prefixes for elements, the script above can handle that situation. Simply include the prefix in the xpathElementToSplit string.
For example, if the sample data above used namespaces instead:
<myprefix:customers xmlns:myprefix="mynamespace" >
  <myprefix:customer>
    <myprefix:name>Customer A</myprefix:name>
    <myprefix:territory>001;002;003</myprefix:territory>
    <myprefix:type>direct</myprefix:type>
  </myprefix:customer>
  <myprefix:customer>
    <myprefix:name>Customer B</myprefix:name>
    <myprefix:territory>002;004</myprefix:territory>
    <myprefix:type>channel</myprefix:type>
  </myprefix:customer>
</myprefix:customers>x
Change:
String xpathElementToSplit = "/myprefix:customers/myprefix:customer/myprefix:territory";  

 

Result Document Data:
<?xml version="1.0" encoding="UTF-8"?>
<myprefix:customers xmlns:myprefix="mynamespace">
  <myprefix:customer>
    <myprefix:name>Customer A</myprefix:name>
    <myprefix:type>direct</myprefix:type>
    <myprefix:territory>001</myprefix:territory>
    <myprefix:territory>002</myprefix:territory>
    <myprefix:territory>003</myprefix:territory>
</myprefix:customer>
  <myprefix:customer>
    <myprefix:name>Customer B</myprefix:name>
    <myprefix:type>channel</myprefix:type>
    <myprefix:territory>002</myprefix:territory>
    <myprefix:territory>004</myprefix:territory>
</myprefix:customer>
</myprefix:customers>
14 people found this helpful

Attachments

    Outcomes