How to filter a unique set of documents using Groovy

Document created by Adam Arrowsmith Employee on Oct 26, 2011Last modified by Adam Arrowsmith Employee on Aug 8, 2017
Version 3Show Document
  • View in full screen mode

This script can be useful if your source application or data source returns multiple documents/records with the same "value of interest" within a given result set and you want to skip any records with a duplicate value.

 

 

Use Case

This script outputs a unique group of documents based on the value of a "FILTER_BY_VALUE" dynamic document property. When a new FILTER_BY_VALUE value is encountered, the document is output and the FILTER_BY_VALUE recorded. If the same FILTER_BY_VALUE is encountered in the same group of documents, subsequent documents are silently skipped and not output.

 

This can be used to filter out duplicate documents within the same document group.

 

Approach

 

  1. Pre-process - Perform whatever steps necessary to manipulate your data so that each document contains the discrete unit by which you want to filter. Typically, this simply means getting one logical record per document. Most application connectors return one record per document by default, however if you are reading in data from a database or flat files, you will need to split/combine as necessary to prepare the document data.
  2. Set the filter-by property - Capture the "filter by" value in a dynamic document property. The value by which you want to filter by could be from the data itself (i.e. reference a profile element such as "customer name" or "order number") or could be something like the source file name (i.e. reference a connector document property).
  3. Groovy script - Add a Data Process step with a Custom Scripting step. Replace the default script with that below. Note the filterByValuePropName variable must match whatever name you gave the dynamic document property in Step #2.
  4. Post-process - Now that your documents are filtered, perform whatever steps are next in your integration scenario. You may want to combine the documents or continue processing them individually.

 

Implementation

Script:

 

/*
This script outputs a unique group of documents based on the value of a
"FILTER_BY_VALUE" user defined document property. When a new FILTER_BY_VALUE
value is encountered, the document is output and the FILTER_BY_VALUE recorded.
If the same FILTER_BY_VALUE is encountered in the same group of documents,
subsequent documents are silently skipped and not output.

This can be used to filter out "duplicate" documents within the same
document group.
*/


// Name of User Defined Document Property that contains the filter by values.
filterByValuePropName = "FILTER_BY_VALUE";

Set docList = new HashSet();

// Loop through documents and inspect the filter-by-values. If not encountered before,
// store the value in the HashSet and output the document, otherwise silently skip it.

for ( int i = 0; i < dataContext.getDataCount(); i++ ) {
  InputStream is = dataContext.getStream(i);
  Properties props = dataContext.getProperties(i);

  recordId = props.getProperty("document.dynamic.userdefined." + filterByValuePropName);

  if ( docList.contains(recordId) ) {

    // Skip the document by simply not adding it to the storeStream.
    continue;

  } else {

    docList.add(recordId);

    dataContext.storeStream(is, props);

  }

}
6 people found this helpful

Attachments

    Outcomes