How to Sort a group of Documents using Groovy

Document created by Adam Arrowsmith Employee on Sep 11, 2015Last modified by Adam Arrowsmith Employee on Jul 13, 2017
Version 5Show Document
  • View in full screen mode

Use Case

Some integration use cases require documents to be processed in a specific order to ensure data integrity or for convenience. Depending on the source Connector and application, you can often specify an "ORDER BY" field in the initial query. However if sorting is not supported by the source application or data source or if you need to sort records within a single document, you can use the script below to rearrange the group of documents that reach a certain step based on some value you choose.

 

A few important notes:

  • The script sorts the documents in alphabetical order based on the value of a dynamic document property named "SORT_BY_VALUE".
  • The sorting algorithm uses a simple, classical alphanumerical order. For example:
    • 10, 11, 12, 1, 20, 2, 3...
    • A, B, C, a, b, c...
  • This script sorts the set of documents NOT multiple records within a given document. For example, if you read in a CSV file with 100 lines and you want to sort those lines, you will need to first split the document by line into 100 documents (each with one line), do the sort, and then recombine.
  • This script is not intended to be used with very high volumes (millions of records). It is possible to run into out-of-memory errors with very large numbers of documents.

 

 

Implementation

  1. Pre-process -- Perform whatever steps necessary to manipulate your data so that each Document contains the discrete unit by which you want to sort. Typically, this simply means getting one logical record per document. Most application Connectors return one record per Document by default, however if you are reading in data from a database or flat files, you will need to split/combine as necessary to prepare the Document data.
  2. Set the sort-by property -- Add a Set Properties step to capture the "sort by" value in a Dynamic Document Property. The value by which you want to sort could be from the data itself (i.e. reference a Profile Element such as "customer name" or "order number") or could be something like the source file name (i.e. reference a Connector Document Property).
  3. Groovy script -- Add a Data Process step with a Custom Scripting step. Replace the default script with that below. Note the sortByValuePropName variable must match whatever name you gave the Dynamic Document Property in Step #2.
  4. Post-process -- Now that your Documents are sorted, perform whatever steps are next in your integration scenario. You may want to combine the Documents or continue processing them individually.

 

Script

import java.util.ArrayList;

// Name of Dynamic Document Property that contains the sorting values.
sortByValuePropName = "SORT_BY_VALUE";

// Init temp collections
SortedMap sortedMap = new TreeMap();


// Loop through documents and store the sort-by-values and document indices in the sortedMap.

for ( int i = 0; i < dataContext.getDataCount(); i++ ) {
  Properties props = dataContext.getProperties(i);
  String sortByValue = props.getProperty("document.dynamic.userdefined." + sortByValuePropName) + "_" + i;
  sortedMap.put(sortByValue, new Integer(i));
}

// Retrieve the sorted values.
Collection sortedValues = sortedMap.values();

// Loop through sorted results to output documents for the next process shape
Iterator iterator = sortedValues.iterator();

while (iterator.hasNext()) {
  int i = ((Integer)iterator.next()).intValue();

  dataContext.storeStream(dataContext.getStream(i), dataContext.getProperties(i));
}
14 people found this helpful

Attachments

    Outcomes