Use Case
Some integration use cases require documents to be processed in a specific order to ensure data integrity or for convenience. Depending on the source Connector and application, you can often specify an "ORDER BY" field in the initial query. However if sorting is not supported by the source application or data source or if you need to sort records within a single document, you can use the script below to rearrange the group of documents that reach a certain step based on some value you choose.
A few important notes:
- The script sorts the documents in alphabetical order based on the value of a dynamic document property named "SORT_BY_VALUE".
- The sorting algorithm uses a simple, classical alphanumerical order. For example:
- 10, 11, 12, 1, 20, 2, 3...
- A, B, C, a, b, c...
- This script sorts the set of documents NOT multiple records within a given document. For example, if you read in a CSV file with 100 lines and you want to sort those lines, you will need to first split the document by line into 100 documents (each with one line), do the sort, and then recombine.
- This script is not intended to be used with very high volumes (millions of records). It is possible to run into out-of-memory errors with very large numbers of documents.
Implementation
- Pre-process -- Perform whatever steps necessary to manipulate your data so that each Document contains the discrete unit by which you want to sort. Typically, this simply means getting one logical record per document. Most application Connectors return one record per Document by default, however if you are reading in data from a database or flat files, you will need to split/combine as necessary to prepare the Document data.
- Set the sort-by property -- Add a Set Properties step to capture the "sort by" value in a Dynamic Document Property. The value by which you want to sort could be from the data itself (i.e. reference a Profile Element such as "customer name" or "order number") or could be something like the source file name (i.e. reference a Connector Document Property).
- Groovy script -- Add a Data Process step with a Custom Scripting step. Replace the default script with that below. Note the sortByValuePropName variable must match whatever name you gave the Dynamic Document Property in Step #2.
- Post-process -- Now that your Documents are sorted, perform whatever steps are next in your integration scenario. You may want to combine the Documents or continue processing them individually.
Script
import java.util.ArrayList;
// Name of Dynamic Document Property that contains the sorting values.
sortByValuePropName = "SORT_BY_VALUE";
// Init temp collections
SortedMap sortedMap = new TreeMap();
// Loop through documents and store the sort-by-values and document indices in the sortedMap.
for ( int i = 0; i < dataContext.getDataCount(); i++ ) {
Properties props = dataContext.getProperties(i);
String sortByValue = props.getProperty("document.dynamic.userdefined." + sortByValuePropName) + "_" + i;
sortedMap.put(sortByValue, new Integer(i));
}
// Retrieve the sorted values.
Collection sortedValues = sortedMap.values();
// Loop through sorted results to output documents for the next process shape
Iterator iterator = sortedValues.iterator();
while (iterator.hasNext()) {
int i = ((Integer)iterator.next()).intValue();
dataContext.storeStream(dataContext.getStream(i), dataContext.getProperties(i));
}
The above script does work if the source file contains unique key/value pairs. If you want to sort CSV data and would like to sort it row by row alphanumerically, you need a different script. The difference is that the 1st script uses a TreeMap based on key/value pairs and the 2nd script uses an ArrayList.
import java.util.Properties;
import java.io.InputStream;
import java.util.AbstractList;
import com.boomi.execution.ExecutionUtil;
//Retrieve a handle to the Logger
logger = ExecutionUtil.getBaseLogger();
// Create an array to store each line
ArrayList<String> lines = new ArrayList<String>();
logger.info("Start");
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
// Get current line and add it to the array
String line = is.getText();
lines.add(line);
logger.info("Added line " + i);
}
// Sort the array
Collections.sort(lines);
logger.info("Sorted all lines");
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
// Retrieve next line
String line = lines.get(i);
logger.info("Get line " + i);
// Replace inputStream with current line
is = new ByteArrayInputStream(line.getBytes("UTF-8"));
dataContext.storeStream(is, props);
}
logger.info("End");