Process performance is taking too long how do I make it run faster?

Document created by mike_aronson Employee on Sep 12, 2014Last modified by dave_lesshafft on Mar 2, 2016
Version 2Show Document
  • View in full screen mode
Process performance is taking too long – how do I make it run faster?
As a first step 1) Always download and analyze the Process Log first to determine which shapes/steps are taking the most time.

Here are common shapes/steps that impact process performance and how to make them run faster:

·         Shape/Step: Connectors with Get/Query action are too slow.  Recommendation: When using Connectors to query data, if available, consider experimenting with the Batch Results option and test various batching values to see how it impacts performance.  Also, some Connector Operations will have an option to select each field to return (e.g. Salesforce).  Always only select the necessary fields to return (don’t just use the default which will return all fields).

·         Shape/Step: Connectors with Send Action are too slow. Recommendation:   When sending documents/data into to connectors make sure any available batching options are enabled.

·         Shape/Step: Flow Control is configured to Run Each Document Individually.  Recommendation:  Avoid Flow Control/Run Each Document Individually whenever possible.  Data Integration is most efficient when the data is processed in batches, not when processed individually.  If you need, use this for testing purposes only.  If you require that documents/data be processed in sequence (order) and depend on this sequence, then you will need to assess your architecture to ensure that all other aspects are optimized for efficient processing.  For example, consider turning on Low Latency mode for this case.

·         Shape/step:  Connector Call lookups per document/record are slowing down the process or Map.  Recommendations:  1) Use the Map Function Caching: Cache By Map option when performing a lookup within a Map Function. This will remember the output for a given input(s) and skip the actual API call.  2) If it’s still too slow, use the Document Cache instead.  Branch the process and do a single query to lookup all records once and store them in the Document Cache, then look up the records from the Document Cache in the map later.
Some other things to consider:  Remove unnecessary lookups or consolidate connector call lookups.  In addition to the above recommendations for using map function caching or a document cache, look for opportunities to reduce/consolidate calls such as writing SQL queries that join data together in a single call, and writing SQL advanced update statements that don’t require the need to select records first.

·         Shape/Step: Map function Ordering is slowing down the Map – use only if absolutely necessary.  Map Function Ordering controls whether map functions are executed in a user defined sequence (slower) or naturally based on the profile (faster).


·         Shape/Step: Try/Catch is slowing down the process – use only if absolutely necessary.  Try/Catch shapes require extra overhead and should be used sparingly.  Nested Try/Catch shapes (using more than one in the same branch in a process) will slow down processing even further.  Evaluate your requirements for catching errors and implement only what is absolutely necessary. 

Some important considerations:

Data volumes – Improving the performance of the integration process does not address the volume of data being integrated.  Ways to reduce the volume include:
a)      For every connector operation, review query/selection criteria to extract only the records and fields that need to be synced
b)      Avoid sync all of the data every time.  Use parameters to sync only data that has been changed since prior runs (e.g. using a Sync Me flag field or based on a last modified date field)
c)       Batch data across executions whenever possible.
d)      Reduce the frequency of scheduled executions  - especially if same data needs to be pulled multiple times.
e)      Processing large files – increase maximum memory available to the atom to handle processing of larger files.   An OS that the atom runs on generally has a fixed overhead.  Typically, the machine running an atom needs 512MB to 1GB for the OS itself.   Assuming that only the atom is running on the machine, in theory the rest of the memory can be allocated to the atom’s jvm.  If other applications/processes are running, as a rule of thumb you may want to allocate half of the remaining resources to the atom and see how it performs.  For 32-bit OS, allocate max 1 GB RAM to the atom.   For 64-bit OS with 8 GB RAM available, allocate 4GB RAM to the atom and see how it performs, etc…

Process Design - When processing large volumes of records, small design inefficiencies quickly become very inefficiencies.  
Re-evaluate your process design:
a)      Consolidate shapes/steps – there is some nominal overhead associated with each step execution
b)      Instead of Process Properties, consider using Document Cache and/or User Defined Document Properties. Document Cache lets you efficiently temporarily store and reference entire documents (indexed by profile fields) anywhere downstream; for example you could cache the original data up front to reference later after mapping/connector calls. Document Properties also let you capture original values per document to reference later, however the properties are not propagated through all types of connectors.
c)       Look for opportunities to eliminate “extra” steps, such as temporary maps, multiple set properties, etc. For example, instead of mapping database data to an XML profile and then doing a split, simply split the data within the database connector operation using a batch count=1.
d)      Consider turning on low latency:

When to consider Flow Control with Parallel Processing.  if you've performed all of the improvement recommendations above, but are still experiencing issues, consider parallel processing to spread the processing load across multiple logical/physical nodes  You can use the Flow Control step to split document execution across multiple threads, allowing records to execute simultaneously. Parallel Processing can be used to “multi-thread” steps or sections of processes that run slowly.  This is recommended for running on a molecule or atom cloud.  When running in the Atom Cloud the maximum number of threads or “units” is 10.
15 people found this helpful