Process performance and/or Map performance is taking too long, or is making the atom run out of memory

Document created by mike_aronson Employee on Sep 12, 2014Last modified by chris_stevens on Mar 11, 2016
Version 3Show Document
  • View in full screen mode

Process performance and/or Map performance is taking too long or is making the atom run out of memory (either due to a java heap space or Out of Memory Error (OOME) or GC Overhead limit is reached)


With regards to Atom RAM usage, each step will utilize RAM differently, depending on how the step is developed.

  • Connector steps typically utilize small RAM because data is streamed to/from disk instead of being placed in RAM.
  • Map steps will utilize as much RAM as possible, however, once RAM is used, it will begin to stream the mapping/transformation to disk. This can significantly affect performance for really large file transformations.
  • Logic steps and other steps that don’t utilize document data tend to use little RAM


Because the performance of the Map step is memory sensitive, there are some techniques that can be done to speed up mapping or prevent memory crashes:

  1. For processing large files – increase maximum memory available to the atom to handle processing of larger files.   An OS that the atom runs on generally has a fixed overhead.  Typically, the machine running an atom needs 512MB to 1GB for the OS itself.   Assuming that only the atom is running on the machine, in theory the rest of the memory can be allocated to the atom’s jvm.  If other applications/processes are running, as a rule of thumb you may want to allocate half of the remaining resources to the atom and see how it performs.  For 32-bit OS, allocate maximum of 1 GB RAM to the atom (not more).  If need more than that, considering increasing the machine architecture.  For 64-bit OS with 8 GB RAM available, allocate 4GB RAM to the atom and see how it performs, etc…
  2. Experiment with smaller batch counts (1, 10, 100, 250, 500, 1000, etc…) and observe map performance. If you have data with many columns, smaller batches will lower per-doc data volume to mitigate in-memory mapping and avoid buffering to disk.
  3. Split up large files into smaller files of multiple lines (for example, if you have a large file with 100,000 lines, split it up into smaller files of 5,000 to 10,000 lines).
  4. On an atom, spread out and process the data in batches across multiple executions.
  5. On an atom, molecule or cloud, use Flow Control step with batch setting enabled to process the data in batches.  With very large data, do not use Run Each Document Individually as this will significantly slow down performance.
  6. On an atom, molecule or cloud, use Flow Control step with multiple threads. Configure as follows: no batching, threads=4 (to match #CPU on a typical machine). Experiment with thread count however be conscious of the atom server's and JVM limitations (e.g. if the machine has 4 CPU and the atom has 1GB RAM allocated to it, be careful not to exceed). If you configure too many threads, performance will decrease as CPU thrashes between threads.
  7. On a molecule or cloud, execute the smaller files in parallel using the Flow Control step with parallel processing enabled.
  8. If needed, combine the smaller files back into larger files using the Combine Documents step (this will use little RAM).
  9. Other considerations: During testing, remove all map functions and observe map performance for various batch sizes. If significant performance improvement, consider using a document cache or dynamic process properties for looking up or configuring values instead of process property components which should parse more quickly. Additionally with dynamic process properties, map function caching can be used.
4 people found this helpful