Understanding Atom Runtime Pausing During Restarts and Updates

Document created by Adam Arrowsmith Employee on Jun 2, 2016Last modified by Adam Arrowsmith Employee on Sep 15, 2016
Version 6Show Document
  • View in full screen mode

This article describes the impact to process executions when an Atom performs a graceful shutdown and provides configuration recommendations to minimize disruption.

 

 

Overview

Stopping and restarting your Atom, Molecule, or Atom Cloud runtime is a common task, however it's important to understand the implications for current and future integration process executions to avoid terminating executions unexpectedly. The behavior differs slightly depending on the type of runtime (e.g. single Atom vs. Molecule/Cloud cluster) and configuration options. In particular, the Force Restart After X Minutes property plays a critical role in determining the fate of in-progress executions. Recommendations for its use are listed below.

 

Restart/stop requests can be initiated:

  • Manually through the Atom Management UI following certain configuration changes
  • Manually via the Atom/node services scripts
  • Automatically upon receiving an Atom or certain connector version updates as part of the monthly releases

 

What Happens when the Atom is Pausing

As part of the restart/stop/upgrade procedure, the runtime first goes into a pausing state to avoid impacting in-progress executions.

 

When the runtime is in a pausing state:

  • The scheduler will not initiate new scheduled executions.
  • The shared web server (e.g. web services, AS2) and other listeners (e.g. JMS, Atom Queue, MLLP) will be stopped and will not receive new inbound requests. Incoming client requests will be denied.
  • The runtime's status in Atom Management will show as stopping/restarting.
  • A status update entry will be written to the container log. 

 

The runtime remains in the pausing state until all in-progress executions have completed OR the Force Restart After X Minutes limit has been reached, whichever occurs first.

 

This means the Force Restart After X Minutes property plays a critical role in the behavior of the runtime during pausing and restarts by controlling whether in-progress executions are aborted or not: 

  • When the property is not set (default), the runtime will wait indefinitely for all in-progress executions to complete. Depending on the duration of those executions, it could mean no new scheduled or listener processes will run for a long time.
  • Alternatively when the property is set, if a process is still running when the Force Restart limit has been reached, depending on the type of runtime and configuration, it may be aborted mid-execution. See the Pausing Behavior table below.

 

Rolling Restarts for Molecules and Atom Clouds

As of the August 2016 release, a rolling restart is performed for Molecule and Atom Cloud clusters. A rolling restart means a subset of the cluster nodes are paused and restarted sequentially while other nodes remain operational to avoid downtime.

 

There are several properties that govern the behavior and timing of the rolling restart. See Rolling restart of Molecules and Atom Clouds for complete details.

 

Note it is possible to gracefully pause and stop individual Molecule/Cloud nodes to perform server maintenance, for example, by manually updating the Atom Status JMX properties (see Shutting down a Molecule node gracefully). However this cannot be done for the purpose of updating the version.

 

Additional Considerations for Atom Clouds and Molecules with Forked Executions

If you have an Atom Cloud or Molecule with forked executions enabled there are several additional concepts and considerations to understand for proper configuration. This also applies to the hosted Dell Boomi Atom Clouds.

 

As a reminder, when forked executions are enabled, each process execution runs in a temporary JVM separate from the persistent node JVM. See Forked execution for Molecules and Clouds for more information. Note the Force Restart property applies to the node JVMs, not the forked execution JVMs.

 

In general the separation of JVMs means the individual forked process executions can continue to run while the node JVM restarts. For example, long-running scheduled or manual executions can continue to run while the node JVM restarts.

 

 

However some listener executions running in forked execution JVMs (this includes synchronous listener processes running in an Atom Worker JVM) are "tied" to the node JVM. For example:

  • A synchronous web service listener that needs to return a response upon completion to the shared web server running in the node JVM
  • A transacted JMS/Atom Queue listener that needs to acknowledge the message upon completion to the transaction manager running in the node JVM

 

Because of this dependency, the node JVM cannot be restarted independently of the process execution without disrupting the execution. This needs to be considered when deciding a Force Restart limit. See the table below.

 

 Note: The restart timeout for Atom Worker JVMs is technically governed by the "Maximum Forked Execution Time in Cloud" property, not the Force Restart property. This means asynchronous listener executions (e.g. a web service without an Output Type/no response) can continue to execute while the node JVM restarts, and the Atom Worker JVM will restart after the execution has completed or the Maximum Forked Execution Time has been reached, whichever occurs first.

 

Pausing Behavior and Force Restart Recommendations

Runtime

Pause and Restart Behavior

Force Restart Recommendations

Single Atom

  1. Atom receives restart request.

  2. Atom pauses itself and waits for all in-progress executions to complete or until the Force Restart limit is reached (if set).

  3. Atom restarts.

Because individual process executions and the node itself run in the same JVM, they cannot restart independently. This means either some in-progress executions will be aborted OR the runtime may not execute new processes for a long time.


If in-progress integrations are critical, consider establishing maintenance periods during which no processes are scheduled to perform the restarts and upgrades (using Release Control). Otherwise analyze the average expected execution duration and set the property accordingly. This could be minutes or even hours depending on your processing requirements.

 

Also consider using a Molecule instead for more fault-tolerant execution of critical processes.

Molecule (without forked executions)

  1. Head node receives restart request.

  2. The head node notifies a subset of the other nodes to pause and restart themselves according to the rolling restart configuration. Other nodes remain fully operational.

  3. That node(s) pauses itself and waits for all in-progress executions to complete or until the Force Restart limit is reached (if set).

  4. Node(s) restarts.

  5. The head node notifies the next subset of nodes to pause and restart. When the head node needs to restart, it passes the headship to another node.

 

Similar to the single Atom above, individual process executions and the node itself run in the same JVM and they cannot restart independently. However because the Molecule cluster performs a rolling restart, some nodes will still be available to execute new processes while the other nodes are paused waiting for their executions to complete. Therefore you can afford to set a greater Force Restart limit to accommodate longer-running processes.

 

Again this value could be minutes or even hours depending on your processing requirements. It is still recommended 

Atom Cloud and Molecule with forked executions

Same as Molecule without forked execution but with the slight difference that the Force Restart can restart the node JVM independent of the individual process execution JVMs. In other words the node can restart without aborting in-progress executions.

The Force Restart property is especially critical for multi-tenant Atom Clouds to prevent a single tenant from delaying the restart of the entire cloud and thereby indirectly impacting other tenants. You should always set this property.


It might be tempting to always set this to 1 minute, however the synchronous/transacted listener executions need to be considered. Consequently the Force Restart limit should be set to accommodate the longest anticipated listener execution. This value can be coordinated with any response timeouts and execution duration limits you have already set.


In practice, listener executions tend to be relatively short-lived (on the order of seconds or minutes at the most), so a value of 1-3 minutes is generally appropriate. Also remember listeners in other nodes will still be fully operational as part of the rolling restart.

 

3 people found this helpful

Attachments

    Outcomes