Operational Monitoring Best Practices for Atom, Molecule, and Atom Cloud Runtimes

Document created by Adam Arrowsmith Employee on Nov 17, 2015Last modified by Adam Arrowsmith Employee on Mar 27, 2017
Version 11Show Document
  • View in full screen mode

This document provides an overview of the various techniques and best practices that should be used to monitor your integration runtime (e.g. Atom, Molecule, or Atom Cloud) to ensure availability and performance for your production environment. This is especially important for local runtimes that are installed and managed by your team.

 

 

Local Runtime Monitoring

To ensure processing and performance for critical production integrations, it is important to monitor the local Atom, Molecule, or Private Atom Cloud runtime for availability and overall system health. Monitoring the runtime and underlying infrastructure is best achieved using a combination of techniques that watch different aspects of the runtime environment and provide a level of redundancy for critical integration operations.

 

The recommendations below apply to Atoms, Molecules, and Private Atom Clouds runtimes, except where noted.

AspectMonitoring Techniques
Server Infrastructure
  • Monitor basic server and OS vitals for the infrastructure running the Atom. Key metrics include server availability, CPU usage, memory usage, hard disk usage, and disk I/O wait latency. These should be monitored for point-in-time anomalies and trending over time for capacity planning.
  • When starting out. monitor the servers closely while running “normal” integration loads to establish a baseline from which to drive threshold reporting.
General Atom Application Availability and Status
  • Monitor that the atom.exe/atom.sh OS process is running. For Molecules and Private Clouds, monitor the atom.exe/atom.sh OS process is running on each node in the cluster.
  • Subscribe to AtomSphere platform email alerts for ATOM.STATUS to be alerted of communication issues between the local Atom and the platform. Alternatively the AtomSphere platform API can be queried for the same events, either using an AtomSphere process running in the Atom Cloud or from a client outside of AtomSphere. See Understanding the Event Framework.
  • Use a log monitoring tool such as Splunk to monitor/tail the ../<atom install>/logs/container.log for SEVERE entries to identify problems with the Atom container outside the context of an integration process execution. This includes problems with the embedded web server and other listeners.
  • The runtime status can be viewed manually within AtomSphere in the Atom Management screen. Note this is the status as determined by the AtomSphere platform's communication with the local runtime.
Atom Application Internal Health
  • Use a JMX-compliant monitoring tool to connect into the Atom’s Java JVM and gain real time insight into various application metrics. This includes both generic Java metrics and Atom-specific metrics. These should be monitored for point-in-time anomalies and trending over time. More on monitoring your system with JMX.
  • Key metrics include:
    • JVM heap used
    • JVM thread count
    • OS open file count
    • OS system load average
    • Container status
    • Has the Atom entered into the "low memory" status
    • Number of times scheduler has missed the time at which it should run schedules
    • Running executions
  • When starting out, monitor the JMX metrics closely while running normal integration loads to establish a baseline from which to drive threshold reporting.
  • For Molecules and Atom Clouds, do not try to hook into and monitor the JVMs for forked executions and Atom Workers because they are transient and short lived. Instead monitor their impact indirectly based upon server utilization.
  • For more details about monitoring via JMX and a full list of metrics, see Using JMX to Monitor Your System.
  • Check out this tutorial on using Java VisualVM to monitor JMX: How to use JVisualVM to Monitor your integration runtime.
General Integration Execution Health
  • Use simple “heart beat” processes to verify integration processes are actually executing. See How to create a heartbeat process for Atom runtime monitoring.
    • To verify scheduled executions, create a process that updates a local file or updates a database or application field. Watch this file or value using an external script or monitoring tool to ensure it is updated regularly and if not raise an alert.
    • To verify web service executions, publish a “ping” or “echo” web service. This service should be regularly invoked from an external client or script. An alert should be raised if a failed response is received.
  • Additionally if desired to verify connectivity to various application endpoints, the heart beat processes can be designed to perform a “ping” against each application and report back problems in the response or file output. Note that the specific “ping” operation will vary by application type.
Cluster Status
(Molecules and Private Clouds only)
  • Monitor the JMX properties with each node's JVM:
    • ClusterProblem
  • (DEPRECATED) Monitor the views and container.log files for each node for intra-cluster communication problems. More on cluster monitoring.
  • The individual node status can be viewed manually within AtomSphere in the Atom Management screen.

 

Dell Boomi Atom Cloud Runtime Monitoring

As a hosted service, the infrastructure capacity, application health, and general system availability of the Dell Boomi Atom Cloud are monitored by the Dell Boomi cloud operations team. Individual customers should still implement some monitoring controls for account-specific configuration (e.g., web services authentication) and automated awareness of the Atom Cloud availability.

AspectMonitoring Technique
General Atom Cloud Availability and Status
  • Subscribe to AtomSphere platform email alerts for ATOM.STATUS.
  • Alternatively the AtomSphere platform API can be queried for the same events, using a client running outside of AtomSphere.
  • Check trust.boomi.com for Atom Cloud status.
General Integration Execution Health
  • Techniques are identical to those above for a local runtime with one exception. To verify scheduled executions, if a file is used it will need to be written to a publicly accessible data store such as an FTP site for example.
12 people found this helpful

Attachments

    Outcomes