Linux server systemd service unit definition to address rolling restart failures and server reboots due to dependencies

Document created by walter_bissic603837 Employee on Mar 9, 2018Last modified by frank_wetzler970218 on Apr 11, 2018
Version 6Show Document
  • View in full screen mode

There are occasions when more dependencies are required for restarting a molecule, for example, after a server reboot or during rolling restarts.  Worker thread may persist after stopping an atom (or manually killing the atom service) as well as synchronization between the file share used by the atom/molecule after a reboot/restart. 

 

Issue

During the linux server restart for the atom service (that should automatically restart as well...) you may observe that the atom service fails but there is no apparent error for the failure.  You may review the /var/log/messages file and the journalctl traces to see if there are any diagnostic details available. If you find where the atom.service is starting almost immediately after the linux server restarts/reboots, then the atom.service must wait for dependencies like the network interfaces, file shares, etc. to become available.  

 

Common traces under a condition where the atom.service restarts too fast.  Note that the atom.serivce starts 3 seconds after the server was rebooted/restarting.

messages file: 
Aug 25 07:54:09 cip001 kernel: Linux version 4.4.74-92.35-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SM 
Aug 25 07:54:09 cip001 kernel: Command line: BOOT_IMAGE=/vmlinuz-4.4.74-92.35-default root=/dev/mapper/system-root resume=/ 

... 
journalctl trace ... 

Aug 25 07:54:12 cip001 systemd[1]: [/etc/systemd/system/atom.service:11] Unknown lvalue 'ExecStat 
Aug 25 07:54:12 cip001 systemd[1]: atom.service: Installed new job atom.service/start as 311 
Aug 25 07:54:22 cip001 systemd[1]: Starting Ctac Atom Cloud... 
Aug 25 07:54:22 cip001 systemd[1967]: atom.service: Executing: /opt/dellboomi/shared/Cloud/bin/at 
Aug 25 07:54:22 cip001 atom[1967]: Starting atom 
Aug 25 07:54:22 cip001 systemd[1]: Started Ctac Atom Cloud. 
Aug 25 07:59:21 cip001 systemd[1]: atom.service: Child 2122 belongs to atom.service 
Aug 25 07:59:21 cip001 systemd[1]: atom.service: Main process exited, code=exited, status=0/SUCCE 
Aug 25 07:59:21 cip001 systemd[1]: atom.service: About to execute: /opt/dellboomi/shared/Cloud/bi 
Aug 25 07:59:21 cip001 systemd[1]: atom.service: Forked /opt/dellboomi/shared/Cloud/bin/atom as 9

 

Errors that you may see during a failed restart of the atom.service after a software release or rolling restart:

LOG SNIPPET STARTS HERE SHOWING FAILED RESTART AFTER BOOMI ATOM SOFTWARE RELEASE: 

- (Note last log entry at 11:20:53 PM is 7 hours after Atom restart was initiated at 4:13:05 AM, atom had to be manually restarted after dependencies were started)

 

Apr 4, 2018 4:12:58 AM UTC INFO [com.boomi.container.plugin.BaseFeatureManager setBaseDownloadUrl] Setting the base download url to https://software.cdn.boomi.com 
Apr 4, 2018 4:12:59 AM UTC INFO [com.boomi.container.core.FeatureManagerImpl downloadAndInstallContainerUpdate] Atom is installing pending updates [Atom feature 'Core' (CORE)] from /usr/local/Boomi_AtomSphere/Atom/Atom_forge_cloud_prod/tmp/updates/install-51300_1316393276603554106.tmp. 
Apr 4, 2018 4:12:59 AM UTC INFO [com.boomi.container.core.FeatureManagerImpl restartContainerImpl] Updates completed, restarting Atom. 
Apr 4, 2018 4:12:59 AM UTC INFO [com.boomi.container.core.BaseContainer restart] Atom restart requested in 5000 milliseconds: Atom is restarting in order to apply updates 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.BaseContainer pause] Atom pause (for stop) requested. Current container status was RUNNING 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.config.ContainerConfig setStatus] Container status changed from RUNNING to PAUSING_FOR_STOP: Atom is restarting in order to apply updates 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.BaseContainer prepareForPausingForStop] Atom is stopping listeners before pausing execution. 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.AccountManager updateStatus] Updating account manager listener status from STARTED to STOPPED 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.AccountManager updateStatus] Account manager listener status is now STOPPED 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.BaseContainer pause] Atom is checking for running processes before pausing execution. 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.config.ContainerConfig setStatus] Container status changed from PAUSING_FOR_STOP to PAUSED_FOR_STOP 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.core.BaseContainer stopImpl] Atom stop requested. Current status is PAUSED_FOR_STOP 
Apr 4, 2018 4:13:04 AM UTC INFO [com.boomi.container.config.ContainerConfig setStatus] Container status changed from PAUSED_FOR_STOP to STOPPING 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.core.MessagePollerThread run] Message polling thread shutting down, stopping executors. 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.core.AccountManager updateStatus] Updating account manager status from STARTED to STOPPED 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.core.AccountManager updateStatus] Account manager status is now STOPPED 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.config.ContainerConfig setStatus] Container status changed from STOPPING to STOPPED: Atom is restarting in order to apply updates 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.core.StatusReporter stop] Stopping Status Reporter 
Apr 4, 2018 4:13:05 AM UTC INFO [com.boomi.container.core.BaseContainer restartContainerProcess] Atom restart initiated. 
Apr 4, 2018 11:20:53 PM UTC INFO [com.boomi.util.management.ServiceRegistry register] ServiceRegistry[Container[05e93beb-c6d1-44ec-be2a-fd0aed5868de]] registered ContainerController to com.boomi.container.core.Container@1283bb96

 

Cause

 

Dependencies for the atom.service must be available before the service can be started.  You must also terminate child or worker threads before successfully restarting the atom.service.  

 

Solution

 

The systemd (atom) service unit may require additional parameters as included below to address dependencies and remaining java worker threads.

 

File location:

/etc/systemd/system/atom.service

 

#Automatically generated by systemd-sysv-generator

[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/opt/dellboomi/shared/Cloud/bin/atom
Description=LSB: Atom
After=local-fs.target network.target remote-fs.target nss-lookup.target ntpd.service
Conflicts=shutdown.target

[Service]
Type=forking
Restart=always
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/opt/dellboomi/shared/Cloud/bin/atom start
ExecStop=/opt/dellboomi/shared/Cloud/bin/atom stop
ExecReload=/opt/dellboomi/shared/Cloud/bin/atom restart
User=boomi
Group=boomi

[Install]
WantedBy=multi-user.target

 

Use the following steps to test this atom.service unit:

1)   Backup your current copy of the boomi.service file 
2)   Add/install the systemd unit atom.service file to your /etc/systemd/system directory 

# cd /etc/systemd/system 
# cp boomi.service boomi.service.old 
# cp /tmp/boomi.service boomi.service 
# systemctl reload-or-restart boomi.service 
# systemctl start boomi.service 
# systemctl status boomi.service 
# ps -elf | grep java

 

 

 

References include:
systemd.service 

10.6. Creating and Modifying systemd Unit Files - Red Hat Customer Portal 

systemctl 

 

Attachments

    Outcomes