Windows server atom service fails to restart  ...

Document created by walter_bissic603837 Employee on Mar 9, 2018
Version 1Show Document
  • View in full screen mode

There has been cases when after the new release of atom software issues a restart of the atom service on Windows, the services stays down and fails to restart.  The atom is restarted manually and no further action is required.  The atom could be waiting for executing processes to complete (which is already addressed by Atom is not restarting after installing updates AND My atom did not restart after a release and was waiting for processes to complete ) but often there is no logs or errors that show what cause the restart to fail.

 

Issue

When either a software release occurs or the Restart Atom from Atom Management causes the atom to restart in Windows, the service can stay down/inactive.  The restart.log shows a series of usually five (5) retry attempts to restart the service and then it fails further attempts.  Only a manual intervention will restart the Windows service after the restart attempts fail.

 

 

In the restart.log you see the retry count reaches 5 and then stops attempting the restart.  The first attempt prints the date/time. 
Note:  This log shows Central European Time. 

02/04/2018 - 1:02:10,29
Executing stop Atom 
Stopping service 'Atom'.
Error while stopping service
Not running.
Auto-start.
Atom status is 3 (1)
Executing start Atom  
Service is already running.
Not running.
Auto-start.
Atom status is 3 (1)
Service is already running.
Not running.
Auto-start.
Atom  status is 3 (2)
Service is already running.
Not running.
Auto-start.
Atom status is 3 (3)
Service is already running.
Not running.
Auto-start.
Atom status is 3 (4)
Service is already running.
Not running.
Auto-start.
Atom status is 3 (5)
Service is already running.

 

Cause

In the container.log, the atom is stopped in this case for the new software release..  The logs show the last entry made before stopping the atom at 01 h 02 CEST.  

 

2 avr. 2018 01 h 02 CEST INFO [com.boomi.container.core.AccountManager updateStatus] Account manager status is now STOPPED
2 avr. 2018 01 h 02 CEST INFO [com.boomi.container.config.ContainerConfig setStatus] Container status changed from STOPPING to STOPPED: Atom is restarting in order to apply updates
2 avr. 2018 01 h 02 CEST INFO [com.boomi.container.core.StatusReporter stop] Stopping Status Reporter
2 avr. 2018 01 h 02 CEST INFO [com.boomi.container.core.BaseContainer restartContainerProcess] Atom restart initiated.
2 avr. 2018 11 h 54 CEST INFO [com.boomi.util.management.ServiceRegistry register] ServiceRegistry[Container[fbcb0f18-c528-4e11-98ab-99c0ff6ea569]] registered ContainerController to com.boomi.container.core.Container@26edda40
2 avr. 2018 11 h 54 CEST INFO [com.boomi.container.core.BaseContainer start] Initializing Integration Process Container build 51300, vmId 4312@DCVSVF120, dir E:\Boomi AtomSphere\Atom 
2 avr. 2018 11 h 54 CEST INFO [com.boomi.container.core.StatusReporter start] Starting Status Reporter

 

In the restart.log I see that the restart is attempted five (5) times and then it stops attempting the restart.  The Windows EventSynapse, shows Event Properties - Event 7036, Service Control Manager, reported "The Atom service entered the stopped state."  However, per the EventSynapse, the Atom service was not actually stopped until TimeCreated SystemTime 2018-04-01T23:04:46.018071200Z , which is UTC time of 11:04:46 PM, which is 01:04:46 AM CEST.

 

EventSynapse XML output:

- <System>
  <Provider Name="Service Control Manager" Guid="{YYYYYY-1234-5678-9abc-XXXXXXXXXX}" EventSourceName="Service Control Manager" />
  <EventID Qualifiers="16384">7036</EventID>
  <Version>0</Version>
  <Level>4</Level>
  <Task>0</Task>
  <Opcode>0</Opcode>
  <Keywords>0x8080000000000000</Keywords>
  <TimeCreated SystemTime="2018-04-01T23:04:46.018071200Z" />
  <EventRecordID>102534</EventRecordID>
  <Correlation />
  <Execution ProcessID="688" ThreadID="3172" />
  <Channel>System</Channel>
  <Computer>Computer_Name</Computer>
  <Security />
  </System>
- <EventData>
  <Data Name="param1">Atom</Data>
  <Data Name="param2">stopped</Data>
  <Binary>410074006F006D0020002D002000530079006E0061007000730065002F0031000000</Binary>
  </EventData>
  </Event>

 

Note that this is 2 minutes past the time the Atom was reported as stopped in the container.log at 01 h 02 CEST timestamp below.  Therefore, the conclusion is that if the Atom service is not completely stopped by the time the restart.bat (in this case for Windows) reaches five (5) attempts, the Atom service remains stopped.  

 

Solution

Even though there are published guide to help with restart failures, in this particular case (without adding a wait time in between re-tries to start the atom...) they are not completely effective.  But the Windows Service has a Recovery Tab from the Atom service Properties (Right Click):

 

Review the current Atom service Properties and its Recovery options.

Modify the Recovery option as follows:

First failure: Restart the Service
Second failure: Restart the Service
Subsequent failures: None (Or continue to Restart the Service. Please tailor to your test Restart criteria...)

Reset fail count after 1 days
Restart service after: 5 minutes (Tailor/Adjust to your Restart criteria)

 

References/Related Links

Improve Atom restart Atom offline for longer period after the server got updated or restarted

Atom offline for longer period after the server got updated or restarted 

1 person found this helpful

Attachments

    Outcomes