jplater

Centralizing Atom Logging with Elastic Stack

Blog Post created by jplater Employee on Jul 19, 2018

If you’ve ever had to support a production application, you know how important logs can be.  Like most applications, the Dell Boomi Atom generates log messages that are written to files on disk. When problems arise, these logs help you figure out what is going on.  Sometimes troubleshooting is as easy as opening the log file and reviewing the most recent messages.  However, sometimes it’s not. Sometimes you need to correlate the Dell Boomi Atom logs with logs from other applications (possibly on different servers) or observe patterns in logs over time.  Attempting to do this manually can be a daunting task. This is where a centralized logging solution can really help. By centralizing all logging, including the Dell Boomi Atom log, into a single application, you gain the ability to easily search across multiple logs quickly.

 

I recently spent some time experimenting with a popular open source logging platform called Elastic Stack.  In this blog post we will step through what I did to install the Elastic Stack and how I configured it to ingest container logs.  By the end of this blog, we will have the following setup:

Dell Boomi Atom + Elastic Stack

Prerequisites

In order to follow along with this blog, there are a few things you will need to set up first:

  • DockerTo simplify my setup, I chose to install all the Elastic Stack components with Docker.  If you don’t have Docker installed already, you can find information on how to install it at https://docs.docker.com/install/
  • Docker ComposeDocker Compose allows you to coordinate the install of multiple Docker containers.  For my setup, I wanted to install Elasticsearch, Kibana and Logstash all on the same server. Docker Compose made this easy.  You can install Docker Compose by following the steps outlined at https://docs.docker.com/compose/install/
  • AtomI will be demonstrating sending container logs from a single Atom that is setup to use UTC and the en_US locale (the importance of this will be explained later).  I recommend using a fresh Atom so that you don’t impact anything else. Instructions for installing a local Atom can be found under the Atom setup topic in our documentation.  While I am only demoing an Atom, the ideas I cover in this blog can be applied to Molecules and Clouds as well.
  • Configuration FilesAll the configuration files referenced in this blog (filebeat.yml, docker-compose.yml, etc) are available on Bitbucket.  

 

What is the Elastic Stack?

Elastic Stack (formerly known as ELK) "is the most popular open source logging platform."  It is made up of four main components that work together to ship, transform, index and visualize logs.  The four components are: 

 

 

 

A lightweight agent that ships data from the server where it is running to Logstash or Elasticsearch.

 

A data processing pipeline that ingests data from Beats (and other sources), transforms it and sends it along to Elasticsearch.

 

A distributed, RESTful search and analytics engine that centrally stores all your data.

 

A data visualization tool that allows you to slice and dice your Elasticsearch data (i.e. your log data).

 

All four components were designed with scale in mind. This means you can start out small, like we will in this blog, and scale them out later to meet the demands of your architecture.

 

What is the Container Log?

Now that we’ve gone over what the Elastic Stack is, let’s take a look at the log that we are going to process. If you aren’t already familiar with the container log, I encourage you to read Understanding the Container Log.  For our purpose, the most important thing to understand is the format of the log. This information will be needed when we configure the Filebeat prospector and the Logstash pipeline in the next sections. As the “Log Structure and Columns” section explains, each log message is composed of five fields:

<TIMESTAMP> <LOG LEVEL> [<JAVA CLASS> <JAVA METHOD>] <MESSAGE> 

The best way to understand the structure is to look at some example log messages. Here are two from my container log: 

May 31, 2018 2:15:22 AM UTC INFO [com.boomi.container.core.AccountManager updateStatus] Account manager status is now STARTED

 

May 31, 2018 2:25:08 AM UTC SEVERE [com.boomi.process.ProcessExecution handleProcessFailure] Unexpected error executing process: java.lang.RuntimeException: There was an error parsing the properties of the Decision task. Please review the task ensuring the proper fields are filled out.
java.lang.RuntimeException: There was an error parsing the properties of the Decision task. Please review the task ensuring the proper fields are filled out.
        at com.boomi.process.util.PropertyExtractor.resolveProfileParams(PropertyExtractor.java:355)
        at com.boomi.process.util.PropertyExtractor.initGetParams(PropertyExtractor.java:183)
        at com.boomi.process.shape.DecisionShape.execute(DecisionShape.java:101)
        at com.boomi.process.graph.ProcessShape.executeShape(ProcessShape.java:559)
        at com.boomi.process.graph.ProcessGraph.executeShape(ProcessGraph.java:489)
        at com.boomi.process.graph.ProcessGraph.executeNextShapes(ProcessGraph.java:573)
        at com.boomi.process.graph.ProcessGraph.execute(ProcessGraph.java:308)
        at com.boomi.process.ProcessExecution.call(ProcessExecution.java:812)
        at com.boomi.execution.ExecutionTask.call(ExecutionTask.java:935)
        at com.boomi.execution.ExecutionTask.call(ExecutionTask.java:61)
        at com.boomi.util.concurrent.CancellableFutureTask.run(CancellableFutureTask.java:160)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:748)

 

The first log message is a simple single-line log message. The second log message is an example of a multi-line message that includes a stack trace in the MESSAGE field. Another thing that might not be obvious just by looking at the log is that the TIMESTAMP and LOG LEVEL fields are dependent on your time zone and locale. This means that if you have multiple Dell Boomi Atoms running in different locations, you might need to have different Logstash and Filebeat configurations (or at least more complicated grok patterns than I show later). As mentioned earlier, my Atom was configured to log using UTC and en_US.

 

Setting up Elasticsearch, Kibana and Logstash via Docker

Before we can start sending logs via Filebeat, we need to have somewhere to send them to. This means installing Elasticsearch, Kibana and Logstash.  Since I was just exploring Elastic Stack, I decided to install all three products on the same physical server (but separate from the server where my Atom was running) using Docker Compose. Running them all on a single server wouldn't be a good idea in a production environment, but it allowed me to get up and running fast.

  1. Clone the elastic-stack-demo repository and checkout the 'part1' tag.

    $ git clone https://bitbucket.org/boomi-community/elastic-stack-demo.git

    Cloning into 'elastic-stack-demo'...

    $ cd elastic-stack-demo

    $ git checkout part1

  2. Start up the Elastic Stack using docker-compose.

    $ docker-compose up

    Creating elasticsearch ... done
    Creating kibana ... done
    Creating logstash ... done
    Attaching to elasticsearch, kibana, logstash
    elasticsearch | [2018-07-03T04:15:14,294][INFO ][o.e.n.Node ] [] initializing ...

  3. That's it. Once it all starts up, point your browser at http://<your_server>:5601 and see that it brings up Kibana. 

At this point, you have a Logstash pipeline running that is ready to consume and process container log messages.  To understand the pipeline a bit more, let’s take a look at the pipeline configuration file.

input {
     beats {
          port => 5044
     }
}


filter {
  if [sourcetype] == "container" {
    grok {
        match => { "message" => "(?<log_timestamp>%{MONTH} %{MONTHDAY}, %{YEAR} %{TIME} (?:AM|PM)) %{WORD} (?<log_level>(FINEST|FINER|FINE|INFO|WARNING|SEVERE))%{SPACE}\[%{JAVACLASS:class} %{WORD:method}\] %{GREEDYDATA:log_message}" }
    }
    date {
        match => [ "log_timestamp", "MMM dd, yyyy KK:mm:ss a" ]
        timezone => "UTC"
        remove_field => [ "log_timestamp" ]
    }
  }
}

output {
     elasticsearch {
          hosts => [ "elasticsearch:9200" ]
     }
}

The input and output stages of the pipeline are pretty standard.  The pipeline is configured to receive log messages from the Elastic Beats framework and ultimately send them to Elasticsearch.  The interesting part of the pipeline is the filter stage. The filter stage is using the grok filter to parse container log messages into fields so that the information can be easily queried from Kibana.  It is also using the date filter to parse the timestamp from the log message and use it as the logstash timestamp for the event. This way the timestamp in Kibana will be the timestamp of the log message, not the timestamp of when the message was processed by the pipeline.

 

As a reminder, log message content is dependent on the Atom time zone and locale so the grok and date filters shown in this configuration might need to be tweaked for your time zone and locale.

 

Setting up Filebeat

Now that the Logstash pipeline is up and running, we can set up Filebeat to send log messages to it.  Filebeat should be installed on the same server as your Atom. There are multiple ways to install Filebeat. I chose to install it using Docker.  My install steps below reference a few variables that you will need to replace with your information:

 

YOUR_ES_HOST - The hostname or IP address where you just installed Elasticsearch.

YOUR_ATOM_HOME - The directory where your Atom is installed.

YOUR_CONTAINER_NAME - The name you gave your Atom.  This will be queryable in Kibana. 

YOUR_CONTAINER_ID - The unique ID of your Atom (aka Atom ID).

YOUR_LOGSTASH_HOST - The hostname or IP address where you just installed Logstash (in this example, it is the same as YOUR_ES_HOST).

 

Once you've collected that information, you can install and configure Filebeat on the Atom sever by following these steps:

  1. Start your Atom if it isn't running already.

    <YOUR_ATOM_HOME>/bin/atom start

  2. Pull down the Filebeat Docker image.

    $ docker pull docker.elastic.co/beats/filebeat:6.2.4

  3. Manually load the Filebeat index template into Elasticsearch (as per the Filebeat documentation).

    $ docker run docker.elastic.co/beats/filebeat:6.2.4 setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["<YOUR_ES_HOST>:9200"]'

  4. Clone the elastic-stack-demo repository and checkout the 'part1' tag.

    $ git clone https://bitbucket.org/boomi-community/elastic-stack-demo.git

    Cloning into 'elastic-stack-demo'...

    $ cd elastic-stack-demo

    git checkout part1

  5. Update the ownership and permissions of the Filebeat configuration file (see Config file ownership and permissions for more information on why this is needed).
    $ chmod g-w filebeat.yml 
    $ sudo chown root filebeat.yml
  6. Start the Filebeat Docker container.

    $ docker run -d -v "$(pwd)"/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro -v <YOUR_ATOM_HOME>/logs:/var/log/boomi:ro -e CONTAINER_NAME='<YOUR_CONTAINER_NAME>' -e CONTAINER_ID='<YOUR_CONTAINER_ID>'  -e LOGSTASH_HOSTS='<YOUR_LOGSTASH_HOST>:5044' --name filebeat docker.elastic.co/beats/filebeat:6.2.4

Once Filebeat starts up, it will use the prospector defined in filebeat.yml to locate, process and ship container log messages to the Logstash pipeline we set up earlier.  Let's quickly review how the prospector is configured.

#=========================== Filebeat prospectors =============================
filebeat.prospectors:
- type: log
enabled: true
paths:
- /var/log/boomi/*.container.log
multiline.pattern: '^[A-Za-z]{3}\s[0-9]{1,2},\s[0-9]{4}'
multiline.negate: true
multiline.match: after
fields:
sourcetype: container
atomname: '${CONTAINER_NAME}'
containerid: '${CONTAINER_ID}'
fields_under_root: true

As you can see, the prospector is configured to:

  • Read all container log files that are present in the /var/log/boomi directory (which is mounted in the Filebeat container that points to your logs directory)
  • Handle multi-line messages so that log messages with stack traces are parsed correctly.  Note, you may need to adjust the multi-line pattern shown here if your Atom is using a different locale.
  • Add additional informational fields (sourcetype, atomname and containerid) to the output that is sent to Logstash.  These fields will end up as queryable fields in Kibana.   

 

Query the Container Logs

The last step you need to do before you can explore your log messages, is tell Kibana which index(es) you want to search.  This is done by creating an index pattern in Kibana.  

  1. Open up Kibana (http://<your_server>:5601).
  2. Click on Management.
  3. Click on Index Patterns.
  4. Enter ‘logstash-*’ as the Index pattern and click Next step.
  5. Select '@timestamp' as the Time Filter field name and click Create index pattern.
  6. Once created, you can explore the fields that are available in the new index.

 

It is finally time to test our setup end to end. Lets generate some log messages and search for them in Kibana.

  1. Stop your Atom. This will generate some log messages.

  2. On the Discover tab in Kibana, run a search for "Atom is stopping."
  3. Click on View single document to see all the fields that the Elastic Stack is tracking.

 

Isn't that much easier than searching the container log directly?  Searching for keywords is just the tip of the iceberg, I encourage you to explore the Kibana User Guide to see all the ways that Kibana can help you search, view and interact with the container log.

 

Are you using Elastic Stack (or another product) to centralize your Atom logs? If so, I'd love to hear about it.

 

Jeff Plater is a Software Architect at Dell Boomi.  He enjoys everything search, but not before his morning coffee.

Outcomes