Logs with docker and logstash
Posted on January 12, 2017 (Last modified on July 11, 2024) • 7 min read • 1,435 wordsIt would be nice to have all container logs from a docker cluster sent to … let’s say, an ELK stack. Right?
Right.
So we did:
Here’s how.
Logstash sees a stream of lines. When the data is sent to it, logstash gets a lot of different data types, depending what software runs in the containers. So to filter out a container / service / instance in Kibana, we need at least the container image name to say “give me the logs of this service type”, and then we don’t even separate by different stages.
GELF gives us this information out-of-the-box:
{
"source_host" => "10...",
"level" => 6,
"created" => "2017-01-09T13:41:06.039364628Z",
"message" => "This is da log!",
"version" => "1.1",
"command" => "./run.sh",
"tags" => [],
"image_name" => "hub/repo/image",
"@timestamp" => 2017-01-12T11:56:34.395Z,
"container_name" => "ze-container",
"host" => "ip-10-9-8-7",
"@version" => "1",
"tag" => "",
"image_id" => "sha256:...",
"container_id" => "..."
}
This is a lot of information what is useful later.
Compared to that, syslog looks rather bleak (all according to the examples on the docker pages):
Aug 7 18:33:19 HOSTNAME docker/hello-world/foobar/5790672ab6a0[9103]: Hello from Docker.
Yes, you can tag that, but you also have to. Not with GELF. Also it’s already beautifully structured in fields, whereas with syslog you’d probably have to agree on a scheme and deconstruct later.
Over to fluentd:
2015-09-01 15:10:40 -0600 docker.3fd8678d487e: \
{"source":"stdout","log":"Hello Fluentd!",\
"container_id":"3fd8678d487e540c7a303e1613101e746c5012f3317434eda93f24351c1928f7",\
"container_name":"/angry_kalam"}
This is … not so helpful. Especially the container name and the container ID are super useless, because there’s no way I can get the image name information out of it. You might be able to tag accordingly, but the same now applies as with syslog (except here we already have a document structure it seems). I would have liked to use fluentd, because there’s a low-resource agent on each host, and we don’t need a central collector (i.e. logstash) for lossy UDP traffic. But that made me choose GELF for the first try.
Cloudwatch now only logs into buckets, which is a big bunch of text. If you want to separate you have to create a cloudwatch bucket for everything, so pretty much one for every dimension (host, container name, …). Either I have not fully understood this, or this is really really useless.
(Disclaimer: I am fairly new to the whole ELK ecosystem, so this is all based in my potentially limited knowledge, and might be wrong in places)
We committed ourselves to just use JSON formatted log output for our log messages. Now, not every program is without flaws, so what we usually have is a couple of stack traces mixed with the output, which are also usually multi-line (read: multiline events in ELK speak).
So one of my biggest requirements was: To parse the JSON logs as JSON when I have one, and to re-assemble several distinct log messages which form a multi-line event into one.
The JSON-messages are not a problem. Check whether something starts and stops with a curly brace, and you’re good to go. And it’s always one single line anyway.
The multiline messages though … . From logstash perspective, there’s just a stream of lines. For each line the information of the container is present, but for logstash that’s just dumb metadata without any meaning. So imagine this sequence of events:
EVENT 0: CONTAINER_1_HOST_1 JSON-MESSAGE-LINE
EVENT 1: CONTAINER_2_HOST_2 JSON-MESSAGE-LINE
EVENT 2: CONTAINER_3_HOST_1 MULTILINE LOG - LINE 0
EVENT 3: CONTAINER_4_HOST_3 JSON-MESSAGE-LINE
EVENT 4: CONTAINER_1_HOST_1 MULTILINE LOG - LINE 0
If I naively use the multiline codec like this:
input {
stdin {
codec => multiline {
pattern => "^\s"
what => "previous"
}
}
}
I guess that would not work. Why? Because either Event 4 gets appended to the line of Event 3, which is wrong, or (I have no clue how the codecs work) to Event 2, which is equally wrong (different container source).
Now, the solution to use the GELF input with a multiline codec does not seem to work, because there is a bug in logstash which prevents usage of codecs in the GELF input. Luckily there’s a workaround using the multiline input filter of logstash, outside of the plugin. Fortunately there’s an input plugin we can use, called multiline (yes, just like the filter plugin), which seems to be working. Let’s try this. Unfortunately this plugin is no longer bundled with Logstash 5.x. (Sucks? Yes.) It was “ replaced” in favor of the multiline codec, which reads like “deprecated” to me, although I get the motivation. But it makes me feel weird using it. (Sucks even more? Yes.)
To add insult to injury I deploy logstash using Docker, and the Docker image now does not have this plugin - so I have to build my own logstash image. I don’t like to repeat myself, so I built a logstash image which can dynamically install plugins. Adds ages to startup time, but if I ever have another plugin I need I don’t need to rebuild the image, just restart.
Our test-config for now, which seems to be working well so far (couple of test runs only), is:
input { gelf {} }
filter {
# if we have json, let's use it.
if [message] =~ /^{.*}$/ {
json {
source => 'message'
target => 'data'
}
mutate {
remove_field => 'message'
}
}
multiline {
pattern => '^\s'
what => 'previous'
stream_identity => "%{container_id}"
}
# let's clone and add the token field. the ORIGINAL input goes to S3,
# the CLONED input with the token field goes to logz.io
clone {
clones => [ "one" ]
add_field => { "token" => "asdf" }
}
}
output { # ... }
That works somewhat well. All in all, I am pretty pleased after the whole thing, because once you have overcome the obstacles of the logstash config language, plugins, docker configuration, etc., it is running pretty smoothly.
Now, I still don’t like this solution for the following reasons:
I hope I am doing a couple of things “wrong”, because as long as it runs it runs fine, but changes are not so agile in this description. I still might check out fluentd in more detail, because if that workflow is needed fluentd might be a bit more flexible.
But all in all a really great first start.