Does sensu handle logs?


#1

Hi,

can you handle log files with sensu?

Is it common to handle log files with sensu?

How do sensu users handle their log files?

I asked google before asking here, but could not find good results.

Most results where about handling the log files of sensu. But I want
to handle log files which get generated by the systems sensu monitors.

Regards,
Thomas Güttler


#2

I personally don't like to monitor log files with a monitoring system,
but I understand that sometimes that is the only way to do it.
Here are the (few) community logging plugins:
https://github.com/sensu/sensu-community-plugins/tree/master/plugins/logging

Remember Sensu just runs check commands. If you have a command that
can check whatever log you have, then sensu can run it. (even just
"grep")

I'm assuming this is what you mean by "handle"

···

On Fri, Mar 20, 2015 at 11:37 AM, Thomas Güttler <hv@tbz-pariv.de> wrote:

Hi,

can you handle log files with sensu?

Is it common to handle log files with sensu?

How do sensu users handle their log files?

I asked google before asking here, but could not find good results.

Most results where about handling the log files of sensu. But I want
to handle log files which get generated by the systems sensu monitors.

Regards,
  Thomas Güttler


#3

One of the better ways we’ve found to monitor log files (which I agree is something to be avoided when possible) is to use monit: http://mmonit.com/monit/documentation/monit.html#FILE-CONTENT-TESTING

Since we have all of our monit events routing through Sensu, the net result is that we get what I think you may be looking for.

···

Reinhardt Quelle

quelle@infrasystems.com


#4

Hi Reinhardt,

I understand if someone says “monitoring logs with sensu is not a good solution”.

But I don’t understand your first sentence: Do you avoid to check logs at all?

Regards,
Thomas Güttler

···

Am Samstag, 21. März 2015 17:39:53 UTC+1 schrieb Reinhardt Quelle:

One of the better ways we’ve found to monitor log files (which I agree is something to be avoided when possible) is to use monit: http://mmonit.com/monit/documentation/monit.html#FILE-CONTENT-TESTING

Since we have all of our monit events routing through Sensu, the net result is that we get what I think you may be looking for.


#5

We generally avoid alerting from log contents - we alert from direct checks (Sensu and Sensu-routed Monit checks, where the latter are used because monit is doing fast local checks and process control), from metrics thresholds, and from a variety of other checks both on and off-net (external checks of our service apis and “ping” apis that we require all internally built applications to have).

The I have a number of issues with using logs as a primary alert source:

  • log volume is very high - many 10s of GB/day for each environment. Parsing all of that text - even json-formatted events - is relatively costly.

  • logs are relatively “brittle” - application developers change log formats all the time, and its difficult to track changes resulting in false positives or even worse false negatives.

  • there are a lot of moving parts in the log infrastructure, and slowness or interruptions of logs would leave us blind to alerts. This is a simple economic trade-off; we could make logs more reliable, but there are costs to doing so, and we’d rather invest in application performance and resilience.

I prefer to have explicitly different “streams” for metrics, logs, alerts, and health checks. They can and do all feed each other, but each is distinct and each is optimized for their purpose rather than trying to be one-size fits all

Where some prefer a “single source of truth”, I prefer to have a “second opinion” - when an alert fires, we turn to our metrics and logs to understand what is happening. Indeed when our log volume drops unexpectedly, we alert. This is why we love tools like Sensu - its toolchain/pipeline approach makes it easy to integrate with our other systems.

Sometimes, we don’t have choice - if we’re using off-the-shelf software, we sometimes can’t find a better way to alert than to scrape logs. In that case, we do it a the source (in the “log shipper” in our Logstash logging infra).

In fact, the only thing I can think of in our current stack where we were doing that was fixed in a later release, and we re now alerting off of collectd stats wired into that component - 500 errors in http logs, in particular.

···


Reinhardt Quelle
quelle@infrasystems.com

On March 22, 2015 at 1:38:17 PM, Thomas Güttler (hv@tbz-pariv.de) wrote:

Am Samstag, 21. März 2015 17:39:53 UTC+1 schrieb Reinhardt Quelle:

One of the better ways we’ve found to monitor log files (which I agree is something to be avoided when possible) is to use monit: http://mmonit.com/monit/documentation/monit.html#FILE-CONTENT-TESTING

Since we have all of our monit events routing through Sensu, the net result is that we get what I think you may be looking for.

Hi Reinhardt,

I understand if someone says “monitoring logs with sensu is not a good solution”.

But I don’t understand your first sentence: Do you avoid to check logs at all?

Regards,

Thomas Güttler


#6

Dear Reinhardt Quelle,

thank you very much for sharing your knowledge.

Regards,
Thomas Güttler

···

Am Montag, 23. März 2015 00:43:14 UTC+1 schrieb Reinhardt Quelle:

We generally avoid alerting from log contents - we alert from direct checks (Sensu and Sensu-routed Monit checks, where the latter are used because monit is doing fast local checks and process control), from metrics thresholds, and from a variety of other checks both on and off-net (external checks of our service apis and “ping” apis that we require all internally built applications to have).

The I have a number of issues with using logs as a primary alert source:

  • log volume is very high - many 10s of GB/day for each environment. Parsing all of that text - even json-formatted events - is relatively costly.
  • logs are relatively “brittle” - application developers change log formats all the time, and its difficult to track changes resulting in false positives or even worse false negatives.
  • there are a lot of moving parts in the log infrastructure, and slowness or interruptions of logs would leave us blind to alerts. This is a simple economic trade-off; we could make logs more reliable, but there are costs to doing so, and we’d rather invest in application performance and resilience.

I prefer to have explicitly different “streams” for metrics, logs, alerts, and health checks. They can and do all feed each other, but each is distinct and each is optimized for their purpose rather than trying to be one-size fits all

Where some prefer a “single source of truth”, I prefer to have a “second opinion” - when an alert fires, we turn to our metrics and logs to understand what is happening. Indeed when our log volume drops unexpectedly, we alert. This is why we love tools like Sensu - its toolchain/pipeline approach makes it easy to integrate with our other systems.

Sometimes, we don’t have choice - if we’re using off-the-shelf software, we sometimes can’t find a better way to alert than to scrape logs. In that case, we do it a the source (in the “log shipper” in our Logstash logging infra).

In fact, the only thing I can think of in our current stack where we were doing that was fixed in a later release, and we re now alerting off of collectd stats wired into that component - 500 errors in http logs, in particular.


Reinhardt Quelle
que...@infrasystems.com

On March 22, 2015 at 1:38:17 PM, Thomas Güttler (h...@tbz-pariv.de) wrote:

Am Samstag, 21. März 2015 17:39:53 UTC+1 schrieb Reinhardt Quelle:

One of the better ways we’ve found to monitor log files (which I agree is something to be avoided when possible) is to use monit: http://mmonit.com/monit/documentation/monit.html#FILE-CONTENT-TESTING

Since we have all of our monit events routing through Sensu, the net result is that we get what I think you may be looking for.

Hi Reinhardt,

I understand if someone says “monitoring logs with sensu is not a good solution”.

But I don’t understand your first sentence: Do you avoid to check logs at all?

Regards,

Thomas Güttler