Alerting on errors stored into Elasticsearch

Hi,

We have a straightforward Logstash->Elasticsearch stack, getting logs data from various applications and storing everything into well, Elasticsearch.
We also use Kibana to search for the relevant informations

I would like to be able to trigger alerts on some specific logs which can be searched at the moment with Kibana running on top of Elasticsearch.

I was thinking I could leverage Sensu to do this, but I’m not sure how and if it’s a good idea actually.
A naive approach would be to query Elasticsearch to get the relevant data for the past X minutes but something better would be to find a way to do the queries and get the results which have been produced since the last time the check ran. How would I store this information then?

Is there anybody doing something like this with Sensu?

Jonathan

Yes. At Yelp we built this to do what you are describing:

One of the important things is that the "sensu" part is that it just
emits events to the localhost:3030 socket. This is (personally) my
favorite kind of sensu "integration", where really it is just a
mechanism to push events, and that is the extent of it.

Another thing that is important to us is being able to declaratively
control the check definitions. (contrast to other solutions that are
point/click on a web interface)

One significant downside to elastalert is that it doesn't support "resolves" :frowning:

···

On Sat, Apr 2, 2016 at 2:57 AM, Jonathan Ballet <jon@multani.info> wrote:

Hi,

We have a straightforward Logstash->Elasticsearch stack, getting logs data
from various applications and storing everything into well, Elasticsearch.
We also use Kibana to search for the relevant informations

I would like to be able to trigger alerts on some specific logs which can be
searched at the moment with Kibana running on top of Elasticsearch.

I was thinking I could leverage Sensu to do this, but I'm not sure how and
if it's a good idea actually.
A naive approach would be to query Elasticsearch to get the relevant data
for the past X minutes but something better would be to find a way to do the
queries and get the results which have been produced since the last time the
check ran. How would I store this information then?

Is there anybody doing something like this with Sensu?

Jonathan

Hey,

Slightly off-topic but maybe relevant to your ‘resolves’ issue?

We send non-0 application events through the sensu socket, unfortunately for whatever reason we can’t send status-0 to trigger resolves.

Our solution involves the following:

  • we rely on applications in a bad state to send non-0 events every X seconds

  • we rely on the interval between events for an application to be Y seconds

  • based on the above, we can calculate a reasonable TTL

  • we can safely assume that when a TTL-timeout event for an application is detected, the application is no-longer in a bad state (since it’s stopped sending non-0 events).

Here’s the handler we use that essentially resolves events (through sensu-api) that are no-longer considered active: https://gist.github.com/roobert/2cd85ce2bbbeaad1748c7149ba1fd2a1

I’ve only had a brief look at elastalert but perhaps you could do something similar with a low realert time and this handler to resolve events automatically…?

Cheers,

Rob

···

On 3 April 2016 at 00:17, Kyle Anderson kyle@xkyle.com wrote:

Yes. At Yelp we built this to do what you are describing:

https://github.com/Yelp/elastalert

One of the important things is that the “sensu” part is that it just

emits events to the localhost:3030 socket. This is (personally) my

favorite kind of sensu “integration”, where really it is just a

mechanism to push events, and that is the extent of it.

Another thing that is important to us is being able to declaratively

control the check definitions. (contrast to other solutions that are

point/click on a web interface)

One significant downside to elastalert is that it doesn’t support “resolves” :frowning:

On Sat, Apr 2, 2016 at 2:57 AM, Jonathan Ballet jon@multani.info wrote:

Hi,

We have a straightforward Logstash->Elasticsearch stack, getting logs data

from various applications and storing everything into well, Elasticsearch.

We also use Kibana to search for the relevant informations

I would like to be able to trigger alerts on some specific logs which can be

searched at the moment with Kibana running on top of Elasticsearch.

I was thinking I could leverage Sensu to do this, but I’m not sure how and

if it’s a good idea actually.

A naive approach would be to query Elasticsearch to get the relevant data

for the past X minutes but something better would be to find a way to do the

queries and get the results which have been produced since the last time the

check ran. How would I store this information then?

Is there anybody doing something like this with Sensu?

Jonathan

Hmmmm. This is interesting approach.

···

On Fri, Apr 8, 2016 at 6:33 AM, Rob <roobert@gmail.com> wrote:

Hey,

Slightly off-topic but maybe relevant to your 'resolves' issue?

We send non-0 application events through the sensu socket, unfortunately for
whatever reason we can't send status-0 to trigger resolves.

Our solution involves the following:

* we rely on applications in a bad state to send non-0 events every X
seconds
* we rely on the interval between events for an application to be Y seconds
* based on the above, we can calculate a reasonable TTL
* we can safely assume that when a TTL-timeout event for an application is
detected, the application is no-longer in a bad state (since it's stopped
sending non-0 events).

Here's the handler we use that essentially resolves events (through
sensu-api) that are no-longer considered active:
https://gist.github.com/roobert/2cd85ce2bbbeaad1748c7149ba1fd2a1

I've only had a brief look at elastalert but perhaps you could do something
similar with a low realert time and this handler to resolve events
automatically..?

Cheers,

Rob

On 3 April 2016 at 00:17, Kyle Anderson <kyle@xkyle.com> wrote:

Yes. At Yelp we built this to do what you are describing:
https://github.com/Yelp/elastalert

One of the important things is that the "sensu" part is that it just
emits events to the localhost:3030 socket. This is (personally) my
favorite kind of sensu "integration", where really it is just a
mechanism to push events, and that is the extent of it.

Another thing that is important to us is being able to declaratively
control the check definitions. (contrast to other solutions that are
point/click on a web interface)

One significant downside to elastalert is that it doesn't support
"resolves" :frowning:

On Sat, Apr 2, 2016 at 2:57 AM, Jonathan Ballet <jon@multani.info> wrote:
> Hi,
>
> We have a straightforward Logstash->Elasticsearch stack, getting logs
> data
> from various applications and storing everything into well,
> Elasticsearch.
> We also use Kibana to search for the relevant informations
>
> I would like to be able to trigger alerts on some specific logs which
> can be
> searched at the moment with Kibana running on top of Elasticsearch.
>
> I was thinking I could leverage Sensu to do this, but I'm not sure how
> and
> if it's a good idea actually.
> A naive approach would be to query Elasticsearch to get the relevant
> data
> for the past X minutes but something better would be to find a way to do
> the
> queries and get the results which have been produced since the last time
> the
> check ran. How would I store this information then?
>
> Is there anybody doing something like this with Sensu?
>
> Jonathan

Proposed feature: https://github.com/sensu/sensu/issues/1228

···

On 8 April 2016 at 15:58, Kyle Anderson kyle@xkyle.com wrote:

Hmmmm. This is interesting approach.

On Fri, Apr 8, 2016 at 6:33 AM, Rob roobert@gmail.com wrote:

Hey,

Slightly off-topic but maybe relevant to your ‘resolves’ issue?

We send non-0 application events through the sensu socket, unfortunately for

whatever reason we can’t send status-0 to trigger resolves.

Our solution involves the following:

  • we rely on applications in a bad state to send non-0 events every X

seconds

  • we rely on the interval between events for an application to be Y seconds
  • based on the above, we can calculate a reasonable TTL
  • we can safely assume that when a TTL-timeout event for an application is

detected, the application is no-longer in a bad state (since it’s stopped

sending non-0 events).

Here’s the handler we use that essentially resolves events (through

sensu-api) that are no-longer considered active:

https://gist.github.com/roobert/2cd85ce2bbbeaad1748c7149ba1fd2a1

I’ve only had a brief look at elastalert but perhaps you could do something

similar with a low realert time and this handler to resolve events

automatically…?

Cheers,

Rob

On 3 April 2016 at 00:17, Kyle Anderson kyle@xkyle.com wrote:

Yes. At Yelp we built this to do what you are describing:

https://github.com/Yelp/elastalert

One of the important things is that the “sensu” part is that it just

emits events to the localhost:3030 socket. This is (personally) my

favorite kind of sensu “integration”, where really it is just a

mechanism to push events, and that is the extent of it.

Another thing that is important to us is being able to declaratively

control the check definitions. (contrast to other solutions that are

point/click on a web interface)

One significant downside to elastalert is that it doesn’t support

“resolves” :frowning:

On Sat, Apr 2, 2016 at 2:57 AM, Jonathan Ballet jon@multani.info wrote:

Hi,

We have a straightforward Logstash->Elasticsearch stack, getting logs

data

from various applications and storing everything into well,

Elasticsearch.

We also use Kibana to search for the relevant informations

I would like to be able to trigger alerts on some specific logs which

can be

searched at the moment with Kibana running on top of Elasticsearch.

I was thinking I could leverage Sensu to do this, but I’m not sure how

and

if it’s a good idea actually.

A naive approach would be to query Elasticsearch to get the relevant

data

for the past X minutes but something better would be to find a way to do

the

queries and get the results which have been produced since the last time

the

check ran. How would I store this information then?

Is there anybody doing something like this with Sensu?

Jonathan

Hey Kyle,

thanks a lot for mentioning Elastalert, it looks really cool :slight_smile:

I'm still trying to play with it and got interesting results, but "sadly" our applications don't log enough errors at the moment :stuck_out_tongue:

Also, for the client's localhost:3030 socket, thank you for reminding this, I often forget it's a way to interact with the system as a whole, we already use it for some kind of dead man's switch with a dedicated TTL but not for this kind of thing...

  Jonathan

···

Le 2016-04-03 01:17, Kyle Anderson a écrit :

Yes. At Yelp we built this to do what you are describing:
https://github.com/Yelp/elastalert

One of the important things is that the "sensu" part is that it just
emits events to the localhost:3030 socket. This is (personally) my
favorite kind of sensu "integration", where really it is just a
mechanism to push events, and that is the extent of it.

Another thing that is important to us is being able to declaratively
control the check definitions. (contrast to other solutions that are
point/click on a web interface)

One significant downside to elastalert is that it doesn't support "resolves" :frowning:

On Sat, Apr 2, 2016 at 2:57 AM, Jonathan Ballet <jon@multani.info> > wrote:

Hi,

We have a straightforward Logstash->Elasticsearch stack, getting logs data
from various applications and storing everything into well, Elasticsearch.
We also use Kibana to search for the relevant informations

I would like to be able to trigger alerts on some specific logs which can be
searched at the moment with Kibana running on top of Elasticsearch.

I was thinking I could leverage Sensu to do this, but I'm not sure how and
if it's a good idea actually.
A naive approach would be to query Elasticsearch to get the relevant data
for the past X minutes but something better would be to find a way to do the
queries and get the results which have been produced since the last time the
check ran. How would I store this information then?

Is there anybody doing something like this with Sensu?

Jonathan