Sensu resolve email

So I’ve set up a monitoring system with sensu that emails me about certain services on my vms being in critical condition. However, I’ve configured the emails to only be sent if the vms are in critical condition for over an hour by using filters. Now, I would like to send an email when the VMs are no longer in critical condition.

For example, if the memory usage was over 90% for two hours, I would get one email per hour, resulting in two emails. Then, say shortly after that, the memory usage went below 90%, so the handler switched back to state 0. In this case, I would like to get an email saying that the memory is below 90% (no longer in critical condition).

I’ve looked at the “resolve” documentation and it seems like this is what I would want to use. Here’s where I get stuck: How can I get the resolve to only email me if the critical condition lasted for more than an hour? So, if we took the previous example, but changed it so that the memory usage was over 90% for only half an hour, in this case I would receive 0 emails because it was not in critical condition for more than an hour. However, it still switched from critical condition to normal condition, so theoretically I would still get a resolve email even though I don’t want one. How can I only get the resolve email if the critical condition lasted for at least an hour, or for say, at least 120 occurences (running the check once every 30 sec, 30*120 = 1 hour).

Thanks!!

Here’s an example of one my handlers:

{

“handlers”: {

“memory-email”: {

“type”: “pipe”,

“filter”: “20_sec_recurrences”,

“severities”: [

“critical”

],

“mutator”: “mutated”,

“command”: “mailx -s ‘The memory usage is more than 90%’ dmak1112@bu.edu”

}

}

}

···

On Tuesday, June 14, 2016 at 2:04:00 PM UTC-4, Dimitri Makrigiorgos wrote:

So I’ve set up a monitoring system with sensu that emails me about certain services on my vms being in critical condition. However, I’ve configured the emails to only be sent if the vms are in critical condition for over an hour by using filters. Now, I would like to send an email when the VMs are no longer in critical condition.

For example, if the memory usage was over 90% for two hours, I would get one email per hour, resulting in two emails. Then, say shortly after that, the memory usage went below 90%, so the handler switched back to state 0. In this case, I would like to get an email saying that the memory is below 90% (no longer in critical condition).

I’ve looked at the “resolve” documentation and it seems like this is what I would want to use. Here’s where I get stuck: How can I get the resolve to only email me if the critical condition lasted for more than an hour? So, if we took the previous example, but changed it so that the memory usage was over 90% for only half an hour, in this case I would receive 0 emails because it was not in critical condition for more than an hour. However, it still switched from critical condition to normal condition, so theoretically I would still get a resolve email even though I don’t want one. How can I only get the resolve email if the critical condition lasted for at least an hour, or for say, at least 120 occurences (running the check once every 30 sec, 30*120 = 1 hour).

Thanks!!

If you have a look at my email here I give details about using the sensu-plugins-mailer handler which allows you to specify the time between emails and also sends an email as soon as the alert clears. It does what you want, and is probably easier than trying to roll your own. It also allows you to specify templates for the emails.

Cheers,

Joel

···

On Tuesday, 14 June 2016 19:04:00 UTC+1, Dimitri Makrigiorgos wrote:

So I’ve set up a monitoring system with sensu that emails me about certain services on my vms being in critical condition. However, I’ve configured the emails to only be sent if the vms are in critical condition for over an hour by using filters. Now, I would like to send an email when the VMs are no longer in critical condition.

For example, if the memory usage was over 90% for two hours, I would get one email per hour, resulting in two emails. Then, say shortly after that, the memory usage went below 90%, so the handler switched back to state 0. In this case, I would like to get an email saying that the memory is below 90% (no longer in critical condition).

I’ve looked at the “resolve” documentation and it seems like this is what I would want to use. Here’s where I get stuck: How can I get the resolve to only email me if the critical condition lasted for more than an hour? So, if we took the previous example, but changed it so that the memory usage was over 90% for only half an hour, in this case I would receive 0 emails because it was not in critical condition for more than an hour. However, it still switched from critical condition to normal condition, so theoretically I would still get a resolve email even though I don’t want one. How can I only get the resolve email if the critical condition lasted for at least an hour, or for say, at least 120 occurences (running the check once every 30 sec, 30*120 = 1 hour).

Thanks!!

Have you looked at the occurrences option on the check? This would allow you to specify that the alert status only happens if there have been x occurrences, eg 120 in your case. As the alert won’t have gone critical, it won’t send a resolve message, unless there have been at least 120 occurrences to trigger the alert.

···

On Wednesday, 15 June 2016 09:17:37 UTC+1, joel....@hscic.gov.uk wrote:

If you have a look at my email here I give details about using the sensu-plugins-mailer handler which allows you to specify the time between emails and also sends an email as soon as the alert clears. It does what you want, and is probably easier than trying to roll your own. It also allows you to specify templates for the emails.

Cheers,

Joel

On Tuesday, 14 June 2016 19:04:00 UTC+1, Dimitri Makrigiorgos wrote:

So I’ve set up a monitoring system with sensu that emails me about certain services on my vms being in critical condition. However, I’ve configured the emails to only be sent if the vms are in critical condition for over an hour by using filters. Now, I would like to send an email when the VMs are no longer in critical condition.

For example, if the memory usage was over 90% for two hours, I would get one email per hour, resulting in two emails. Then, say shortly after that, the memory usage went below 90%, so the handler switched back to state 0. In this case, I would like to get an email saying that the memory is below 90% (no longer in critical condition).

I’ve looked at the “resolve” documentation and it seems like this is what I would want to use. Here’s where I get stuck: How can I get the resolve to only email me if the critical condition lasted for more than an hour? So, if we took the previous example, but changed it so that the memory usage was over 90% for only half an hour, in this case I would receive 0 emails because it was not in critical condition for more than an hour. However, it still switched from critical condition to normal condition, so theoretically I would still get a resolve email even though I don’t want one. How can I only get the resolve email if the critical condition lasted for at least an hour, or for say, at least 120 occurences (running the check once every 30 sec, 30*120 = 1 hour).

Thanks!!