Auto Correct event


#1

I just stumbled upon sensu yesterday, and started to play with it. I am trying to figure out the proper flow of how I could implement a “auto correct” attempt, before sending an alert.

For example, if cron is not running on a server I would want to attempt to start cron before raising the event as a critical My first thought would be to have that in the handlers. I would have a check run the first handler to attempt to start the process, then set a refresh of 60 seconds, the second handler would check if cron was still down and then send an email. I don’t think this will work since handlers only run on the server. Would a better approach to have the “auto correct” attempt built into the check script itself? If the script was able to auto correct the event, I would still want a handler to send an email to update me that it was able to fix it. Do handlers only trigger when an event is in “critical”?


#2

I would still use handlers to decide what needs to happen when a check fails. And not have the logic if “auto correction” in the check script itself. This makes the responsibilities of the handler and the checks very clear.

To implement auto correction I would use some kind of command and control tool like mcollective or Ansible. So in your case when cron process doesn’t run on an instance a check would fail and the handler would trigger an action on that instance where the check failed to restart the cron service via a command and control tool.

···


Diptanu

On Fri, Jun 14, 2013 at 6:28 PM, Quenten Griffith qgriffith@gmail.com wrote:

I just stumbled upon sensu yesterday, and started to play with it. I am trying to figure out the proper flow of how I could implement a “auto correct” attempt, before sending an alert.

For example, if cron is not running on a server I would want to attempt to start cron before raising the event as a critical My first thought would be to have that in the handlers. I would have a check run the first handler to attempt to start the process, then set a refresh of 60 seconds, the second handler would check if cron was still down and then send an email. I don’t think this will work since handlers only run on the server. Would a better approach to have the “auto correct” attempt built into the check script itself? If the script was able to auto correct the event, I would still want a handler to send an email to update me that it was able to fix it. Do handlers only trigger when an event is in “critical”?


#3

Thank you that does make sense. I did want to keep the logic in the handler but didn’t think about having the handler call something like Mcollective. Thank you for the help.

···

On Friday, June 14, 2013 10:31:26 AM UTC-4, Diptanu Choudhury wrote:

I would still use handlers to decide what needs to happen when a check fails. And not have the logic if “auto correction” in the check script itself. This makes the responsibilities of the handler and the checks very clear.

To implement auto correction I would use some kind of command and control tool like mcollective or Ansible. So in your case when cron process doesn’t run on an instance a check would fail and the handler would trigger an action on that instance where the check failed to restart the cron service via a command and control tool.


Diptanu

On Fri, Jun 14, 2013 at 6:28 PM, Quenten Griffith qgri...@gmail.com wrote:

I just stumbled upon sensu yesterday, and started to play with it. I am trying to figure out the proper flow of how I could implement a “auto correct” attempt, before sending an alert.

For example, if cron is not running on a server I would want to attempt to start cron before raising the event as a critical My first thought would be to have that in the handlers. I would have a check run the first handler to attempt to start the process, then set a refresh of 60 seconds, the second handler would check if cron was still down and then send an email. I don’t think this will work since handlers only run on the server. Would a better approach to have the “auto correct” attempt built into the check script itself? If the script was able to auto correct the event, I would still want a handler to send an email to update me that it was able to fix it. Do handlers only trigger when an event is in “critical”?


#4