David - you’re right. The remediation handler triggers additional ‘checks’ for the box via API calls. Those ‘checks’ are just Ruby files (or any other language you want) so they can do anything you like - they just need to report a check status back to sensu-client when they’re finished. You could certainly write a check that shells out to start a process, then reports OK if it succeeded, or CRITICAL if it failed to start. This same pattern could be used for automatic disk cleanup, terminating inactive connections after reaching a certain threshold, emptying a message queue, and so on.
Another way to handle this would be to write a custom check script that tries to directly resolve the issue before reporting results. For the process check example, write a check that does the following:
- Check to see if the process is running - if it is, return status OK.
- If the process is not running, send a shell command to start it; if it succeeds, return status WARNING.
- If the process could not be restarted, return CRITICAL.
Then set your occurrences for that check to 1, and tune your handlers to report differently for each state. For example, you could send an email notification for the WARNING state to let you know that a problem was automatically resolved; you could send a PagerDuty alert for the CRITICAL state telling you that the process is down and needs to be manually fixed.
On Tuesday, January 13, 2015 at 8:19:47 PM UTC-6, David Petzel wrote:
On Tuesday, January 13, 2015 at 8:58:05 PM UTC-5, Matt Jones wrote:
This will just run other checks, not perform system actions. The other checks are called using the sensu API. The OP, I believe, wants to restart a service or something to that effect. This would entail the handler kicking off a script to perform this action on the server that created the event.
Sorry if I’m being dense here, but I’m not groking the distinction, isn’t that remediation plugin, “performing system actions”, by invoking the unpublished checks? Aren’t those addition “_remediation” checks just system actions, and one of those could be to restart the process in question?.
Again sorry if this is a dumb question, if I’m way out in left field, I’ll start a fresh thread so as not to further derail this one.
This e-mail, including attachments, contains confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. The reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.