Could you please help me to find the proper way to avoid “passing” messages being sent when the “failing” messages had been filtered by the content of the events?
Here are the details:
I have a
check-http.rb check but sometimes the DNS resolution on that agent does not work properly, and the check throws exception with the message: “Hostname not known”. This kind of error is irrelevant for me as long as it gets recovered quickly, so I’ve created a filter with the following expression:
event.check.output.indexOf("Hostname not known") !== -1 ? event.check.occurrences == 3 : event.check.occurrences == 1
This part works fine, I don’t get alerts about DNS issues now. BUT I still receive the “passing” alerts when the DNS lookup works again. I would like to not receive these messages neither. As the event does not contain that output anymore when DNS works again, I can’t filter success events that way.
I could not find any guides about how to set this up properly. I have some ideas that might work, but could not find examples about how to implement them:
- Mark the event resolved somehow from the handler when it’s filtered by the above filter. The problem here is that I would still like to receive an alert when that content keeps coming.
- Create a silence on that check from the handler somehow. The silence would expire on resolve, or would be deleted after 2 failures. But I don’t know how to manage silences from handlers. Is it possible at all? Is there such a handler already?
- Tag the event somehow when its “failing” alert had been suppressed. Upcoming occurrences of the event could be filtered by that tag. Maybe stashes could be used for this purpose, but I’m not sure.