We’ve got some routers which each have several hundred BGP sessions which I want to have checks for.
My first attempt at this was to write a script which looped round these, generated a check definition for each session and poked them in to the Sensu API.
That works ok (well, except using an interval of 60s for hundreds of checks wasn’t the best idea!), but I feel like there must be a better way.
That’s a lot of checks (which are almost the same) to manage & maintain, it makes things like ‘sensuctl check list’ very long and is a lot of checks/events happening - 10s per second just for a single entity.
Anyone have a similar scenario or have any advice?
I don’t believe I can use the token stuff as the tokens can only be entity attributes?
I’m thinking of having a single check with a script which loops round the sessions (I could have something regularly dump them out to json or yaml to make this quick/easy), perform the check on each.
It could then use the sensu API to add events for each item. The check itself could then return an aggregate (“there are 195 sessions up, 5 sessions down”) and alert on that.
That’s still a lot of events (and i’m not sure how the backend/api would cope with having a few hundred suddenly spat at it like that every few mins), but would at least only be a single check.
Is that sane? - or is there a better way?
I guess I could just create events for sessions which fail - but then the script would have to some how keep track and store that so it can do the resolution events.