I inherited a Sensu install (It is mainly feeding metrics into Graphite, but there are some other checks as well).
(Sensu was upgraded from a mix of 0.16 - 0.20 to 1.1.2 if that is relevant)
After a few days Redis ended up using huge amounts of memory, resulting in the kernel oom-killer being invoked.
I have since set maxmemory in redis to keep this from happening (at 6GiB), hoewver this still gets reached after a few days.
From redis-cli, it seems like the “events” list is the cause. (using “DEL events” results in redis’s memory use dropping to reasonable levels)
What could cause this? Is there some way to limit how many events is kept?
(It seems like some of the entries are the rather large contents of some of the metrics scripts… (Redis Desktop manager is slow on it))
Some info on the entries: It seems to include successful check results…
Currently, about 2 hours after a restart (After deleting the key from redis-cli), there are more than 6000 entries present.
Uchiwa show one active event between the 6 hosts. (That active event shows >2000 occurrences currently…) (Not sure how that happens with a check configured to run once a minute, assuming the occurrences was reset with the “DEL events”)
(The problematic check is /opt/sensu/embedded/bin/metrics-redis-llen.rb ironically, which complaints about its connection to redis for some keys for some reason…)
Where do I start troubleshooting?