Sensu Performance Testing: Redis

Hello!

We’re setting up a sensu PoC system, and I’ve been having trouble finding any information regarding performance/load testing. The sensu documentation for redis mentions the existence of redis-benchmark, but doesn’t give any indications on how to model sensu’s access patterns. Eventually, we want to be running sensu and its dependencies in docker containers (in a scalable HA configuration), but we want a way to measure the effect this will have on sensu, redis, rabbitmq, etc’s performance. I’ve looked through the sensu-redis library, but there doesn’t seem to be much to indicate how the connection is configured. Is pipelining enabled? Is the connection kept alive?

Monitoring goals:

  • Primarily monitor kubernetes containers, pods, etc., as well as some stand-alone docker hosts and software routers.
  • Work with AWS, GCE, and bare metal.
  • Scale across AZs and regions.
  • Regions are tolerant of failures in other regions.
  • Monitor 10,000 hosts per region.

I ended up creating 3 nodes in virtualbox (sensu/uchiwa, redis, and rabbitmq w/ ssl), and used redis-cli to try to observe sensu’s interactions with redis. All 3 nodes have been configured to run ‘metrics-cpu.rb’ & ‘check-process.rb -p cron’ checks to generate some simple test data. It would appear that sensu-server and sensu-api each create a single client connection to redis. I’m guessing we can ignore sensu-api (and redis sentinel in HA) connections as I assume they have a minor impact on redis’s performance compared to sensu-server (equating to the ‘-c 1’ switch in redis-benchmark). Should I increase this in a redis HA config to account for the slaves reading from the master? Also, the number of connections never changes, so I assume sensu-server & sensu-api are using keepalive (’-k 1’ in redis-benchmark). Using ‘redis-benchmark --bigkeys’, I get the following output:

$ redis-cli -h redis-0 -a REDIS-PASSWD --bigkeys

Scanning the entire keyspace to find biggest keys as well as

average sizes per key type. You can use -i 0.1 to sleep 0.1 sec

per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far ‘client:rabbitmq-0’ with 208 bytes
[00.00%] Biggest string found so far ‘result:sensu-0:cpu_metrics’ with 228 bytes
[00.00%] Biggest list found so far ‘history:rabbitmq-0:keepalive’ with 21 items
[00.00%] Biggest set found so far ‘clients’ with 3 members
[31.25%] Biggest string found so far ‘result:sensu-0:cron’ with 248 bytes

-------- summary -------

Sampled 32 keys in the keyspace!
Total key length in bytes is 680 (avg len 21.25)

Biggest string found ‘result:sensu-0:cron’ has 248 bytes
Biggest list found ‘history:rabbitmq-0:keepalive’ has 21 items
Biggest set found ‘clients’ has 3 members

18 strings with 2827 bytes (56.25% of keys, avg size 157.06)
9 lists with 189 items (28.12% of keys, avg size 21.00)
5 sets with 13 members (15.62% of keys, avg size 2.60)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

``

I know this is a very small sample to be basing my testing config off of, but lacking any official guidance, I needed SOME starting point. Please correct me if I’m wrong, but based on the ‘avg size 157.06’ line, I’ve set the redis-benchmark value data size to 160 bytes (’-d 160’). I’m using the default number of transactions (10,000) with the keyspace randomization set to 1000 to mimic 1000 hosts/pods/containers/etc sending 10 metrics each. So my final redis-benchmark command is:

redis-benchmark -h redis-0 -a REDIS-PASSWD -q --csv -n 100000 -c 1 -r 10000 -d 160

``

I then loop this to execute 5 times from a benchmark node against the redis server for each of the following configurations:

  1. Debian 8 VM with redis 3.2.8 installed from jessie-backports.
  2. Redis 3.2.8 docker containers.
  3. 10GB LVM volume mounted for writing the AOF and dump.rdb to.
  4. With 2 slaves, and redis sentinels running on the master and each slave.
  5. Changing ‘-c 1’ to ‘-c 3’ to simulate scaling up to 3 sensu-server nodes.
  6. All the various combinations of 1 - 5.

Thoughts? Advise?

It would REALLY help if this was documented somewhere.

Ditto for rabbitmq perf testing.

Also, official sensu HA docs would be nice.

Also, I want a pony.

Thanks,

Gabriel Burkholder

Correction:

“I’m using the default number of transactions (100,000) with the keyspace randomization set to 10,000 to mimic 10,000 hosts/pods/containers/etc sending 10 metrics each.”

Hi Gabriel,

Thanks for bringing this up. I agree that our documentation with regard to performance testing leaves a lot to be desired. I’ve opened an issue against the sensu-docs0 project to track the documentation needed for this sort of testing. If you care to share any further details of your approach and findings in that issue it would be helpful.

I think you are on the right track with regard to Redis benchmarking, this approach is probably pretty close to what will need to be officially documented. Reviewing instantaneous_ops_per_sec and total_commands_processed via redis INFO after running redis-benchmark may also be helpful.

You are correct in assuming that sensu-api and sensu-server use keepalive. Both daemons use a single persistent, pipelined connection to Redis.

I see that you are using the CPU metrics check to generate a reasonably sized check result output. You may also consider using the system profiler extension1 to generate even larger metric payloads for your testing.

···

On Wednesday, April 5, 2017 at 5:24:25 PM UTC-6, gabr...@unity3d.com wrote:

Hello!

We’re setting up a sensu PoC system, and I’ve been having trouble finding any information regarding performance/load testing. The sensu documentation for redis mentions the existence of redis-benchmark, but doesn’t give any indications on how to model sensu’s access patterns. Eventually, we want to be running sensu and its dependencies in docker containers (in a scalable HA configuration), but we want a way to measure the effect this will have on sensu, redis, rabbitmq, etc’s performance. I’ve looked through the sensu-redis library, but there doesn’t seem to be much to indicate how the connection is configured. Is pipelining enabled? Is the connection kept alive?

Monitoring goals:

  • Primarily monitor kubernetes containers, pods, etc., as well as some stand-alone docker hosts and software routers.
  • Work with AWS, GCE, and bare metal.
  • Scale across AZs and regions.
  • Regions are tolerant of failures in other regions.
  • Monitor 10,000 hosts per region.

I ended up creating 3 nodes in virtualbox (sensu/uchiwa, redis, and rabbitmq w/ ssl), and used redis-cli to try to observe sensu’s interactions with redis. All 3 nodes have been configured to run ‘metrics-cpu.rb’ & ‘check-process.rb -p cron’ checks to generate some simple test data. It would appear that sensu-server and sensu-api each create a single client connection to redis. I’m guessing we can ignore sensu-api (and redis sentinel in HA) connections as I assume they have a minor impact on redis’s performance compared to sensu-server (equating to the ‘-c 1’ switch in redis-benchmark). Should I increase this in a redis HA config to account for the slaves reading from the master? Also, the number of connections never changes, so I assume sensu-server & sensu-api are using keepalive (‘-k 1’ in redis-benchmark). Using ‘redis-benchmark --bigkeys’, I get the following output:

$ redis-cli -h redis-0 -a REDIS-PASSWD --bigkeys

Scanning the entire keyspace to find biggest keys as well as

average sizes per key type. You can use -i 0.1 to sleep 0.1 sec

per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far ‘client:rabbitmq-0’ with 208 bytes
[00.00%] Biggest string found so far ‘result:sensu-0:cpu_metrics’ with 228 bytes
[00.00%] Biggest list found so far ‘history:rabbitmq-0:keepalive’ with 21 items
[00.00%] Biggest set found so far ‘clients’ with 3 members
[31.25%] Biggest string found so far ‘result:sensu-0:cron’ with 248 bytes

-------- summary -------

Sampled 32 keys in the keyspace!
Total key length in bytes is 680 (avg len 21.25)

Biggest string found ‘result:sensu-0:cron’ has 248 bytes
Biggest list found ‘history:rabbitmq-0:keepalive’ has 21 items
Biggest set found ‘clients’ has 3 members

18 strings with 2827 bytes (56.25% of keys, avg size 157.06)
9 lists with 189 items (28.12% of keys, avg size 21.00)
5 sets with 13 members (15.62% of keys, avg size 2.60)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

``

I know this is a very small sample to be basing my testing config off of, but lacking any official guidance, I needed SOME starting point. Please correct me if I’m wrong, but based on the ‘avg size 157.06’ line, I’ve set the redis-benchmark value data size to 160 bytes (‘-d 160’). I’m using the default number of transactions (10,000) with the keyspace randomization set to 1000 to mimic 1000 hosts/pods/containers/etc sending 10 metrics each. So my final redis-benchmark command is:

redis-benchmark -h redis-0 -a REDIS-PASSWD -q --csv -n 100000 -c 1 -r 10000 -d 160

``

I then loop this to execute 5 times from a benchmark node against the redis server for each of the following configurations:

  1. Debian 8 VM with redis 3.2.8 installed from jessie-backports.
  2. Redis 3.2.8 docker containers.
  3. 10GB LVM volume mounted for writing the AOF and dump.rdb to.
  4. With 2 slaves, and redis sentinels running on the master and each slave.
  5. Changing ‘-c 1’ to ‘-c 3’ to simulate scaling up to 3 sensu-server nodes.
  6. All the various combinations of 1 - 5.

Thoughts? Advise?

It would REALLY help if this was documented somewhere.

Ditto for rabbitmq perf testing.

Also, official sensu HA docs would be nice.

Also, I want a pony.

Thanks,

Gabriel Burkholder