RabbitMQ in AWS

Hi All,

I’ve been speaking with a couple of people on IRC about this but just thought I’d reach out to a wider audience to hopefully get some feedback on how people are setting up Sensu and its dependencies in AWS. I’ve only just set-up a single instance of Sensu/RabbitMQ/API/Uchiwa/Redis so very much new to this, but liking Sensu so far.

Looking at making the set-up robust, the one dependency I’m concerned about is RabbitMQ. I plan on running Sensu/API/Uchiwa on N number of Sensu servers (with a load-balancer in front) and Redis will be managed by AWS (ElastiCache) so not worrying about that one.

I’ve found various blog posts on how people are deploying RabbitMQ including:

  • Independent RabbitMQ instance on every Sensu server

  • Separate RabbitMQ Cluster using an ASG + ELB

  • Single RabbitMQ instance per AZ and each client connecting to their local (AZ) RabbitMQ instance. Sensu server configured to talk to each RabbitMQ instance in the AZs

There are pros and cons to each, many that I don’t even know about, so to save me from going down a painful road, I’m looking for feedback from real-life experience with RabbitMQ in AWS.

One thing that concerns me about setting up RabbitMQ clustering is that I’m using Puppet and I’d have to set-up a script to join an existing cluster (auto-discovering existing nodes in the cluster). I’m sure it’s do-able, but it sounds… fiddly.

Anyway, would certainly appreciate any pointers/feedback.

  • Gonzalo

I use the first method, sorta. (3 instances, one on each sensu server,
all clustered, haproxy in front)
I would *not* write a script to join a cluster, I would use the
puppetlabs-rabbitmq module and let it do the clustering for you.
You should have some sort of "source of truth" about your cluster
topology, hiera? This allows you to configure clients *and* servers
with the same data. (I use dns for legacy reasons and use
GitHub - Yelp/puppet-netstdlib: A collection of Puppet functions for interacting with the network to get an
array out of it)

If you are more comfortable with ASG+ELB's, that might be a way to go
for you. ELB+ASG has it's own idiosyncrasies. If you are using puppet
for other things, I would vote against this method and use the same
tool for your other components (puppet).

Certainly think hard about your partition handling strategy:
https://www.rabbitmq.com/partitions.html

I chose autoheal originally because I didn't have 3 everywhere. Now
that I have sets of 3 in all my clusters I want to switch to
pause_minority to get the "C" back in CAP.

···

On Wed, Dec 31, 2014 at 7:37 PM, Gonzalo Servat <gservat@gmail.com> wrote:

Hi All,

I've been speaking with a couple of people on IRC about this but just
thought I'd reach out to a wider audience to hopefully get some feedback on
how people are setting up Sensu and its dependencies in AWS. I've only just
set-up a single instance of Sensu/RabbitMQ/API/Uchiwa/Redis so very much new
to this, but liking Sensu so far.

Looking at making the set-up robust, the one dependency I'm concerned about
is RabbitMQ. I plan on running Sensu/API/Uchiwa on N number of Sensu servers
(with a load-balancer in front) and Redis will be managed by AWS
(ElastiCache) so not worrying about that one.
I've found various blog posts on how people are deploying RabbitMQ
including:

- Independent RabbitMQ instance on every Sensu server
- Separate RabbitMQ Cluster using an ASG + ELB
- Single RabbitMQ instance per AZ and each client connecting to their local
(AZ) RabbitMQ instance. Sensu server configured to talk to each RabbitMQ
instance in the AZs

There are pros and cons to each, many that I don't even know about, so to
save me from going down a painful road, I'm looking for feedback from
real-life experience with RabbitMQ in AWS.

One thing that concerns me about setting up RabbitMQ clustering is that I'm
using Puppet and I'd have to set-up a script to join an existing cluster
(auto-discovering existing nodes in the cluster). I'm sure it's do-able, but
it sounds... fiddly.

Anyway, would certainly appreciate any pointers/feedback.

- Gonzalo

Thanks very much Kyle for your reply. If you don’t mind, a few questions on your setup:

a) You said you have 3 instances with Sensu and RabbitMQ (clustered) and HAProxy in front. You have a separate HAProxy cluster that load balances RabbitMQ to the 3x Sensu nodes? or HAProxy also runs on each of the Sensu nodes? Do you have an elastic IP attached to the active HAProxy that you float to another if it fails?

  1. I do use Hiera and totally agree that it should be the source of truth. Only issue I’m having is how I can determine the names of the other cluster nodes (for the join command) keeping in mind the hostnames and IPs are random. Ideally I want to be able to scale up the number of Sensu servers to whatever number I need (not necessarily 1 per AZ) so using the AZ in the name to determine the other cluster nodes is not ideal. Any ideas?

  2. Your comment about using ASG+ELB was in regards to using that vs HAProxy?

As for the partitions, I’m not really sure yet what I’ll use. I’ve had a read of that page but it’s not crystal clear yet so I’ll probably just have to experiment.

Cheers

Gonzalo

···

On Fri Jan 02 2015 at 4:40:09 AM Kyle Anderson kyle@xkyle.com wrote:

I use the first method, sorta. (3 instances, one on each sensu server,

all clustered, haproxy in front)

I would not write a script to join a cluster, I would use the

puppetlabs-rabbitmq module and let it do the clustering for you.

You should have some sort of “source of truth” about your cluster

topology, hiera? This allows you to configure clients and servers

with the same data. (I use dns for legacy reasons and use

https://github.com/Yelp/puppet-netstdlib#gethostbyaddr2array to get an

array out of it)

If you are more comfortable with ASG+ELB’s, that might be a way to go

for you. ELB+ASG has it’s own idiosyncrasies. If you are using puppet

for other things, I would vote against this method and use the same

tool for your other components (puppet).

Certainly think hard about your partition handling strategy:

https://www.rabbitmq.com/partitions.html

I chose autoheal originally because I didn’t have 3 everywhere. Now

that I have sets of 3 in all my clusters I want to switch to

pause_minority to get the “C” back in CAP.

On Wed, Dec 31, 2014 at 7:37 PM, Gonzalo Servat gservat@gmail.com wrote:

Hi All,

I’ve been speaking with a couple of people on IRC about this but just

thought I’d reach out to a wider audience to hopefully get some feedback on

how people are setting up Sensu and its dependencies in AWS. I’ve only just

set-up a single instance of Sensu/RabbitMQ/API/Uchiwa/Redis so very much new

to this, but liking Sensu so far.

Looking at making the set-up robust, the one dependency I’m concerned about

is RabbitMQ. I plan on running Sensu/API/Uchiwa on N number of Sensu servers

(with a load-balancer in front) and Redis will be managed by AWS

(ElastiCache) so not worrying about that one.

I’ve found various blog posts on how people are deploying RabbitMQ

including:

  • Independent RabbitMQ instance on every Sensu server
  • Separate RabbitMQ Cluster using an ASG + ELB
  • Single RabbitMQ instance per AZ and each client connecting to their local

(AZ) RabbitMQ instance. Sensu server configured to talk to each RabbitMQ

instance in the AZs

There are pros and cons to each, many that I don’t even know about, so to

save me from going down a painful road, I’m looking for feedback from

real-life experience with RabbitMQ in AWS.

One thing that concerns me about setting up RabbitMQ clustering is that I’m

using Puppet and I’d have to set-up a script to join an existing cluster

(auto-discovering existing nodes in the cluster). I’m sure it’s do-able, but

it sounds… fiddly.

Anyway, would certainly appreciate any pointers/feedback.

  • Gonzalo

Thanks very much Kyle for your reply. If you don't mind, a few questions on
your setup:

a) You said you have 3 instances with Sensu and RabbitMQ (clustered) and
HAProxy in front. You have a separate HAProxy cluster that load balances
RabbitMQ to the 3x Sensu nodes? or HAProxy also runs on each of the Sensu
nodes? Do you have an elastic IP attached to the active HAProxy that you
float to another if it fails?

In my setup, HAProxy runs on each node, and they load balance each-other.
I've mimicked this: http://failshell.io/sensu/high-availability-sensu/
I use normal ips and DNS round robin.
If there is an issue with a haproxy connection, the sensu client will
re-resolve and retry another server in the DNS round robin, which is
nice.
I don't float ips, or use keepalived or elastic ips. DNS round robin
worked for me in both datacenter and ec2.

2) I do use Hiera and totally agree that it should be the source of truth.
Only issue I'm having is how I can determine the names of the other cluster
nodes (for the join command) keeping in mind the hostnames and IPs are
random. Ideally I want to be able to scale up the number of Sensu servers to
whatever number I need (not necessarily 1 per AZ) so using the AZ in the
name to determine the other cluster nodes is not ideal. Any ideas?

Ug. Yea rabbitmq isn't going to respond well to random hostnames.
When one of the members changes, you will have to reconfigure everyone.
Having short (non-fqdn) resolvable hostnames is kinda baked into rabbitmq:
https://www.rabbitmq.com/ec2.html#issues-hostname
You can override the node name. I don't know, this is probably the
hardest part for you if they are random.

The nice part about having the names or ips in a central store
(dns/hiera) is that you can use that same data to inform the servers
how to cluster *and* how the clients should connect to the cluster.

3) Your comment about using ASG+ELB was in regards to using that vs HAProxy?

Sorta. ASG+ELB is more than just replacing "haproxy" of course. You
have to define your application around the architecture.
Running an ELB in front is more like replacing haproxy. That will probably fine.
Running clustered rabbit inside an ASG will be hard. You might want to
seek help from the rabbitmq-users list in this regard.

···

On Thu, Jan 1, 2015 at 4:02 PM, Gonzalo Servat <gservat@gmail.com> wrote:

As for the partitions, I'm not really sure yet what I'll use. I've had a
read of that page but it's not crystal clear yet so I'll probably just have
to experiment.

Cheers
Gonzalo

On Fri Jan 02 2015 at 4:40:09 AM Kyle Anderson <kyle@xkyle.com> wrote:

I use the first method, sorta. (3 instances, one on each sensu server,
all clustered, haproxy in front)
I would *not* write a script to join a cluster, I would use the
puppetlabs-rabbitmq module and let it do the clustering for you.
You should have some sort of "source of truth" about your cluster
topology, hiera? This allows you to configure clients *and* servers
with the same data. (I use dns for legacy reasons and use
GitHub - Yelp/puppet-netstdlib: A collection of Puppet functions for interacting with the network to get an
array out of it)

If you are more comfortable with ASG+ELB's, that might be a way to go
for you. ELB+ASG has it's own idiosyncrasies. If you are using puppet
for other things, I would vote against this method and use the same
tool for your other components (puppet).

Certainly think hard about your partition handling strategy:
Clustering and Network Partitions — RabbitMQ

I chose autoheal originally because I didn't have 3 everywhere. Now
that I have sets of 3 in all my clusters I want to switch to
pause_minority to get the "C" back in CAP.

On Wed, Dec 31, 2014 at 7:37 PM, Gonzalo Servat <gservat@gmail.com> wrote:
> Hi All,
>
> I've been speaking with a couple of people on IRC about this but just
> thought I'd reach out to a wider audience to hopefully get some feedback
> on
> how people are setting up Sensu and its dependencies in AWS. I've only
> just
> set-up a single instance of Sensu/RabbitMQ/API/Uchiwa/Redis so very much
> new
> to this, but liking Sensu so far.
>
> Looking at making the set-up robust, the one dependency I'm concerned
> about
> is RabbitMQ. I plan on running Sensu/API/Uchiwa on N number of Sensu
> servers
> (with a load-balancer in front) and Redis will be managed by AWS
> (ElastiCache) so not worrying about that one.
> I've found various blog posts on how people are deploying RabbitMQ
> including:
>
> - Independent RabbitMQ instance on every Sensu server
> - Separate RabbitMQ Cluster using an ASG + ELB
> - Single RabbitMQ instance per AZ and each client connecting to their
> local
> (AZ) RabbitMQ instance. Sensu server configured to talk to each RabbitMQ
> instance in the AZs
>
> There are pros and cons to each, many that I don't even know about, so
> to
> save me from going down a painful road, I'm looking for feedback from
> real-life experience with RabbitMQ in AWS.
>
> One thing that concerns me about setting up RabbitMQ clustering is that
> I'm
> using Puppet and I'd have to set-up a script to join an existing cluster
> (auto-discovering existing nodes in the cluster). I'm sure it's do-able,
> but
> it sounds... fiddly.
>
> Anyway, would certainly appreciate any pointers/feedback.
>
> - Gonzalo

Sorry, more questions (BTW: are you on freenode → #sensu? Would be good to have a quick chat if you’re around…)

In my setup, HAProxy runs on each node, and they load balance each-other.

I’ve mimicked this: http://failshell.io/sensu/high-availability-sensu/

I use normal ips and DNS round robin.

If there is an issue with a haproxy connection, the sensu client will

re-resolve and retry another server in the DNS round robin, which is

nice.

Oh, I see. So you have a DNS record such as rabbitmq.foo.com with weighted round robin to spread the load around?

I assume you have your Sensu/RabbitMQ nodes in an ASG. Do you not? Do you assign them static hostnames so that you can define them in Hiera?

The issue I haven’t yet solved is that they are in an ASG, and there could potentially be N number of Sensu/RabbitMQ servers. So I’m still unsure about how to cluster them together.

Ug. Yea rabbitmq isn’t going to respond well to random hostnames.

When one of the members changes, you will have to reconfigure everyone.

Having short (non-fqdn) resolvable hostnames is kinda baked into rabbitmq:

https://www.rabbitmq.com/ec2.html#issues-hostname

You can override the node name. I don’t know, this is probably the

hardest part for you if they are random.

Maybe I need to play with RabbitMQ clustering first as I have a few question marks on this.

The nice part about having the names or ips in a central store

(dns/hiera) is that you can use that same data to inform the servers

how to cluster and how the clients should connect to the cluster.

Yep, I agree, although the DNS name that the clients will connect to will be a single record that does round-robin.

Sorta. ASG+ELB is more than just replacing “haproxy” of course. You

have to define your application around the architecture.

Running an ELB in front is more like replacing haproxy. That will probably fine.

Agreed. I think I need to give up the idea of putting Sensu/RabbitMQ in an ASG and just have individual instances that get added to an ELB. That way I can put something in the UserData in each instance to set the hostname, and register a DNS record for it.

Running clustered rabbit inside an ASG will be hard. You might want to

seek help from the rabbitmq-users list in this regard.

Yeah, I might give up on the ASG idea and just scale up myself if I feel two nodes doesn’t cut it. I will just have to update hiera and the CloudFormation template if I decide I want more nodes.

  • Gonz
···

On Fri Jan 02 2015 at 12:52:32 PM Kyle Anderson kyle@xkyle.com wrote:

Sorry, more questions (BTW: are you on freenode -> #sensu? Would be good to
have a quick chat if you're around...)

I do no. I tag team with a co-worker of mine, "bobtfish" does irc and
I do the mailing list.
Bug him :slight_smile:

In my setup, HAProxy runs on each node, and they load balance each-other.
I've mimicked this: http://failshell.io/sensu/high-availability-sensu/
I use normal ips and DNS round robin.
If there is an issue with a haproxy connection, the sensu client will
re-resolve and retry another server in the DNS round robin, which is
nice.

Oh, I see. So you have a DNS record such as rabbitmq.foo.com with weighted
round robin to spread the load around?

"weighted round robin" sounds fancy. Mine is just normal dns round robin.

I assume you have your Sensu/RabbitMQ nodes in an ASG. Do you not? Do you
assign them static hostnames so that you can define them in Hiera?

They are not in an ASG. They are just manually launched and given
static hostnames.

The issue I haven't yet solved is that they are in an ASG, and there could
potentially be N number of Sensu/RabbitMQ servers. So I'm still unsure about
how to cluster them together.

Having them static allows configuration management (puppet) an easy
time. I can just have an array of nodes with stable hostnames, puppet
takes care of the clustering for me.

Puppet was not really designed to run in an ASG environment like that.
If you ask me, rabbitmq also wasn't designed to be run (operated?) in
an ASG-like environment either :frowning:

Ug. Yea rabbitmq isn't going to respond well to random hostnames.
When one of the members changes, you will have to reconfigure everyone.
Having short (non-fqdn) resolvable hostnames is kinda baked into rabbitmq:
Running RabbitMQ on Amazon EC2 — RabbitMQ
You can override the node name. I don't know, this is probably the
hardest part for you if they are random.

Maybe I need to play with RabbitMQ clustering first as I have a few question
marks on this.

The nice part about having the names or ips in a central store
(dns/hiera) is that you can use that same data to inform the servers
how to cluster *and* how the clients should connect to the cluster.

Yep, I agree, although the DNS name that the clients will connect to will be
a single record that does round-robin.

Correct.

···

On Thu, Jan 1, 2015 at 7:30 PM, Gonzalo Servat <gservat@gmail.com> wrote:

On Fri Jan 02 2015 at 12:52:32 PM Kyle Anderson <kyle@xkyle.com> wrote:

Sorta. ASG+ELB is more than just replacing "haproxy" of course. You
have to define your application around the architecture.
Running an ELB in front is more like replacing haproxy. That will probably
fine.

Agreed. I think I need to give up the idea of putting Sensu/RabbitMQ in an
ASG and just have individual instances that get added to an ELB. That way I
can put something in the UserData in each instance to set the hostname, and
register a DNS record for it.

Running clustered rabbit inside an ASG will be hard. You might want to
seek help from the rabbitmq-users list in this regard.

Yeah, I might give up on the ASG idea and just scale up myself if I feel two
nodes doesn't cut it. I will just have to update hiera and the
CloudFormation template if I decide I want more nodes.

- Gonz