SAS® Message Broker cluster message queues are out of sync after a node failure


If a node within the SAS® Message Broker cluster (based on Pivotal RabbitMQ) fails for any reason, the message queues might become out of sync. The most common failure is a network partition. RabbitMQ refers to this scenario as "split-brain." 

To circumvent this problem, complete these steps:

  1. In the /opt/sas/viya/config/etc/rabbitmq-server/rabbitmq.config.ssl file on each node, add the following:
cluster_partition_handling = pause_minority
  1. Restart the sas-viya-rabbitmq-server-default service on all SAS Message Broker/RabbitMQ nodes.

You should implement this change only on clusters that contain an odd number of nodes. SAS recommends that RabbitMQ clusters contain three nodes.