sas-rabbitmq-server persistent volume runs out of space after upgrading to stable 2025.12 with RabbitMQ version 4.2.0


SAS® Viya® upgraded the version of RabbitMQ to 4.2.0 in Stable 2025.12. After the upgrade, the sas-rabbitmq-server pods can use up all of the disk space allocated to the sas-viya-rabbitmq-data-volume-sas-rabbitmq-server-x persistent volume. As a result, the stability of the SAS Viya system might be affected and cause errors when you start new jobs or compute sessions. 

Cause

The policies applied to the quorum queues used by many of the pods configure a dead letter exchange, but the exchange is not created. Since the dead letter exchange does not exist, messages cannot be sent to it. A warning message is logged if a message needs to be sent to the exchange:

[warning] <0.1138.0> Cannot forward any dead-letter messages from source quorum queue 'sas.application.audit.AuditConsumer.quorum' in vhost '/' because its configured dead-letter-exchange exchange 'sas.dlq' in vhost '/' does not exist. Either create the configured dead-letter-exchange or re-configure the dead-letter-exchange policy for the source quorum queue to prevent dead-lettered messages from piling up in the source quorum queue. This message will not be logged again.

Because the message cannot complete dead letter processing, the message remains in the queue. RabbitMQ is unable to clean up the segment file that contains this message in the /rabbitmq/data directory, which is mounted on the PVC. As a result, the usage of the PVC increases until it runs out of space.

Use the following command to check the current size of the /rabbitmq/data directory on the persistent volume:

kubectl -n <NS> exec -it sas-rabbitmq-server-0 -- /bin/sh -c 'du -ah /rabbitmq/data | sort -h | tail -1' 

Resolution

To work around this issue, change the policies of the quorum queues to remove the dead letter processing for the queue.

To change the policy, use kubectl to exec onto each sas-rabbitmq-server-x pod using bash. Then, run the following commands. (This example below is for sas-rabbitmq-server-0.)

kubectl -n <NS> exec sas-rabbitmq-server-0 -c sas-rabbitmq-server -it -- bash

rabbitmqctl set_policy --priority 1 --apply-to all qq "^sas\." '{"dead-letter-strategy":"at-least-once","expires":432000000,"overflow":"reject-publish","queue-leader-locator":"balanced"}'

rabbitmqctl set_policy --priority 10 --apply-to quorum_queues qq-retry '.\.retry\..' '{"dead-letter-strategy": "at-least-once", "overflow":"reject-publish","queue-leader-locator":"balanced"}'
 

Once you run the commands on all the sas-rabbitmq-server pods, the storage usage of the PVC should decrease. 

Note: You can rerun the following command to check the PVC size. 

kubectl -n <NS> exec -it sas-rabbitmq-server-0 -- /bin/sh -c 'du -ah /rabbitmq/data | sort -h | tail -1'

If you restart the sas-rabbitmq-server pods, you need to run the commands to set the policies again.

A patch to permanently update the policies is available for this issue. Run the Update Checker Report and apply updates to the sas-rabbitmq-server image.