This SAS KB article explains ESXi 6 and VMware best practices for SAS. It contains the following sections:
Generic Overview of VM Farms
VM Farms are thinly provisioned, shared, movable, commodity resources. They are designed for thin applications that can subsist well in shared resource environments. Virtual Farms are thin, but SAS is thick.
Environment Considerations
- On-premises (on-prem): VMware on-prem is the easiest to control and plan for performance.
- Cloud: VMware in the cloud is highly dependent on shared cloud resources and adds another virtualization layer stack.
Resource Provisioning
vSphere Version: SAS always recommends using the latest stable release. (SAS tends to avoid *.0 releases.)
System Tuning – Hypervisor versus Host
Many RHEL host tunable settings can be set on the physical host via UDEV profiles, tuned rules, and so on that cover items such as Power Management, IO Elevators, VMM Dirty Page Handling, OS Readahead, LVM construction, and so on. VMware administrators typically prefer to control these items via ESXi at the hypervisor layer.
There are scenarios that are exceptions (such as VM Host-Striped LVM LUN Construction) when the underlying hardware cannot provide the performance from default hypervisor-set configurations and settings. In these scenarios, do the following:
- Follow the Red Hat Tuning Guide for the suggested Red Hat Linux Settings for RHEL 6 and RHEL7 (RHEL 8 uses the RHEL 7 Settings for now).
- If VMware hypervisor governance will set and enforce these tunables, SAS wants them to be set as closely as possible to what the Host Settings are in Optimizing SAS on Red Hat Enterprise Linux (RHEL) 6 & 7.
VM Placement
Here are some details to help you configure VM placement.
- Are the SAS VMs in the same ESXi cluster?
- This configuration is optimal.
- Crossing clusters involves highly potential performance degradations.
- Are the SAS Virtual Machine hosts contained within the same local network subnet?
- It is best to configure the hosts like this if possible.
- Note: In a large, commodity VM Farm, this configuration might not be feasible.
- NUMA Avoidance:
- Are the SAS Virtual Machine hosts and memory for each individual VM contained within the same single socket? If not, NUMA problems can persist despite VMware attempts to control it.
- The VMware CPU Hot Add setting can influence NUMA alignment if it adds a CPU outside the current socket. It should be disabled, if possible, for SAS.
- 2- or 4-Socket VM Placement: 2-socket hosts are easier to tune and less risky for NUMA issues. However, most VM farms utilize larger 4-socket hosts to aid vMotion placement of resources as needed in workload fluctuation.
- Are there adequate NIC/FC card resources to support each VM when multiple VMs share an underlying physical host?
- Each VM also needs enough NIC or FC card bandwidth (end-to-end to external storage) to permit at least 100 MB/sec/core IO Bandwidth.
- This is primarily true for SAS GRID COMPUTE NODEs, and the CAS Controller for Serial Path CAS Table Loads.
- For Fiber Channel Adapters, SAS recommends dedicated ASICs (no sharing), PVSCSI Adapters with a minimum QDepth of 1024:
- Has the queue depth and ring page count within the virtual machine been increased? https://kb.vmware.com/s/article/2053145: The following command is specific to paravirtual SCSI adapters (recommended for SAS usage) and tends to help with queue depth issues on Single LUN VMDKs:
vmw_pvscsi.cmd_per_lun=254
vmw_pvscsi.ring_pages=32 - The second line ensures persistence after a reboot as described in https://kb.vmware.com/s/article/2150431.
- For Network Interface Cards, SAS recommends the use of VMXNet3 Adapters:
- Configure the Network Adapter Settings for Large Send Offload (LSO)/Large Receive Offload (LRO) in guest host.
VM Subscription
Here are some details to help you configure VM subscription.
- Are the VMs statically reserved or vMotioned?
- If vMotion runs during a SAS process, the disruption is disabling.
- PETs nodes in SAS Viya cannot be live vMotioned without quorum and cluster failure.
- Thin vs. Thick Provisioning
- Are the VMs sharing/borrowing/stealing resources (CPU/Memory)?
- SAS prefers them to be thickly provisioned and as static as possible (not being vMotioned).
SAS Application Roles for VMs & Underlying Physical Host Sharing
Watch the CPU count and bandwidth required to support IO. On SAS Viya systems networks, follow the precepts in (5. Network) below.
- Network Bandwidth Allocation for each VM
- Storage Bandwidth Allocation for each VM
VM Host Backups
Here are some details to help you configure VM host backups.
- Are hypervisor-level backups used?
- It is recommended to not allow hypervisor-level backups during SAS job executions.
- These take place through the network at the hypervisor layer and can interfere with network bandwidth and latency.
- If the disruption is large enough, it can affect SAS Viya services stability (such as Consul, Rabbit, Postgres).
- These should be placed on a non-interfering schedule.
- Are physical host-level backups used?
- It is recommended to not allow host-level backups during SAS job executions.
- These should be placed on a non-interfering schedule.
CPU
CPU ratios from physical hosts to VMs will typically be: 1 PCPU=.85 VCPU. Consider this ratio when you provision VM CPU. If a shared file system is employed, it can drop the ratio by another .05.
Network
Here are some details to help you configure the network.
- Is the VM cluster using VMXNET3 protocols? Most do, and SAS highly recommends it.
- SAS requires a 10 Gbit network bandwidth minimum between VMs (NIC through Switch).
- Having VM/host resources co-located on local subnets are best.
- Paravirtual SCSI Controller/Adapters provide best throughput bandwidth and stability.
Storage
Here are some details to help you configure storage.
VM Disks
Most modern all flash storage arrays can employ single LUN VMDKs for SAS File Systems. There are many considerations about how to arrange and tune these. When they aren’t performant enough due to underlying storage, there are fallbacks to use host created striped LVMs, employing multiple VMDKs in a host stripe.
- LUNS, Logical Volumes, File Systems
- Single LUN vs. Concatenated LUN, vs. Host LVM File System Mounts
- Underlying Storage Considerations
- Flash vs. Non-Flash Storage
- LUN Sizes/Amounts/Queue Depths
- File System Size Effects on Maintenance/HA
- Backup/HA Considerations for Storage Type
- VM disk Tuning for thick provisioned/eager zeroed?
FC versus NIC vs. Host -attached storage
-
- Card and Switch bandwidth to Storage
- NIC MTU Settings
- Bi-directional IO Flow, Active-Active Ports
- Storage Should be in the same data center as the VMs
- Storage Vendor and Model Specifics
LUN Queue Depths for VMDKs or LVMS can be underlying storage brand/model specific. Customers should consult with their storage vendor to set this maximally.
Dedicated SAS versus Shared Storage
Performance can be affected if customer storage arrays or Network Storage Appliances are shared with non SAS applications. Noisy neighbors and non-sequential workloads can impact the SAS storage performance experience.
IO and Storage Tuning Precepts
If you have questions about storage under VMware, contact SAS for help with these settings and subjects:
- Multipathing
- VAAI Primitives
- DiskMAXIO Settings
- Device Queue Lengths
- iSCSI Settings
SAS Viya Specific Considerations (SAS Viya 3.5)
This section provides more information about SAS Viya 3.5.
Network
- SAS Viya requires a 10-Gbit Minimum Network Structure. In addition, it requires low latency.
- When some of the PETs microservices stack begins to exceed 500 microseconds per transaction, slowness can occur.
- When it exceeds a millisecond, a heavily loaded system (such as high RabbitMQ message handling) can become unresponsive and unstable.
Host IO Bandwidth Notes
- CAS Serial Path Loads
- Storage-to-CAS Controller is heavy IO Bandwidth usually via Fiber Channel Connections.
- CAS Controller parallel data spreading to CAS Workers is split up and heavy via the Network.
- CAS DNFS Path Loads
- CAS DNFS Path Loads are encouraged to use Fiber Channel Attached Storage Links.
- While these can be single LUN VMDKs, it is important to watch LUN sizing, tuning, and queue depth arrangements when multiple CAS Workers exploit a single VMDK repository
- CAS_Disk_Cache
- CAS_Disk_Cache backing stores are typically provisioned with local host bus-attached storage (internal Flash Drives to the Server).
- These can be provisioned as multiple High Speed, Write Intensive Flash Drives striped together in a software stripe, or beneath a Hardware RAID controller.
- SAS does not use parity above RAID 5.
- Virtual NVME set at the hypervisor (vNVME) is an option for the internal CDC or CAS_Controller_Temp flash storage if NVME is used instead of standard SSD flash.
- NVME is software striped and cannot install in a Hardware RAID Controller arrangement.
- Host Memory Utilization
- CAS tables are in-memory tables, and what used to be SASWORK space is in memory now as well.
- This can place a significant load on memory when many tables are loaded and operated upon, with little sharing. CAS primary design precept was for a shared-table arrangement for applications.
- It is important to provision memory sufficiently and keep Virtual Memory tuned for optimal performance.
- CGROUP management helps with this.
- It is important to not use “balloon memory” or memory sharing across CAS nodes in VMs.
- This can result in an unanticipated shortage, borrowed memory retrieval on-demand, and inefficient NUMA access.
- Memory allocations to VMs should be well size and static.
- Microservices Specific Needs
- Host Stability
- VM resource sizing and monitoring are crucial.
- VMs that constantly over-run memory can become unstable. This is especially true for PETs servers.
- Specific VM issues to watch for are:
- port queueing (CAS Controller has specific NIC RX/TX increases)
- IO bandwidth
- Java Heap Sizing and Tuning
- buffer sizing (especially Rabbit)
- NIC Tuning
- Microservices are very latency sensitive, causing quorum and heart-beat issues if latency rises too high.
- VM NIC arrangements are important to get right for bandwidth, port assignment, and queueing.
- RabbitMQ specific needs
- Sufficient Java heap sizing in servers, low transaction network latency (host to host, host to logging), network buffer performance.
- Rabbit is very latency sensitive, and a very large system can build message queues, back up ports, experience port exhaustion, deplete java heap stacks, and so on.
- NIC, port, Java, and network performance are important.
- These should not be watered down in a commodity farm by over-sharing or noisy neighbors.