Best Practices for Rancher Managed vSphere Clusters
This guide outlines a reference architecture for provisioning downstream Rancher clusters in a vSphere environment, in addition to standard vSphere best practices as documented by VMware.
- 1. VM Considerations
- 2. Network Considerations
- 3. Storage Considerations
- 4. Backups and Disaster Recovery
\<figcaption\>Solution Overview\</figcaption\>

1. VM Considerations
#
Leverage VM Templates to Construct the EnvironmentTo facilitate consistency across the deployed Virtual Machines across the environment, consider the use of "Golden Images" in the form of VM templates. Packer can be used to accomplish this, adding greater customisation options.
#
Leverage DRS Anti-Affinity Rules (Where Possible) to Separate Downstream Cluster Nodes Across ESXi HostsDoing so will ensure node VM's are spread across multiple ESXi hosts - preventing a single point of failure at the host level.
#
Leverage DRS Anti-Affinity Rules (Where Possible) to Separate Downstream Cluster Nodes Across DatastoresDoing so will ensure node VM's are spread across multiple datastores - preventing a single point of failure at the datastore level.
#
Configure VM's as Appropriate for KubernetesIt’s important to follow K8s and etcd best practices when deploying your nodes, including disabling swap, double-checking you have full network connectivity between all machines in the cluster, using unique hostnames, MAC addresses, and product_uuids for every node.
2. Network Considerations
#
Leverage Low Latency, High Bandwidth Connectivity Between ETCD NodesDeploy etcd members within a single data center where possible to avoid latency overheads and reduce the likelihood of network partitioning. For most setups, 1Gb connections will suffice. For large clusters, 10Gb connections can reduce the time taken to restore from backup.
#
Consistent IP Addressing for VM'sEach node used should have a static IP configured. In the case of DHCP, each node should have a DHCP reservation to make sure the node gets the same IP allocated.
3. Storage Considerations
#
Leverage SSD Drives for ETCD NodesETCD is very sensitive to write latency. Therefore, leverage SSD disks where possible.
4. Backups and Disaster Recovery
#
Perform Regular Downstream Cluster BackupsKubernetes uses etcd to store all its data - from configuration, state and metadata. Backing this up is crucial in the event of disaster recovery.
#
Back up Downstream Node VMsIncorporate the Rancher downstream node VM's within a standard VM backup policy.