Extending a VMware vSphere Cluster Deployment
This document explains how to extend a baseline VMware vSphere cluster deployment after the minimum single-datacenter workflow is running successfully.
TOC
ScenariosPrerequisitesAdd a second NICExpand from one NIC to two NICsRevert from two NICs to one NICEnable multiple datacenters and failure domainsAdd or remove data disksScale out worker nodesVerificationNext StepsScenarios
Use this document in the following scenarios:
- You need a second NIC on control plane or worker nodes.
- You want to distribute nodes across multiple datacenters or deployment zones.
- You want to add more data disks.
- You want to scale out the worker pool.
Prerequisites
Before you begin, ensure the following conditions are met:
- The baseline workflow in Creating a VMware vSphere Cluster in the global Cluster completed successfully.
- You validated the extra parameters in Preparing Parameters for a VMware vSphere Cluster.
- You understand which manifests own the network, placement, and disk settings in your deployment.
Add a second NIC
When nodes require an additional management, storage, or service network, extend the manifests in the following resources:
02-vsphereresourcepool-control-plane.yaml03-vsphereresourcepool-worker.yaml20-control-plane.yaml30-workers-md-0.yaml04-failure-domains.yamlif failure domains are enabled
Add the second NIC to each node slot in the static allocation pools:
Apply the same pattern to the worker node slots:
Add the second NIC to the machine templates:
If failure domains are enabled, update the network list in VSphereFailureDomain.spec.topology.networks:
When you define the second NIC values, prepare the following placeholders in the checklist and manifests:
<master_01_nic2_ip><master_02_nic2_ip><master_03_nic2_ip><worker_01_nic2_ip><worker_02_nic2_ip>when you also expand the worker pool
When you move between one NIC and two NICs, apply the following rules:
Expand from one NIC to two NICs
Update all of the following fields together:
VSphereResourcePool.spec.resources[].networkVSphereMachineTemplate.spec.template.spec.network.devicesVSphereFailureDomain.spec.topology.networkswhen failure domains are enabled
Revert from two NICs to one NIC
Remove the second NIC block from all of the following fields:
- The second NIC entry in
VSphereResourcePool.spec.resources[].network - The second device entry in
VSphereMachineTemplate.spec.template.spec.network.devices - The second network name in
VSphereFailureDomain.spec.topology.networks
Enable multiple datacenters and failure domains
Use multiple datacenters and failure domains when you need node placement across different vCenter datacenters or compute clusters.
The following principles apply:
- One cluster can define multiple
VSphereFailureDomainobjects. - Each
VSphereDeploymentZonereferences oneVSphereFailureDomain. - The control plane uses
VSphereCluster.spec.failureDomainSelector. - A worker
MachineDeploymentusesspec.template.spec.failureDomainwhen it must target a specific deployment zone.
Prepare the following placeholders for the first datacenter:
<compute_cluster_1><default_datastore_1><resource_pool_path_1><fd_name_1><dz_name_1>
Prepare the following placeholders for the second datacenter:
<dc_name_2><fd_name_2><dz_name_2><compute_cluster_2><default_datastore_2><resource_pool_path_2>
If you add a third datacenter, continue with the same placeholder pattern:
<dc_name_3><fd_name_3><dz_name_3><compute_cluster_3><default_datastore_3><resource_pool_path_3>
Create the failure-domain objects in 04-failure-domains.yaml. The first datacenter also needs a VSphereFailureDomain and VSphereDeploymentZone when failure domains are enabled:
Enable control plane selection across the available failure domains:
Set a worker deployment zone when a worker MachineDeployment must be pinned to one deployment target:
Use a VSphereDeploymentZone name for <worker_failure_domain>, not a VSphereFailureDomain name.
Recommendation: Before you enable multiple datacenters, confirm that the VM template, networks, and datastores are available in every target datacenter.
Before you enable multiple datacenters, also confirm the following prerequisites:
- The template is already synchronized to every target datacenter.
- The network names are resolvable in every target datacenter.
- The datastore names are resolvable in every target datacenter.
- The vSphere CPI datacenter list covers every target datacenter.
Add or remove data disks
The baseline example includes dedicated data disks for control plane nodes and can include dedicated data disks for worker nodes.
In the baseline target scenario, the control plane data disk is part of the minimum deployment set. If the control plane design depends on an additional disk for /var/cpaas or another directory, do not remove it.
Worker data disks are optional and depend on workload requirements.
If a worker node does not need a data disk, remove the persistentDisks section entirely from the corresponding node slot:
If no worker node requires a data disk, remove that block from every worker slot in the worker CAPV static allocation pool.
If a node needs multiple data disks, append more entries to the same persistentDisks list:
Scale out worker nodes
Worker scale-out depends on the relationship between MachineDeployment.spec.replicas and the available node slots in the worker CAPV static allocation pool, VSphereResourcePool.spec.resources[].
Apply the following rules:
- The number of node slots can be greater than
replicas. - Idle slots do not affect a running cluster.
- If
replicasexceeds the number of available slots, CAPV cannot assign the new worker nodes correctly.
Use the following order when you scale out workers:
- Add new worker node slots to
03-vsphereresourcepool-worker.yaml. - Increase
MachineDeployment.spec.replicasin30-workers-md-0.yaml.
The following example adds a new worker slot:
Then update the worker replicas:
Verification
After each extension, validate the cluster state with the following commands:
Confirm the following results:
- The new placement, NIC, or disk definitions are reflected in the target resources.
- New worker nodes reach the
Readystate. - Existing nodes remain healthy after the change.
Next Steps
Apply one extension at a time. Validate the result before you combine multiple changes in the same cluster.