Did OpenShift Go Crazy? 💣 Introducing the Openshift 4 and 5 Node Control Plane Architecture

Forum|Forum|2 months ago
November 17, 2025
5 comments
157 views

eprieto
Veeam Legend

Boosting Resilience in Bare-Metal Active Active Clusters: 4 and 5 Node Control Plane Architecture (4.17 ⬆ Version)

Organizations running active-active deployments across two locations—especially those hosting stateful workloads like OpenShift Virtualization VMs that run only a single instance—depend heavily on the underlying infrastructure to guarantee availability.
While traditional virtualization platforms handle this natively, running these workloads on OpenShift bare metal introduces new architectural considerations.

The Challenge: What Happens When the Primary Site Fails? ⚠️

In typical stretched OpenShift clusters, the control plane is often deployed in a 2+1 or 1+1+1 topology.
But if the data center hosting the majority of control-plane nodes goes down:

The surviving control-plane node becomes the only source of truth for the cluster.
That single node must switch to read-write mode and act as the exclusive etcd copy.
If that node fails… recovery becomes catastrophic, especially when running stateful VMs.

This risk becomes even more critical in environments leveraging OpenShift Virtualization for production workloads.

The Solution: 4-Node and 5-Node Control Plane for Stretched Clusters 🚀

To increase resiliency during data-center-level failures, OpenShift can leverage 4-node or 5-node control-plane deployments, such as:

2+2
3+2

With these designs, even if an entire site is lost, the remaining location still retains two read-only copies of etcd, significantly boosting cluster recoverability and reducing the risk of losing quorum.

Today, the cluster-etcd-operator already supports up to five etcd members, automatically scaling in environments using MachineSets.
But in bare-metal or agent-based installations, MachineSets are not available—meaning the operator won't scale automatically but will adjust etcd peers when control-plane nodes are added manually.

This is exactly the workflow we aim to validate and officially support.

🔧 Note: This capability is specifically targeted at bare-metal clusters, with a strong focus on OpenShift Virtualization use cases.

Goals 🎯

Validate and support 4-node and 5-node control-plane architectures for bare-metal stretched clusters, under the following constraints:

Bare-metal control-plane nodes
Installed via Assisted Installer or Agent-based Installer
Shared Layer 3 network across locations
Latency < 10 ms between all control-plane nodes
Minimum 10 Gbps bandwidth
etcd stored on SSD or NVMe

Acceptance Criteria ✔️

📌 Performance

Control plane performance and scalability must show less than 10% degradation when compared to standard HA clusters.

📌 Recovery Procedures

Documentation must be validated and updated for manual control-plane recovery in cases of quorum loss.

+21

Chris.Childerhose
Veeam Legend, Veeam Vanguard
Forum|Forum|2 months ago
November 17, 2025

Nice to see how other hypervisors are handling clustering and recovery. Thanks for sharing this one Esteban.

JailBreak
Veeam Vanguard
Forum|Forum|2 months ago
November 18, 2025

Openshift is not a Hypervisor per se, but a Kubernetes-based container platform that now uses OpenShift Virtualization VMs (KubeVirt VMs on KVM) but is good to see that they are improving a lot the product. That is a very good sign for a possible good and trustful Enterprise alternative to VMware.

Best Regards, Luciano Patrão

+13

lukas.k
Influencer
Forum|Forum|2 months ago
November 18, 2025

Nice writeup, thanks for the input!

Does anyone already have experience with the support? Imo there are great products (“Broadcom competitors”) on the market but it often comes down to support - that’s the feedback from my customers.

LK | Enterprise Architect @ Veeam Software | Former Veeam Vanguard | Security Specialist

eprieto
Author
Veeam Legend
Forum|Forum|2 months ago
November 19, 2025

Nice writeup, thanks for the input!

Red Hat has a lot of experience with Kubernetes and Kubevirt/KVM, so I don't think support will be a problem in this case. For example, here in Latin America, we have support in Spanish, which is highly valued. But it's something the client needs to be aware of when trying to switch from one technology to another.

Esteban - Red Hat Certified Specialist in OpenShift Virtualization -VMCE - VMCAv1 - VMCT - https://estebanprieto.home.blog/

eprieto
Author
Veeam Legend
Forum|Forum|2 months ago
November 19, 2025

Thanks, Luciano. We certainly strive to make improvements with each OpenShift release and achieve customer satisfaction.

Esteban - Red Hat Certified Specialist in OpenShift Virtualization -VMCE - VMCAv1 - VMCT - https://estebanprieto.home.blog/

Boosting Resilience in Bare-Metal Active Active Clusters: 4 and 5 Node Control Plane Architecture (4.17 ⬆ Version)

The Challenge: What Happens When the Primary Site Fails? ⚠️

The Solution: 4-Node and 5-Node Control Plane for Stretched Clusters 🚀

Goals 🎯

Acceptance Criteria ✔️

📌 Performance

📌 Recovery Procedures

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded