Self-Managing Clusters via the Pivot Pattern

Your management cluster is the single point of failure for your entire fleet. The more clusters you add, the worse it gets. I solved this by making every cluster manage itself — after provisioning, Lattice transfers infrastructure ownership to the workload cluster and the management cluster can be deleted.

The problem

In the standard multi-cluster model, a management cluster runs Cluster API and owns the Machine, MachineDeployment, and Cluster resources for every workload cluster. If the management cluster goes down, no workload cluster can scale, upgrade, or replace failed nodes. MachineHealthChecks stop running. Every cluster in your fleet is waiting for a brain that’s not there anymore.

The blast radius is proportional to the number of clusters it manages — which means the more you grow, the more fragile you become.

At 5 workload clusters, this is an uncomfortable risk. At 50, it’s an existential one. And here’s what nobody talks about: upgrading the management cluster is the riskiest operation in your fleet. Every managed cluster depends on it. You know something’s wrong with your architecture when the scariest change is maintaining the thing that’s supposed to maintain everything else.

The constraint

CAPI provisions workload clusters by creating Machine and MachineDeployment resources on the management cluster. The CAPI controllers that reconcile these resources — scaling nodes, rolling upgrades, replacing unhealthy machines — run on the management cluster. Your workload cluster’s infrastructure is managed from outside the cluster.

The constraint: after provisioning, the CAPI resources and controllers need to move to the workload cluster. But the workload cluster doesn’t have CAPI controllers yet (you need to install them during bootstrapping), and clusterctl move requires direct network connectivity — which conflicts with the outbound-only architecture.

The deeper constraint: the transfer must happen while the cluster is running workloads. No downtime. No gap where neither cluster is managing the infrastructure. And it must be idempotent — if it fails midway, rerunning it picks up where it left off without duplicating resources.

The solution: the distributed pivot protocol

I built a pivot protocol that transfers CAPI resource ownership over the gRPC stream.

flowchart TD
  subgraph "Before Pivot"
    P1[Parent Cluster] -->|"owns"| M1[CAPI Resources]
    M1 -->|"manages"| W1[Workload Cluster]
  end
  subgraph "Pivot"
    P2[Parent] -- "PivotCommand\n(serialized resources)" --> A[Agent on Workload]
    A --> I["Import in\ntopological order"]
  end
  subgraph "After Pivot"
    W2[Workload Cluster] -->|"owns"| M2[CAPI Resources]
    W2 -->|"manages"| W2
  end

The sequence:

Parent provisions the workload cluster through standard CAPI. The parent owns all infrastructure resources.
Platform operator is installed on the workload cluster during bootstrapping — including CAPI controllers and the agent component.
Agent establishes an outbound gRPC stream to the parent.
Parent sends a PivotCommand over the stream. The command contains the serialized CAPI resources — Cluster, Machine, MachineDeployment, InfrastructureMachineTemplate — with their full specifications and dependency relationships.
Agent imports the resources locally. It creates them on the workload cluster’s API server in topological order (dependency-free resources first, dependent resources after), remaps UIDs to maintain owner references, and waits for the local CAPI controllers to begin reconciling.
Parent cleans up only after the agent confirms. The parent doesn’t delete its copy of the CAPI resources until the agent sends back an acknowledgment that local CAPI controllers are reconciling and healthy. Until that ack arrives, the parent retains ownership.

What about split brain? If the pivot fails midway — agent crash, network drop, partial import — both clusters have copies of the CAPI resources. But only the parent’s controllers are unpaused and active. The imported resources on the workload side are paused until the full transfer completes. No dual reconciliation, no conflicting scaling decisions.

Rerun the pivot and the idempotent protocol picks up where it left off. Resources that already exist on the workload cluster are skipped. The protocol tracks what’s been transferred and what hasn’t.

Why not clusterctl move: It requires direct network connectivity from source to target, which breaks outbound-only. It’s a one-shot CLI operation, not reconciliation-based — there’s no retry on partial failure. And it doesn’t handle the bootstrapping problem (CAPI must be running on the target before you move resources to it).

Why gRPC over the agent stream: The stream already exists for cluster communication. Serializing CAPI resources as Protobuf messages over the same authenticated, encrypted channel avoids opening a new protocol or network path. The pivot is just another message type on the existing bidirectional stream. No new ports, no new auth, no new anything.

After pivot, the workload cluster is fully independent:

Scaling: Change the MachineDeployment replica count locally. Local CAPI provisions or decommissions nodes.
Upgrades: Update the Kubernetes version or machine image in the local MachineDeployment. Local CAPI does rolling replacement.
Self-healing: MachineHealthCheck runs locally, detects unhealthy nodes, triggers replacement.
Parent deletion: The parent can be destroyed. The workload cluster continues operating.

That last one is the test. Not “does it look self-managing in a diagram.” Does it actually survive the parent being deleted? If yes, it’s self-managing. If no, you just have a distributed management cluster with extra steps.

The tradeoff: 50 independent clusters

The management cluster was a single pane of glass — and a single point of failure. After pivot, you don’t have a single pane of glass anymore. You have 50 independent clusters, each managing their own infrastructure.

“But how do I see what’s happening across 50 clusters?”

Fair question. The answer is that the pivot separates the control plane (now local to each cluster) from the observation plane (still global). The outbound gRPC stream that each cluster maintains to the parent carries telemetry, health status, and API proxy access. The parent can still kubectl get nodes on any workload cluster — it just can’t manage the nodes if its own CAPI controllers are down. Visibility survives. Management is local.

You still need something — Flux, ArgoCD, a CI pipeline, or the parent’s gRPC stream — to deliver configuration changes to the fleet. Rolling out a Kubernetes upgrade across 30 clusters, distributing updated Cedar policies, deploying new platform operator versions — that’s operational coordination, and it still needs a channel. The difference: if that coordination channel goes down, clusters keep running. They just stop receiving updates until the channel recovers. That’s degradation, not failure.

And in the worst case — coordination is down, the parent is gone, Flux isn’t syncing — you can still kubectl directly into any self-managing cluster and operate it. Scale nodes, roll an upgrade, replace a failed machine. The CAPI resources are local. The cluster doesn’t need to phone home to do infrastructure work. That’s the break-glass path, and it works precisely because the pivot moved ownership to the cluster itself.

I accept this tradeoff. Not reluctantly — deliberately. The alternative is a management cluster whose failure stops every cluster from scaling, upgrading, or self-healing. I’ll take 50 independent clusters with stale config over one dead brain and a frozen fleet.

Key takeaways

Hub-and-spoke management is a liability at scale. The management cluster is the riskiest thing in your infrastructure. The pivot eliminates it as a dependency.
The pivot is split-brain-safe. Parent retains ownership until the agent confirms. Paused resources prevent dual reconciliation. Idempotent retry on failure.
The test is simple: delete the parent. If the workload cluster keeps running, the pivot worked. Everything else is just claims.