Cluster Scaling with the CockroachDB Operator

On this page

This page explains how to add and remove CockroachDB nodes on Kubernetes.

Note:

The CockroachDB operator is in Preview.

Add nodes

Before scaling up CockroachDB, note the following topology recommendations:

Each CockroachDB node (running in its own pod) should run on a separate Kubernetes worker node.
Each availability zone should have the same number of CockroachDB nodes.

If your cluster has 3 CockroachDB nodes distributed across 3 availability zones (as in our deployment example), Cockroach Labs recommends scaling up by a multiple of 3 to retain an even distribution of nodes. You should therefore scale up to a minimum of 6 CockroachDB nodes, with 2 nodes in each zone.

Run kubectl get nodes to list the worker nodes in your Kubernetes cluster. There should be at least as many worker nodes as pods you plan to add. This ensures that no more than one pod will be placed on each worker node.
If you need to add worker nodes, resize your cluster by specifying the desired number of worker nodes in each zone. Using Google Kubernetes Engine as an example:
```
gcloud container clusters resize {cluster-name} --region {region-name} --num-nodes 2
```
This example distributes 2 worker nodes across the default 3 zones, raising the total to 6 worker nodes.
Update cockroachdb.crdbCluster.regions.code.nodes in the values file used to deploy the cluster, with the target size of the CockroachDB cluster in the specified region. This value refers to the number of CockroachDB nodes, each running in one pod:
```
cockroachdb:
  crdbCluster:
    regions:
    - code: us-central1
      cloudProvider: gcp
      domain: cluster.domain.us-central
      nodes: 6
```

Apply the new settings to the cluster:

helm upgrade --reuse-values $CRDBCLUSTER ./cockroachdb-parent/charts/cockroachdb --values ./cockroachdb-parent/charts/cockroachdb/values.yaml -n $NAMESPACE

Verify that the new pods were successfully started:

kubectl get pods

NAME                                  READY   STATUS    RESTARTS   AGE
cockroach-operator-655fbf7847-zn9v8   1/1     Running   0          30m
cockroachdb-0                         1/1     Running   0          24m
cockroachdb-1                         1/1     Running   0          24m
cockroachdb-2                         1/1     Running   0          24m
cockroachdb-3                         1/1     Running   0          30s
cockroachdb-4                         1/1     Running   0          30s
cockroachdb-5                         1/1     Running   0          30s

Each pod should be running in one of the 6 worker nodes.

Remove nodes

If your nodes are distributed across 3 availability zones (as in our deployment example), Cockroach Labs recommends scaling down by a multiple of 3 to retain an even distribution. If your cluster has 6 CockroachDB nodes, you should therefore scale down to 3, with 1 node in each zone.

Warning:

Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on CockroachDB and will cause errors. Before scaling down CockroachDB, note that each availability zone should have the same number of CockroachDB nodes.

Update cockroachdb.crdbCluster.regions.code.nodes in the values file used to deploy the cluster, with the target size of the CockroachDB cluster. For instance, to scale a cluster in Google Cloud down to 3 nodes:
```
cockroachdb:
  crdbCluster:
    regions:
    - code: us-central1
      cloudProvider: gcp
      domain: cluster.domain.us-central
      nodes: 3
```

Apply the new settings to the cluster:

helm upgrade --reuse-values $CRDBCLUSTER ./cockroachdb-parent/charts/cockroachdb --values ./cockroachdb-parent/charts/cockroachdb/values.yaml -n $NAMESPACE

Verify that the pods were successfully removed:
```
kubectl get pods
```

Decommission nodes

When a Kubernetes node is scheduled for removal or maintenance, the CockroachDB operator can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline.

Note:

Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node and the decommissioning process for the CockroachDB pods scheduled on this Kubernetes node begins immediately.

If cluster resources are constrained, replacement pods may remain in the Pending state until the Kubernetes scheduler identifies suitable nodes.

The following prerequisites are necessary for the CockroachDB operator to be able to decommission a CockroachDB node:

The --enable-k8s-node-/controller=true flag must be enabled in the operator's .yaml values file, for example:
```
containers:
    - name: cockroach-operator
      image: /:
      args:
        - "-enable-k8s-node-controller=true"
```
At least one replica of the operator must not be on the target node.
There must be no under-replicated ranges on the CockroachDB cluster.

To mark a node for decommissioning, follow these steps:

Identify the name of the Kubernetes node that is to be removed.
Annotate the Kubernetes node with crdb.cockroachlabs.com/decommission="true". The decommissioning process begins immediately after this annotation is applied. Using kubectl, for example:
```
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
```
Monitor the cluster:
- Confirm the decommissioned node's cordoned status:
```
  kubectl describe node {example-node-name}
```
- Monitor operator events and logs for decommission start and completion messages:
```
  kubectl logs pod {operator-pod-name}
```

If the replacement pods remain in a Pending state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

Pricing

Contact us

Sign In

Cluster Scaling with the CockroachDB Operator

Add nodes

Remove nodes

Decommission nodes

Tell us about your experience

Thank you for your feedback!

Explore More Documentation:

Cluster Scaling with the CockroachDB Operator

Add nodes

Remove nodes

Decommission nodes

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation: