Deploy a CockroachDB cluster in Amazon EKS by using Terraform - AWS Prescriptive Guidance

Deploy a CockroachDB cluster in Amazon EKS by using Terraform

Sandip Gangapadhyay and Kalyan Senthilnathan, Amazon Web Services

Summary

This pattern provides a HashiCorp Terraform module for deploying a multi-node CockroachDB cluster on Amazon Elastic Kubernetes Service (Amazon EKS) by using the CockroachDB operator. CockroachDB is a distributed SQL database that provides automatic horizontal sharding, high availability, and consistent performance across geographically distributed clusters. This pattern uses Amazon EKS as the managed Kubernetes platform and implements cert-manager for TLS-secured node communication. It also uses a Network Load Balancer for traffic distribution and creates CockroachDB StatefulSets with pods that automatically replicate data for fault tolerance and performance.

Intended audience

To implement this pattern, we recommend that you are familiar with the following:

  • HashiCorp Terraform concepts and infrastructure as code (IaC) practices

  • AWS services, particularly Amazon EKS

  • Kubernetes fundamentals, including StatefulSets, operators, and service configurations

  • Distributed SQL databases

  • Security concepts, such as TLS certificate management.

  • DevOps practices, CI/CD workflows, and infrastructure automation

Prerequisites and limitations

Prerequisites

Limitations

  • The CockroachDB Kubernetes operator does not support multiple Kubernetes clusters for multi-Region deployments. For more limitation, see Orchestrate CockroachDB Across Multiple Kubernetes Clusters (CockroachDB documentation) and CockroachDB Kubernetes Operator (GitHub).

  • Automatic pruning of persistent volume claims (PVCs) is currently disabled by default. This means that after decommissioning and removing a node, the operator will not remove the persistent volume that was mounted to its pod. For more information, see Automatic PVC pruning in the CockroachDB documentation.

Product versions

  • CockroachDB version 22.2.2

Architecture

Target architecture

The following diagram shows a highly available CockroachDB deployment across three AWS Availability Zones within a virtual private cloud (VPC). The CockroachDB pods are managed through Amazon EKS. The architecture illustrates how users access the database through a Network Load Balancer, which distributes traffic to the CockroachDB pods. The pods run on Amazon Elastic Compute Cloud (Amazon EC2) instances in each Availability Zone, which provides resilience and fault tolerance.

A highly available CockroachDB deployment across three AWS Availability Zones within a VPC.

Resources created

Deploying the Terraform module used in this pattern creates the following resources:

  1. Network Load Balancer – This resource serves as the entry point for client requests and evenly distributes traffic across the CockroachDB instances.

  2. CockroachDB StatefulSet – The StatefulSet defines the desired state of the CockroachDB deployment within the Amazon EKS cluster. It manages the ordered deployment, scaling, and updates of CockroachDB pods.

  3. CockroachDB pods – These pods are instances of CockroachDB running as containers within Kubernetes pods. These pods store and manage the data across the distributed cluster.

  4. CockroachDB database – This is the distributed database that is managed by CockroachDB, spanning multiple pods. It replicates data for high availability, fault tolerance, and performance.

Tools

AWS services

Other tools

  • HashiCorp Terraform is an infrastructure as code (IaC) tool that helps you use code to provision and manage cloud infrastructure and resources.

  • kubectl is a command-line interface that helps you run commands against Kubernetes clusters.

Code repository

The code for this pattern is available in the GitHub Deploy a CockroachDB cluster in Amazon EKS using Terraform repository. The code repository contains the following files and folders for Terraform:

  • modules folder – This folder contains the Terraform module for CockroachDB

  • main folder – This folder contains the root module that calls CockroachDB child module to create the CockroachDB database cluster.

Best practices

  • Do not scale down to fewer than three nodes. This is considered an anti-pattern on CockroachDB and can cause errors. For more information, see Cluster Scaling in the CockroachDB documentation.

  • Implement Amazon EKS autoscaling by using Karpernter or Cluster Autoscaler. This allows the CockroachDB cluster to scale horizontally and new nodes automatically. For more information, see Scale cluster compute with Karpenter and Cluster Autoscaler in the Amazon EKS documentation.

    Note

    Due to the podAntiAffinity Kubernetes scheduling rule, only one CockroachDB pod can be schedule in one Amazon EKS node.

  • For Amazon EKS security best practices, see Best Practices for Security in the Amazon EKS documentation.

  • For SQL performance best practices for CockroachDB, see SQL Performance Best Practices in the CockroachDB documentation.

  • For more information about setting up an Amazon Simple Storage Service (Amazon S3) remote backend for the Terraform state file, see Amazon S3 in the Terraform documentation.

Epics

TaskDescriptionSkills required

Clone the code repository.

Enter the following command to clone the repository:

git clone https://github.com/aws-samples/crdb-cluster-eks-terraform.git
DevOps engineer, Git

Update the Terraform variables.

  1. Enter the following command to navigate into the main folder in the cloned repository:

    cd crdb-cluster-eks-terraform/main
  2. Open the variable.tf file.

  3. Configure the default value for the following variables:

    • region – Enter the target AWS Region

    • eks_cluster_name – Enter the name for the target Amazon EKS cluster

    • number_of_nodes – Enter the number of nodes to deploy

  4. Save and close the variable.tf file.

DevOps engineer, Terraform
TaskDescriptionSkills required

Deploy the infrastructure.

  1. Enter the following command to initialize the Terraform deployment:

    terraform init
  2. Enter the following command to generate an execution plan:

    terraform plan
  3. Review the plan, and validate the resources and infrastructure components that will be created.

  4. Enter the following command to deploy the infrastructure:

    terraform apply
  5. When prompted, enter yes to confirm the deployment.

  6. Wait until the deployment is completed.

DevOps engineer, Terraform
TaskDescriptionSkills required

Verify resource creation.

  1. Enter the following command to set the Amazon EKS context using the AWS CLI:

    aws eks update-kubeconfig —name <eks_cluster_name>
  2. Enter the following command to verify the number of pods that are using CockroachDB:

    kubectl get pods -n <namespace>

    The following is a sample output.

    NAME READY STATUS RESTARTS AGE cockroach-operator-655fbf7847-zn9v8 1/1 Running 0 30m cockroachdb-0 1/1 Running 0 24m cockroachdb-1 1/1 Running 0 24m cockroachdb-2 1/1 Running 0 24m
  3. Verify that the number of pods matches the value that you defined in the variable.tf file.

DevOps engineer

(Optional) Scale up or down.

  1. In the variable.tf file, increase or decrease the number of nodes and then save the file.

  2. Repeat the steps to deploy the infrastructure through Terraform. Terraform adds or removes pods.

  3. Repeat the steps to verify the number of pods that are using Cockroach DB. For example, if you increased the number of nodes from three to four, you should now see four pods running.

DevOps engineer, Terraform
TaskDescriptionSkills required

Delete the infrastructure.

Scaling nodes to 0 will reduce compute costs. However, you will still incur charges for the persistent Amazon EBS volumes that were created by this module. To eliminate storage costs, follow these steps to delete all volumes:

  1. Enter the following command to delete the infrastructure:

    terraform destroy
  2. When prompted, enter yes to confirm.

Terraform

Troubleshooting

IssueSolution

Error validating provider credentials

When you run the Terraform apply or destroy command, you might encounter the following error:

Error: configuring Terraform AWS Provider: error validating provider  credentials: error calling sts:GetCallerIdentity: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: 123456a9-fbc1-40ed-b8d8-513d0133ba7f, api error InvalidClientTokenId: The security token included in the request is invalid.

This error is caused by the expiration of the security token for the credentials used in your local machine’s configuration. For instructions on how to resolve the error, see Set and view configuration settings in the AWS CLI documentation.

CockroachDB pods in pending state

  1. Due to the podAntiAffinity Kubernetes scheduling rule, only one CockroachDB pod can be schedule in one Amazon EKS node. If the number of CockroachDB pods exceeds number of available Amazon EKS nodes, then CockroachDB pods might be in pending state. In that case, you need to implement Cluster Autoscaler or Karpenter so that theAmazon EKS node scales automatically. For more information, see Scale cluster compute with Karpenter and Cluster Autoscaler.

  2. Check if the Kubernetes worker nodes have node=cockroachdb labels by entering the following command:

    kubectl get nodes --show-labels

    If they are not, you need to make sure that all of the worker nodes are correctly labeled. For more information, see Edit a node group configuration in the Amazon EKS documentation.

Related resources

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip