This blog post has been updated from an earlier one to use new features that make deploying on your own system easier.
Vertica released the VerticaDB operator in August, 2021, which began Vertica’s integration with Kubernetes. The operator automates many Vertica administrator tasks, such as restarting Vertica if any of the nodes go down, upgrading Vertica to a new version while keeping the database online, and integrating with the Kubernetes HorizontalPodAutoscaler.
This blog post shows you how easy it is to get Vertica up and running inside Kubernetes.
You don’t need a full understanding of Kubernetes to navigate this blog. It does not contain much detail about some of the Kubernetes concepts. There are plenty of great resources available online that provide additional information.
Kubernetes Setup
Typically, developers think that they must run Kubernetes inside a large multi-node cluster. However, for testing purposes, Kubernetes is small enough that you can run it on your own computer. That is what we are going to do in this tutorial.
There are a few ways to run Kubernetes locally. This tutorial uses kind, which stands for “Kubernetes IN Docker”.
In addition to kind, you must install the following:
- Docker
- kubectl
- Helm
First, download the kind binary. If you are running Linux, you can copy and paste the commands below to download it. For other operating systems, follow the instructions at the kind quick start page.
$ curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-linux-amd64
$ chmod +x ./kind
$ sudo mv ./kind /usr/local/bin
The next tool is kubectl, the CLI that you use to talk to a Kubernetes cluster. If you are running Linux, you can copy the commands below. For other operating systems, see the kubectl download page.
$ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" $ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
And the last tool you need is Helm. This is a package manager for Kubernetes, and we will use it to install the operator. You can download its installer with these commands:
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 $ chmod 700 get_helm.sh $ ./get_helm.sh
After you download these tools, you can create the Kubernetes cluster. First, create a config file. You can copy/paste the command below in your shell or create a config file using your favorite editor:
$ cat << EOF > kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 32001
hostPort: 32001
EOF
After you create the config, you can create the cluster:
$ kind create cluster --config kind.yaml
Creating cluster "kind" …
✓ Ensuring node image (kindest/node:v1.24.0) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Deploy the Operator
There are two official ways to install the operator. This tutorial installs the operator using Helm. You can also deploy it with the operator lifecycle manager. For additional details, see the official documentation.
Run this command to install using the official Helm chart:
$ helm repo add vertica-charts https://vertica.github.io/charts $ helm repo update $ helm install vdb-op --wait --namespace my-verticadb-operator --create-namespace vertica-charts/verticadb-operator
This installs the operator in the Kubernetes namespace my-verticadb-operator.
This command waits until the pod running the operator is ready.
VerticaDB
At this point, we have created the Kubernetes cluster and deployed the VerticaDB operator. Now, we are ready to create the Vertica database.
To create a Vertica database, we must create an instance of the VerticaDB CR. When this happens, the operator reacts to this and creates the necessary objects and bootstraps the database by creating a new one.
We use the following sample CR:
$ cat << EOF > vdb.yaml
apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
name: verticadb-sample
spec:
annotations:
VERTICA_MEMDEBUG: “2” # Required if running macOS with an arm based chip
communal:
path: "/communal/vertica-db-tutorial"
includeUIDInPath: true
subclusters:
- name: sc
volumes:
- name: hostpath
hostPath:
path: /tmp
volumeMounts:
- name: hostpath
mountPath: /communal
EOF
NOTE: There are many more parameters for the CR—the above is the required minimum. Refer to the Vertica documentation for a complete list.
This CR creates a new database using a host path on your system for communal storage. This only works because we created a single-node Kubernetes cluster. If you run on a multi-node Kubernetes cluster, you need to use a communal source accessible by all nodes. The database has a single three-node subcluster, named sc. This subcluster uses the community edition license, so you are restricted to at most three Vertica pods.
To begin creating the database, you can apply the CR with the following command:$ kubectl apply --namespace my-verticadb-operator -f vdb.yaml
The above command returns immediately, but it will take a few minutes to download the Vertica image and create the database. You can wait for this to complete with the following command:
$ kubectl wait --for=condition=DBInitialized=True --namespace my-verticadb-operator vdb/verticadb-sample --timeout=10m
This command does not show any output until everything is set up. There are a few things to check to monitor the progress. If it is busy downloading the Vertica containers, you can view its progress by issuing this command:
$ kubectl get pods --namespace my-verticadb-operator --selector app.kubernetes.io/instance=verticadb-sample NAME READY STATUS RESTARTS AGE verticadb-sample-sc-0 0/1 ContainerCreating 0 97s verticadb-sample-sc-1 0/1 ContainerCreating 0 97s verticadb-sample-sc-2 0/1 ContainerCreating 0 97s
If it is creating the database, the operator logs events as it calls various commands under the hood. You can view the events to gain more visibility into what the operator is doing. To see the events, run the describe command against your CR:
$ kubectl describe --namespace my-verticadb-operator vdb/verticadb-sample … Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreateDBStart 2m59s verticadb-operator Calling 'admintools -t create_db' Normal CreateDBSucceeded 113s verticadb-operator Successfully created database with subcluster sc. It took 1m6.188642254s
You can also see a quick status line about the VerticaDB by issuing the following command:
$ kubectl --namespace my-verticadb-operator get vdb
NAME AGE SUBCLUSTERS INSTALLED DBADDED UP
verticadb-sample 9m51s 1 3 3 3
Client Access
Now you have a database—how can you access it? For ad-hoc queries, you can run vsql directly from one of the pods:
$ kubectl exec -it --namespace my-verticadb-operator verticadb-sample-sc-0 -- vsql
However, this isn’t application-friendly: you need to know the name of the pod and it must be in the UP state. The best way to connect to Vertica is through service objects. By default, we create one service object for each subcluster. You can see the service objects with this command:
$ kubectl --namespace my-verticadb-operator get service --selector vertica.com/svc-type=external NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE verticadb-sc ClusterIP 10.96.110.181 5433/TCP,5444/TCP 24m
For easier discovery, the name of service object contains both the VerticaDB name and subcluster name. The networking model for Kubernetes is a flat structure, which means that any pod can access any other pod. The name of service objects is stored in DNS servers so it can be accessed with a fully qualified domain name. Service objects also load balance among a set of pods and only pick the pods that are in the “Ready” state. For pods running Vertica, this means that the server is UP and connectable. However, the default behavior is that service objects are accessible only from within Kubernetes on a cluster-internal IP, known as ClusterIP.
If you want to access the service from outside Kubernetes, there are a few options that depend on where Kubernetes is deployed. All major cloud vendors provide a LoadBalancer type that exposes the service externally through the cloud vendor’s load balancer. This isn’t available out-of-the-box with kind. Our only option is to use NodePort, which exposes the service through a port on all Kubernetes nodes. When we created the kind cluster, there was a port mapping added for this purpose.
To use NodePort so that you can access Vertica from your host machine, run this patch:
$ kubectl patch vdb --namespace my-verticadb-operator verticadb-sample --type=merge --patch '{"spec": {"subclusters": [{"name": "sc", "serviceType": "NodePort", "nodePort": 32001}]}}' verticadb.vertica.com/verticadb-sample patched
This forces the operator to change the service object type. Now, you can access Vertica from your host machine with localhost using port 32001:$ vsql -U dbadmin -h localhost -p 32001
Cleanup
When you are done with your system, you must delete your K8s cluster and delete any files in the hostPath that was set in the config file (/tmp, unless changed).
To delete the K8s cluster, issue the following command:
$ kind delete cluster
Deleting cluster "kind" …
This should be enough to get you started using Vertica in Kubernetes. The CR that we used is minimal—there are many more parameters that are available to customize your database. For more information, visit the GitHub page that hosts the operator, or our official documentation.