Velero
Velero, a VMWare-backed open-source project, is a mature cloud-native backup solution, able to selectively backup / restore your various workloads / data.
For ElfHosted, I rely on Velero to automatically snapshot TBs of data, so that in the event of a disaster which impacts user-managed data2, I'll be able to perform a quick restore.
What is ElfHosted?
ElfHosted is "self-hosting as a service" (SHAAS? ) - Using our Kubernetes / GitOps designs, we've build infrastructure and automation to run popular self-hosted apps (think "Plex, Radarr, Mattermost..") and attach your own cloud storage ("bring-your-own-storage").
You get $10 free credit when you sign up, so you can play around without commitment!
We're building "in public", so follow the progress in the open-source repos, the blog or in Discord.
TL;DR? Here's a guide to getting started, and another to migrating from another provider.
Ingredients
- A Kubernetes cluster
- Flux deployment process bootstrapped
Optional:
- S3-based storage for off-cluster backup
Optionally for volume snapshot support:
- Persistence supporting PVC snapshots for in-cluster backup (i.e., Rook Ceph)
- Snapshot controller with validation webhook
Terminology
Let's get some terminology out of the way. Velero manages Backups and Restores, to BackupStorageLocations, and optionally snapshots volumes to VolumeSnapshotLocations, either manually or on a Schedule.
Clear as mud?
Preparation
Velero Namespace
We need a namespace to deploy our HelmRelease and associated YAMLs into. Per the flux design, I create this example yaml in my flux repo at /bootstrap/namespaces/namespace-velero.yaml
:
apiVersion: v1
kind: Namespace
metadata:
name: velero
Velero HelmRepository
We're going to install the Velero helm chart from the vmware-tanzu repository, so I create the following in my flux repo (assuming it doesn't already exist):
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
name: vmware-tanzu
namespace: flux-system
spec:
interval: 15m
url: https://vmware-tanzu.github.io/helm-charts
Velero Kustomization
Now that the "global" elements of this deployment (just the HelmRepository in this case) have been defined, we do some "flux-ception", and go one layer deeper, adding another Kustomization, telling flux to deploy any YAMLs found in the repo at /velero/
. I create this example Kustomization in my flux repo:
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: velero
namespace: flux-system
spec:
interval: 30m
path: ./velero
prune: true # remove any elements later removed from the above path
timeout: 10m # if not set, this defaults to interval duration, which is 1h
sourceRef:
kind: GitRepository
name: flux-system
healthChecks:
- apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
name: velero
namespace: velero
Fast-track your fluxing! π
Is crafting all these YAMLs by hand too much of a PITA?
"Premix" is a git repository, which includes an ansible playbook to auto-create all the necessary files in your flux repository, for each chosen recipe!
Let the machines do the TOIL!
SealedSecret
We'll need credentials to be able to access our S3 storage, so let's create them now. Velero will use AWS credentials in the standard format preferred by the AWS SDK, so create a temporary file like this:
[default]
aws_access_key_id = YOUR_AWS_ACCESS_KEY_OR_S3_COMPATIBLE_EQUIVALENT
aws_secret_access_key = YOUR_AWS_SECRET_KEY_OR_S3_COMPATIBLE_EQUIVALENT
And then turn this file into a secret, and seal it, with:
kubectl create secret generic -n velero velero-credentials \
--from-file=cloud=mysecret.aws.is.dumb \
-o yaml --dry-run=client \
| kubeseal > velero/sealedsecret-velero-credentials.yaml
You can now delete mysecret.aws.is.dumb
Velero HelmRelease
Lastly, having set the scene above, we define the HelmRelease which will actually deploy velero into the cluster. We start with a basic HelmRelease YAML, like this example:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: velero
namespace: velero
spec:
chart:
spec:
chart: velero
version: 5.1.x # auto-update to semver bugfixes only (1)
sourceRef:
kind: HelmRepository
name: vmware-tanzu
namespace: flux-system
interval: 15m
timeout: 5m
releaseName: velero
values: # paste contents of upstream values.yaml below, indented 4 spaces (2)
- I like to set this to the semver minor version of the Velero current helm chart, so that I'll inherit bug fixes but not any new features (since I'll need to manually update my values to accommodate new releases anyway)
- Paste the full contents of the upstream values.yaml here, indented 4 spaces under the
values:
key
If we deploy this helmrelease as-is, we'll inherit every default from the upstream Velero helm chart. That's probably hardly ever what we want to do, so my preference is to take the entire contents of the Velero helm chart's values.yaml, and to paste these (indented), under the values
key. This means that I can then make my own changes in the context of the entire values.yaml, rather than cherry-picking just the items I want to change, to make future chart upgrades simpler.
Why not put values in a separate ConfigMap?
Didn't you previously advise to put helm chart values into a separate ConfigMap?
Yes, I did. And in practice, I've changed my mind.
Why? Because having the helm values directly in the HelmRelease offers the following advantages:
- If you use the YAML extension in VSCode, you'll see a full path to the YAML elements, which can make grokking complex charts easier.
- When flux detects a change to a value in a HelmRelease, this forces an immediate reconciliation of the HelmRelease, as opposed to the ConfigMap solution, which requires waiting on the next scheduled reconciliation.
- Renovate can parse HelmRelease YAMLs and create PRs when they contain docker image references which can be updated.
- In practice, adapting a HelmRelease to match upstream chart changes is no different to adapting a ConfigMap, and so there's no real benefit to splitting the chart values into a separate ConfigMap, IMO.
Then work your way through the values you pasted, and change any which are specific to your configuration.
Configure Velero
Here are some areas of the upstream values.yaml to pay attention to..
initContainers
Uncomment velero-plugin-for-aws
to use an S3 target for backup, and additionally uncomment velero-plugin-for-csi
if you plan to create volume snapshots:
# Init containers to add to the Velero deployment's pod spec. At least one plugin provider image is required.
# If the value is a string then it is evaluated as a template.
initContainers:
- name: velero-plugin-for-csi
image: velero/velero-plugin-for-csi:v0.6.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.8.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
backupStorageLocation
Additionally, it's required to configure certain values (highlighted below) under the configuration
key:
configuration:
# Parameters for the BackupStorageLocation(s). Configure multiple by adding other element(s) to the backupStorageLocation slice.
# See https://velero.io/docs/v1.6/api-types/backupstoragelocation/
backupStorageLocation:
# name is the name of the backup storage location where backups should be stored. If a name is not provided,
# a backup storage location will be created with the name "default". Optional.
- name:
# provider is the name for the backup storage location provider.
provider: aws # if we're using S3-compatible storage (1)
# bucket is the name of the bucket to store backups in. Required.
bucket: my-awesome-bucket # the name of my specific bucket (2)
# caCert defines a base64 encoded CA bundle to use when verifying TLS connections to the provider. Optional.
caCert:
# prefix is the directory under which all Velero data should be stored within the bucket. Optional.
prefix: optional-subdir # a path under the bucket in which the backup data should be stored (3)
# default indicates this location is the default backup storage location. Optional.
default: true # prevents annoying warnings in the log
# validationFrequency defines how frequently Velero should validate the object storage. Optional.
validationFrequency:
# accessMode determines if velero can write to this backup storage location. Optional.
# default to ReadWrite, ReadOnly is used during migrations and restores.
accessMode: ReadWrite
credential:
# name of the secret used by this backupStorageLocation.
name: velero-credentials # this is the sealed-secret we created above (3)
# name of key that contains the secret data to be used.
key: cloud # this is the key we used in the sealed-secret we created above (3)
# Additional provider-specific configuration. See link above
# for details of required/optional fields for your provider.
config:
region: # set-this-to-your-b2-region, for example us-west-002
s3ForcePathStyle:
s3Url: # set this to the https URL to your endpoint, for example "https://s3.us-west-002.backblazeb2.com"
# kmsKeyId:
# resourceGroup:
# The ID of the subscription containing the storage account, if different from the clusterβs subscription. (Azure only)
# subscriptionId:
# storageAccount:
# publicUrl:
# Name of the GCP service account to use for this backup storage location. Specify the
# service account here if you want to use workload identity instead of providing the key file.(GCP only)
# serviceAccount:
# Option to skip certificate validation or not if insecureSkipTLSVerify is set to be true, the client side should set the
# flag. For Velero client Command like velero backup describe, velero backup logs needs to add the flag --insecure-skip-tls-verify
# insecureSkipTLSVerify:
- There are other providers
- Your bucket name, unique to your S3 provider
- I use prefixes to backup multiple clusters to the same bucket
volumeSnapshotLocation
Also under the config
key, you'll find the volumeSnapshotLocation
section. Use this if you're using a supported provider, and you want to create in-cluster snapshots. In the following example, I'm creating Velero snapshots with rook-ceph using the CSI provider. Take note of the highlighted sections, these are the minimal options you'll want to set:
volumeSnapshotLocation:
# name is the name of the volume snapshot location where snapshots are being taken. Required.
- name: rook-ceph
# provider is the name for the volume snapshot provider. If omitted
# `configuration.provider` will be used instead.
provider: csi
# Additional provider-specific configuration. See link above
# for details of required/optional fields for your provider.
config: {}
# region:
# apiTimeout:
# resourceGroup:
# The ID of the subscription where volume snapshots should be stored, if different from the clusterβs subscription. If specified, also requires `configuration.volumeSnapshotLocation.config.resourceGroup`to be set. (Azure only)
# subscriptionId:
# incremental:
# snapshotLocation:
# project:
# These are server-level settings passed as CLI flags to the `velero server` command. Velero
# uses default values if they're not passed in, so they only need to be explicitly specified
# here if using a non-default value. The `velero server` default values are shown in the
# comments below.
# --------------------
# `velero server` default: restic
uploaderType:
# `velero server` default: 1m
backupSyncPeriod:
# `velero server` default: 4h
fsBackupTimeout:
# `velero server` default: 30
clientBurst:
# `velero server` default: 500
clientPageSize:
# `velero server` default: 20.0
clientQPS:
# Name of the default backup storage location. Default: default
defaultBackupStorageLocation:
# How long to wait by default before backups can be garbage collected. Default: 72h
defaultBackupTTL:
# Name of the default volume snapshot location.
defaultVolumeSnapshotLocations: csi:rook-ceph
# `velero server` default: empty
disableControllers:
# `velero server` default: 1h
garbageCollectionFrequency:
# Set log-format for Velero pod. Default: text. Other option: json.
logFormat:
# Set log-level for Velero pod. Default: info. Other options: debug, warning, error, fatal, panic.
logLevel:
# The address to expose prometheus metrics. Default: :8085
metricsAddress:
# Directory containing Velero plugins. Default: /plugins
pluginDir:
# The address to expose the pprof profiler. Default: localhost:6060
profilerAddress:
# `velero server` default: false
restoreOnlyMode:
# `velero server` default: customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
restoreResourcePriorities:
# `velero server` default: 1m
storeValidationFrequency:
# How long to wait on persistent volumes and namespaces to terminate during a restore before timing out. Default: 10m
terminatingResourceTimeout:
# Comma separated list of velero feature flags. default: empty
# features: EnableCSI
features: EnableCSI
# `velero server` default: velero
namespace:
schedules
Set up backup schedule(s) for your preferred coverage, TTL, etc. See Schedule for a list of available configuration options under the template
key:
schedules:
daily-backups-r-cool:
disabled: false
labels:
myenv: foo
annotations:
myenv: foo
schedule: "0 0 * * *" # once a day, at midnight
useOwnerReferencesInBackup: false
template:
ttl: "240h"
storageLocation: default # use the same name you defined above in backupStorageLocation
includedNamespaces:
- foo
Install Velero!
Commit the changes to your flux repository, and either wait for the reconciliation interval, or force a reconcilliation using flux reconcile source git flux-system
. You should see the kustomization appear...
~ β― flux get kustomizations velero
NAME READY MESSAGE REVISION SUSPENDED
velero True Applied revision: main/70da637 main/70da637 False
~ β―
The helmrelease should be reconciled...
~ β― flux get helmreleases -n velero velero
NAME READY MESSAGE REVISION SUSPENDED
velero True Release reconciliation succeeded v5.1.x False
~ β―
And you should have happy pods in the velero namespace:
~ β― k get pods -n velero -l app.kubernetes.io/name=velero
NAME READY STATUS RESTARTS AGE
velero-7c94b7446d-nwsss 1/1 Running 0 5m14s
~ β―
Is it working?
Confirm that the basic config is good, by running kubectl logs -n velero -l app.kubernetes.io/name=velero
:
time="2023-10-17T22:24:40Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/b2 controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:152"
time="2023-10-17T22:24:41Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/b2 controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:137"
Confirm Velero is happy with your BackupStorageLocation
The pod output will tell you if Velero is unable to access your BackupStorageLocation. If this happens, the most likely cause will be a misconfiguration of your S3 settings!
Test backup
Next, you'll need the Velero CLI, which you can install on your OS based on the instructions here
Create a "quickie" backup of a namespace you can afford to loose, like this:
velero backup create goandbeflameretardant --include-namespaces=chartmuseum --wait
Confirm your backup completed successfully, with:
velero backup describe goandbeflameretardant
Then, like a boss, delete the original namespace (you can afford to loose it, right?) with some bad-ass command like kubectl delete ns chartmuseum
. Now it's gone.
Test restore
Finally, in a kick-ass move of ninja sysadmin awesomeness, restore your backup with:
velero create restore --from-backup goandbeflameretardant --wait
Confirm that your pods / data have been restored.
Congratulations, you have a backup!
Test scheduled backup
Confirm the basics are working by running velero get schedules
, to list your schedules:
davidy@gollum01:~$ velero get schedules
NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR PAUSED
velero-daily Enabled 2023-10-13 04:20:42 +0000 UTC 0 0 * * * 240h0m0s 22h ago <none> false
davidy@gollum01:~$
Force an immediate backup per che schedule, by running velero backup create --from-schedule=velero-daily
:
davidy@gollum01:~$ velero backup create --from-schedule=velero-daily
Creating backup from schedule, all other filters are ignored.
Backup request "velero-daily-20231017222207" submitted successfully.
Run `velero backup describe velero-daily-20231017222207` or `velero backup logs velero-daily-20231017222207` for more details.
davidy@gollum01:~$
Use the describe
and logs
command outputted above to check the state of your backup (you'll only get the backup logs after the backup has completed)
When describing your completed backup, if the result is anything but a complete success, then further investigation is required.
Summary
What have we achieved? We've got scheduled backups running, and we've successfully tested a restore!
Summary
Created:
- Velero running and creating restorable backups on schedule
Chef's notes π
Tip your waiter (sponsor) π
Did you receive excellent service? Want to compliment the chef? (..and support development of current and future recipes!) Sponsor me on Github / Ko-Fi / Patreon, or see the contribute page for more (free or paid) ways to say thank you! π
Employ your chef (engage) π€
Is this too much of a geeky PITA? Do you just want results, stat? I do this for a living - I'm a full-time Kubernetes contractor, providing consulting and engineering expertise to businesses needing short-term, short-notice support in the cloud-native space, including AWS/Azure/GKE, Kubernetes, CI/CD and automation.
Learn more about working with me here.
Flirt with waiter (subscribe) π
Want to know now when this recipe gets updated, or when future recipes are added? Subscribe to the RSS feed, or leave your email address below, and we'll keep you updated.