Hi all,
The infrastructure is as follows:
rke2 kubernetes cluster provided by Rancher.
VMware vSphere 7.0.3
minio s3 bucket for exports
This is our default storage class which is using a tag based placement vSphere storage policy.
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: rancher-vsphere-csi
meta.helm.sh/release-namespace: kube-system
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2023-05-11T12:34:14Z"
labels:
app.kubernetes.io/managed-by: Helm
name: vsphere-csi-sc
resourceVersion: "731"
uid: d91c57ea-3c0d-4fab-bcca-ae1adbd1d84e
parameters:
storagepolicyname: <REDACTED>
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
When I perform a backup, it will perform the first two steps of “Backup” stage (ie. “Snapshotting workload” and “Snapshotting Application configuration”), but it will fail at the “Snapshotting Application Components” stage. The error seems to imply that it cannot find the volume within vSphere.
Below is the logs from the executor-svc pods.
{
"File": "kasten.io/k10/kio/exec/internal/runner/runner.go",
"Function": "kasten.io/k10/kio/exec/internal/runner.(*Runner).maybeExecJob",
"JobID": "f4bfb713-4325-11ee-887a-4e5290b9f5f1",
"Line": 230,
"ManifestID": "f4bf47ba-4325-11ee-8358-4eafe08e0033",
"QueuedJobID": "f4bfb713-4325-11ee-887a-4e5290b9f5f1",
"RequestID": "8dbb15f3-4314-11ee-91c4-b2e3db1d0aa0",
"SubjectRef": "kasten-io:nagios-db",
"cluster_name": "09ad248c-168b-4943-9984-5d4498ee291b",
"error": {
"message": "Failed checking jobs in group",
"function": "kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).Run",
"linenumber": 96,
"file": "kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:96",
"fields": b
{
"name": "manifestID",
"value": "f4bf47ba-4325-11ee-8358-4eafe08e0033"
},
{
"name": "jobID",
"value": "f4bfb713-4325-11ee-887a-4e5290b9f5f1"
},
{
"name": "groupIndex",
"value": 0
}
],
"cause": {
"message": "Failure in snapshotting workload nagios-db",
"function": "kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).processGroup",
"linenumber": 196,
"file": "kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:196",
"fields":
{
"name": "FailedSubPhases",
"value":
{
"Phase": "Snapshotting Workload nagios-db",
"Err": {
"cause": {
"cause": {
"cause": {
"message": "Failure in snapshotting application components"
},
"fields":
{
"name": "FailedSubPhases",
"value":
{
"Err": {
"cause": {
"cause": {
"cause": {
"cause": {
"cause": {
"message": "Failed to query the disk: ServerFaultCode: The object or item referred to could not be found."
},
"fields":
{
"name": "VolumeID",
"value": "52ae2e54-cb86-4b0e-9af1-f4be68224e12"
}
],
"file": "kasten.io/k10/kio/exec/phases/phase/snapshot.go:636",
"function": "kasten.io/k10/kio/exec/phases/phase.ProviderSnapshot",
"linenumber": 636,
"message": "Volume unavailable"
},
"fields":
{
"name": "volumeName",
"value": "nagios-mariadb"
},
{
"name": "volumeNamespace",
"value": "nagios-mariadb"
}
],
"file": "kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:848",
"function": "kasten.io/k10/kio/exec/phases/backup.basicVolumeSnapshot.func1.1",
"linenumber": 848,
"message": "Error snapshotting volume"
},
"fields":
{
"name": "appName",
"value": "nagios-mariadb"
},
{
"name": "appType",
"value": "statefulset"
},
{
"name": "namespace",
"value": "nagios-mariadb"
}
],
"file": "kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:859",
"function": "kasten.io/k10/kio/exec/phases/backup.basicVolumeSnapshot",
"linenumber": 859,
"message": "Failed to snapshot volumes"
},
"file": "kasten.io/k10/kio/exec/phases/backup/snapshot_data_phase.go:385",
"function": "kasten.io/k10/kio/exec/phases/backup.processVolumeArtifacts",
"linenumber": 385,
"message": "Failed snapshots for workload"
},
"fields": ],
"message": "Job failed to be executed"
},
"ID": "f50019b9-4325-11ee-8358-4eafe08e0033",
"Phase": "Snapshotting Application Components"
}
]
}
],
"file": "kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:196",
"function": "kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).processGroup",
"linenumber": 196,
"message": "Failure in snapshotting application components"
},
"fields":
{
"name": "manifestID",
"value": "f4fa5fba-4325-11ee-8358-4eafe08e0033"
},
{
"name": "jobID",
"value": "f4fbe6ab-4325-11ee-887a-4e5290b9f5f1"
},
{
"name": "groupIndex",
"value": 0
}
],
"file": "kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:96",
"function": "kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).Run",
"linenumber": 96,
"message": "Failed checking jobs in group"
},
"fields": ],
"message": "Job failed to be executed"
},
"ID": "f4fa5fba-4325-11ee-8358-4eafe08e0033"
}
]
}
],
"cause": {
"message": "Failure in snapshotting workload nagios-db"
}
}
},
"hostname": "executor-svc-547b97c699-zpggs",
"level": "error",
"msg": "Job failed",
"time": "2023-08-25T09:04:56.848Z",
"version": "6.0.5"
}
When i use govc to find the volume using the VolumeID in the logs above it is valid:
govc volume.ls | grep 52ae2e54-cb86-4b0e-9af1-f4be68224e12
52ae2e54-cb86-4b0e-9af1-f4be68224e12 pvc-b6f70289-5c79-4d20-bfa5-bac1d2f60190
It’s also important to note that although the k8s cluster is deployed using Rancher, I did not use Rancher’s partner repository to install Kasten. It was installed from Kasten’s own Helm repo.
My end goal for this is to export these backups to Veeam backup and replication 12.
Please let me know if you need any further info.
Any assistance is gratefully appreciated.
Matt