I recently did an “unscheduled rebuild” of my cluster and restored K10 using the disaster recovery process outlined in the docs.
I did run into an issue with a couple of the restore pods hanging waiting for a volume to be created so worked around this by manually creating the pvc and that allowed the restore to complete successfully.
I mention this because I suspect that might have something to do with the current issue I face which is the ‘media’ namespace backup is failing nightly with the following error.
- cause:
cause:
cause:
cause:
cause:
message: " PUT /artifacts/{itemId}]c409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:U]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
message: " PUT /artifacts/{itemId}]c409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:U]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
message: " PUT /artifacts/{itemId}]c409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:U]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed
Here are one of the problem pvcs from the restore
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2023-02-04T02:19:58Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
kasten.io/backup-volume: enabled
kustomize.toolkit.fluxcd.io/name: apps-media-jellyfin
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: jellyfin-config-v1
namespace: media
resourceVersion: "19044618"
uid: 3a2355f7-2197-43fa-8853-2043bef564af
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
volumeName: pvc-3a2355f7-2197-43fa-8853-2043bef564af
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 30Gi
phase: Bound
I have tried deleting ALL of the exported backups and shapshots and even went so far as recreating the policy but the error comes back. How can I resolve this to get the backups back to a working state? My next step will probably be to get drastic and do a complete reinstall of K10 but I am hoping to avoid it if possible.
Thanks in advance for your input.