Solved

Key collision errors after K10 cluster restore


  • Not a newbie anymore
  • 2 comments

I recently did an “unscheduled rebuild” of my cluster and restored K10 using the disaster recovery process outlined in the docs.

 

I did run into an issue with a couple of the restore pods hanging waiting for a volume to be created so worked around this by manually creating the pvc and that allowed the restore to complete successfully. 

 

I mention this because I suspect that might have something to do with the current issue I face which is the ‘media’ namespace backup is failing nightly with the following error. 

 

- cause:
cause:
cause:
cause:
cause:
message: "[PUT /artifacts/{itemId}][409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:[]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
message: "[PUT /artifacts/{itemId}][409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:[]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed
- cause:
cause:
cause:
cause:
cause:
message: "[PUT /artifacts/{itemId}][409] updateArtifactConflict &{Code:409
Message:Conflict. Cause: Key collision. UserMessages:[]}"
fields:
- name: artID
value: 38c0aa5e-c56f-11ec-98a8-02e5dc1cf223
file: kasten.io/k10/kio/rest/clients/catalogclient.go:363
function: kasten.io/k10/kio/rest/clients.UpdateArtifact
linenumber: 363
message: Unable to update artifact
file: kasten.io/k10/kio/repository/utils.go:96
function: kasten.io/k10/kio/repository.CreateOrUpdateRepositoryArtifact
linenumber: 96
message: Failed to update Repository artifact
file: kasten.io/k10/kio/collections/kopia/manager.go:153
function: kasten.io/k10/kio/collections/kopia.(*KopiaManager).Export
linenumber: 153
message: Failed to add repository artifact for collections
file: kasten.io/k10/kio/exec/phases/phase/migrate.go:146
function: kasten.io/k10/kio/exec/phases/phase.(*migrateSendPhase).Run
linenumber: 146
message: Failed to export collection
message: Job failed to be executed


Here are one of the problem pvcs from the restore

 

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2023-02-04T02:19:58Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
kasten.io/backup-volume: enabled
kustomize.toolkit.fluxcd.io/name: apps-media-jellyfin
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: jellyfin-config-v1
namespace: media
resourceVersion: "19044618"
uid: 3a2355f7-2197-43fa-8853-2043bef564af
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
volumeName: pvc-3a2355f7-2197-43fa-8853-2043bef564af
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 30Gi
phase: Bound

 

I have tried deleting ALL of the exported backups and shapshots and even went so far as recreating the policy but the error comes back. How can I resolve this to get the backups back to a working state? My next step will probably be to get drastic and do a complete reinstall of K10 but I am hoping to avoid it if possible.

 

Thanks in advance for your input.

icon

Best answer by rust84 14 March 2023, 22:50

View original

5 comments

Userlevel 7
Badge +7

@jaiganeshjk 

Userlevel 6
Badge +2

@rust84 I am not exactly sure on this error, However from the message, it suggests that there is an artifact in the catalog that has the same key/value as the resource that you are trying to backup.

 

There are few logs and outputs which we would like to look at and it could be better handled and tracked through a case(As this might involve looking through the saved artifacts from your namespace).

 

Would you be able to open a case with us from my.veeam.com and select Veeam Kasten by K10 Trial under the product to raise a case with the debug logs?

Thanks @jaiganeshjk I have raised a case now with the requested logs.

I’ve resolved my issue for now with the help of the support team. In case somebody comes across this issue in their own install and needs a quick workaround, then creating a new profile pointed to the same NFS dir with a different profile name does the trick.

This generated a new repositoryArtifact removing the conflict and allowed the export to complete successfully. Thanks to @jaiganeshjk for the excellent support.

Userlevel 6
Badge +2

@rust84 Thank you for your support and patience thus far.

We found the root cause of the issue in this case and we are currently working on enhancing the product to avoid such situation in future.

We don’t expect/support importing the restorepoints in the same k10 installation where it was initially exported from.

Below is the gist of what caused this issue.

 

You seem to have run an import for the media namespace after the catalog is restored using K10 DR.
Prior to 5.5.4 we didn't have the concept of imported repositories.
We introduced a change after 5.5.4 for repositoryArtifact to track and find the difference between imported and exported repositories(imported repositories are set to read-only)

with this particular timing, you happened to DR restore data to a catalog state where the export-side repo artifact didn't yet have API keys, prior to 5.5.4. Then, prior to running another export to that same repo using 5.5.4, you happened to run an import into the same k10 instance first. Due to the hashing divergence, the catalog happily added the import-side repo artifact (API keys included), which didn't clash with any existing API keys.

Then when you started running export again, causing a conflict with the new import-side artifact which already had the expected keys.

Two conflicting artifacts created on two different versions of k10
Created in April 2022:
"path": "media"
"repoPath": "media/kopia"
Created February 2023:
"path": "media/media"
"repoPath": "kopia

Comment