Solved

Failed to connect to the backup repository - invalid repository password


Userlevel 4

Hello! 

 

I have a big problem. We had a small network outage in our datacenter and after that I needed to restore an application. I am shocked because my backups seem to be “lost”. After a while of troubleshooting and restarting k10-services, I tried to do a restore and it is working but the export to nfs isnt working. Maybe thats the reason why k10 tells my that there are no restore points for my application. 

 

cause:
cause:
fields:
- name: FailedSubPhases
value:
- Err:
cause:
cause:
cause:
cause:
cause:
message: command terminated with exit code 1
message: invalid repository password
file: kasten.io/k10/kio/kopia/repository.go:549
function: kasten.io/k10/kio/kopia.ConnectToKopiaRepository
linenumber: 549
message: Failed to connect to the backup repository
fields:
- name: appNamespace
value: <APPNAME>-test
file: kasten.io/k10/kio/exec/phases/phase/export.go:216
function: kasten.io/k10/kio/exec/phases/phase.prepareKopiaRepoIfExportingData
linenumber: 216
message: Failed to create Kopia repository for data export
file: kasten.io/k10/kio/exec/phases/phase/export.go:138
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 138
message: Failed to copy artifacts
fields: []
message: Job failed to be executed
ID: d5e372eb-db46-11ec-ba36-aae5192d65c4
Phase: Exporting RestorePoint
file: kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:185
function: kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).processGroup
linenumber: 185
message: Failure in exporting restorepoint
fields:
- name: manifestID
value: a7222bc2-db46-11ec-ba36-aae5192d65c4
- name: jobID
value: a7297043-db46-11ec-a99e-e65a53c09c98
- name: groupIndex
value: 0
file: kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:87
function: kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).Run
linenumber: 87
message: Failed checking jobs in group
message: Job failed to be executed
fields: []

 

  • Why dont I see my restore points anymore? There should be at least one restore point as volumesnapshot remaining in the cluster (I can see them with kubectl get volumesnapshot --all-namespaces)
  • Why did the password change to my repository on nfs? Can I restore that? 

 

Thanks a lot, let me know if you need more information

 

Infos: 

  • Kubernetes: 1.21
  • K10: 4.5.14
icon

Best answer by jaiganeshjk 8 June 2022, 07:50

View original

10 comments

Userlevel 7
Badge +13

Hello! 

 

I have a big problem. We had a small network outage in our datacenter and after that I needed to restore an application. I am shocked because my backups seem to be “lost”. After a while of troubleshooting and restarting k10-services, I tried to do a restore and it is working but the export to nfs isnt working. Maybe thats the reason why k10 tells my that there are no restore points for my application. 

 

cause:
cause:
fields:
- name: FailedSubPhases
value:
- Err:
cause:
cause:
cause:
cause:
cause:
message: command terminated with exit code 1
message: invalid repository password
file: kasten.io/k10/kio/kopia/repository.go:549
function: kasten.io/k10/kio/kopia.ConnectToKopiaRepository
linenumber: 549
message: Failed to connect to the backup repository
fields:
- name: appNamespace
value: <APPNAME>-test
file: kasten.io/k10/kio/exec/phases/phase/export.go:216
function: kasten.io/k10/kio/exec/phases/phase.prepareKopiaRepoIfExportingData
linenumber: 216
message: Failed to create Kopia repository for data export
file: kasten.io/k10/kio/exec/phases/phase/export.go:138
function: kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run
linenumber: 138
message: Failed to copy artifacts
fields: []
message: Job failed to be executed
ID: d5e372eb-db46-11ec-ba36-aae5192d65c4
Phase: Exporting RestorePoint
file: kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:185
function: kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).processGroup
linenumber: 185
message: Failure in exporting restorepoint
fields:
- name: manifestID
value: a7222bc2-db46-11ec-ba36-aae5192d65c4
- name: jobID
value: a7297043-db46-11ec-a99e-e65a53c09c98
- name: groupIndex
value: 0
file: kasten.io/k10/kio/exec/phases/phase/queue_and_wait_children.go:87
function: kasten.io/k10/kio/exec/phases/phase.(*queueAndWaitChildrenPhase).Run
linenumber: 87
message: Failed checking jobs in group
message: Job failed to be executed
fields: []

 

  • Why dont I see my restore points anymore? There should be at least one restore point as volumesnapshot remaining in the cluster (I can see them with kubectl get volumesnapshot --all-namespaces)
  • Why did the password change to my repository on nfs? Can I restore that? 

 

Thanks a lot, let me know if you need more information

 

Infos: 

  • Kubernetes: 1.21
  • K10: 4.5.14

Let’s see if our captain @Geoff Burke has a solution, but until then have you open a ticket support directly with Veeam?

Userlevel 4

Let’s see if our captain @Geoff Burke has a solution, but until then have you open a ticket support directly with Veeam?

 

Hey @marcofabbri

no, we have no subscription at the moment because we are still validation k10 to be sure that it does its job how we want it. We are also on six nodes cluster at the moment

 

Thanks a lot for your really fast reply.

Userlevel 7
Badge +22

Hi Guys,

 

Please follow the procedures here to troubleshoot and get logs https://docs.kasten.io/latest/operating/support.html and post here and bits that you see that could be related (warnings, errors etc) 

On what is presented here there is simply not information to make any assessments. Off the top of my head did you try re-validating your location profile? passwords seeming to change would hint at secrets being deleted but then restores would not work. Also the snapshots are on the cluster whereas you mentioned the NFS location which are exports. 

My guess is that the restores were from the local snapshots and the NFS has somehow lost connectivity. That is only a guess though :) better to see what the logs say.

cheers

 

Userlevel 4

Hi @Geoff Burke

 

thank you and sorry for the missing information. I try to give you as much as I can. These are the things I found out the last hours:

  • The snaphots are only left inside the cluster. K10 shows that there are “0” restore-points for my application. 
  • I validated the application and the export profile. 
  • I recreated the nfs profile, but no change. I made sure that it has the same name and path and k10 detects it after recreating it (I got a warning that some policies would be effected if it is missing, after recreating the profile, they were gone)
  • I tried to find out if the secret is missing and it is still there. I also decrypted it and compared it with the secret I stored on creation. It is the same and it didn’t change. 
  • So k10 doesnt see any snapshots, exports and volumesnapshots are missing. I can only create a new snapshot and it is working until it gets to the part to the nfs-export. If it gets to this point, the error-message shown above is generated. 
  • I checked and recreated the nfs-pvc. Everything is there, no missing files and the pvc is bound. 

Log Files are in my attatched files. I wiped out private data, let me know if there are problems while reading it. 

 

Thanks a lot and sorry for not providing enough information. Just didn’t know whats relevant to you. 

 

Userlevel 4

Hey, 

can anyone say something about this kind of issue? I am really afraid to have an inaccessible backup again. Could I prevent this kind of issue if I use another external backup location like S3?

 

Thanks a lot

Userlevel 6
Badge +2

Hi @SMT-Almato 
Thank you for sharing the logs.

Invalid repository password happens usually when you reinstall K10. This way, the MasterKey K10 uses changes causing the mismatch while connecting to the kopia repository in your target NFS location.

However, I don’t see any re-installation recently from the logs you have shared. Also I see that all the K10 secrets are intact. 

This shouldn’t be happening unless there is something wrong.

Would you be able to elaborate more on the network issue that you mentioned in the post ?

Userlevel 6
Badge +2

This needs further troubleshooting. Would you be able to create a case with us by selecting Kasten By Veeam K10 Trial under the select products dropdown

Hallo, I have a similar issue.

Any exports of snapshots does work, other not. I see the error:


"Failed to exec command in pod: command terminated with exit code 1"},"message":"invalid repository password"},"file":"kasten.io/k10/kio/kopia/repository.go:552","function":"kasten.io/k10/kio/kopia.ConnectToKopiaRepository","linenumber":552,"message":"Failed to connect to the backup repository"},"fields":[{"name":"appNamespace","value":"myApp"}],"file":"kasten.io/k10/kio/exec/phases/phase/export.go:216","function":"kasten.io/k10/kio/exec/phases/phase.prepareKopiaRepoIfExportingData","linenumber":216,"message":"Failed to create Kopia repository for data export"},"file":"kasten.io/k10/kio/exec/phases/phase/export.go:138","function":"kasten.io/k10/kio/exec/phases/phase.(*exportRestorePointPhase).Run","linenumber":138,"message":"Failed to copy artifacts"}

All backups use the same credentials of my S3 target. One thing, I have switch my backup target any weeks ago. The old target had other hostname and orher credentials. Is this maybe the cause? I yes, why does any work, other not?

 

Regards

Userlevel 4

Hi @SMT-Almato 
Thank you for sharing the logs.

Invalid repository password happens usually when you reinstall K10. This way, the MasterKey K10 uses changes causing the mismatch while connecting to the kopia repository in your target NFS location.

However, I don’t see any re-installation recently from the logs you have shared. Also I see that all the K10 secrets are intact. 

This shouldn’t be happening unless there is something wrong.

Would you be able to elaborate more on the network issue that you mentioned in the post ?

Thanks a lot for your response @jaiganeshjk. We don’t know exactly what happened and we won’t get a more detailed explanation for this. In the end I found out, that one of the nodes also had problems since this day and I decided to reboot every node in the cluster. After that k10 could find my backups from nfs again until the day of desaster. The backups taken after this outage where gone. I only had the volumesnapshots remaining in the cluster. 

I will now take the chance to reinstall k10 in the newest version, configure the encryption key like mentioned in the documentation and I will increase the retention of the volumesnapshots inside the cluster to be sure, that we have anything to restore from. 

One question, maybe you guys know this: Where are volumesnapshots stored (the storage needs should only be relevant for the delta, right?) inside the cluster? 

Thanks a lot for helping me out. I cant create a case in this particular outage, because the problem is simply gone. But it has a bad taste. 

Userlevel 6
Badge +2

One question, maybe you guys know this: Where are volumesnapshots stored (the storage needs should only be relevant for the delta, right?) inside the cluster? 

It depends on the backend storage that you use and in most of the cases the snapshots are only delta.

I cant create a case in this particular outage, because the problem is simply gone. But it has a bad taste.

I understand. The reason I wanted to work with you over a case is because there are few variables(NFS PVC claimname, repo path(this is derived from NS UID)  that are used for generating the repository password. 

If any of those above changed as a part of the restore(if you deleted the namespace and recreated it with the restore) that you did, it might be a reason for `invalid repository password`

Basically, K10 doesn’t store the keys. It is derived onDemand based on the above variables whenever needed.

Comment