Solved

K10 Disaster recovery to a new cluster


Userlevel 1
  • Not a newbie anymore
  • 4 comments

Hi all,

im new to K10 doing POC to check all the features if we will adopt this solution for our needs or not.

 

Have run in this situation, when i'm able to backup apps using pv/pvc, restore them in the same cluster, but unable to restore them in a different cluster with k10restore.

 

Both clusters are setup in the same way, using the same scripts. Each cluster is using different SP to run.


Azure setup details:

2x AKS: 1.23.8 test1+test2

agentpool: 2x Standard_B2ms

testnp: 1x Standard_D4ads_v5

Using azure CNI in same vnet, different subnets


Location profile:
Locally-redundant storage (LRS)
private endpoint in the same vnet as aks clusters

 

Scenario:
aks test1:

Apps:
k10-disaster-recovery-policy backup&snapshot kasten-io namespace
default-backup backup&snapshot all namespaces (Testapp reside in here)

 

aks test2:
storage class created
location profile pointing to the same blob storage created
k10-dr-secret created
k10-restore using test1 cluster id and profile name created 
 

So far so good.

Applications->Removed->Testapp->Restore initiated
namespace created
pods created 
pvc created
pv pending

failed to provision volume with StorageClass "default": rpc error: code = Internal desc = Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"LinkedAuthorizationFailed","message":"The client '2d587b0f-redacted' with object id '2d587b0f-redacted' has permission to perform action 'Microsoft.Compute/disks/write' on scope '/subscriptions/60f08bbf-redacted/resourceGroups/mc_test-vs_vs-aks-test2_westeurope/providers/Microsoft.Compute/disks/pvc-4605c466-3190-49ab-9204-cf1a471e99b2'; however, it does not have permission to perform action 'Microsoft.Compute/disks/beginGetAccess/action' on the linked scope(s) '/subscriptions/60f08bbf-redacted/resourceGroups/mc_test-vs_vs-aks-test1_westeurope/providers/Microsoft.Compute/snapshots/snapshot-51e1b7de-9ae3-4997-8f82-2681a3ffb3a0' or the linked scope(s) are invalid."}}

 

He is trying to use the client '2d587b0f-redacted' (SP under the test2 cluster is running) to read the snapshot from the test1 cluster. I thought that he should be able to get the snapshot directly from the storage account, not from the previous cluster, that in case of real DR scenario can be already buried 6ft under.

Any idea how to make this work?
Thanks

icon

Best answer by jaiganeshjk 10 October 2022, 11:32

View original

9 comments

Userlevel 7
Badge +7

@Debarshi_K10 

@Geoff Burke 

Userlevel 7
Badge +22

So yeah the storage is not being provisioned on the DR cluster for some reason. This is Azure territory, so not my area of expertise. Just to state the obvious the persistent storage is not being allocated for some reason (as you mention there seem to be permission issues at first glance according to that log). One thing you mention is “Each cluster is using different SP to run.” Are there any differences at all there? Even if there are you can use “transforms” to overcome those https://docs.kasten.io/latest/api/transforms.html?highlight=transforms

what is the result of

kubectl get sc 

on both clusters?

Userlevel 1

...

what is the result of

kubectl get sc 

on both clusters?

 

 

AKS test1:
% kubectl get sc --kubeconfig VS-aks-test1.conf
NAME                    PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile               file.csi.azure.com   Delete          Immediate              true                   6d23h
azurefile-csi           file.csi.azure.com   Delete          Immediate              true                   6d23h
azurefile-csi-premium   file.csi.azure.com   Delete          Immediate              true                   6d23h
azurefile-premium       file.csi.azure.com   Delete          Immediate              true                   6d23h
default (default)       disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   6d23h
managed                 disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   6d23h
managed-csi             disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   6d23h
managed-csi-premium     disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   6d23h
managed-premium         disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   6d23h

AKS test2:
vspacir001@CZ_C02FD6KJMD6R kasten % kubectl get sc --kubeconfig VS-aks-test2.conf
NAME                    PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile               file.csi.azure.com   Delete          Immediate              true                   21h
azurefile-csi           file.csi.azure.com   Delete          Immediate              true                   21h
azurefile-csi-premium   file.csi.azure.com   Delete          Immediate              true                   21h
azurefile-premium       file.csi.azure.com   Delete          Immediate              true                   21h
default (default)       disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   21h
managed                 disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   21h
managed-csi             disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   21h
managed-csi-premium     disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   21h
managed-premium         disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   21h

Userlevel 1

And regarding the SP, i should explain its Service Principal, basically “identity” that owns and run the environment.

Userlevel 7
Badge +22

ah yes I do remember that now, IAM azure language :). So off the top of my head I would narrow in on permissions, authorizations etc. Kasten support have a few Azure folks who can chime in better than me on this one. 

Badge

@Vojtas Can you confirm if export is enabled in the DR .?

Userlevel 1

@Vojtas Can you confirm if export is enabled in the DR .?

@satish.kumar Here is the screenshot of test1(origin) cluster backup policies(they are on pause now, to reduce storage costs).
 

 

Userlevel 6
Badge +2

@Vojtas Thank you for confirming that you have enabled the exports in the source cluster.

The reason you are seeing the below error message is that you are trying to restore the local restorepoint from the secondary cluster.

Local restorepoints point to the local snapshots created from the source cluster and restoring that will try to use the local snapshot(this will require access to the snapshot for the SP)

 

failed to provision volume with StorageClass "default": rpc error: code = Internal desc = Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"LinkedAuthorizationFailed","message":"The client '2d587b0f-redacted' with object id '2d587b0f-redacted' has permission to perform action 'Microsoft.Compute/disks/write' on scope '/subscriptions/60f08bbf-redacted/resourceGroups/mc_test-vs_vs-aks-test2_westeurope/providers/Microsoft.Compute/disks/pvc-4605c466-3190-49ab-9204-cf1a471e99b2'; however, it does not have permission to perform action 'Microsoft.Compute/disks/beginGetAccess/action' on the linked scope(s) '/subscriptions/60f08bbf-redacted/resourceGroups/mc_test-vs_vs-aks-test1_westeurope/providers/Microsoft.Compute/snapshots/snapshot-51e1b7de-9ae3-4997-8f82-2681a3ffb3a0' or the linked scope(s) are invalid."}}

 

Would you be able to try to restore the exported restorepoint and see if it is successful ?

Userlevel 1

Hi, i figure out how to restore from the storage account snapshot, its well hidden. :)


Thank you all for the efforts.

We can close this ticket/question.

Comment