Skip to main content
Solved

Problem installing k10 - helm


Hello, i’m trying to install k10 on a "on-premise” k8s cluster. 

Kubernetes v1.25.1 with 4 worker nodes and 3 control-planes. 
StorageClass is provided Rook-ceph filesystem ad the the rook-ceph environment is at latest release. 
To be short, all seems to be working fine including the pre-flight checks but with 

helm install my-k10 kasten/k10 --version 5.5.3

kubectl get pods
 

aggregatedapis-svc-9b5775d64-db4bj       1/1     Running                 0               23m
auth-svc-cf4646c89-vwjm5                 1/1     Running                 0               23m
catalog-svc-5bddbd67b5-dm7rs             0/2     Init:CrashLoopBackOff   9 (2m29s ago)   23m
controllermanager-svc-78b8fb4bf7-td5z9   1/1     Running                 0               23m
crypto-svc-7f4ff8b479-wzvck              4/4     Running                 0               23m
dashboardbff-svc-d87fc5bc-fmr7h          2/2     Running                 0               23m
executor-svc-7f5dc6c874-2858w            2/2     Running                 0               23m
executor-svc-7f5dc6c874-5cqjg            2/2     Running                 0               23m
executor-svc-7f5dc6c874-kvsz8            2/2     Running                 0               23m
frontend-svc-c69bf6fb6-vnznl             1/1     Running                 0               23m
gateway-9dd654864-s7mq6                  1/1     Running                 0               23m
jobs-svc-c89f77974-wgntb                 0/1     Init:CrashLoopBackOff   9 (2m2s ago)    23m
k10-grafana-7fc8b45cd-6wtvz              1/1     Running                 0               23m
kanister-svc-6656bc89d5-fp5wl            1/1     Running                 0               23m
logging-svc-6d95d9dd85-98mt2             0/1     Init:CrashLoopBackOff   9 (112s ago)    23m
metering-svc-5b855f46c5-g27wq            1/1     Running                 0               23m
prometheus-server-9f7769bbb-24w2l        1/2     CrashLoopBackOff        9 (2m17s ago)   23m
state-svc-7765f59cc-h4kvh                2/2     Running                 0               23m

 

looking inside those failing pods for example:
kubectl logs catalog-svc-5bddbd67b5-dm7rs -c upgrade-init

2023/01/19 13:43:54 Fresh install detected
panic: {"message":"Failed to change state owner","function":"main.main","linenumber":55,"file":"kasten.io/k10/cmd/upgrade/upgrade.go:55","cause":{"message":"Failed to create store directory","function":"main.changeStoreOwner","linenumber":30,"file":"kasten.io/k10/cmd/upgrade/upgrade.go:30","fields":[{"name":"model_store_dir","value":"//mnt/k10state/kasten-io/"}],"cause":{"message":"mkdir //mnt/k10state/kasten-io/: permission denied"}}}

goroutine 1 [running]:
main.main()
        /codefresh/volume/k10/go/src/kasten.io/k10/cmd/upgrade/upgrade.go:55 +0x54

It seems an authorization problem also on the remaining pods in Init:CrashLoopBackOff and 
 CrashLoopBackOff
Is there someone who can address this issue?

Best answer by stefanodemartini

 Thank you for your kind answer.

k get pvc -n kasten-io
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
catalog-pv-claim    Bound    pvc-b77f1112-53bf-4318-bd8c-a2eb1e311e10   20Gi       RWO            rook-cephfs    18h
jobs-pv-claim       Bound    pvc-30ca4284-2231-4d86-bd02-b3a963343e45   20Gi       RWO            rook-cephfs    18h
k10-grafana         Bound    pvc-3a19897f-6f9d-4b64-8080-c846d399e39c   5Gi        RWO            rook-cephfs    18h
logging-pv-claim    Bound    pvc-8d5891ec-76a1-4ac4-8fcb-5d06113c1bbd   20Gi       RWO            rook-cephfs    18h
metering-pv-claim   Bound    pvc-a94319a8-35f4-464b-8395-c82d45b6a7d3   2Gi        RWO            rook-cephfs    18h
prometheus-server   Bound    pvc-80eb78fa-6d15-4fce-b64e-34fbd17321da   8Gi        RWO            rook-cephfs    18h

That's what i guessed but nothing strange with these pvc's as thery were created directly by the helm chart. I don't think it's a problem of the underlaying storageclass since other charts don’t have problems…

 

View original
Did this topic help you find an answer to your question?

3 comments

Madi.Cristil
Forum|alt.badge.img+8
  • Community Manager
  • 617 comments
  • January 19, 2023

jaiganeshjk
Forum|alt.badge.img+2
  • Experienced User
  • 274 comments
  • January 20, 2023

@stefanodemartini Thanks for posting the question here.

From the above error messages, I understand that there is a problem with the permissions of the PVCs.

All the pods which use PVCs are failing to start/initialise and the error message that I see in the logs are permission denied(except Grafana, which uses initContainer to change permissions of the directory).

There is something wrong with the directory permissions and ownership for those PVCs.

 

Let me take a look at the CephFS if it has any limitations wrt fsGroup settings in the workload.


  • Author
  • Not a newbie anymore
  • 1 comment
  • Answer
  • January 20, 2023

 Thank you for your kind answer.

k get pvc -n kasten-io
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
catalog-pv-claim    Bound    pvc-b77f1112-53bf-4318-bd8c-a2eb1e311e10   20Gi       RWO            rook-cephfs    18h
jobs-pv-claim       Bound    pvc-30ca4284-2231-4d86-bd02-b3a963343e45   20Gi       RWO            rook-cephfs    18h
k10-grafana         Bound    pvc-3a19897f-6f9d-4b64-8080-c846d399e39c   5Gi        RWO            rook-cephfs    18h
logging-pv-claim    Bound    pvc-8d5891ec-76a1-4ac4-8fcb-5d06113c1bbd   20Gi       RWO            rook-cephfs    18h
metering-pv-claim   Bound    pvc-a94319a8-35f4-464b-8395-c82d45b6a7d3   2Gi        RWO            rook-cephfs    18h
prometheus-server   Bound    pvc-80eb78fa-6d15-4fce-b64e-34fbd17321da   8Gi        RWO            rook-cephfs    18h

That's what i guessed but nothing strange with these pvc's as thery were created directly by the helm chart. I don't think it's a problem of the underlaying storageclass since other charts don’t have problems…

 


Comment