Skip to main content

Hello, i’m trying to install k10 on a "on-premise” k8s cluster. 

Kubernetes v1.25.1 with 4 worker nodes and 3 control-planes. 
StorageClass is provided Rook-ceph filesystem ad the the rook-ceph environment is at latest release. 
To be short, all seems to be working fine including the pre-flight checks but with 

helm install my-k10 kasten/k10 --version 5.5.3

kubectl get pods
 

aggregatedapis-svc-9b5775d64-db4bj       1/1     Running                 0               23m
auth-svc-cf4646c89-vwjm5 1/1 Running 0 23m
catalog-svc-5bddbd67b5-dm7rs 0/2 Init:CrashLoopBackOff 9 (2m29s ago) 23m
controllermanager-svc-78b8fb4bf7-td5z9 1/1 Running 0 23m
crypto-svc-7f4ff8b479-wzvck 4/4 Running 0 23m
dashboardbff-svc-d87fc5bc-fmr7h 2/2 Running 0 23m
executor-svc-7f5dc6c874-2858w 2/2 Running 0 23m
executor-svc-7f5dc6c874-5cqjg 2/2 Running 0 23m
executor-svc-7f5dc6c874-kvsz8 2/2 Running 0 23m
frontend-svc-c69bf6fb6-vnznl 1/1 Running 0 23m
gateway-9dd654864-s7mq6 1/1 Running 0 23m
jobs-svc-c89f77974-wgntb 0/1 Init:CrashLoopBackOff 9 (2m2s ago) 23m
k10-grafana-7fc8b45cd-6wtvz 1/1 Running 0 23m
kanister-svc-6656bc89d5-fp5wl 1/1 Running 0 23m
logging-svc-6d95d9dd85-98mt2 0/1 Init:CrashLoopBackOff 9 (112s ago) 23m
metering-svc-5b855f46c5-g27wq 1/1 Running 0 23m
prometheus-server-9f7769bbb-24w2l 1/2 CrashLoopBackOff 9 (2m17s ago) 23m
state-svc-7765f59cc-h4kvh 2/2 Running 0 23m

 

looking inside those failing pods for example:
kubectl logs catalog-svc-5bddbd67b5-dm7rs -c upgrade-init

2023/01/19 13:43:54 Fresh install detected
panic: {"message":"Failed to change state owner","function":"main.main","linenumber":55,"file":"kasten.io/k10/cmd/upgrade/upgrade.go:55","cause":{"message":"Failed to create store directory","function":"main.changeStoreOwner","linenumber":30,"file":"kasten.io/k10/cmd/upgrade/upgrade.go:30","fields":"{"name":"model_store_dir","value":"//mnt/k10state/kasten-io/"}],"cause":{"message":"mkdir //mnt/k10state/kasten-io/: permission denied"}}}

goroutine 1 orunning]:
main.main()
/codefresh/volume/k10/go/src/kasten.io/k10/cmd/upgrade/upgrade.go:55 +0x54

It seems an authorization problem also on the remaining pods in Init:CrashLoopBackOff and 
 CrashLoopBackOff
Is there someone who can address this issue?

@jaiganeshjk @Yongkang 


@stefanodemartini Thanks for posting the question here.

From the above error messages, I understand that there is a problem with the permissions of the PVCs.

All the pods which use PVCs are failing to start/initialise and the error message that I see in the logs are permission denied(except Grafana, which uses initContainer to change permissions of the directory).

There is something wrong with the directory permissions and ownership for those PVCs.

 

Let me take a look at the CephFS if it has any limitations wrt fsGroup settings in the workload.


 Thank you for your kind answer.

k get pvc -n kasten-io
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
catalog-pv-claim Bound pvc-b77f1112-53bf-4318-bd8c-a2eb1e311e10 20Gi RWO rook-cephfs 18h
jobs-pv-claim Bound pvc-30ca4284-2231-4d86-bd02-b3a963343e45 20Gi RWO rook-cephfs 18h
k10-grafana Bound pvc-3a19897f-6f9d-4b64-8080-c846d399e39c 5Gi RWO rook-cephfs 18h
logging-pv-claim Bound pvc-8d5891ec-76a1-4ac4-8fcb-5d06113c1bbd 20Gi RWO rook-cephfs 18h
metering-pv-claim Bound pvc-a94319a8-35f4-464b-8395-c82d45b6a7d3 2Gi RWO rook-cephfs 18h
prometheus-server Bound pvc-80eb78fa-6d15-4fce-b64e-34fbd17321da 8Gi RWO rook-cephfs 18h

That's what i guessed but nothing strange with these pvc's as thery were created directly by the helm chart. I don't think it's a problem of the underlaying storageclass since other charts don’t have problems…

 


Comment