Hi! I’ve been kicking the tires of Kasten k10 in my home k8s environment and had it mostly working until I upgraded to 4.5.1 recently. I’m not exactly sure it was the upgrade that broke things, or me doing something else around the same time (upgrade from 1.21.3 to 1.22.3?).
All policy runs fail now. Using the dashboard, and clicking in to details, gives me this similar error:
ause:
cause:
cause:
cause:
ErrStatus:
code: 404
details:
causes:
- message: 404 page not found
reason: UnexpectedServerResponse
message: the server could not find the requested resource
metadata: {}
reason: NotFound
status: Failure
fields:
- name: Resource
value:
Group: extensions
Resource: ingresses
Version: v1beta1
function: kasten.io/k10/kio/exec/phases/phase.k8sObjectTypeObjects
linenumber: 302
message: Failed to list resource
function: kasten.io/k10/kio/exec/phases/backup.backupNamespaceToCatalog
linenumber: 258
message: Failed to snapshot objects in the K10 Namespace
fields:
- name: namespace
value: kasten-io
function: kasten.io/k10/kio/exec/phases/backup.(*BackupK10Phase).Run
linenumber: 72
message: Failed to backup namespace specs
message: Job failed to be executed
fields: >]
I completely removed (helm uninstall, but also kubectl delete namespace), and re-installed back to “factory” settings (all my policies are removed and I wiped all snapshots made), but the problem persists, even with just the single k10-disaster-recovery-policy.
I don’t have any ingress resources.
Looking through all of the logs from all of the kasten pods, the only trouble I would see repeat itself was from the aggregatedapis-svc pod:
saggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:34.328266 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
9aggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:34.328273 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
faggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:35.333158 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
Kubernetes environment:
v1.22.3
6 nodes (3 master/control panel)
storage: rook-ceph
network: calico
metallb: load balancing
NFS set up as a location for Kasten.
All pods show as Running or Completed.
ceph status shows HEALTHY
All node syslogs messages look routine.
Any advice on how I can further dig into this? Maybe it’s a service account permission thing?
k10tools primer output:
I1031 20:48:11.872606 7 request.go:655] Throttling request took 1.032036175s, request: GET:https://10.96.0.1:443/apis/rook.io/v1alpha2?timeout=32s
Kubernetes Version Check:
Valid kubernetes version (v1.22.3) - OKRBAC Check:
Kubernetes RBAC is enabled - OKAggregated Layer Check:
The Kubernetes Aggregated Layer is enabled - OKFound multiple snapshot API group versions, using preferred.
CSI Capabilities Check:
Using CSI GroupVersion snapshot.storage.k8s.io/v1 - OKFound multiple snapshot API group versions, using preferred.
Validating Provisioners:
rook-ceph.rbd.csi.ceph.com:
Is a CSI Provisioner - OK
Missing/Failed to Fetch CSIDriver Object
Storage Classes:
rook-ceph-block
Valid Storage Class - OK
Volume Snapshot Classes:
csi-rbdplugin-snapclass
Has k10.kasten.io/is-snapshot-class annotation set to true - OK
Has deletionPolicy 'Delete' - OK
k10-clone-csi-rbdplugin-snapclassValidate Generic Volume Snapshot:
Pod Created successfully - OK
GVS Backup command executed successfully - OK
Pod deleted successfully - OK