Solved

aggregatedapis-svc retrywatcher.go:130 Watch failed: the server could not find the requested resource

3 years ago
October 31, 2021
11 comments
784 views

Snakebyte
New Here
1 comment

Hi! I’ve been kicking the tires of Kasten k10 in my home k8s environment and had it mostly working until I upgraded to 4.5.1 recently. I’m not exactly sure it was the upgrade that broke things, or me doing something else around the same time (upgrade from 1.21.3 to 1.22.3?).

All policy runs fail now. Using the dashboard, and clicking in to details, gives me this similar error:

ause:
cause:
cause:
cause:
ErrStatus:
code: 404
details:
causes:
- message: 404 page not found
reason: UnexpectedServerResponse
message: the server could not find the requested resource
metadata: {}
reason: NotFound
status: Failure
fields:
- name: Resource
value:
Group: extensions
Resource: ingresses
Version: v1beta1
function: kasten.io/k10/kio/exec/phases/phase.k8sObjectTypeObjects
linenumber: 302
message: Failed to list resource
function: kasten.io/k10/kio/exec/phases/backup.backupNamespaceToCatalog
linenumber: 258
message: Failed to snapshot objects in the K10 Namespace
fields:
- name: namespace
value: kasten-io
function: kasten.io/k10/kio/exec/phases/backup.(*BackupK10Phase).Run
linenumber: 72
message: Failed to backup namespace specs
message: Job failed to be executed
fields: []

I completely removed (helm uninstall, but also kubectl delete namespace), and re-installed back to “factory” settings (all my policies are removed and I wiped all snapshots made), but the problem persists, even with just the single k10-disaster-recovery-policy.

I don’t have any ingress resources.

Looking through all of the logs from all of the kasten pods, the only trouble I would see repeat itself was from the aggregatedapis-svc pod:

[aggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:34.328266 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
[aggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:34.328273 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
[aggregatedapis-svc-584b7f4799-xkrfc] E1031 20:33:35.333158 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource

Kubernetes environment:

v1.22.3
6 nodes (3 master/control panel)

storage: rook-ceph

network: calico

metallb: load balancing
NFS set up as a location for Kasten.

All pods show as Running or Completed.

ceph status shows HEALTHY

All node syslogs messages look routine.

Any advice on how I can further dig into this? Maybe it’s a service account permission thing?

k10tools primer output:

I1031 20:48:11.872606 7 request.go:655] Throttling request took 1.032036175s, request: GET:https://10.96.0.1:443/apis/rook.io/v1alpha2?timeout=32s
Kubernetes Version Check:
Valid kubernetes version (v1.22.3) - OK
RBAC Check:
Kubernetes RBAC is enabled - OK
Aggregated Layer Check:
The Kubernetes Aggregated Layer is enabled - OK
Found multiple snapshot API group versions, using preferred.
CSI Capabilities Check:
Using CSI GroupVersion snapshot.storage.k8s.io/v1 - OK
Found multiple snapshot API group versions, using preferred.
Validating Provisioners:
rook-ceph.rbd.csi.ceph.com:
Is a CSI Provisioner - OK
Missing/Failed to Fetch CSIDriver Object
Storage Classes:
rook-ceph-block
Valid Storage Class - OK
Volume Snapshot Classes:
csi-rbdplugin-snapclass
Has k10.kasten.io/is-snapshot-class annotation set to true - OK
Has deletionPolicy 'Delete' - OK
k10-clone-csi-rbdplugin-snapclass
Validate Generic Volume Snapshot:
Pod Created successfully - OK
GVS Backup command executed successfully - OK
Pod deleted successfully - OK

Best answer by Geoff Burke

Looks like your cluster version is 1.22.3, I think Kasten only supports up to 1.21

I say that only because I upgraded to 1.22 and had issues with the dashboard and then when I went back to 1.21 everything was fine.

Still that might not have anything to do with it especially if it was working for you on 4.13 but thought I would mention it.

https://docs.kasten.io/latest/operating/support.html?highlight=support

cheers

View original

Did this topic help you find an answer to your question?

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1317 comments
Answer
3 years ago
October 31, 2021

Looks like your cluster version is 1.22.3, I think Kasten only supports up to 1.21

I say that only because I upgraded to 1.22 and had issues with the dashboard and then when I went back to 1.21 everything was fine.

Still that might not have anything to do with it especially if it was working for you on 4.13 but thought I would mention it.

https://docs.kasten.io/latest/operating/support.html?highlight=support

cheers

VMCA2024, VMCE2023, VMCE2024-SP,CKA, LFCS, PCA,TARS

+22

Geoff Burke
Veeam Legend, Veeam Vanguard
1317 comments
3 years ago
October 31, 2021

That page also has a bunch of debugging suggestions too which might be of help.

cheers

VMCA2024, VMCE2023, VMCE2024-SP,CKA, LFCS, PCA,TARS

Snakebyte
Author
New Here
1 comment
3 years ago
November 1, 2021

Geoff Burke wrote:

Looks like your cluster version is 1.22.3, I think Kasten only supports up to 1.21

How did I miss that! lol, well, if that turns out to be the case, my first PR will be to update their k10primer tool to error out if version >= 1.22.0

I ran the debug script -- looks like it’s returning the stuff I already looked at (pod logs, pod details). There was something I missed for the gateway service though -- it was complaining about not having CRDs for Ambassador Edge Stack in the beginning of the logs, but didn’t seem to stop it from eventually working (I think I access the dashboard gui through it?). It is interesting though since I think Ambassador does ingress stuff, and it’s the ingress lookup that seems to be failing here.

mparise-sorint
New Here
1 comment
3 years ago
December 21, 2021

Hi everyone,

any info about when support for 1.22.3 (and OpenShift 4.9 that brings 1.22.3 on the table) will be released?
Thank you!

Mattia

flo-mic
Not a newbie anymore
4 comments
3 years ago
January 2, 2022

I’m facing the same issues on a fresh installed k3s single node cluster v1.21.8. When you browse the web application you don’t see any errors and it looks like the application is running. But when you have a look on the log files you can see that the aggregatedapis are not working. In the nodes system logs I can also see that specific apis are constantly queried but they are missing. So it seams like some CRDs are missing in the helm package v4.5.6

Pod aggregatedapis-svc logs:

E0102 09:06:49.837482 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837374 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837397 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837445 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837455 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837447 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837474 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource
E0102 09:06:50.837474 1 retrywatcher.go:130] Watch failed: the server could not find the requested resource

System logs:

I0102 10:13:12.084062 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.apps.kio.kasten.io: Rate Limited Requeue.
E0102 10:13:12.084049 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.apps.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:13:12.082927 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.actions.kio.kasten.io: Rate Limited Requeue.
E0102 10:13:12.082913 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.actions.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:13:12.081747 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.vault.kio.kasten.io: Rate Limited Requeue.
E0102 10:13:12.081721 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.vault.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:11:12.088633 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.actions.kio.kasten.io: Rate Limited Requeue.
E0102 10:11:12.088612 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.actions.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:11:12.087444 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.vault.kio.kasten.io: Rate Limited Requeue.
E0102 10:11:12.087418 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.vault.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:11:12.086301 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.apps.kio.kasten.io: Rate Limited Requeue.
E0102 10:11:12.086278 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.apps.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:09:12.086698 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.actions.kio.kasten.io: Rate Limited Requeue.
E0102 10:09:12.086655 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.actions.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:09:12.081189 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.vault.kio.kasten.io: Rate Limited Requeue.
E0102 10:09:12.081162 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.vault.kio.kasten.io" failed with: OpenAPI spec does not exist
I0102 10:09:12.079924 1487 controller.go:129] OpenAPI AggregationController: action for item v1alpha1.apps.kio.kasten.io: Rate Limited Requeue.
E0102 10:09:12.079896 1487 controller.go:116] loading OpenAPI spec for "v1alpha1.apps.kio.kasten.io" failed with: OpenAPI spec does not exist

KelianSB
New Here
7 comments
3 years ago
January 31, 2022

We are facing the same issue with Kasten 4.5.7 and Kuberneters 1.21, which sould be supported. Did you finally manage to solve this @flo-mic ?

flo-mic
Not a newbie anymore
4 comments
3 years ago
January 31, 2022

@KelianSB no i still have the issue. And 4.5.7 solved it only partly. I can perform backup and restore operations but the log still shows thousands of this messages.

I have an open support case for this

KelianSB
New Here
7 comments
3 years ago
February 1, 2022

Ok thanks for your answer. Please let us know if support finds anything to solve this issue, it’s very annoying.

flo-mic
Not a newbie anymore
4 comments
3 years ago
February 6, 2022

@KelianSB I was able to fix this error. Maybe the below steps can help you as well:

Uninstall the k10 helm package
```
helm uninstall k10 -n kasten-io
```
Add the following two lines to “/etc/sysctl.conf” on the nodes to allow more file descriptors and file notifier (or create a custom config like “/etc/sysctl.d/30-file-system.conf” with the below content)
```
fs.file-max = 100000
fs.inotify.max_user_instances = 512
```
Set a higher soft and hard limit for any user in “/etc/security/limits.conf” on the nodes (Add the lines at the end of the file)
```
* soft nofile 65532
* hard nofile 100000
```
Reboot the node (complete reboot required due to the sysctl modifications)

Install the helm chart without any custom values

helm upgrade --install --atomic k10 kasten/k10 -n kasten-io

Now update the chart with the custom settings needed, e.g.

helm upgrade --install --atomic k10 kasten/k10 -n kasten-io -f values.yaml

Maybe the step 5 is not required and you can directly use step 6, but at least this was the way how I was getting it working.

KelianSB
New Here
7 comments
3 years ago
February 10, 2022

flo-mic wrote:

@KelianSBI was able to fix this error. Maybe the below steps can help you as well:

Uninstall the k10 helm package
```
helm uninstall k10 -n kasten-io
```
Add the following two lines to “/etc/sysctl.conf” on the nodes to allow more file descriptors and file notifier (or create a custom config like “/etc/sysctl.d/30-file-system.conf” with the below content)
```
fs.file-max = 100000
fs.inotify.max_user_instances = 512
```
Set a higher soft and hard limit for any user in “/etc/security/limits.conf” on the nodes (Add the lines at the end of the file)
```
* soft nofile 65532
* hard nofile 100000
```
Reboot the node (complete reboot required due to the sysctl modifications)

Install the helm chart without any custom values

helm upgrade --install --atomic k10 kasten/k10 -n kasten-io

Now update the chart with the custom settings needed, e.g.

helm upgrade --install --atomic k10 kasten/k10 -n kasten-io -f values.yaml

Maybe the step 5 is not required and you can directly use step 6, but at least this was the way how I was getting it working.

Hello @flo-mic, thanks for your response. I didn't expect to have to change system settings on Kubernetes nodes for an error like this in k10... Is this the answer you got from support?

flo-mic
Not a newbie anymore
4 comments
3 years ago
February 10, 2022

@KelianSB the support was not able to help. The configuration of k10 was already done properly and the primer tool was successfully validating the setup.

This changes on OS level are only needed in case you have setup your own cluster on e.g. Debian 11. If you use AKS or GKs this settings are not needed as the nodes are already configured in a proper way and the file limits are already increased

Comment

Related topics

IBM i-serie VTLicon

IBM i File Systemicon

Get to Know Michael Stempf: A Journey to Cyber Resiliency, and the One Thing to Do Now

ibm i backup is going into pending at scan phaseicon

Restore spool file using IBM i File System Agenticon

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded