Skip to main content

All snapshots and actions succeed on my policy run yet the policy is marked Failed and there is no place I can find an error message at the policy level. How can I troubleshoot?

Do you have a screenshot of the error message?  That might help to get an answer.


There is no error message, just a failed policy run state.


Anything under “Show Details” or the logs?


I didn’t see anything on the details page but I just discovered if I view the YAML there is an error listed.

    cause: '{"fields":"{"name":"entry","value":{"type":"ArtifactReferenceGroup"}}],"file":"kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177","function":"kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries","linenumber":177,"message":"Unexpected
manifest entry"}'
message: Job failed to be executed

 


Not sure if this will help but found this error a couple times on the Kasten Troubleshooting page but it references Object Storage so not sure if you are using that.

K10 Troubleshooting Guide (kasten.io)


Thanks. I am storing to S3, however the error indicates an unexpected manifest entry and not an error communicating with the object store or a full bucket.


K10 support, @Hagag , @Satish-- What manifest is K10 complaining about and how can I fix this? My critical backups are currently failing.


@Aaron Oneal I think you might fix this be recreate the Policy, please try this workaround and let me know.
i value if you share the debug logs in order to try to understand this error.

 

Thanks

Ahmed Hagag


@Hagag, deleting and recreating the policy did not fix this.

I cannot share the debug logs as there is information in them that would be a privacy breach. However, I am happy to share specifics from them or redact files if you can tell me what you’re looking for. 

`executor-svc*` appears to be the only file that is relevant. Clearly K10 is attempting to “retire a policy” and manifest "c374f458-cf86-11ec-a067-0ef546421892" has an unexpected entry.

Too bad it doesn’t say what the unexpected entry is.

How can I find and clean up this manifest and any related data manually?

 

{
"File": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go",
"Function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).checkRetirePolicyRun",
"JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"Line": 316,
"ManifestID": "c374f458-cf86-11ec-a067-0ef546421892",
"QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"SubjectRef": "kasten-io:cloud-daily-backup",
"cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
"hostname": "executor-svc-5c97596977-5vf59",
"level": "info",
"manifestID": "e9cfdcbd-ce69-11ec-a067-0ef546421892",
"msg": "Retiring policy run manifest",
"time": "20220509-11:04:01.082Z",
"version": "4.5.14"
}
{
"File": "kasten.io/k10/kio/exec/internal/runner/runner.go",
"Function": "kasten.io/k10/kio/exec/internal/runner.(*Runner).maybeExecJob",
"JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"Line": 177,
"ManifestID": "c374f458-cf86-11ec-a067-0ef546421892",
"QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"SubjectRef": "kasten-io:cloud-daily-backup",
"cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
"error": {
"message": "Unexpected manifest entry",
"function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries",
"linenumber": 177,
"file": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177",
"fields":
{
"name": "entry",
"value": {
"type": "ArtifactReferenceGroup"
}
}
]
},
"hostname": "executor-svc-5c97596977-5vf59",
"level": "error",
"msg": "Job failed",
"time": "20220509-11:04:01.786Z",
"version": "4.5.14"
}
{
"File": "kasten.io/k10/kio/daemon/daemon.go",
"Function": "kasten.io/k10/kio/daemon.(*Daemon).run",
"JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"Line": 133,
"QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
"hostname": "executor-svc-5c97596977-5vf59",
"level": "info",
"msg": "Daemon Shutting Down",
"time": "20220509-11:04:01.854Z",
"version": "4.5.14"
}

 


It appears these manifests are kept in the `model-store.db` of either the `catalog-pv-claim` or `jobs-pv-claim`. What format is the database?


I still don’t know what database format `model-store.db` is so I can’t correct any records, but I managed to extract the manifest from the catalog with a hex dump. The error indicates that `ArtifactReferenceGroup` is not a valid entry in the `entries` collection. My backup cluster is still failing due to this. Please advise.

{
"creationTime": "2022-05-09T10:57:05.341Z",
"destructionTime": "2022-05-10T10:59:57.196Z",
"id": "c374f458-cf86-11ec-a067-0ef546421892",
"meta": {
"manifest": {
"action": "snapshot",
"apiKeys": K
"/actions.kio.kasten.io/runactions/run-m9dp9v4pwg"
],
"apiMeta": {
"annotations": null,
"labels": b
{
"key": "k10.kasten.io/policyName",
"value": "cloud-daily-backup"
},
{
"key": "k10.kasten.io/policyNamespace",
"value": "kasten-io"
}
]
},
"endTime": "2022-05-09T11:04:44.306Z",
"entries": r
{
"artifactReferenceGroup": r
"c5957844-cf86-11ec-a067-0ef546421892",
"c5a31150-cf86-11ec-a067-0ef546421892",
"c5b0ee68-cf86-11ec-a067-0ef546421892",
"c5b82f2f-cf86-11ec-a067-0ef546421892",
"c5cb0bf5-cf86-11ec-a067-0ef546421892",
"c5d53747-cf86-11ec-a067-0ef546421892",
"c5dab78b-cf86-11ec-a067-0ef546421892",
"c5dfdfc4-cf86-11ec-a067-0ef546421892",
"c5e420e2-cf86-11ec-a067-0ef546421892",
"c5e96d04-cf86-11ec-a067-0ef546421892",
"c5ee36dd-cf86-11ec-a067-0ef546421892",
"c5f512d0-cf86-11ec-a067-0ef546421892",
"c5fac0de-cf86-11ec-a067-0ef546421892",
"c600262f-cf86-11ec-a067-0ef546421892",
"c60621bb-cf86-11ec-a067-0ef546421892",
"c60b5118-cf86-11ec-a067-0ef546421892",
"c6104b10-cf86-11ec-a067-0ef546421892",
"c614738b-cf86-11ec-a067-0ef546421892",
"c6277635-cf86-11ec-a067-0ef546421892",
"c62e03f3-cf86-11ec-a067-0ef546421892",
"c6460e3f-cf86-11ec-a067-0ef546421892",
"c64abda5-cf86-11ec-a067-0ef546421892",
"c65a41f9-cf86-11ec-a067-0ef546421892"
],
"type": "ArtifactReferenceGroup"
}
],
"exceptions": null,
"finalFailure": {
"cause": {
"fields": e
{
"name": "entry",
"value": {
"type": "ArtifactReferenceGroup"
}
}
],
"file": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177",
"function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries",
"linenumber": 177,
"message": "Unexpected manifest entry"
},
"fields": e],
"message": "Job failed to be executed"
},
"jobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
"originatingPolicies": c
{
"id": "8fec6a66-7136-4cfe-9819-d9ffccf90c41"
}
],

 


Hello Aaron,

 

Could you please provide us your debug logs so that we may further look into your logs. You can find this by going to Settings>Support>Download Logs 

 

If you could please attach the executor-svc(all three), and logging-svc log file and lastly the k10_debug we will have a good bases to further troubleshoot.

 

Thanks 

Emmanuel

 

 

 

 

 


These were submitted. Backups still failing.


These were submitted. Backups still failing.

Hello @Aaron Oneal .

Kasten K10 4.5.15 is out please check the release notes here: https://docs.kasten.io/latest/releasenotes.html 

There is a fix in this release that was related when policies with selective export and independent export retention schedule fail but all the run actions were succeed, related with automatic retirement failure.

I would recommend to try this latest version and maybe if possible recreate your policy or create a new one to test.

Thanks

Fernando


@FRubens I am already using that version and it still fails after running snapshots if it is not time for export.

E.g. hourly snapshot, daily export.


Actually, it still fails when time for export too. Looks like I was able to get a single manual run to work but anything scheduled still fails.


Hello @Aaron Oneal ,

Thank you for the information.

Would be great to have your debug logs with the new version (4.5.15), or at least could you please provide the same executor-svc output of the error as you did in this post but with 4.5.15 version, since we had a fix for the same function I would like to see if something changed in the error message from 4.5.14 to the latest version that would help us to investigate.

Regards
Fernando


Hello @Aaron Oneal ,

Thank you for the information.

Would be great to have your debug logs with the new version (4.5.15), or at least could you please provide the same executor-svc output of the error as you did in this post but with 4.5.15 version, since we had a fix for the same function I would like to see if something changed in the error message from 4.5.14 to the latest version that would help us to investigate.

Regards
Fernando

Hi @Aaron Oneal ,

Also if possible can you provide the policy yaml or the screenshot from dashboard of the policy setup , I would like to check the retention/export retention you selected to try to replicate on our side.

Thank you

Regards

Fernando


Attached to support case. What I noted there is creating a new policy with a new name works. Creating a policy with the old name does not. It appears there must be some old state in the catalog related to the original policy.


Comment