Skip to main content
Solved

Policy run fails even though all actions succeed


All snapshots and actions succeed on my policy run yet the policy is marked Failed and there is no place I can find an error message at the policy level. How can I troubleshoot?

Best answer by FRubens

Aaron Oneal wrote:

These were submitted. Backups still failing.

Hello @Aaron Oneal .

Kasten K10 4.5.15 is out please check the release notes here: https://docs.kasten.io/latest/releasenotes.html 

There is a fix in this release that was related when policies with selective export and independent export retention schedule fail but all the run actions were succeed, related with automatic retirement failure.

I would recommend to try this latest version and maybe if possible recreate your policy or create a new one to test.

Thanks

Fernando

View original
Did this topic help you find an answer to your question?

19 comments

Chris.Childerhose
Forum|alt.badge.img+21

Do you have a screenshot of the error message?  That might help to get an answer.


  • Author
  • Comes here often
  • 29 comments
  • May 9, 2022

There is no error message, just a failed policy run state.


Chris.Childerhose
Forum|alt.badge.img+21

Anything under “Show Details” or the logs?


  • Author
  • Comes here often
  • 29 comments
  • May 9, 2022

I didn’t see anything on the details page but I just discovered if I view the YAML there is an error listed.

    cause: '{"fields":[{"name":"entry","value":{"type":"ArtifactReferenceGroup"}}],"file":"kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177","function":"kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries","linenumber":177,"message":"Unexpected
      manifest entry"}'
    message: Job failed to be executed

 


Chris.Childerhose
Forum|alt.badge.img+21

Not sure if this will help but found this error a couple times on the Kasten Troubleshooting page but it references Object Storage so not sure if you are using that.

K10 Troubleshooting Guide (kasten.io)


  • Author
  • Comes here often
  • 29 comments
  • May 9, 2022

Thanks. I am storing to S3, however the error indicates an unexpected manifest entry and not an error communicating with the object store or a full bucket.


  • Author
  • Comes here often
  • 29 comments
  • May 10, 2022

K10 support, @Hagag , @Satish-- What manifest is K10 complaining about and how can I fix this? My critical backups are currently failing.


Hagag
Forum|alt.badge.img+2
  • Experienced User
  • 154 comments
  • May 10, 2022

@Aaron Oneal I think you might fix this be recreate the Policy, please try this workaround and let me know.
i value if you share the debug logs in order to try to understand this error.

 

Thanks

Ahmed Hagag


  • Author
  • Comes here often
  • 29 comments
  • May 12, 2022

@Hagag, deleting and recreating the policy did not fix this.

I cannot share the debug logs as there is information in them that would be a privacy breach. However, I am happy to share specifics from them or redact files if you can tell me what you’re looking for. 

`executor-svc*` appears to be the only file that is relevant. Clearly K10 is attempting to “retire a policy” and manifest "c374f458-cf86-11ec-a067-0ef546421892" has an unexpected entry.

Too bad it doesn’t say what the unexpected entry is.

How can I find and clean up this manifest and any related data manually?

 

{
  "File": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go",
  "Function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).checkRetirePolicyRun",
  "JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "Line": 316,
  "ManifestID": "c374f458-cf86-11ec-a067-0ef546421892",
  "QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "SubjectRef": "kasten-io:cloud-daily-backup",
  "cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
  "hostname": "executor-svc-5c97596977-5vf59",
  "level": "info",
  "manifestID": "e9cfdcbd-ce69-11ec-a067-0ef546421892",
  "msg": "Retiring policy run manifest",
  "time": "20220509-11:04:01.082Z",
  "version": "4.5.14"
}
{
  "File": "kasten.io/k10/kio/exec/internal/runner/runner.go",
  "Function": "kasten.io/k10/kio/exec/internal/runner.(*Runner).maybeExecJob",
  "JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "Line": 177,
  "ManifestID": "c374f458-cf86-11ec-a067-0ef546421892",
  "QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "SubjectRef": "kasten-io:cloud-daily-backup",
  "cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
  "error": {
    "message": "Unexpected manifest entry",
    "function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries",
    "linenumber": 177,
    "file": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177",
    "fields": [
      {
        "name": "entry",
        "value": {
          "type": "ArtifactReferenceGroup"
        }
      }
    ]
  },
  "hostname": "executor-svc-5c97596977-5vf59",
  "level": "error",
  "msg": "Job failed",
  "time": "20220509-11:04:01.786Z",
  "version": "4.5.14"
}
{
  "File": "kasten.io/k10/kio/daemon/daemon.go",
  "Function": "kasten.io/k10/kio/daemon.(*Daemon).run",
  "JobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "Line": 133,
  "QueuedJobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
  "cluster_name": "053cf100-3f34-47f3-a4b5-0d18f067dc50",
  "hostname": "executor-svc-5c97596977-5vf59",
  "level": "info",
  "msg": "Daemon Shutting Down",
  "time": "20220509-11:04:01.854Z",
  "version": "4.5.14"
}

 


  • Author
  • Comes here often
  • 29 comments
  • May 12, 2022

It appears these manifests are kept in the `model-store.db` of either the `catalog-pv-claim` or `jobs-pv-claim`. What format is the database?


  • Author
  • Comes here often
  • 29 comments
  • May 12, 2022

I still don’t know what database format `model-store.db` is so I can’t correct any records, but I managed to extract the manifest from the catalog with a hex dump. The error indicates that `ArtifactReferenceGroup` is not a valid entry in the `entries` collection. My backup cluster is still failing due to this. Please advise.

{
  "creationTime": "2022-05-09T10:57:05.341Z",
  "destructionTime": "2022-05-10T10:59:57.196Z",
  "id": "c374f458-cf86-11ec-a067-0ef546421892",
  "meta": {
    "manifest": {
      "action": "snapshot",
      "apiKeys": [
        "/actions.kio.kasten.io/runactions/run-m9dp9v4pwg"
      ],
      "apiMeta": {
        "annotations": null,
        "labels": [
          {
            "key": "k10.kasten.io/policyName",
            "value": "cloud-daily-backup"
          },
          {
            "key": "k10.kasten.io/policyNamespace",
            "value": "kasten-io"
          }
        ]
      },
      "endTime": "2022-05-09T11:04:44.306Z",
      "entries": [
        {
          "artifactReferenceGroup": [
            "c5957844-cf86-11ec-a067-0ef546421892",
            "c5a31150-cf86-11ec-a067-0ef546421892",
            "c5b0ee68-cf86-11ec-a067-0ef546421892",
            "c5b82f2f-cf86-11ec-a067-0ef546421892",
            "c5cb0bf5-cf86-11ec-a067-0ef546421892",
            "c5d53747-cf86-11ec-a067-0ef546421892",
            "c5dab78b-cf86-11ec-a067-0ef546421892",
            "c5dfdfc4-cf86-11ec-a067-0ef546421892",
            "c5e420e2-cf86-11ec-a067-0ef546421892",
            "c5e96d04-cf86-11ec-a067-0ef546421892",
            "c5ee36dd-cf86-11ec-a067-0ef546421892",
            "c5f512d0-cf86-11ec-a067-0ef546421892",
            "c5fac0de-cf86-11ec-a067-0ef546421892",
            "c600262f-cf86-11ec-a067-0ef546421892",
            "c60621bb-cf86-11ec-a067-0ef546421892",
            "c60b5118-cf86-11ec-a067-0ef546421892",
            "c6104b10-cf86-11ec-a067-0ef546421892",
            "c614738b-cf86-11ec-a067-0ef546421892",
            "c6277635-cf86-11ec-a067-0ef546421892",
            "c62e03f3-cf86-11ec-a067-0ef546421892",
            "c6460e3f-cf86-11ec-a067-0ef546421892",
            "c64abda5-cf86-11ec-a067-0ef546421892",
            "c65a41f9-cf86-11ec-a067-0ef546421892"
          ],
          "type": "ArtifactReferenceGroup"
        }
      ],
      "exceptions": null,
      "finalFailure": {
        "cause": {
          "fields": [
            {
              "name": "entry",
              "value": {
                "type": "ArtifactReferenceGroup"
              }
            }
          ],
          "file": "kasten.io/k10/kio/exec/phases/phase/retire_policy.go:177",
          "function": "kasten.io/k10/kio/exec/phases/phase.(*retirePolicyPhase).retireMultiActionPolicyEntries",
          "linenumber": 177,
          "message": "Unexpected manifest entry"
        },
        "fields": [],
        "message": "Job failed to be executed"
      },
      "jobID": "c378bd89-cf86-11ec-ac01-6e30f4d8aeb4",
      "originatingPolicies": [
        {
          "id": "8fec6a66-7136-4cfe-9819-d9ffccf90c41"
        }
      ],

 


Forum|alt.badge.img+1
  • Comes here often
  • 89 comments
  • May 14, 2022

Hello Aaron,

 

Could you please provide us your debug logs so that we may further look into your logs. You can find this by going to Settings>Support>Download Logs 

 

If you could please attach the executor-svc(all three), and logging-svc log file and lastly the k10_debug we will have a good bases to further troubleshoot.

 

Thanks 

Emmanuel

 

 

 

 

 


  • Author
  • Comes here often
  • 29 comments
  • May 17, 2022

These were submitted. Backups still failing.


FRubens
Forum|alt.badge.img+2
  • Experienced User
  • 96 comments
  • Answer
  • May 18, 2022
Aaron Oneal wrote:

These were submitted. Backups still failing.

Hello @Aaron Oneal .

Kasten K10 4.5.15 is out please check the release notes here: https://docs.kasten.io/latest/releasenotes.html 

There is a fix in this release that was related when policies with selective export and independent export retention schedule fail but all the run actions were succeed, related with automatic retirement failure.

I would recommend to try this latest version and maybe if possible recreate your policy or create a new one to test.

Thanks

Fernando


  • Author
  • Comes here often
  • 29 comments
  • May 18, 2022

@FRubens I am already using that version and it still fails after running snapshots if it is not time for export.

E.g. hourly snapshot, daily export.


  • Author
  • Comes here often
  • 29 comments
  • May 18, 2022

Actually, it still fails when time for export too. Looks like I was able to get a single manual run to work but anything scheduled still fails.


FRubens
Forum|alt.badge.img+2
  • Experienced User
  • 96 comments
  • May 20, 2022

Hello @Aaron Oneal ,

Thank you for the information.

Would be great to have your debug logs with the new version (4.5.15), or at least could you please provide the same executor-svc output of the error as you did in this post but with 4.5.15 version, since we had a fix for the same function I would like to see if something changed in the error message from 4.5.14 to the latest version that would help us to investigate.

Regards
Fernando


FRubens
Forum|alt.badge.img+2
  • Experienced User
  • 96 comments
  • May 20, 2022
FRubens wrote:

Hello @Aaron Oneal ,

Thank you for the information.

Would be great to have your debug logs with the new version (4.5.15), or at least could you please provide the same executor-svc output of the error as you did in this post but with 4.5.15 version, since we had a fix for the same function I would like to see if something changed in the error message from 4.5.14 to the latest version that would help us to investigate.

Regards
Fernando

Hi @Aaron Oneal ,

Also if possible can you provide the policy yaml or the screenshot from dashboard of the policy setup , I would like to check the retention/export retention you selected to try to replicate on our side.

Thank you

Regards

Fernando


  • Author
  • Comes here often
  • 29 comments
  • May 20, 2022

Attached to support case. What I noted there is creating a new policy with a new name works. Creating a policy with the old name does not. It appears there must be some old state in the catalog related to the original policy.


Comment