@voarsh Thank you for posting your question and being a patron by using K10 for a long time.
Do you see K10 databases crashing due to storage I/O issues ? Which databases do you see such issues with ? Is it Catalog/jobs/metering ?
We don’t use SQLlite in any of our K10 services except for upstream grafana that we deploy along with K10.
K10 uses a golang based key-value store in all of its stateful services.
@voarsh Thank you for posting your question and being a patron by using K10 for a long time.
Do you see K10 databases crashing due to storage I/O issues ? Which databases do you see such issues with ? Is it Catalog/jobs/metering ?
We don’t use SQLlite in any of our K10 services except for upstream grafana that we deploy along with K10.
K10 uses a golang based key-value store in all of its stateful services.
Oh my bad, I assumed it was SQLlite because I had an IO/unclean shutdown a few days ago and a lot of my SQLlite based PVCs (I have a few), including K10 that were corrupted (it’s funny, because they’re Ceph backed, so there can’t really be corruption, but SQLite images need repairing when it’s not a clean shutdown/image close) - not had this problem with real database engines though… I have had to repair Redis AppendAOF files (unclean close on the file), but there’s a tool for that so easy to fix….... it’s usually the Catalog PVC/database (that has all the backup/K10) (excluding the logs PVC) that has been corrupted several times in the past.
E.G. results in backup jobs always being pending, or showing that it’s snapshotting application components but not progressing…. Catalog pod shows different kind of errors “panics”. I don’t have a detailed error log, needless to say there’s some sort of corruption when it is shut off uncleanly. I always just purge the entire K10 install.
Even more annoying is restoring from DR backups, the issues could have started a fairly long time ago and you need to blind restore to an earlier date you don’t know, and K10 doesn’t “recreate” the backup entries you have via NFS/S3 backups… so if I restore K10 to 1 month ago, my backup I made today/yesterday (for earlier) that’s safely stored in S3/NFS won’t show up under my namespace/application backup……. Which also means I have orphaned S3/NFS backups sitting around unusable
Sorry my reply isn’t so helpful.