Skip to main content
Question

Block size for Kasten


Hi,

 

I am testing Kasten K10, backing up PostgreSQL in Kubernetes to Scality ARTESCA which is an S3-compatible object storage. I see that the average object size is about 20MB, but my data was generated using pgbench.

I couldn’t find it anywhere in the documentation/community/blog posts, the average object size 20MB or is it 10MB if data is actually compressible?

Hello @rahulpadigela, thank you for trying Kasten!

I am having a little bit of trouble understanding your question: block size applies to block storage, which should not apply to object storage.

From https://docs.kasten.io/latest/operating/footprint.html#requirement-types:

Backup Requirements: Resources for backup are required when data is transferred from volume snapshots to object storage or NFS file storage. While the backup requirements depend on your data, churn rate, and file system layout, the requirements are not unbounded and can easily fit in a relatively narrow band.

When Kasten uses object storage for an artifact repository, you will not see a direct, immediate translation of each backup artifact exported to the repository on object storage. The repository contains metadata for management as well as compression, encryption of each artifact, etc. Over time, the repository is managed for deduplication and removal of artifacts and other expired data. This can be a complex topic, I’m still learning the details (for instance deduplication may be handled earlier and later in the process), but I’ve highlighted some of the key characteristics that go into the operations.

To summarize, Kasten optimizes repository storage over time and an immediate one to one comparison of your first back up to the repository reveals that there is more going on! Many aspects of your data usage will impact the repository and the resources consumed.

I hope this is helpful, please let us know if you have further questions.

--Mark


Thanks @mark.lavi for the response!

Sorry, I was referring to the chunk size that Kasten would use to upload objects to an S3 compatible object storage. Block size is what Veeam typically uses within their glossary of terms to mean chunk size, so I was using that.

What I have seen is Kasten is using a similar data aging technique that Veeam Backup and Replication does during deduplication and removing expired chunks/objects.


Does anyone have any experience you can share when you are using Kasten with an S3-compatible object storage?


Following up on the chunk size to S3 question: since this would be a network transmission, I believe we are talking about network packets and MTU for size.

If you are asking about what chunk is used for dedupe and expiration inside the artifact repository: I believe that it can vary because we are dealing with all sorts of objects, metadata, and blocks, very much depending on your data characteristics, as mentioned above. I believe it is dynamic, not fixed, because Kasten cannot use a one size fits all possible use cases.

I can try to find out more, but perhaps I can ask: what are your concerns?


Thanks! It’s not the MTU size, I am just trying to find out the object size we will see in the S3-compatible object storage. My plan is to specify this in our documentation as it will help with sizing and performance testing efforts.


It is entirely dependent on your workload, I’m sorry that I can’t give you a better calculation: there are too many variables as documented above. If you can provide those, then you can estimate what will be in the repository and add overhead for the repository metadata, which will depend on the inputs.

I would make a gross capacity estimate based on your inputs, then reduce it by some factor that comes from experience after reaching a somewhat steady state of retained artifacts in your repository. This is why we have our sales engineers assist with their field experience, have you considered that?

I hope this helps illuminate the components of an answer and the difficulty involved.


Comment