Skip to main content

A Deeper Look at the Smart-Entity

In my Veeam Amazing Object Storage Tips & Techniques Part 4 I provided a high level introduction of the Smart-Entity feature of the Smart Object Storage API (SOSAPI).  In this article, I will explain how our Technical Alliance Partner (TAP) Object First has implemented the Smart-Entity as part of their object storage solution.

Let’s start off with a reminder of how the Smart-Entity feature works:

  1. Veeam Backup & Replication will tell the object storage platform how much data VBR is about to send it as well as for what object (virtual machines, cloud vms, NAS file share, and Physical machines). 
  2. The object storage platform can then use this information to determine what node/endpoint will receive the backup data
  3. The object storage platform then provides VBR the node name/endpoint address to send the data to

Before I dive into the details of how Smart-Entity works, let me define what an entity is.  Veeam Backup & Replication uses backup jobs to protect the object(s) mentioned above within your environment.  An entity is an individual object that is being backed up.  If you had a backup job that was protecting 100 virtual machine objects, you would have 100 entities being protected.

For the rest of this article, I will be referring to this diagram to demonstrate how Object First uses the Smart-Entity:

 

When you create the object storage repository using the repository wizard, you need to specify the S3 endpoint/service point that Veeam Backup & Recovery (VBR) will use to access the object storage bucket:

 

Once the object storage repository is created, you will see the S3 endpoint within the path for the repository:

 

Since Object First is using the SOSAPI, you will see the blue bucket icon next to the repository’s name.  The repository “Type” is “S3-integrated” which is another indicator that the SOSAPI is used by Object First.  Both of those indicators are used to highlight the deeper level of integration Object First and other partners using the SOSAPI have versus traditional S3 compatible solutions.

Object First also implemented the SOSAPI’s Capacity feature which allows VBR to report the repository’s capacity and free space.  Neither of those values are available for object storage repositories that don’t use the SOSAPI.

Because Object First also implemented the SOSAPI Smart-Entity feature, lets see the benefits of it.  The first benefit is the Smart-Entity allows Object First to load balance the backup date being sent to their storage by VBR.  This is done by VBR telling Object First the name of the entity from which the backup data is coming, as well as the size of the backup data.  Based on this information, Object First will determine which node within the cluster to send the backup data.

In the example below VBR is going to backup the virtual machines (VM-1, VM-2, and VM-3).  The process using Smart-Entity is as follows:

  1. All three backup jobs are sent to the S3 Endpoint (192.168.1.100) by VBR along with the Smart-Entity information mentioned earlier (VM name and size of backup job)
  2. Object First decides based on the Smart-Entity information which node to send the VM backup data
  3. Object First will use Smart-Entity to note the node where the backup data was stored

The last step where Object First notes which node the VM backup data was stored is used during subsequent backups and restores.  This is a very important feature of the Smart-Entity.  VBR prefers to store the backup chains for the entity being protected on the same node.  This “intelligent data placement” improves both backup and restore performance.

In the example above, whenever VM-3 is backed up Object First will store the backup on the 192.168.1.101 node.  VM-2 will be stored on the 192.168.1.103 node and VM-1 will be stored on node 192.168.1.102.

The same Smart-Entity information and logic is used whenever an entity needs to be restored by VBR.  VBR will make the restore request to the S3 Endpoint address of 192.168.1.100, but Object First will use the Smart-Entity information it has stored for the endpoint to know which node to get the backup data and return it to VBR so that the restoration can occur more efficiently which results in faster restores.

Object First utilizes the Smart-Entity feature to help facilitate a customer’s lifecycle management of the Object First cluster.  When a customer adds storage and/or nodes to an Object First cluster., the Smart-Entity is utilized to balance capacity, IO performance, and network performance automatically.

Hopefully you now have a better understanding of what the Smart-Entity feature of the SOSAPI is and the value it can provide to an object storage partner who implements it.  By implementing and taking advantage of the Smart Object Storage API’s Smart-Entity feature, Object First has created a deeply integrated solution with Veeam Backup & Replication.  The combination of Object First and SOSAPI can help customers with both their backup and restore operations as well as their Object First storage lifecycle management.

Very cool read @SteveF thanks for sharing this one.  Really want to look at Object First.


Another good article and good to know. Thank you @SteveF 


Amazing @SteveF thanks for sharing


Great article @SteveF!


Nice writeup. Object First is something I’ve been keeping my eye on as well!


Interesting read.

 

What I’m curious about with the “Smart-Entity” implementation is how Object Storage stores their data. I would assume this can’t be some sort of erasure coded data, as that would spread copies of the data across other nodes in the cluster. Hence, if data is spread across other nodes, there wouldn’t be any benefit to grabbing the data only from one node in the cluster (at least, not for restores). If the erasure coded set is large enough, the data could be on every node. If the data is made redundant across multiple nodes (which I hope it is, and probably is) how does the Smart-Entity feature provide any benefit? Or does it say “Grab your data from Node 1 & 2” in the event of a restore, allowing additional speed benefits over one node doing the heavy lifting?

 

I also wouldn’t expect this to work really well as a service provided behind any kind of load balancer, as it would fight with the Smart-Entity implementation of the SOSAPI, nor could one really provide this as an internet facing service, unless you wanted to have a public IP for every node. Having this as a cluster on premises would be quite beneficial though. I would assume the implementation guide for Object First likely specifies to not use any load balancer at all, which leads me to my second point. 

 

Please don’t take my comments or questions as criticisms. I am mainly looking to improve my understanding of Object First and comparing to other Object Storage implementations I know. 


You are correct Object First does not leverage erasure coding. The first backup location must have strong performance capabilities to deal with restores, especially when taking into consideration random i/o operations needed in some type restores as opposed to sequential ones. The problem with erasure coding that it requires a database that slows the whole process down significantly. So, in a nutshell erasure coding is not leveraged in order to improve performance. That said, each node is protected with RAID6, but as Veeam recommends, all users should to follow the 3-2-1-1-0 to get their data tiered to multiple locations, for example with a Service Provider leveraging Cloud Connect. In the case of Object First a load balancer is not needed as stated in the article since the smart entity feature allows Object First to load balance based on the information it receives from VBR.


Really appreciate the response @Geoff Burke . That definitely helps my understanding of how Object Storage works. 

 

Digging into the world of erasure coding gets super interesting. Some implementations require a database (which as you said can slow the whole process down), but some implementations use consistent hashing and the file system to track object metadata, alleviating that bottleneck.

 

All very fascinating, hence my questions around what’s under the hood of Object First. Also makes a lot of sense why some features of the SOSAPI are pointless to some S3 integrations, while useful to others. 


All very fascinating, hence my questions around what’s under the hood of Object First. Also makes a lot of sense why some features of the SOSAPI are pointless to some S3 integrations, while useful to others. 

@TylerJurgens you are correct that most of the features of the SOSAPI are targetted for the on-prem object storage solutions.  I am not saying that public cloud offerings couldn’t use the SOSAPI, but the on-prem solutions certainly can benefit the most at this time by adopting it.  Object First is a great example of that.


All very fascinating, hence my questions around what’s under the hood of Object First. Also makes a lot of sense why some features of the SOSAPI are pointless to some S3 integrations, while useful to others. 

@TylerJurgens you are correct that most of the features of the SOSAPI are targetted for the on-prem object storage solutions.  I am not saying that public cloud offerings couldn’t use the SOSAPI, but the on-prem solutions certainly can benefit the most at this time by adopting it.  Object First is a great example of that.

Also depends on how the on premises implementations use SOSAPI. For example, the Smart-Entity feature would be a huge benefit to Object First, but not really useful to an erasure coded S3 implementation because of the way erasure coding works. Some features definitely could be useful to other offerings, such as the size reporting. An S3 bucket with a quota can show up the quota limit on the repository, rather than ‘unknown’ that you get from an S3 bucket without SOSAPI integration. 

 

I’m really interested in how Veeam is going to develop and improve SOSAPI. Lots of neat features and possibilities!


Comment