Skip to main content

Mongo DB and Veeam

Traditional relational database systems such as Oracle, Microsoft SQL, Postgres are built around the concept of tables of information organized into Rows and Columns. These tables/rows/columns store data in a rigid structure. When a new row is created for a table, it should have all the same fields as other entries in the same table. Some fields in a table can reference information from other tables through foreign keys. There is a Schema to define what fields exist in which table and how tables cross reference each other. Data is entered and retrieved through a Structured Query Language which is similar across each of the different vendors but might have slightly different syntax. Each platform has a unique way to manage scale and improve performance and how storage is managed, as well as implementations of other features like replication and high availability.

Relational databases aren’t the only option for storing data. NoSQL databases offer more flexibility in how data can be organized per entry. Some common categories for NoSQL databases are: Key-value pair, Document-oriented, Column-oriented, Graph-based and Time series. AWS has a high-level overview of the different categories that can be found here Types of NoSQL databases - Choosing an AWS NoSQL Database (amazon.com)

MongoDB falls into the Document Database category of NoSQL database servers and is one of the first platforms people consider when thinking of NoSQL. In MongoDB, a database is made up of one or more Collections (similar to tables) which store unique entries as Binary JavaScript Object Notation (BSON) documents. A document is a group of data that is related, kind of like a row in a table. Some fields in a document can hold subdocuments. You can find a list of data types a document field can have here BSON Types - MongoDB Manual v7.0, which includes documents, arrays and arrays of documents. From an Architectural perspective there are a few different deployment types:

  • Standalone instance which is primarily used for dev/test or non-production,
  • Replica Sets which create a Master which replicates data to secondary nodes for failover purposes or for reading data. Since all writes go to the master, it can become a bottleneck as adding more nodes doesn’t improve write performance.
  • Sharding allows you to split up the database across nodes making multiple nodes able to be the Primary/Master for a section of the database and could improve performance depending on the structure of the data and which shard data lives on. Sharded data is typically also part of a replica set, ensuring extra copies for HA/durability.

Mongo DB offers a hosted PaaS of its own called Atlas and has managed offerings with Hyperscalers such as AWS. Mongo also can be deployed on one’s own servers in Community Edition or Enterprise Edition. It is common to see to see MongoDB installed inside a container and run on Docker or in a pod in a Stateful set in Kubernetes. MongoDB can be installed on Linux, Mac and Windows. Production instances are typically setup on Linux servers.

When deploying MongoDB, by default, authentication is turned off allowing anyone to connect to MongoDB from a shell or console assuming MongoDB is setup to listen on an IP other than the local loopback address. The default is to only listen on 127.0.0.1. To setup a Replication Set, you must bind the mongo daemon to listen on other IPs/DNS names to allow the nodes to communicate with each other. It is recommended to setup and enforce authentication. There is an admin database that stores information about authentication/users/roles for the different databases on the server.

Veeam Backup for MongoDB documentation is in the Enterprise Plug-ins documentation found here: MongoDB Backup - Veeam Plug-ins for Enterprise Applications. The current version only supports deployments using Replica Sets on self-managed servers. Standalone and Sharded configurations aren’t supported. Unlike other Enterprise Plug-ins in Veeam which allow the native backup software to send data to VBR repositories through a Plug-in interface and data mover, the MongoDB Backup uses the Veeam Agent for Linux to process data. There is a Mongo Agent deployed which collects Database hierarchy and communicates with the mongo daemon to orchestrate backup operations. Deployment involves creating a Protection Group from the Inventory view in VBR and specifying one of the nodes in the Replica Set and specifying a MongoDB user that has the required permissions in the Admin database. User accounts for the OS can then be specified for each node in the cluster to allow the install of the Veeam Software. When a new backup is started or if the previous node isn’t available to perform the backup, Veeam will use the following to determine which server to pick:

  1. Excludes arbiter nodes. These nodes are used to provide quorum and do not store data.
  2. Excludes unhealthy and unreachable nodes.
  3. Excludes hidden nodes.
  4. From the nodes that are not excluded, Veeam Mongo Agent selects a secondary node with a call back delay below 30 seconds.

It’s possible to tell the software to select the Primary node to perform the backup. Because the job uses the Veeam Agent for Linux engine to create a blksnap of the storage and read data, we protect the databases and write them to a normal Veeam Backup format using Forward incremental and support most of the same options as VAL such as Hardened Repos, dedupe appliances and direct to object. The Veeam Backup for Mongo doesn’t backup the entire Operating System and can be paired with Veeam Agent for Linux jobs to capture the rest of the server. Along with backups, there is a restore explorer that allows the backup administrators to restore a single or multiple collections or even an entire instance of a Replication Set.

Thanks for the explanation! I’ve been asked about supporting MongoDB for some time, and now we have at least a partial answer. Atlas might be in the future?


Comment