Skip to main content

Another interesting case from the field today. I had a customer that had historically cloned the disk of their AIX platform, without any further modification. This had suited them fine for years until the Veeam Agent for AIX came along and wanted to utilise some supposedly unique attributes of the installation as the basis of its UUID.

This manifested in a bizarre scenario that didn’t appear during testing, but started when the agents were deployed out to all the relevant servers. Instead of seeing a long list of hosts with the Veeam Agent for AIX deployed, I saw no more than four.

As the Veeam Agents were being manually rolled out, I didn’t know yet that this was actually an issue I was seeing, but instead my first assumption was a slow rollout was being performed.

It wasn’t long however before I was alerted to an issue by the AIX team. Some installations were failing, with the following error:

cli    | Thread hEstablish connect to VBR <VBR.FQDN.TLD>:10006; <VBRHOSTNAME>:10006; <VBRIPAddress>:10006 by certificate.]. Failed.
Thread hEstablish connect to VBR <VBR.FQDN.TLD>:10006; <VBRHOSTNAME>:10006; <VBRIPAddress>:10006 by certificate.](3117) failed.
| >> |Bad auth result: PersonalCertificateNotFound
| >> |--tr:CSslVbrConnection: Failed to perform handshake with VBR server.
| >> |--tr:at (/home/teamcity/src/VmbPlatformLib/vmbplatform/vbr/conn/SslVbrConnection.cpp, 179: void vmbplatform::vbr::CSslVbrConnection::ProtocolHandShake(vmbplatform::CBinaryStream&))
| >> |Unable to authenticate on backup server <VBR.FQDN.TLD>:10006; <VBRHOSTNAME>:10006; <VBRIPAddress>:10006.

I wanted to be sure that there was no interruption or interception in the communication between the host and the VBR server, and I found earlier in the log that all was as expected, ruling out SSL interception and firewall issues:

       | Thread started.  Role: '(async) reconn. sender {d57ac3a0-6251-48bb-ad11-0e5e36475823}', thread id: 2829, parent id: 1546.
alg | Sender stage: running send cycle. Resend list: l0].
cli | Thread hEstablish connect to VBR <VBRIPAddress>:10006 by certificate.].
vmb | Authenticating against backup server e<VBRIPAddress>:10006] with certificate i<Certificate Thumbprint>]
| Default CA certificates will be loaded by openssl api
| CA certificates was loaded from OS specific paths
vmb | Validate VBR certificate.
vmb | VBR certificate subject: /CN=Veeam Backup Server Certificate.
vmb | VBR certificate issuer : /CN=Veeam Backup Server Certificate.
vmb | VBR certificate is self signed: true.
vmb | Client certificate subject: /CN=<Certificate UUID>.
vmb | Client certificate issuer : /CN=Veeam Backup Server Certificate.
vmb | Signature check passed.
vmb | Validate VBR certificate. ok.
cli | Thread hEstablish connect to VBR <VBRIPAddress>:10006 by certificate.]. ok.

Seeing this output meant that the agent could indeed connect to the VBR server successfully, and see its certificate, which I manually validated on the server.

With this ruled out, I speculated that the VBR server might’ve been under stress and not actually saved to its database the client’s certificate, and we should perform a full reconfiguration using the following:

stopsrc -s veeamsvc #This will stop the Veeam Service on the AIX server
mv /var/lib/veeam/veeam_db.sqlite /var/lib/veeam/veeam_db.sqlite.bak #This will rename the veeam_db.sqlite file to have a .bak suffix, forcing Veeam to recreate the file on next start up.
veeamconfig mode setVBRsettings --cfg <configfile.xml> --force #This will redeploy the configuration.
startsrc -s veeamsvc #OPTIONAL - If the service doesn't automatically start for whatever reason, we can invoke this here
veeamconfig mode syncnow #OPTIONAL - Veeam should automatically sync after installation, but we can be sure it has by running this, as officially Veeam will sync every 6 hours.

After performing this however, I noticed something perculiar. My list of servers still only contained four AIX hosts, however the server I had been troubleshooting was on this list, and another server had been removed from the list.

 

Perfectly Cloned Servers

 

At this point I had a conversation with the AIX team to present my findings, and I found out that all the servers impacted had been cloned bit-for-bit identically from the same disk, with subsequent hostname & IP address configuration changes afterwards.

This created a problem as, after discussions with Veeam’s Product Management team, it transpires that the boot volume’s (supposedly) unique ID is used to generate the Veeam Agent’s UUID.

To ensure the message never gets lost, this is what Veeam’s “PTide” had to say:

We look at the disk that holds root filesystem and then we look closer on the disk’s first partition. In all cases this is going to be some LVM logical volume. Then we get the LV’s ID and calculate its MD5 hash. The result is the UUID. Generating a new LV ID for cloned machines would resolve the issue completely.

PTide – Veeam R&D Forums

The conversation then evolved into how we generate a new UUID that will be used by Veeam, and there are two options currently available, based on which version of the Veeam Agent for AIX you’re using.

 

Veeam Agent for AIX v4

Within Veeam Agent for AIX v4 you can amend your Veeam.ini configuration file to include a path to a file that contains your UUID.

 

The following command will generate a new file and within the file a UUID, within the formatting that Veeam Agent for AIX expects. If you’re manually generating this file, just know that the file needs to contain nothing more than a UUID, surrounded by the {} curly brackets.

echo "{$(uuid_get)}" > /etc/veeamagentid

Once this is done, you need to perform the following actions:

#Stop Veeam Service with:
stopsrc -s veeamsrv
#Remove or Rename the existing database
mv /var/lib/veeam/veeam_db.sqlite /var/lib/veeam/veeam_db.sqlite.bak
#Use vi to edit the /etc/veeam/veeam.ini file, find the /core] section, and add the following line within this section: agentIdFilePath = /etc/veeamagentid
vi /etc/veeam/veeam.ini
#Start the Veeam Service again
startsrc -s veeamsrv

In some circumstances, you might have a broken connection to VBR that requires a reconnection to the database, to achieve this run the following commands:

#Reset Veeam Agent to standalone mode, ready for re-joining
veeamconfig mode reset --force
#Reattach Veeam Agent to VBR server
veeamconfig mode setVBRsettings --cfg <CONFIGFILE.XML>

At the end of this, your AIX server will be joined using its newly generated UUID!

 

Veeam Agent for AIX v3 (And Possibly Older Versions Too)

I’ve tested this with the Veeam Agent for AIX v3, but I haven’t tried with an older version to confirm any further backwards compatibility.

The v3 version of this agent doesn’t support this additional parameter within the veeam.ini file that we just saw in v4, so we have to do something else to support a different UUID.

For this approach, we’re actually going to replace a Veeam binary with a script.

Veeam supply a binary within ‘/opt/veeam/bin/veeamagentid’ that generates the UUID via the method I detailed earlier in this post. This is the file we’re going to replace. Let’s get started with the following commands:

#Stop the Veeam Service
stopsrc -s veeamsvc
#Backup the veamagentid binary
mv /opt/veeam/bin/veeamagentid /opt/veeam/bin/veeamagentid.bak
#Generate a UUID, create a script to echo this back when executed
echo "echo $(uuid_get)" > /opt/veeam/bin/veeamagentid
#Grant execution permissions to the new file
chmod +x /opt/veeam/bin/veeamagentid
#Start the Veeam Service
startsrc -s veeamsvc

Once again, we’ll likely have a broken connection to VBR that we need to fix, so to resolve this we’ll run the following commands

#Reset Veeam Agent to standalone mode, ready for re-joining
veeamconfig mode reset --force
#Reattach Veeam Agent to VBR server
veeamconfig mode setVBRsettings --cfg <CONFIGFILE.XML>

And that’s it! It’s all fixed!

 

Special Thanks

 

This issue wouldn’t have been so thoroughly unravelled, and therefore this blog post wouldn’t exist without the support of the many awesome Veeam resources.

I’d like to thank the following people at Veeam for their support in providing all of the information contained above:

  • Allan H
  • Ali S
  • Damon D
  • Justin A
  • Milos P
  • PTide

Interesting situation.
Seems that the native AIX solutions for backup (mksysb, etc.) does not rely on the UUIDs.

Thank you for the detailed description.


This was a great article.  I read them before you post them here which is nice as I get the email updates from your blog.  Interesting situation for sure.


Comment