Veeam Backup & Replication Monitoring Feature: Tape Drive Alerts


Userlevel 7
Badge +20

Everyday is a school day, and today I found out something really cool that Veeam was doing, that I never knew about because “It Just Works”.

I had an alert generated from a customer system today that I had never seen before. Now I’ve seen plenty of alerts for different backup issues, whether they’re caused by networks, BSODs, disk space constraints etc, but I got surprised by this completely new one.

 

A Tape Drive Alert, but of an unexpected variety

 

Warning: “TapeDrive alert: The voltage supply to the tape drive is outside the specified range.”

As I said above, I’d never seen this warning before! I didn’t know that Veeam was tracking such attributes of the tape drives it uses. So I set about looking up the root cause of the problem and busted out some “Google-Fu” to find who else had these issues in the past and I found this page of Veeam Documentation:

Tape Drive Alerts – Veeam Backup Guide for vSphere

This web page has all of the alert codes, the severity of the issue, a description about the issue and what has caused the issue.

Now some of these errors you might be expecting to see such as “Media Error”, your data is at risk or read/write warnings, but some of the warnings here really go above and beyond what Veeam NEEDS to look at to do its job, examples:

  • Cooling fan failure
  • Voltage supply low/high / Power Consumption low/high
  • Humidity
  • Firmware
  • Redundant Power supply failed.

 

Ok cool, but why is this a big deal?

 

As technologies have evolved, we’ve added more abstract layers and isolations between tiers, and it’s great from some perspectives but restrictive to others. These tiers are often unaware of faults or early warning signs above or below them, for example:

  • In the storage world, an operating system doesn’t know that a disk has failed in a RAID array if it’s controlled by a RAID controller, as long as it can continue to read/write it’s unaware of a problem with the underlying layer.
  • In the networking world, a device doesn’t know that a highly available route has lost one of its paths, the traffic still flows, so it is unaware.
  • In the computing world, a virtual machine doesn’t know the physical host it resides upon has lost a redundant power supply, it is still running and so it is unaware.

Due to these scenarios we expect a lack of interaction between our systems, that we ourselves must aggregate this information, or use dedicated reporting systems to aggregate this information. Some exceptions exist of course, but largely the awareness of other layers is completely siloed. This is a missed opportunity as awareness of each of these scenarios could provide benefits.

Now, Veeam has their own monitoring and reporting product, Veeam One, so surely this would be the perfect time to talk about this and how it will collect all of this information? Wrong!

Veeam Backup & Replication actually is performing some key monitoring tasks built in, with these alerts for the Tape Drive being one of them. Realistically, Veeam only needs to concern itself when data isn’t being read or written successfully, and normally we would expect this siloed approach to our infrastructure, as long as Veeam can write a backup to tape, its job is done, right?

Veeam goes above and beyond here, sensor information provided by the Tape Drive is fed back into the job report, if the Tape Drive is reporting the humidity is too high, Veeam will tell you! Loss of power redundancy? Veeam will tell you that too! And Veeam is going to put that information straight into the output of the tape jobs impacted by this so you can get an immediate scope of what percentage of your tape jobs are at risk!

 

Conclusion

 

Credit needs to be given to Veeam for including this in their base product and not hiding this away behind an additional monitoring tool. Using these sensors to detect potential issues allows customers to proactively replace failing components and product their data availability. I’ve gone through the help center version history and can see this alert table was first added in version 9.5u4 so the functionality has existed for some time.


20 comments

Userlevel 7
Badge +17

Fine feature :sunglasses: Good to know that hardware problems of a tapedrive are shown within Veeam.

I have never seen these alerts up to now. Out tapedrives we are using with Veeam didn’t have a problem since we are working with them….

Userlevel 7
Badge +20

Very interesting as we are using tape more often with VCC so will look in to this.

Userlevel 7
Badge +4

@MicoolPaul : Is there any auto category defined on these alerts like Critical, Major, minor or just warning !

Userlevel 7
Badge +13

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:

Userlevel 7
Badge +17

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:


Mhh, the drives are very robust nowadays.

I know only two ways to kill a drive…. water and dust….

Userlevel 7
Badge +13

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:


Mhh, the drives are very robust nowadays.

I know only two ways to kill a drive…. water and dust….

and this guy

 

Userlevel 7
Badge +17

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:


Mhh, the drives are very robust nowadays.

I know only two ways to kill a drive…. water and dust….

and this guy

 


Ok, ok 😀😀😀

You are right. I forgot the third way - pure violence….

 

Userlevel 7
Badge +20

@MicoolPaul : Is there any auto category defined on these alerts like Critical, Major, minor or just warning !

There’s three categories, “Critical”, “Warning” and “Information”, I’ve submitted feedback for this pages’ documentation as there’s no mention of how this may impact the job success/warning/error status. Typically the notification I had was of the “Warning” type and the job finished with a warning, but I haven’t got a spare stack of tape drives to start trashing them in different ways to test…

I’ll update this if/when Veeam reply to me :slight_smile:

 

@vNote42 @JMeixner you’ve forgotten about the REAL destroyer of tapes… kids! :joy:

 

Userlevel 7
Badge +17

@vNote42 @JMeixner you’ve forgotten about the REAL destroyer of tapes… kids! :joy:

 

Haha, yes. But I don’t let the kids play with the tape drives. They may destroy a single tape but not the drives :sunglasses:

Userlevel 7
Badge +13

@MicoolPaul : Is there any auto category defined on these alerts like Critical, Major, minor or just warning !

There’s three categories, “Critical”, “Warning” and “Information”, I’ve submitted feedback for this pages’ documentation as there’s no mention of how this may impact the job success/warning/error status. Typically the notification I had was of the “Warning” type and the job finished with a warning, but I haven’t got a spare stack of tape drives to start trashing them in different ways to test…

I’ll update this if/when Veeam reply to me :slight_smile:

 

@vNote42 @JMeixner you’ve forgotten about the REAL destroyer of tapes… kids! :joy:

 

Right! But I think even my kids would need some time to destroy one of these tapes! :zap::rofl:

Userlevel 7
Badge +11

Thx @MicoolPaul for sharing, I also was not aware about those notifications. As you already mentioned, even when you use the product every day, you definitely are still learning new things :-) !

Userlevel 7
Badge +17

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:


No, tape is definitely not dead.

Have a look at this roadmap…
https://blocksandfiles.com/2021/02/04/petabyte-tape-cartridges-are-coming/

Userlevel 7
Badge +4

@MicoolPaul : Is there any auto category defined on these alerts like Critical, Major, minor or just warning !

There’s three categories, “Critical”, “Warning” and “Information”, I’ve submitted feedback for this pages’ documentation as there’s no mention of how this may impact the job success/warning/error status. Typically the notification I had was of the “Warning” type and the job finished with a warning, but I haven’t got a spare stack of tape drives to start trashing them in different ways to test…

I’ll update this if/when Veeam reply to me :slight_smile:

 

@vNote42 @JMeixner you’ve forgotten about the REAL destroyer of tapes… kids! :joy:

 

:thumbsup:

Userlevel 7
Badge +20

Great finding, Michael!

Conclusion (2): tape is still not dead, and voltage outside the specified range will not kill it too! :joy:


No, tape is definitely not dead.

Have a look at this roadmap…
https://blocksandfiles.com/2021/02/04/petabyte-tape-cartridges-are-coming/

No it is definitely not.  We have a service here built around Tape using Veeam and another product for offline backups for clients.

Userlevel 7
Badge +20

As promised, an update on this topic.


I’ve heard back from Veeam’s Technical Writers to confirm that the severity of the alert has no bearing on the tape job’s success within reason, and that a critical alert could still result in a successful tape job, whilst a warning could still be cause for a job failure.

As a result these alerts are provided for convenience and informational purposes to assist in maintaining a healthy tape infrastructure :relaxed:

Userlevel 7
Badge +20

I’ve got another update to share on this topic. The customer who’s environment generated the voltage warning only performs a tape backup once a month. This month is completely failed, so the voltage warning was a great indicator of a pending problem. Well done Veeam for sharing this or the customer would’ve been completely taken by surprise! It’s also sped up the troubleshooting process as we have an idea of what was happening before “the event”.

Userlevel 7
Badge +17

Nice 😎

But… don't you monitor the library to catch such events?

Userlevel 7
Badge +20

Nice 😎

But… don't you monitor the library to catch such events?

Normally we would, this is a customer with an existing tape drive from before we started providing them services. Due to COVID restrictions we’ve not been able to go to site and inventory their physical estate but they claim they have no monitoring capabilities for the hardware… it’s highly doubtful, but it’s been a long story why we haven’t been given all the information necessary. So until then we’ve said we’ll only manage Veeam. Hardware is on them to support!

Userlevel 7
Badge +17

Aha, ok. 😀 it is a single drive...

Most libraries have some monitor and notification features on board.

Userlevel 7
Badge +4

@MicoolPaul : Hats off to you sir, see more posts from you !

Comment