Linux Backup Repository woes

Forum|Forum|5 years ago
May 16, 2021
11 comments
1601 views

Ralf
Comes here often

Hi,

I’m a bit confused about the difference of old Veeam forum and new community discussion board. I just posted in old forum, but as it’s not allowed to post log file content there, I’ll give it a try here too.

I'm setting up our first Linux based backup repository based on Apollo 4510 server (Veeam 10). Now the jobs have different kind of errors. Before I open a case I wanted to know if those problems are known issues with an easy workaround. This all seems to be related to load, as it happens only when multiple backups are running. But there is not much CPU load and no errors in the usual linux logs.

#1 at this time backup jobs were writing active fulls with 1,5GB/s over LAN to the server, CPU was 90% idle (52 cores). Nothing interesting in /var/log messages.

[15.05.2021 15:19:27] <326> Error Failed to upload file D:\Veeam\Backup\VeeamAgent64 to /tmp/VeeamAgent0bc9a8bd-ebd8-44b8-a373-44510aefd89f
[15.05.2021 15:19:27] <326> Error Failed to find terminal prompt: timeout occurred (60 sec) (System.Exception)

#2 Sometimes the extents are just gone. I don't see any warning in Linux, for the Linux server itself the device was present all time.

backup: 15.05.2021 16:40:19 :: Error: DE-WOP-B01-E01-Test extent is offline.
copy: 15.05.2021 16:29:11 :: Error: Some extents storing required backup files are offline

#3 There seems to be a problem with password too sometimes (sudo), but as this is the same server, same job, just another task, it can't be a general problem with permissions.

15.05.2021 15:37:34 :: Error: Permission denied (password).

#4 connection attempts are failing sometimes

12.05.2021 16:50:37 :: Processing SDET2509 Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

#5 /tmp and Veeam user home are not cleaned up

4-7 GB are still in each of those directories, /tmp was only a 4GB partition at beginning but I had to expand it to 14GB to finish a job without errors. Shouldn't Veeam cleanup it's own mess?

-rwxrwxrwx. 1 root root 62964808 May 12 16:45 VeeamAgent655b0e07-2e34-4899-8087-a76ed7a69971
.....

-rwxrwxrwx. 1 xxxxxx xxxxx 62964808 May 12 16:51 79bfc3e6-a1bb-4f44-881c-1625b4f7509b
...

Top

+12

Mildur
Veeam Product Management
Forum|Forum|5 years ago
May 16, 2021

The R&R forums are not intended for free technical support if you face issues. Veeam has a support department for that. Because of that, it’s not allowed to upload log files to the R&D Forum.

You can ask for technical advise or directly talk to veeams product manager there. And of course, If you want to post a technical issue, you can do that. But you have to post your case number for the product manager to have a reference of the issue.

————-

Back to your problem.

I will get soon some HPE Apollo appliances. If I face the same issue, i can report it back to you. I don‘t see any problems at the moment with a small Linux Hardened Repo Server (Ubuntu 20.04, 40TB), but I am not sure about the tmp partition. It‘s possible, that it is on the same partition as the root disk. Not a dedicated partition.

Senior Analyst, Product Management @ Veeam Software

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 16, 2021

Well, I know that there is Veeam support and I’ve created a lot of cases there. But sometimes I got better feedback and hints in forum than from support and usually quicker. Maybe I should then post those questions at reddit where it’s ok to post technical problems.

Anyhow, I’ll create a ticket (probably more as those are different issues) on monday. I was a bit naive to believe Linux repositories would be less trouble than SMB shares ;)

+12

Mildur
Veeam Product Management
Forum|Forum|5 years ago
May 16, 2021

Linux works fine, but I have learned that it needs more care as a simple SMB Share to work 100% flawlessly :)

The community pages here are a good place for this technical issues. You should be able to upload logs here. Someone will have an answer or the experience to comment on your issues :)

Senior Analyst, Product Management @ Veeam Software

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 16, 2021

Is there documentation about Linux specific settings, other than https://www.veeam.com/kb2216?

+12

Mildur
Veeam Product Management
Forum|Forum|5 years ago
May 16, 2021

We have dedicated Teams for managing the linux servers in my company. They install and configure for me the linux OS and hardware. I‘m happy that I hav
I only configure the part in your mentioned KB and the part where the xfs filesystem is created for cfs with reflink support.

I'm glad I don't have to do everything myself under Linux.

Senior Analyst, Product Management @ Veeam Software

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 16, 2021

It’s the same here. But it’s not much that is in this KB. There are other KB’s for ssh settings etc, would be nice to have everything in one place.

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 17, 2021

I’ve found some KB’s and forum threads with more information regarding Linux repositories. I also reduced the concurrent tasks to 26 for each of the 2 volmes, as this looks more realistic.

added registry key on VBR server LinAgentFolder with /opt/veeam, changed permissions to 770 and Veeam owner (https://forums.veeam.com/vmware-vsphere-f24/default-execute-directory-tmp-for-linux-servers-t65091.html)
added registry key ConnectByIPsTimeoutSec with value 1200 (https://www.veeam.com/kb1976)
added some sshd options to solve connection issues, as Veeam in V10 uses ssh/sftp tp copy the data mover to the servers. This changed in V11 and will be much easier (https://www.veeam.com/kb2985)
ClientAliveInterval 30
TCPKeepAlive yes
ClientAliveCountMax 99999
MaxSessions 200
MaxStartups 100:30:200
changed network interface buffers (https://forums.veeam.com/vmware-vsphere-f24/recommendation-for-linux-proxy-t65637.html#p405034)
ethtool -G eno3 rx 2048 tx 2048

ethtool -G eno4 rx 2048 tx 2048

then added....

ETHTOOL_OPTS="-G ${DEVICE} rx 2048 tx 2048"
... to /etc/sysconfig/network-scripts/ifcfg-bond0-port1 and /etc/sysconfig/network-scripts/ifcfg-bond0-port2
created /etc/sysctl.d/99-mellanox.conf with some more tuning parameters (https://community.mellanox.com/s/article/linux-sysctl-tuning)
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=1
net.core.netdev_max_backlog=250000
net.core.rmem_max=4194304
net.core.wmem_max=4194304
net.core.rmem_default=4194304
net.core.wmem_default=4194304
net.core.optmem_max=4194304
net.ipv4.tcp_rmem=4096 87380 4194304
net.ipv4.tcp_wmem=4096 65536 4194304
net.ipv4.tcp_low_latency=1
net.ipv4.tcp_adv_win_scale=1

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 17, 2021

If you have backups and copy jobs between 2 datacenter and lets say 3 servers at each dc, and you have server with 2 RAID volumes, would you use dedicated servers for backup and copy jobs or would you use one volume in each server for backup and the other for copy?

+12

Mildur
Veeam Product Management
Forum|Forum|5 years ago
May 18, 2021

I see two scenarios:

3 Servers in DC1 —> Backup Jobs

3 Servers in DC2 —> Backup Copy Jobs

but only if all your workload is running in DC1. Consider a SOBR with the Raid Volumes included

—————-

If you have workload in both Datacenter, consider a mix of Backup Job and Backup Copy Job on each server.

Backup Job from Workload in DC1 on Raid Volumes in DC1, Copy Job to Raid Volumes in DC2

Backup Job from Workload in DC2 on Raid Volumes in DC2, Copy Job to Raid Volumes in DC1

Senior Analyst, Product Management @ Veeam Software

+18

JMeixner
On the path to Greatness
Forum|Forum|5 years ago
May 18, 2021

But you don’t have a dedicated VBR server in this scenario…

Ok, in such a small environment this is perhaps a bit oversized. :sunglasses:

Ralf
Author
Comes here often
Forum|Forum|5 years ago
May 18, 2021

We have workloads in both DC’s. It’s a stretched setup with mirrored IBM SVC volumes. We are backing up parts in one DC others in the second one. Then we copy the jobs from one side to the over so that we always have either the backup or the copy of a VM available if a DC is down.

We have one VBR server in a mgmt cluster, we are thinking about a cold standby VBR VB where we can import config backup if needed.

Sign up

Login to the community