Experience with HPE Apollo or other High Density server as backup repository?


Userlevel 6
Badge +1
  • Comes here often
  • 71 comments

Hi,

I’d be interested in getting feedback from people that are already using HPE Apollo’s or other high density servers with local storage as repository server. It seems that a lot of people are currently planning to use them.

  • what is you setup?
  • Number of servers? Number of disks?
  • Which type and size of RAID?
  • Number of VM’s / source data
  • how is your performance?
  • do you use them only as backup target or as copy source as well?
  • which filesystem ReFS/XFS with blockcloning/reflink

61 comments

Userlevel 6
Badge +1

The Apollos are not deployed, we are currently installing them with RHEL 8.2. It will take a couple of days until I’ll be able to test.

 

just curious, why in your case did you choose RHEL 8.2 , instead of (e.g.) Ubuntu?

 

RHEL is the default for all Linux servers here. 

Userlevel 6
Badge +1

Just noticed that my last post was in the wrong thread. Here are the two options I’m thinking about. We’ll probably try out a backup and copy extent on each server.

 

 

Aujor4m.png

 

Userlevel 7
Badge +8

For my curiosity what kind of object storage are you using? Example of performance for copy job?

i will deploy on rhel 8 too, we’r using kickstart or satellite provisionning. Do you deploy hardening on you repo? rhel csi?

Userlevel 6
Badge +1

For capacity tier offloading we use AWS S3 buckets, we want to look into Wasabi in the next couple of weeks now that they have object lock too.

 

The servers were deployed from Linux team, they use satellite. No special hardening yet.

Userlevel 7
Badge +8

Hey Ralf, any news about your test?

Userlevel 6
Badge +1

No more tests 😉 already migrated most of the jobs. I had some problems with sudo/ssh, some jobs failed a lot of data was left in /opt/veeam (configured in registry) and in the Veeam user home, /var is growing rapidly. At one point Veeam could not perform any task on one server as the Veeam user was not able to connect to his nfs home anymore (>100 defunct processes). After disabling sudo and using the root user this did not happen again, but it’s not so nice needing root for this to work properly.

As we are still on v10 this might be much better in v11 with persistent data mover. All of those are no Apollo issues, it’s Veeam + Linux. Performance in real world is still pretty good, I’m just struggling with the available repository task slots as 52 cores per server are be a bit too little with backup + copy + offload tasks. Those 2 Apollos were initially bought only as copy target, but - as usual - mission has changed and they are now also used for backup. We’ll have 2-4 additional Apollos shortly, then the task slot issue should also be solved.

Userlevel 7
Badge +8

Hey @Ralf , when you said var is growing rapidly ? What is about? Logs? Cache?  How many mb/gb per days/jobs?

I’m wonderiing, how did you mount the xfs partition? Are you using lvm?

Userlevel 7
Badge +13

Hey @Ralf , when you said var is growing rapidly ? What is about? Logs? Cache?  How many mb/gb per days/jobs?

I’m wonderiing, how did you mount the xfs partition? Are you using lvm?

I would also be interested if you are using LVM. IMHO it is not necessary for repositories.

Userlevel 6
Badge +1

Yes, we use LVM by default. But not for the Veeam extents. /var has a size of 25 GB now, 17 GB are used, 12 GB Veeam logs.

 

xfs mount options (not much tuning):

… type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=6144,noquota)

Userlevel 6
Badge +1

Just a note: we configured the 4510 servers with the 547FLR-QSFP 40G network adapter. This was not the best idea as the max supported DAC cable lenght is 5m and there is only a MPO transceiver or AOC as alternative. BiDi transceiver like “HPE X140 40G QSFP+ LC BiDi MM” are not supported for this adapter. There is an HPE advisory that certain HPE 40/100 network adapters does not work with BiDi due to a power problem…. As we do not use MPO or AOC in our datacenter, we probably have to replace the adapters now.

Userlevel 7
Badge +8

after all i’m now member of apollo gang :). Don’t forget like me to configure cache of the controller, by default it was 100% to read. The difference was HUGE. Interesting fact my processing rate quadruplet compared to my old dedup appliance on same backup scenario. I will continue my test to a specific restoration scenario to compare read performances on backup repo.

 

fio --rw=write --name=test --size=50G --direct=1 --bs=512k --numjobs=20 --ioengine=libaio --iodepth=16 --refill_buffers --group_reporting

test: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16

...

fio-3.19

Starting 20 processes

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

test: Laying out IO file (1 file / 51200MiB)

Jobs: 11 (f=10): [f(1),W(1),_(1),W(1),_(1),W(1),_(1),W(2),_(4),W(1),_(2),W(4)][99.5%][w=2398MiB/s][w=4795 IOPS][eta 00m:02s]

test: (groupid=0, jobs=20): err= 0: pid=276626: Fri Feb 18 12:28:06 2022

  write: IOPS=5030, BW=2515MiB/s (2637MB/s)(1000GiB/407115msec); 0 zone resets

    slat (usec): min=5, max=204138, avg=2829.68, stdev=8357.51

    clat (usec): min=1612, max=905853, avg=60461.38, stdev=32858.38

     lat (usec): min=1621, max=914383, avg=63291.66, stdev=33526.12

    clat percentiles (msec):

     |  1.00th=[   35],  5.00th=[   39], 10.00th=[   41], 20.00th=[   43],

     | 30.00th=[   45], 40.00th=[   47], 50.00th=[   50], 60.00th=[   53],

     | 70.00th=[   57], 80.00th=[   65], 90.00th=[   90], 95.00th=[  155],

     | 99.00th=[  186], 99.50th=[  197], 99.90th=[  215], 99.95th=[  224],

     | 99.99th=[  241]

   bw (  MiB/s): min= 1528, max= 4374, per=100.00%, avg=2521.54, stdev=17.08, samples=16126

   iops        : min= 3055, max= 8738, avg=5032.83, stdev=34.26, samples=16126

  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.09%, 50=51.84%

  lat (msec)   : 100=39.01%, 250=9.04%, 500=0.01%, 750=0.01%, 1000=0.01%

  cpu          : usr=5.01%, sys=1.05%, ctx=2038785, majf=0, minf=77794

  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued rwts: total=0,2048000,0,0 short=0,0,0,0 dropped=0,0,0,0

     latency   : target=0, window=0, percentile=100.00%, depth=16

 

Run status group 0 (all jobs):

  WRITE: bw=2515MiB/s (2637MB/s), 2515MiB/s-2515MiB/s (2637MB/s-2637MB/s), io=1000GiB (1074GB), run=407115-407115msec

 

Disk stats (read/write):

  sda: ios=0/2046821, merge=0/125, ticks=0/103413090, in_queue=103413090, util=100.00%

Comment