Immutable Storage actual disk used per folder


Userlevel 1

Inherency:

Since we bill by actual disk usage for our client utilizing CloudConnect, the existing reports are non-functional for reporting and billing (They report pre-deduplication/reflink data).  It took a while to figure out, but there is a way to calculate actual disk space used on a per directory on an immutable repository.


Solution:

Using: https://community.veeam.com/blogs-and-podcasts-57/check-reflink-and-spared-space-on-xfs-repositories-244

 

I have extrapolated a script that will give the disk usage of each folder (IE: client) on an immutable repo.  This isn't "Data used"; that is what Veeam reports.  This is "Disk Used".  The actual size on disk after reflinks (duplicated data is only counted once).

 

A note about this script; it appears that the original blog entry is wrong on the size of a block.  They attribute it to 4096... which is true... on disk... but the utility used explicitly gives the information in block sizes of 512:
https://linux.die.net/man/8/xfs_bmap
"units of 512-byte blocks"

 

To use this script, we use a cron task and pipe the output to a mail client on the repo itself (IE: script.bash 2>&1 | mail -s “Immutable storage report for $HOSTNAME” someemail@email.place)

 

#!/bin/bash

for clientDir in `find /backups/disk-01/backups/ -mindepth 1 -maxdepth 1 -type d`
do
echo $clientDir
clientSpaceUsed=$(find $clientDir/*/* -xdev -type f -exec xfs_bmap -l {} + | awk '{ print $3 " " $4 }' | sort -k 1 | uniq | awk '{ print $2 }' | grep -Eo '[0-9]{1,7}' | paste -sd+ | bc | awk '{print $1*512/1024/1024/1024}')
#block sizes of 512 bytes. Divided by 1024 for KB. Divided by 1024 for MB. Divided by 1024 for GB.
echo "$clientSpaceUsed GB"
done

To break down how this works:

For each client directory in “/backups/disk-01/backups/”

output the directory being reported on

run xfs_bmap -l (this tells us all about the blocks in question)
Take columns 3 and 4 (now becomes column 1 and 2, the rest are discarded)
sort by column 1
remove duplicate rows of data (reflinks for fast cloning; keeps a single copy of the data for counting purposes)
Select only column 2 (now becomes column 1)

remove anything other than numbers

Add those numbers together

multiple by block size (512)

divide by 1024 (now KB)

divide by 1024 (now MB)

divide by 1024 (now GB)

output text


6 comments

Userlevel 7
Badge +5

Great post.  Another script to add to my repo. :sunglasses:

Userlevel 7
Badge +4

Interesting script 👍🏼

Do you have a similar solution for ReFS repositories, too?

Userlevel 1

Interesting script 👍🏼

Do you have a similar solution for ReFS repositories, too?


See:
http://dewin.me/refs/

 

He has a tool that will do that for you.  Parsing that output into a script shouldn’t be too hard.  Should be called “blockstat” or something.

Userlevel 7
Badge +5

Interesting script 👍🏼

Do you have a similar solution for ReFS repositories, too?


See:
http://dewin.me/refs/

 

He has a tool that will do that for you.  Parsing that output into a script shouldn’t be too hard.  Should be called “blockstat” or something.

This one is very good for ReFS.  Use it all the time.

Userlevel 7
Badge +4

Interesting script 👍🏼

Do you have a similar solution for ReFS repositories, too?


See:
http://dewin.me/refs/

 

He has a tool that will do that for you.  Parsing that output into a script shouldn’t be too hard.  Should be called “blockstat” or something.

Thank you for your reply.

Yes, I know about blockstat. This tool takes a lot of time to get the results.
I was hoping that there is a faster solution anywhere out there. :sunglasses:

I have a script ready which parses the information via blockstat for each folder in a repository….

Userlevel 5
Badge

Never see this, thanks @ThePlaneskeeper 

Comment