Understanding VDO Vs ZFS

Home » CentOS » Understanding VDO Vs ZFS
CentOS 10 Comments

Folks

I’m looking for a solution for backups because ZFS has failed on me too many times. In my environment, I have a large amount of data
(around 2tb) that I periodically back up. I keep the last 5
“snapshots”. I use rsync so that when I overwrite the oldest backup, most of the data is already there and the backup completes quickly, because only a small number of files have actually changed.

Because of this low change rate, I have used ZFS with its deduplication feature to store the data. I started using a CentOS-6
installation, and upgraded years ago to CentOS7. CentOS 8 is on my agenda. However, I’ve had several data-loss events with ZFS where because of a combination of errors and/or mistakes, the entire store was lost. I’ve also noticed that ZFS is maintained separately from CentOS. At this moment, the CentOS 8 update causes ZFS to fail. Looking for an alternate, I’m trying VDO.

In the VDO installation, I created a logical volume containing two hard-drives, and defined VDO on top of that logical volume. It appears to be running, yet I find the deduplication numbers don’t pass the smell test. I would expect that if the logical volume contains three copies of essentially identical data, I should see deduplication numbers close to 3.00, but instead I’m seeing numbers like 1.15. I compute the compression number as follows:
Use df and extract the value for “1k blocks used” from the third column
use vdostats –verbose and extract the number titled “1K-blocks used”

Divide the first by the second.

Can you provide any advice on my use of ZFS or VDO without telling me that I should be doing backups differently?

Thanks

David

10 thoughts on - Understanding VDO Vs ZFS

  • My two cents:
    1- Do you have an encrypted filesystem on top of VDO? If yes, you will see no benefit from dedupe.
    2- can you post the stats of vdostats –verbose /dev/mapper/xxxxx (replace with your device)

    you can do something like: “vdostats -verbose /dev/mapper/xxxxxxxx | grep
    -B6 ‘save percentage’

    ———————
    Erick Perez

  • sorry corrections:
    For this test I created a 40GB lvm volume group with /dev/sdb and /dev/sdc then a 40GB LV
    then a 60GB VDO vol (for testing purposes)

    vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    output from just created vdoas

    [root@localhost ~]# vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    physical blocks : 10483712
    logical blocks : 15728640
    1K-blocks : 41934848
    1K-blocks used : 4212024
    1K-blocks available : 37722824
    used percent : 10
    saving percent : 99
    [root@localhost ~]#

    FIRST copy CentOS-7-x86_64-Minimal-2003.iso (1.1G) to vdoas from source outside vdo volume
    [root@localhost ~]# vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    1K-blocks used : 4721348
    1K-blocks available : 37213500
    used percent : 11
    saving percent : 9

    SECOND copy CentOS-7-x86_64-Minimal-2003.iso (1.1G) to vdoas form source outside vdo volume
    #cp /root/CentOS-7-x86_64-Minimal-2003.iso
    /mnt/vdomounts/CentOS-7-x86_64-Minimal-2003-version2.iso
    1K-blocks used : 5239012
    1K-blocks available : 36695836
    used percent : 12
    saving percent : 52

    THIRD copy CentOS-7-x86_64-Minimal-2003.iso (1.1G) to vdoas form inside vdo volume to inside vdo volume
    1K-blocks used : 5248060
    1K-blocks available : 36686788
    used percent : 12
    saving percent : 67

    Then I did this a total of 9 more times to have 10 ISOs copied. Total data copied 10.6GB.

    Do note this:
    When using DF, it will show the VDO size, in my case 60G
    when using vdostats it will show the size of the LV, in my case 40G
    Remeber dedupe AND compression are enabled.

    The df -hT output shows the logical space occupied by these iso files as seen by the filesystem on the VDO volume. Since VDO manages a logical to physical block map, df sees logical space consumed according to the file system that resides on top of the VDO
    volume. vdostats –hu is viewing the physical block device as managed by VDO. Physically a single .ISO image is residing on the disk, but logically the file system thinks there are 10 copies, occupying 10.6GB.

    So at the end I have 10 .ISOs of 1086 1MB blocks (total 10860 1MB blocks)
    that yield these results:
    1K-blocks used : 5248212
    1K-blocks available : 36686636
    used percent : 12
    saving percent : 89

    So at the end it is using 5248212 1K blocks minus 4212024 initial used 1K
    blocks, gives (5248212 – 4212024) = 1036188 1K blocks / 1024 = about 1012MB
    total.

    Hope this helps understanding where the space goes.

    BTW: Testing system is CentOS Linux release 7.8.2003 stock. with only “yum install vdo kmod-kvdo”

    History of commands:
    [root@localhost vdomounts]# history
    2 pvcreate /dev/sdb
    3 pvcreate /dev/sdc
    8 vgcreate -v -A y vgvol01 /dev/sdb /dev/sdc
    9 vgdisplay
    13 lvcreate -l 100%FREE -n lvvdo01 vgvol01
    14 yum install vdo kmod-kvdo
    18 vdo create –name=vdoas –device=/dev/vgvol01/lvvdo01
    –vdoLogicalSize=60G –writePolicy=async
    19 mkfs.xfs -K /dev/mapper/vdoas
    20 ls /mnt
    21 mkdir /mnt/vdomounts
    22 mount /dev/mapper/vdoas /mnt//vdomounts/
    26 vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    28 cp /root/CentOS-7-x86_64-Minimal-2003.iso /mnt/vdomounts/ -vvv
    29 vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    30 cp /root/CentOS-7-x86_64-Minimal-2003.iso
    /mnt/vdomounts/CentOS-7-x86_64-Minimal-2003-version2.iso
    31 vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    33 cd /mnt/vdomounts/
    35 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version3.iso
    36 vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    37 df
    39 vdostats –hu
    40 ls -l –block-size=1MB /root/CentOS-7-x86_64-Minimal-2003.iso
    41 df -hT
    42 vdo status | grep Dedupl
    43 vdostats –hu
    44 vdostats
    48 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version4.iso
    49 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version5.iso
    50 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version6.iso
    51 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version7.iso
    52 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version8.iso
    53 cp CentOS-7-x86_64-Minimal-2003-version2.iso
    ./CentOS-7-x86_64-Minimal-2003-version9.iso
    54 df -hT
    55 ls -l –block-size=1MB
    56 vdostats –hu
    57 df -hT
    58 df
    59 vdostats –hu
    60 vdostats
    61 vdostats –verbose /dev/mapper/vdoas | grep -B6 ‘saving percent’
    62 cat /etc/CentOS-release
    63 history
    [root@localhost vdomounts]#

    ———————
    Erick Perez Quadrian Enterprises S.A. – Panama, Republica de Panama Skype chat: eaperezh WhatsApp IM: +507-6675-5083
    ———————

  • Il 03/05/20 04:50, david ha scritto:
    Hi David, I’m not an expert about vdo but I will try it for backup purpose with rsync + hardlink. I know that this is not an answer you asked, sorry for this.

    Many user said me to use

  • Strahil, I am using about 1012MB for the first ISO. I believe it’s because of compression. From there vdostats –hu reports 5.0G usage and 12% in percentage. With savings of 89% for original + 9 copies of the same ISO.

    ———————
    Erick Perez Quadrian Enterprises S.A. – Panama, Republica de Panama Skype chat: eaperezh WhatsApp IM: +507-6675-5083
    ———————

  • Hi David,

    in my opinion, VDO isn’t worth the effort. I tried VDO for the same use case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller dataset, VDO couldn’t stand up to it’s promises. It used tons of CPU and memory and with a lot of tuning I could get it to kind of work, but it became corrupted at the slightest problem (even a shutdown could do this, and shutdowns could also take hours).

    I have tried a number of things and I use a combination of two things now:
    1. a btrfs volume with force-compress enabled to store the intermediate data – it compresses my data to about 60% and that’s enough for me
    2. use of bup (https://bup.github.io/) to store long-term backups.

    bup is incredibly efficient for my use case (full VM backups). Over the course of a whole month, the dataset only increases by about 30% from the initial size (I create a new full backup each month) – and this is with FULL backups of all VMs every day. bup backupsets can also be mounted via FUSE, giving you access to all stored versions in a filesystem-like manner.

    If you can backup at will you can probably forego the btrfs volume for intermediate storage – that is just a band-aid to work around a specific issue here.

    Stefan

  • I’d like to know what kind of data you’re looking to back up (that will just help get an idea of whether it’s even a good fit for dedupe;
    though if it dedupes well on ZFS, it probably is fine). I’d also like to know how you configured your VDO volume (provide the ‘vdo create’
    command you used). As mentioned in some other responses, can you provide vdostats (full ‘vdostats –verbose’ output as well as base
    ‘vdostats’) and df outputs for this volume? That would help understand a bit more on what you’re experiencing.

    The default deduplication window for a VDO volume is set to ~250G
    (–indexMem=0.25). Assuming you’re writing the full 2T of data each time and want to achieve deduplication across that entire 2T of data, it would require a “–indexMem=2G” configuration. You may want to account for growth as well, which means you may want to consider a larger amount of memory for the ‘–indexMem’ parameter. An alternative, if memory isn’t as plentiful, you could enable the sparse index option to cover a significantly larger dedupe window for a smaller amount of memory commitment. There is an additional on-disk footprint requirement that goes with it. You can look at the documentation [0] to find out those specific requirements. For this setup, a sparse index with default memory footprint (0.25G) would cover ~2.5T, but would require an additional ~20G of storage over the default index configuration.

    [0] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/deduplicating_and_compressing_storage/deploying-vdo_deduplicating-and-compressing-storage#vdo-memory-requirements_vdo-requirements

    Without more information about what you’re attempting to do, I can’t really say that what you’re doing is wrong, but I also can’t say that there are any expectations from VDO yet that aren’t being met. More context would certainly help get to the bottom of this question.

  • I’m sorry to hear you feel that way. I would be interested to understand the situations that you experienced this problem so that it can be addressed better in the future. Did you reach out for any guidance when it was happening?

  • Duplicity works well on CentOS. I had to perform a restore of a website and wiki after I [accidentally] deleted both. Backups are to another machine over SSH scheduled through Systemd.

    A Duplicity-based backup may help protect your data until you get something in place you like better.

    Jeff

  • Rather than dedupe at the file system level, I found the application level dedupe in BackupPC works really well… I’ve run BackupPC on both a big ZFS volume, and on a giant XFS over LVM over MDRAID volume (24 x 3TB disks organized as 2 x 11 raid6 plus 2 hot spares). The BackupPC server I built at my last $job had 30 days of daily incrementals and 12 months of monthlies of about 25 servers+VMs (including Linux, Solaris, AIX, and Windows). The dedupe is done globally on a file level, so no matter how many instances of a file in all those backups ((30+12) * 25), there’s only one file in the ‘hive’. Bonus, BackupPC has a nice web UI for retrieving backups, I could create accounts for my various developers, and they could retrieve stuff from any covered date on any of the servers they had access to without my intervention.

    about the only manual intervention I ever needed to do over the several years this was running involved the Windows rsync client needing a PID file deleted after an unexpected reboot.