KVM Vs. Incremental Remote Backups

Home » CentOS » KVM Vs. Incremental Remote Backups
CentOS 11 Comments

Hi,

Up until recently I’ve hosted all my stuff (web & mail) on a handful of bare metal servers. Web applications (WordPress, OwnCloud, Dolibarr, GEPI, Roundcube) as well as mail and a few other things were hosted mostly on one big machine.

Backups for this setup were done using Rsnapshot, a nifty utility that combines Rsync over SSH and hard links to make incremental backups.

This approach has become problematic, for several reasons. First, web applications have increasingly specific and sometimes mutually exclusive requirements. And second, last month I had a server crash, and even though I
had backups for everything, this meant quite some offline time.

So I’ve opted to go for KVM-based solutions, with everything split up over a series of KVM guests. I wrapped my head around KVM, played around with it (a lot) and now I’m more or less ready to go.

One detail is nagging me though: backups.

Let’s say I have one VM that handles only DNS (base installation + BIND) and one other VM that handles mail (base installation + Postfix + Dovecot).

Under the hood that’s two QCOW2 images stored in /var/lib/libvirt/images.

With the old “bare metal” approach I could perform remote backups using Rsync, so only the difference between two backups would get transferred over the network. Now with KVM images it looks like every day I have to transfer the whole image again. As soon as some images have lots of data on them (say, 100
GB for a small OwnCloud server), this quickly becomes unmanageable.

I googled around quite some time for “KVM backup best practices” and was a bit puzzled to find many folks asking the same question and no real answer, at least not without having to jump through burning loops.

Any suggestions ?

Niki


Microlinux – Solutions informatiques durables
7, place de l’église – 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32
Mob. : 06 51 80 12 12

11 thoughts on - KVM Vs. Incremental Remote Backups

  • For Fedora Infrastructure we use a three prong approach
    1. Kickstarts for the basic system
    2. Ansible for the deployment and ‘general configuration management’
    3. rdiff-backup of things which ansible would not be able to bring back.

    So most of our infrastructure is KVM only and the only systems we have to kickstart by ‘hand’ are the bare metal. The guests are then fired off with an ansible playbook which uses libvirt to fire up the initial guest and kickstart from known data. Then the playbook continues and builds out the system for the rest of the deployment. [Our guests are also usually lvm partitions so we can use LVM tools to snapshot the system in different ways.]

    After it is done there are usually scripts which do things like do ascii dumps of databases and such.

    As you pointed out this isn’t the only way to do so. Other sites have a master qemu image for all their guests on a machine and clone that instead of doing kickstarts for each. They also do snapshots of the images via lvm or some other tool in order to make backups that way.

    hope this helps.


    Stephen J Smoogen.

  • What *I* do for backing up KVM VMs is that I use LVM volumes, not QCOW2
    images. Then I take a LVM “snapshot” volume, then mount that locally /
    readonly on the host and use tar (via Amanda). Another option is to install Amanda’s client on the VM itself and use Amanda to use tar (running on the VM)
    — I use the latter to deal with VMs that have a FS that it not mountable on the host (usually due to ext4 version issues — CentOS 6’s mount.ext4 did not like Ubuntu’s 18.04 ext4 fs). I have always found using container image files with VMs a bit too opaque.

    Since you are using QCOW2 images, you best option would be to treat the VMs as if they were just bare metal servers and rsync over the virtual network
    (ala ‘rsync -a vmhostname:/ backupserver:/backupdisk/vmhostname_backup/’) and not even try to backup the QCOW2 image files, except maybe once in awhile for
    “disaster” recovery purposes (eg if you need to recreate th VM from scratch from a known state).

    At Wed, 31 Mar 2021 14:41:09 +0200 CentOS mailing list wrote:

  • As others pointed out – LVM would be a smart solution and BTW rsnapshot supports LVM snapshot backups.

    If you want a raw approach against the image file, then use a deduplication backup tool (block based backups).

  • We’re doing rsnapshot based backups for everything, VMs and bare metal systems. We don’t care about KVM image files for backups.

    When a new host is included in the backup, we first do a hard link based copy on the backup server of another, similar server. Then, the most of the OS is already there on the backup server and real backup consumes only little space.

    The only problem we had with rsnapshot is that rsync by default can’t handle a lot of hard links. We’re now using our own build of rsync 3.2.3
    with –max-alloc=0 and multi million hard links are not a problem anymore.

    Regards, Simon

  • Il 2021-03-31 14:41 Nicolas Kovacs ha scritto:

    Hi Nicolas, the simpler approach would be to use a filesystem which natively supports send/recv on another host.

    You can be tempted to use btrfs, but having tested it I strongly advice against it: it will horribly fragments and performance will be bad even if disabling CoW (which, by the way, is automatically re-enabled by snapshots).

    I currently just use ZFS on Linux and it works very well. However, using it in CentOS is not trouble-free and it has its own CLI and specific issues to be aware; so, I understand if you don’t want to go down this rabbit hole.

    The next best thing I can suggest is to use lvmthin and XFS, with efficient block-level copies done to another host via tools as bdsync
    [1] or blocksync [2] (of which I forked an advanced version). On the receiving host, you should (again) use lvmthin and XFS with periodic snapshots.

    Finally, I would leave the current rsnapshot backups in-place: you will simply copy from a virtual machine rather than from a bare metal host. I
    found rsnapshot really useful and reliable, so I suggest to continue using it even if efficient block-level backup are taken.

    Just my 2 cents. Regards.

  • Le 31/03/2021 à 21:35, Gionatan Danti a écrit :

    First of all, thanks to everybody for your competent input.

    Indeed, there’s (almost) nothing wrong with Rsnapshot. It even saved me on March 7th when my main production server crashed.

    The problem with using Rsnapshot on the VM’s filesystems rather than backing up the whole VM is the time it takes to restore all the mess.

    Niki


    Microlinux – Solutions informatiques durables
    7, place de l’église – 30730 Montpezat Site : https://www.microlinux.fr Blog : https://blog.microlinux.fr Mail : info@microlinux.fr Tél. : 04 66 63 10 32
    Mob. : 06 51 80 12 12

  • Hi Niki,

    I’m using a similar approach like Stephen’s, but with a kink.

    * Kickstart all machines from a couple of ISOs, depending on the requirements (the Kickstart process is controlled by Ansible)
    * Machines that have persistent data (which make up about 50% in average) have at least two virtual disk devices: The one for the OS (which gets overwritten by Kickstart when a machine is re-created), and another one for persistent data (which Kickstart doesn’t touch)
    * Ansible sets up everything on the base server Kickstart provides, starting from basic OS hardening, authentication and ending with monitoring and backup of the data volume
    * Backup is done via Bareos to a redundant storage server

    That way I can reinitialise a VM at any time without having to care for the persistent data in most cases. If persistent data need to be restored as well, Bareos can handle that as soon as the machine has been set up via Ansible. OS files are never backed up at all.

    An improvement I’m planning to look into is moving from Kickstart to Terraform for the provisioning of the base machines. Currently it takes me about 10 minutes to recreate a broken VM provided the persistent data is left intact. Cheers,

    Peter.

  • Whenever I read such things I’m wondering, what about things like log files? Do you call them OS files or persistent data? How do you back’em up then?

    Regards, Simon

  • Hi Simon,

    I don’t.

    All relevant logging is centralised to a server cluster running Graylog.

    Regards,

    Peter.

  • … and, because I forgot to mention it: Yes, that server cluster has a “persistent data” device.

    Regards,

    Peter.

  • All the same, backing up the VM filesystem from within the VM is the best way to back them up using rsnapshot.

    rsnapshot’s approach of hard links and rsync necessarily means that each time a file changes, the copy in the backup set consumes the entire file size if any byte in the origin file has changed. If you’re backing up VM
    images, you’re giving up all of the efficiency that rsnapshot was designed for.

    I’d note that your original message said that you were transferring the entire VM image.  That *shouldn’t* be the case. rsync should be transferring only the changed bits over the network, but on disk you’ll have an entirely new file.

    There are a few ways you can work around that with rsnapshot, but I’m not aware of an easy solution.

    One option would be to use btrfs as your backup volume and write wrapper scripts for cmd_cp and cmd_rm.  Rather than the default behavior, you’d want to create a snapshot (for cmd_cp) and remove snapshots (for cmd_rm).

    The other option that comes to mind would be to use either XFS or btrfs as your backup volume and write a wrapper script for cmd_cp.  This would be simpler, the script would just be:

        #!/bin/sh
        exec cp –reflink=always “$@”

    If you pursued either option, you’d want to modify the rsnapshot rsync_long_args setting, and add –inplace.

    Those two approaches would take advantage of CoW filesystem capabilities to conserve disk space.  If you decide to pursue them, bear in mind that
    “du” will report that each of the resulting VM images are full size, even though that’s not really the case.  The only way (that I know of)
    to accurately measure disk use will be to run “df” before a backup and after, and compare the disk use of the filesystem.