Using Rsync To Backup / Restore – When To Use (or Not Use) The -H Option Switch?

Home » CentOS » Using Rsync To Backup / Restore – When To Use (or Not Use) The -H Option Switch?
CentOS 6 Comments

Hello.

After some reading, including the rsync man page, I am still not clear on this:
When using rsync to backup and restore, when should and when should one
*not*
include hard links (by using the -H option switch)?

A simple example case would be backups for one or just a few light-duty local workstations. Is there a simple, clear rule about that, or is it too complicated for that?

6 thoughts on - Using Rsync To Backup / Restore – When To Use (or Not Use) The -H Option Switch?

  • Use it unless the resulting backup run is too slow to be practical –
    which will only be when there are vast numbers of hardlinks in the filesystem which is pretty rare (BackupPC’s archive, for example). That is, if you don’t know whether or not you need it you are better off retaining as much of the original filesystem attributes as possible in your backups. But, keep in mind that it can only reproduce the hardlinks that exist in the portion of the filesystem that is covered in one run. If you do multiple runs covering different subdirectories, it can’t duplicate hardlinks outside of each run.

  • We have been using rsync for backups for years with no issues. We backup Oracle archive logs with rsync evry 15 minutes.

  • /var/lib/yum/yumdb and, to a lesser extent, /usr/share/zoneinfo are two places that use hard links a lot. If you _don’t_ use “-H” you make multiple, independent copies of each file and have no way to restore the original hard link structure. If all you care about is not losing data, then it’s just a space issue. If the ability to restore the original hard link relationships is important, then using “-H” is a must, no matter the performance penalty.

  • It’s probably too site or application specific to give any general advice.

    Run this command across the filesystem you’re going to back up:
    find /path -type f -links +1

    All of the files listed in find’s output have multiple links, and will benefit from using -H.

    The cost associated with -H is that rsync has to keep a table in memory of all of the inodes and paths that it processes. A large filesystem can cause rsync to consume a lot of RAM. If sufficient RAM is available, I would always recommend -H.

  • I don’t know about the actual implementation, but wouldn’t it really only need to track the inodes/paths of the files with >1 link?

  • Okay, thanks guys. It seems that -H sould be included by default, unless there is a specific reason not to.

    Maybe the rsync -a option switch should include hard links by default. Rsync tutorial type information usually lists generic examples such as:

    sudo rsync -avz

    and not addressing the subject of hard links.

    And you weren’t kidding about the number of entries in /var/lib/yum/yumdb. Wow!