Rsync Question

Home » CentOS » Rsync Question
CentOS 25 Comments

I’m trying to rsync a 8TB data folder containing squillions of small files and it’s taking forever (i.e. weeks) to get anywhere. I’m assuming the slow bit is check-summing everything with a single CPU (even though it’s on a 12-core server ;-( )

Is it possible to do something simple like scp the whole dir in one go so they’re duplicates in the first instance, then get rsync to just keep them in sync without an initial transfer?

Or is there a better way?

Thanx,

25 thoughts on - Rsync Question

  • use the rsync mode that goes off file timestamp and size. the checksuming block algorithm is only useful on large files that get small random block changes.

  • I use tar and ttcp for an initial transfer:

    On the receiving end:

    ttcp -l5120 -r | tar xf –

    On the transmitter:

    tar cf – . | ttcp -l5120 -t name-of-receiver

    Note: The files are transmitted without encryption.

    I easily get 110 Mbytes/sec. on a gigabit connection.

    If you need encryption, and your transfer is CPU limited, you should investigate which cipher to use. In my case arcfour128 is the fastest, so I use:

    rsync –rsh=’/usr/bin/ssh -c arcfour128′ …

    after the initial transfer with ttcp.

    Mogens

  • As far as I can see timestamp and size is the default. I’ve turned off compression and I think I’m getting better throughput. Running 4 rsync tasks and getting sustained transfers for several hours of just over 800Mb/sec :- )

    –Russell

    —–Original Message—

  • I am trying to rsync the named files under /etc for backup purposes. I
    tried:

    rsync -ah –stats –delete -e “ssh -p613 -l root”
    192.168.192.2:/etc/name* /home/rgm/data/htt/httnet/homebase/new/etc

    The stats shows it sees all the files, but only moves the dir /etc/named and the files within it.

    It does not move the /etc/name* files (like /etc/named.conf).

    By file count, it is ‘seeing’ all the files, but not moving them.

  • Hi Robert,

    First, a trailing slash specified at the end of the source directory means ‘copy everything underneath the specified directory without copying the directory, itself.’ Omitting the trailing slash will cause rsync to first create the directory at the target and then copy the specified contents underneath it. Your invocation ‘/etc/name*’ probably needs to be split into successive command strings, one specifying the directory to backup and the other(s) specifying the file(s) under /etc that you want to backup, as well.

    Also:

    Do you really mean ‘-h’ human-readable vs. ‘-H’ preserve hard links?

    Why ‘-e’ (specify remote shell to use)? Are these systems running disparate operating systems?

    I use ‘-v’ so rsync echos what it’s doing in real time to the terminal as opposed to ‘–stats’, but that’s just my personal preference. This allows me to monitor what’s going on in real time and to scroll up afterward to review discreet actions after the fact. There is also the
    ‘-o’ logging capability for those situations where the actions taken might exceed the number of lines buffered by the terminal.

    Since ‘–delete’ implies that you will be synchronizing the source and backup directories in future, you might consider setting up public key authentication between the two systems.

    Since this is a backup, you really should consider preserving ACLs and extended attributes (-A -X,) too.

    Given all of the above, with public key authentication set up, you could then invoke the following command string from the parent directory of the backup (/home/rgm/data/htt/httnet/homebase/new/etc):

    rsync -avAX –delete root@192.168.192.2:/etc/named/ named rsync -avAX –delete root@192.168.192.2:/etc/named.conf named.conf
    … and so on

    hth & regards,

    Carl

  • Yes.

    Somewhere I read that is what you need to run this over SSH. Otherwise you need to have rsyncd running on the remote system.

    This is not an automated system. It is typically a onetime thing to get a backup of what I did to set up a server (or the other way around). I
    have this adversion of leaving my public key all over the place.

    Maybe, but then I can’t edit it on my system if it is root:named!

    In /etc there are 4 named.* files. Do I have to do each separately?

  • 8< - - - - - trimmed - - - - - >8

    Okay, good to know you’re cognizant of the difference. ;-)

    My understanding is that rsync defaults to SSH these days.

    8< - - - - - trimmed - - - - - >8

    Then ‘–delete’ is superfluous. It really only comes into play on subsequent synchronizations when files that exist in the target need to be deleted because they no longer exist at the source.

    Your public key is only useful when it is paired with the corresponding private key. And, since only you have access to the private key and it resides only on systems where you’ve installed it, I think your aversion is not well thought out.

    Then you’re not creating a true ‘backup’ as I understand the term. :-)
    In any case, you can still edit these files as root. They’ll retain the original attributes when you save them.

    8< - - - - - trimmed - - - - - >8

    Two lines, assuming first pass without public key authentication, executed from the parent directory

    (/home/rgm/data/htt/httnet/homebase/new/etc)

    of the backup that you’re creating:

    rsync -avAX root@192.168.192.2:/etc/named/ named
    –> Since the default protocol is SSH, you will be prompted for the remote system’s root password.

    rsync -avAX root@192.168.192.2:/etc/named.* .
    –> This will pick up the four named.* files and land them in the local pwd due to the ‘dot’ passed to rsync as the target.

    If you need to ‘freshen up’ a local snapshot, reinsert the ‘–delete’
    flag.

    hth & regards,

    Carl

  • I tried your rsync command and it worked on my LAN over ssh. The following was placed in the destination directory:

    drwxr-x— 2 root smmsp 4.0K Jul 28 21:05 named/
    -rw-r—– 1 root smmsp 1.6K Oct 30 2013 named.conf
    -rw-r–r– 1 root smmsp 2.4K Jul 28 21:05 named.iscdlv.key
    -rw-r—– 1 root smmsp 931 Jun 21 2007 named.rfc1912.zones
    -rw-r–r– 1 root smmsp 487 Jul 19 2010 named.root.key

  • What?! Straight from the documentation:

    ” -e, –rsh=COMMAND specify the remote shell to use”

    I didn’t explicitly state that it was “wrong,” just implied (correctly)
    that it was unnecessary.

    regards,

    Carl

  • I ran it again but with:

    rsync -ah –stats –delete -e “ssh -p613 -l root”
    192.168.192.2:/etc/name* /home/rgm/data/htt/httnet/homebase/new/newetc

    And the newetc directory was created with all the files. I again ran:

    rsync -ah –stats –delete -e “ssh -p613 -l root”
    192.168.192.2:/etc/name* /home/rgm/data/htt/httnet/homebase/new/etc

    And none of the /etc/name.* files get moved. Strange.

    Well I got my backup.

  • Mark, I would prefer it if you would please send your replies to the list and not to me personally. I *do* get them if you send them to the list.

    All of these fine grained points you’ve made regarding options that are
    “required for (good) automation” are irrelevant. Robert’s post concerned invoking rsync manually “for backup purposes” — not for
    “automation” as you’re envisioning. He wrote “This is not an automated system. It is typically a onetime thing …”

    Moreover, my “Why ‘-e’?” query was paired with a second very specific question: “Are these systems running disparate operating systems?” The implication seems pretty clear to me — not that ‘-e’ was somehow
    “wrong” but simply likely unnecessary in his scenario. I stand by that somewhat informal (less strict than you) evaluation unless and until we learn that he’s actually operating on truly disparate systems.

    regards,

    Carl

  • I just tried the following:

    rsync -ah –stats “ssh -p613 -l root” 192.168.192.2:/root/samba.PDC/
    /home/rgm/data/htt/httnet/homebase/new/root/

    And it failed with:

    Unexpected remote arg: 192.168.192.2:/root/samba.PDC/
    rsync error: syntax or usage error (code 1) at main.c(1330) [sender=3.1.1]

    I tried again with:

    rsync -ah –stats -e “ssh -p613 -l root” 192.168.192.2:/root/samba.PDC/
    /home/rgm/data/htt/httnet/homebase/new/root/

    and it worked. This is what I read from the manpage, that “-e” is needed for the SSH command.

  • From ‘man rsync’:

    -p, –perms This option causes the receiving rsync to set the destination
    permissions to be the same as the source per- missions.

    Try this:

    rsync -ah –stats –delete root@192.168.129.2:613:/etc/dhcp/ /home/rgm/data/htt/httnet/homebase/new/dhcp

  • SSh is not parsing the port the way http does, it seems:

    $ rsync -ah –stats root@192.168.129.2:613:/etc/dhcp/
    /home/rgm/data/htt/httnet/homebase/new/dhcp ssh: connect to host 192.168.129.2 port 22: No route to host rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
    rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]

    The reason why I change my SSH port is a simple way to keep port knocker robots away. Different hosts use different ports…

  • It’s fairly common on technical lists like this for people to become fixated on minor problems or inefficiencies in a command or configuration when the actual issue is more complicated. Try not to let it bother you too much.

    As for your original question, I’m not sure why the files weren’t copied as expected. I ran your exact command with only the server names, port, and destination directory changed and all the files were copied. You can try running the command without a destination and it should return a list of files found in the source directories. If it doesn’t list everything, there is some problem with how the source files are being specified or something preventing them from being read. Selinux is always suspected in cases of strange permission problems.

  • There is really something going on, perhaps with Selinux. The files DID
    get copied, but I cannot get any tool to list them on my F22 target! I
    copy back to someplace else on the C7 system, and there they all are.
    If I copy into a nonexistant folder on my F22 notebook, they all show in the new directory. Freaky. Same thing when I backed up my /etc/dhcpd/
    directory. Only dhcpd.conf shows in the file listing. The others came across, but don’t show.

LEAVE A COMMENT