Network Copy Performance Is Poor (rsync) – Debugging Suggestions?

Home » CentOS » Network Copy Performance Is Poor (rsync) – Debugging Suggestions?
CentOS 11 Comments

Hi,

I do have two CentOS 6.6 servers. With a “performance optimized” rsync I
get an speed of 15 – 20 MB/s

The options I use are:

rsync -aHAXxv –numeric-ids –progress -e “ssh -T -c arcfour -o Compression=no -x”

If I copy files by smb to/from the servers I do get 60 – 80 MB/s, a dd
(r/w) on the storages attached gives 90 MB/s on the 1Gbit ISCSI (Source Server) and up to 600MB/s on the 10Gbit ISCSI (Destination Server) storage.

Both servers have plenty of memory and cpu usage looks low.

Currently we dont use jumbo frames. Network over all usage is moderate to low. There are no special sysctl tweeks yet in use.

As mentioned, I’m confused that even with SMB I do get 3 to 4 times better performance.

Any hint and suggestion to track that problem down is welcome!

Thanks and best regards . Götz

11 thoughts on - Network Copy Performance Is Poor (rsync) – Debugging Suggestions?

  • Not an expert in rsync/ssh, but I’m pretty sure it’s ssh’s tcp window size that is the slowness. SSH is trying to leak the minimal amount of information to anyone eavesdropping. If speed is your main concern, http://psc.edu/index.php/hpn-ssh, or rsync to an nfs mount.

  • I’m not certain what the problem could be. But enabling jumbo packets would be the fist thing I would try, after turning off firewall to test if it isn’t involved. Joining “other advises” move: we usually use bbftp and gridftp for massive data transfers. Bbftp:

    http://doc.in2p3.fr/bbftp/

    gridftp is available with globus installation.

    Just my $0.02

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • That *is* pretty slow for sustained writes. Does the same rate hold true for individual large files as it does for lots of small ones? What filesystem are you using on each side?

    It’s worth noting that -X and -A are going to perform filesystem IO that you don’t see on SMB, because it isn’t going to preserve/set ACLs and extended attributes (IIRC). So, one possibility is that you’re seeing a difference in rate because you’re doing lots of small files and filesystem operations are relatively slow.

    You might drop those two options and see how that affects the rate. If you determine that those are the cause of the performance difference, you can turn them back on, understanding that there’s a cost associated with preserving that data.

    Define low. If you’re using top and press ‘1’ to expand the CPU lines, you’ll probably see one cpu with higher “us” percentage, which is SSH
    encrypting the data. What percentage is that? Is there a large value in “sy” or “hi” on any CPU? Probably not since you see good rates using
    ‘dd’ and smb copies, but I’ve seen systems where interrupt processing was a major bottleneck, so I make it a standard check.

  • We routinely have to sync 4TB, which is about 2M files…

    Rsync never does well for us – it just cant push the line at all

    So, this may or may not work for you – but this is a huge problem – so we tried whole excel spreadsheet worth of combinations, every protocol imaginable to make this happen

    In the end, after a year of constant work on this –

    We found if we map a network share from Server Source to Server Destination, and use CIFS protocol to “map a drive” then sync say

    /srv/www -> to /mnt/shadow-www

    It worked at 99% of line rate ONLY if we used the cp command to sync the source and destination

    Cd /srv/www

    root@pas01#cp -R -u * /mnt/shadow-www/

    something to consider if you find yourself not getting “line rate”

    our investigation showed the rsync process even with all switches we found has to “open” the file a bit before it copies it… so rsync sucks for this kind of stuff with 2 MILLION small files – it never gets going moving millions of small files it has to keep reading. There a switch that says don’t do that – but never really helped :)

    Cheers

    —–Original Message—

  • +1 on all above.

    Also, it’s likely that your SSH process is going to limit the transfer. Yet, if you remotely mount the share (cifs/nfs) and do rsync on top of it, it may give you line rate but also may end up transferring data that wouldn’t have to be transferred (specially if you use rsync -c option). It will also transfer over network millions syscalls for reading mtime’s and if your sync was going to transfer just 5% of total payload, it may take longer then do it via ssh.

    If that’s the case (ssh limiting), I would simply consider splitting this process into several rsyncs. Spawn one for each subdir, for example
    (and maybe limit to 4, 8 simultaneous processes). That should scale well if your storage doesn’t complain. It’s an easy shell script to write and that’s what rsync ends up doing anyway.

    Never used, just found it when searching for “parallel rsync”:
    http://moo.nac.uci.edu/~hjm/parsync/ may be useful.

    Marcelo

  • Rsync is going to read the directory tree first, then walk it on both sides comparing timestamps (for incrementals) and block checksums. Pre 3.0 versions would read the entire directory tree before even starting anything else. So, there is quite a bit of overhead with the point being to avoid using network bandwidth when the source and destination are already mostly identical. Splitting the work into some sensible directory tree structure might help a lot. Or if you know it is mostly different, just tar it up and stream it.

  • Probably not? TCP window size is usually only a problem if the sender is capable of sending more data than the window size before the receiver can ACK. On low latency networks, it usually isn’t a problem.

  • Jumbo frames are an excellent option if your system is spending a lot of time processing interrupts, which is why I asked about the “hi” value in
    ‘top’. However, jumbo frames aren’t free. You have to set the same MTU
    on ALL of the hardware in a given broadcast domain (LAN). Because of that, you’ll typically only see that when interrupt processing has been demonstrated as a cost that needs to be reduced.

    In this case, CIFS is performing reasonably well without jumbo frames, so jumbo frames probably won’t solve the problem.

  • Note that rsync only opens files and performs block checksums if the source and destination modification times and sizes don’t match.

    You should not see most files opened during an incremental copy, unless one of the filesystems is MS-FAT, where modification time has a 2 second resolution, and it may not be possible to preserve a timestamp exactly during a copy.