Best Practices For Copying Lots Of Files Machine-to-machine

Home » CentOS » Best Practices For Copying Lots Of Files Machine-to-machine
CentOS 10 Comments

An entire filesystem (~180g) needs to be copied from one local linux machine to another. Since both systems are on the same local subnet, there’s no need for encryption.

I’ve done this sort of thing before a few times in the past in different ways, but wanted to get input from others on what’s worked best for them.

One consideration is that the source filesystem contains quite a few hardlinks and symlinks and of course I want to preserve these, and preserve all timestamps and ownerships and permissions as well.
Maintaining the integrity of this metadata and the integrity of the files themselves if of course the top priority.

Speed is also a consideration, but having done this before, I find it even more important to have a running progress report or log so I can see how the session is proceeding and approximately how much longer it will be until finished… and too to see if something’s hung up.

One other consideration: There isn’t much disk space left on the source machine, so creating a tar file, even compressed, isn’t an option.

What relevant methods have you been impressed by?

10 thoughts on - Best Practices For Copying Lots Of Files Machine-to-machine

  • Hi,

    Rsync can maintain symlinks, hardlinks and give you a progress report as well; not to mention it can resume interruptions should they occur.

    Having said that, even with your space problem, it is possible to use tar to pack files during transfer, on the fly which should be faster than rsync. Just search for “tar over ssh”.

  • I use rsync for such work. It is good at maintaining hard and sym links and timestamps. It can give you a running progress as well.

    One thing I have learned is that crud happens and I loose my local session for some stupid reason or another, thus I often run rsync in a screen shell that I can easily reconnect to.

  • If shutting the machines down is feasible, I’d put the source hard drive into the destination machine and use rsync to copy it from one drive to the other (rather than using rsync to copy from one machine to the other over the network).

  • I’m not so sure about that. Probably the disk is the bottleneck, not the network. Assuming that one of the drives is a 7.2krpm drive it will have a sequential read / write performance of somewhere around 120MB/s although newer spinning drives do seem a tad faster than that. Running uncompressed Rsync over SSH over a Gbit network I have seen more than 100MB/s however if your files are compressible then you can see much better performance.

    If your ethernet network is 100Mbit/s or the source and destination networks are both SSD then yea… what he said :)

    CentOS mailing list

  • Vanhorn, Mike wrote:

    Why? I just rsync’d 159G in less than one workday from one server to another. Admittedly, we allegedly have a 1G network, but….


  • you can parallelize rsync with xargs’s -P (max-procs) option (man xargs).

    rsync -a -f”+ */” -f”- *” source/ server:/destination/ #sync directory first cd source/; find . -type f | *xargs* -n1 -*P0* -I% rsync -az %
    server:/destination/% # 0 to let xargs deal with the num of procs


  • Well, I’ve don’t recall ever having to rsync more than 100G (although I am doing multiple rsyncs of about 86G as we speak), and I’ve never been able to do it with machines on their own, isolated switch (so my rsync’s are competing with everything else on the network), and it’s been a while since I’ve actually tried it multiple ways and measured it, but in my experience I’ve never see the network outperform the system bus.

  • Vanhorn, Mike wrote:

    I wasn’t saying the network outperformed the system bus. Most of the time, though, I don’t have that as a possibility. Usually, all the drive bays are full and in use.

    When we get to terabytes, that’s at least overnight; but a few hundred gig I can do in a day, if I start early.