Need Help To Fix Bug In Rsync

Home » CentOS » Need Help To Fix Bug In Rsync
CentOS 13 Comments

Hi,

I’ve discovered a bug in rsync which leads to increased CPU usage and slower transfers in many situations.

When syncing with compression (-z), certain file types should not be compressed during the transfer because they are already compressed. The file types which are not to be compressed can be seen in the man page section –skip-compress.

Unfortunately skipping the default file types doesn’t work and all transferred data is being compressed during the transfer. This is true for all versions since 3.1.0.

Steps to Reproduce:
1. run ‘rm -f Z ; rsync -azv alpha:z.gz Z’

Actual results, transferred data was compressed during the transfer:
sent 43 bytes received 63,873 bytes 25,566.40 bytes/sec total size is 628,952 speedup is 9.84

Expected results:
No compression should happen for .gz file.

Additional info:
Note that the source file ‘z.gz’ was an ascii text file to show clearly that compression took place.

Please the following BZ for more info:
https://bugzilla.samba.org/show_bug.cgi?id323
https://bugzilla.redhat.com/show_bug.cgi?id16528

I tried to locate the bug in the code but failed. That said I’m not a C
developer and therefore need some help from someone who understands C
better than me. Any help is much appreciated!

Thanks and kind regards, Simon

13 thoughts on - Need Help To Fix Bug In Rsync

  • Tbh, using -z with rsync is almost always a bad idea (unless you’re on some pre-historic type of network link..).

    /Peter

  • Since you state that using -z is almost always a bad idea, could you provide the rationale for that? I must be missing something.

  • I think the “rationale” is that at some point the compression/decompression takes longer than the time reduction from sending a compressed file. It depends on the relative speeds of the machines and the network.

    You have most to gain from compressing large files, but if they are already compressed, then you have nothing to gain from just doing small files.

    It obviously depends on your network speed and if you have a metered connection, but does anyone really have such an ancient network connection still these days – I mean if you have fast enough machines at both ends to do rapid compression/decompression, it seems unlikely that you will have a damp piece of string connecting them.

    P.

  • I can’t speak to that, but the obvious workaround is to use ssh’s compression instead of rsync’s:

    rsync -av -e ‘ssh -C’ remotehost:remote.file local.file

  • I really don’t understand the discussion here. What is wrong with using -z with rsync? We’re using rsync with -z for backups and just don’t want to waste bandwidth for nothing. We have better use for our bandwidth and it makes quite a difference when backing up terabytes of data.

    The only reason why I asked for help is because we don’t want to double compress data which is already compressed. This is what currently is broken in rsync without manually specifying a skip-compress list. Fixing it would help all those who don’t know it’s broken now.

    Thanks, Simon

  • From what I understand that’s not really the same as having compression inside rsync makes rsync transfers even better than only compressing the transport.

    Thanks, Simon

  • I don’t really care if you use -z, but you asked for the rationale, and I gave you it. I’m not telling you what you should do.

    I’ll try and make it simpler – if rsync takes 1 second to compress the file, then 1 second to decompress the file, and the whole transfer of the file takes 11 seconds uncompressed vs 10 seconds compressed, then dealing with file takes overall 12 seconds compressed, vs 11 seconds uncompressed. It’s not worth it.

    But as I said it depends on your network and your machine speeds. It’s up to you to decide what is best in your own situation.

    P.

  • That’s why I asked, I wanted to know if there was something inherently bad with “-z”. I had a situation where PostgreSQL was replicating 16M files every few minutes (“log shipping”) on approximately 10 systems, got behind which resulted in almost continuous file transfer (of mostly null 16M files) and saturated the common link. Specifying compression with file transfer cut transmission time by 5-10x resolving the problem.

  • I appreciate the reply – it keeps me from wondering “is there something I should be concerned about?”. We use a co-location facility where we pay for bandwidth utilization so it’s still an issue.

  • Am 25.03.20 um 19:15 schrieb Simon Matter via CentOS:

    Until this is fixed; as a workaround I would do a two-pass transfer with filters via “.rsync-filter” file and then using rsync -azvF for everything with high compression ratio and rsync -av for all, including compressed data. So, “.rsync-filter” includes the exclude statements for compressed formats. This all makes only sense if the compression ratio is higher then the meta data transfer of the second run …

  • That may invalidate your testing.

    rsync may not depend upon the filename extension but instead check the magic numbers within the file to determine whether to compress or not.

    Jon

  • One could think so but it’s not the case, rsync just uses the file suffix to decide which files to compress. It even lower cases them, leaving good old UNIX .Z file behind… Also a bug IMHO but not so important.

    Simon

  • Hi,

    You are right, bandwidth always costs, always.

    Who pays for it can differ a lot, but there is always someone who pays.

    It can be an individual, a company, the tax payer, whoever, but bandwidth is never free.

    Thanks, Simon