Need Help To Fix Bug In Rsync

Home » CentOS » Need Help To Fix Bug In Rsync

March 25, 2020 Johnny Hughes CentOS 13 Comments

Hi,

I’ve discovered a bug in rsync which leads to increased CPU usage and slower transfers in many situations.

When syncing with compression (-z), certain file types should not be compressed during the transfer because they are already compressed. The file types which are not to be compressed can be seen in the man page section –skip-compress.

Unfortunately skipping the default file types doesn’t work and all transferred data is being compressed during the transfer. This is true for all versions since 3.1.0.

Steps to Reproduce:
1. run ‘rm -f Z ; rsync -azv alpha:z.gz Z’

Actual results, transferred data was compressed during the transfer:
sent 43 bytes received 63,873 bytes 25,566.40 bytes/sec total size is 628,952 speedup is 9.84

Expected results:
No compression should happen for .gz file.

Additional info:
Note that the source file ‘z.gz’ was an ascii text file to show clearly that compression took place.

Please the following BZ for more info:
https://bugzilla.samba.org/show_bug.cgi?id323
https://bugzilla.redhat.com/show_bug.cgi?id16528

I tried to locate the bug in the code but failed. That said I’m not a C
developer and therefore need some help from someone who understands C
better than me. Any help is much appreciated!

Thanks and kind regards, Simon

13 thoughts on - Need Help To Fix Bug In Rsync

Peter Kjellström says:

March 25, 2020 at 9:34 am

Tbh, using -z with rsync is almost always a bad idea (unless you’re on some pre-historic type of network link..).

/Peter
Leroy Tennison says:

March 25, 2020 at 9:39 am

Since you state that using -z is almost always a bad idea, could you provide the rationale for that? I must be missing something.
Pete Biggs says:

March 25, 2020 at 12:48 pm

I think the “rationale” is that at some point the compression/decompression takes longer than the time reduction from sending a compressed file. It depends on the relative speeds of the machines and the network.

You have most to gain from compressing large files, but if they are already compressed, then you have nothing to gain from just doing small files.

It obviously depends on your network speed and if you have a metered connection, but does anyone really have such an ancient network connection still these days – I mean if you have fast enough machines at both ends to do rapid compression/decompression, it seems unlikely that you will have a damp piece of string connecting them.

P.
Paul Heinlein says:

March 25, 2020 at 1:11 pm

I can’t speak to that, but the obvious workaround is to use ssh’s compression instead of rsync’s:

rsync -av -e ‘ssh -C’ remotehost:remote.file local.file
Johnny Hughes says:

March 25, 2020 at 1:15 pm

I really don’t understand the discussion here. What is wrong with using -z with rsync? We’re using rsync with -z for backups and just don’t want to waste bandwidth for nothing. We have better use for our bandwidth and it makes quite a difference when backing up terabytes of data.

The only reason why I asked for help is because we don’t want to double compress data which is already compressed. This is what currently is broken in rsync without manually specifying a skip-compress list. Fixing it would help all those who don’t know it’s broken now.

Thanks, Simon
Johnny Hughes says:

March 25, 2020 at 1:18 pm

From what I understand that’s not really the same as having compression inside rsync makes rsync transfers even better than only compressing the transport.

Thanks, Simon
Pete Biggs says:

March 25, 2020 at 1:33 pm

I don’t really care if you use -z, but you asked for the rationale, and I gave you it. I’m not telling you what you should do.

I’ll try and make it simpler – if rsync takes 1 second to compress the file, then 1 second to decompress the file, and the whole transfer of the file takes 11 seconds uncompressed vs 10 seconds compressed, then dealing with file takes overall 12 seconds compressed, vs 11 seconds uncompressed. It’s not worth it.

But as I said it depends on your network and your machine speeds. It’s up to you to decide what is best in your own situation.

P.
Leroy Tennison says:

March 25, 2020 at 1:36 pm

That’s why I asked, I wanted to know if there was something inherently bad with “-z”. I had a situation where PostgreSQL was replicating 16M files every few minutes (“log shipping”) on approximately 10 systems, got behind which resulted in almost continuous file transfer (of mostly null 16M files) and saturated the common link. Specifying compression with file transfer cut transmission time by 5-10x resolving the problem.
Leroy Tennison says:

March 25, 2020 at 1:40 pm

I appreciate the reply – it keeps me from wondering “is there something I should be concerned about?”. We use a co-location facility where we pay for bandwidth utilization so it’s still an issue.
Johnny Hughes says:

March 25, 2020 at 3:53 pm

Am 25.03.20 um 19:15 schrieb Simon Matter via CentOS:

Until this is fixed; as a workaround I would do a two-pass transfer with filters via “.rsync-filter” file and then using rsync -azvF for everything with high compression ratio and rsync -av for all, including compressed data. So, “.rsync-filter” includes the exclude statements for compressed formats. This all makes only sense if the compression ratio is higher then the meta data transfer of the second run …
Jon LaBadie says:

March 25, 2020 at 4:41 pm

That may invalidate your testing.

rsync may not depend upon the filename extension but instead check the magic numbers within the file to determine whether to compress or not.

Jon
Johnny Hughes says:

March 26, 2020 at 1:41 am

One could think so but it’s not the case, rsync just uses the file suffix to decide which files to compress. It even lower cases them, leaving good old UNIX .Z file behind… Also a bug IMHO but not so important.

Simon
Johnny Hughes says:

March 26, 2020 at 1:48 am

Hi,

You are right, bandwidth always costs, always.

Who pays for it can differ a lot, but there is always someone who pays.

It can be an individual, a company, the tax payer, whoever, but bandwidth is never free.

Thanks, Simon

Need Help To Fix Bug In Rsync

13 thoughts on - Need Help To Fix Bug In Rsync

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta