Rsync Over SSH Stalls After Completing The Job

Home » CentOS » Rsync Over SSH Stalls After Completing The Job
CentOS 12 Comments

Here’s a weird one.

I have two CentOS 8 machines that use rsync-over-ssh to back up files between each other. (Each machine acts as a backup machine for the other one.)

There’s are nightly cronjobs that do the backing up, the commands look like this:

rsync -av –delete /home/mydirectory jeff:/home/mydirectorybackup

That command works fine when it’s run through the cronjob.

When I try to run a rsync command between mutt and jeff from the commandline, that’s where the problem starts. It worked a few days ago but now when I log into jeff and do a rsync to or from mutt it works fine. When I log into mutt and do a rsync to or from jeff it works and does the job, but then it seems to stall afterward and I have to hit ctrl-c to get my cursor back.

Here’s a test run so you can see what happens.

[me@mutt temp]$ rsync -avv ../temp/ jeff:temp opening connection using: SSH jeff rsync –server -vvlogDtpre.iLsfxC . temp (7 args)
sending incremental file list delta-transmission enabled bookmarks.html is uptodate
./
abc total: matches=0 hash_hits=0 false_alarms=0 data&321
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [sender=3.1.3]

A file named bookmarks.html existed in both directories so it wasn’t changed, and a new file abc was copied to the backup directory. Then my ctrl-c stopped the job and brought the cursor back after it stalled.

scp works fine to copy files either way when logged into either machine and, again, my backup-this-to-that cronjobs that run rsync seem to be working fine as well. I just discovered this last night when I went to rsync some files manually between these machines. The last time I did that was at least a few days ago and it worked fine then.

12 thoughts on - Rsync Over SSH Stalls After Completing The Job

  • Does it behave any differently when adding a & at the end of the command when running it manually, or running in a screen session?

    Chris

  • Nope. I get the same stall both ways. Running it in a screen session looks exactly like what I posted earlier, and the & at the end of the command looks like this:

    [me@mutt temp]$ rsync -avv ../temp/ jeff:temp &
    [1] 15694
    opening connection using: SSH jeff rsync –server -vvlogDtpre.iLsfxC . temp (7 args)
    [me@mutt temp]$ sending incremental file list delta-transmission enabled bookmarks.html is uptodate abc total: matches8 hash_hits8 false_alarms=0 data=0
    ^C

    The difference here is that I don’t see the “rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(644) [sender=3.1.3]” after the ctrl-c when I use the &. That line shows up only without the &.

  • Using that command, it says “delta-transmission disabled for local transfer or –whole-file
    “, but it still stalls at the end so there’s no change.

    Since it works fine transferring files over nfs, it has to be something about the interaction with ssh. (The fact that it also works fine with SSH as a cronjob adds to the weirdness, of course.)

    Reading up a bit on what can cause SSH to appear to stall, I just un-commented “useDNS no” in sshd_config on both machines and restarted sshd. I think that’s the default setting anyway, but I made the change and that didn’t do anything for me either.

    I notice, though, that if I SSH from jeff to mutt and type exit, it logs me out as soon as I hit enter. If I SSH from mutt to jeff and type exit, it logs me out after a delay of about one second. I never really noticed if that delay was there before or not. I wonder if it’s somehow related.

  • Is there any chance that your shell is configured to emit anything to stderr or stdout when you logout of jeff? It’s fairly rare, but I’ve seen logout messages mess up rsync before.

  • I don’t think so. The only change from the default .bashrc on both machines is the addition of “unset command_not_found_handle”, and in .bash_profile on mutt I have xmodmap -e “keycode 135 = 0x0000”, which hasn’t changed since 2016 according to the timestamp on the file. Other than that, the .bash* files are just the defaults. sshd_config has nothing other than the default settings for logout messages, and in any event none of these things have changed since last week and it did work then.

    I wonder if it’s something to do with the last CentOS 8 update. There was a fair amount of stuff updated including the kernel. I just rebooted both machines with the previous kernel and nothing changed, so that doesn’t appear to be it either.

  • You could try running strace on the hanging process so see what it’s doing.

    Regards, Simon

  • [frankcox@mutt temp]$ rsync -avv ../temp/ jeff:temp opening connection using: SSH jeff rsync –server -vvlogDtpre.iLsfxC . temp (7 args)
    sending incremental file list delta-transmission enabled abc is uptodate total: matches=0 hash_hits=0 false_alarms=0 data=0

    Leaving that sit there apparently doing nothing (but still not giving me my cursor back) I switched to another terminal window and did the following:

    [frankcox@mutt ~]$ ps -FA | grep rsync frankcox 5400 2435 0 60586 3160 5 14:52 pts/0 00:00:00 rsync -avv ../temp/ jeff:temp frankcox 5401 5400 0 67980 7440 1 14:52 pts/0 00:00:00 SSH jeff rsync –server -vvlogDtpre.iLsfxC . temp frankcox 5526 5416 0 55476 1076 3 14:53 pts/1 00:00:00 grep –color=auto rsync

    [frankcox@mutt ~]$ strace -p 5401
    strace: Process 5401 attached select(11, [5 9 10], [], NULL, NULL

    Then it just sits there with no further action. I get my cursor back when I hit ctrl-c.

    [frankcox@mutt ~]$ strace -p 5400
    strace: Process 5400 attached restart_syscall(<... resuming interrupted nanosleep ...>) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
    nanosleep({tv_sec=0, tv_nsec 000000}, NULL) = 0
    wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0

    The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.

    Unfortunately, I have no idea what any of the above actually means. Does it tell us anything interesting?

  • jeff rsync –server -vvlogDtpre.iLsfxC . temp

    Yay!  I am glad someone else on the planet is experiencing this. 
    I noticed this started happening to me after updating some CentOS Linux 8
    systems today.

    I discovered if I set ForwardX11=no (either on SSH command line or in ~/.ssh/config) the hang does not happen.  But why does that matter?  No updates to openssh.

    It is not the systemd update doing something silly with session management.  I painfully downgraded manually and rebooted to no effect. 
    As an aside, why can’t we we have nice things in life like ‘dnf downgrade systemd\*’ actually work?  I did the below – might be dumb, but it worked — alternate suggestions to downgrade are appreciated – searching the list and my google-fu was off the mark today.

      cd [path-to-repo]/CentOS/8/BaseOS/x86_64/os/Packages
      dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e ‘s/3\.2/3.1/’ -e ‘s/^/.\//’ -e ‘s/$/.rpm/’)

    Chris

  • Far out! That’s the solution!

    Well, not really a solution but it’s certainly a work-around that makes rsync over SSH work without hanging.

    I discovered that you have to put the “ForwardX11 no” into the default part of .ssh/config.

    It doesn’t work if you just specify the host involved.

    This doesn’t work:
    Host *
    ForwardX11 yes host jeff ForwardX11 no

    This does work:
    Host *
    ForwardX11 no

    Someone should probably file a bug report about this, but I have no idea what package it pertains to. As you said, openssh wasn’t updated at the point that this broke.

    If it’s of any value, here’s a list of what was updated last on this computer. The ones that look most suspicious to me would be kernel, crypto-policies and/or systemd.

    Packages Altered:
    Install kernel-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Install kernel-core-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Install kernel-debug-devel-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Install kernel-devel-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Install kernel-modules-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgrade dbus-x11-1:1.12.8-12.el8_3.x86_64 @appstream
    Upgraded dbus-x11-1:1.12.8-11.el8.x86_64 @@System
    Upgrade flatpak-1.6.2-6.el8_3.x86_64 @appstream
    Upgraded flatpak-1.6.2-5.el8_3.x86_64 @@System
    Upgrade flatpak-libs-1.6.2-6.el8_3.x86_64 @appstream
    Upgraded flatpak-libs-1.6.2-5.el8_3.x86_64 @@System
    Upgrade flatpak-selinux-1.6.2-6.el8_3.noarch @appstream
    Upgraded flatpak-selinux-1.6.2-5.el8_3.noarch @@System
    Upgrade flatpak-session-helper-1.6.2-6.el8_3.x86_64 @appstream
    Upgraded flatpak-session-helper-1.6.2-5.el8_3.x86_64 @@System
    Upgrade java-1.8.0-openjdk-1:1.8.0.282.b08-2.el8_3.x86_64 @appstream
    Upgraded java-1.8.0-openjdk-1:1.8.0.275.b01-1.el8_3.x86_64 @@System
    Upgrade java-1.8.0-openjdk-devel-1:1.8.0.282.b08-2.el8_3.x86_64 @appstream
    Upgraded java-1.8.0-openjdk-devel-1:1.8.0.275.b01-1.el8_3.x86_64 @@System
    Upgrade java-1.8.0-openjdk-headless-1:1.8.0.282.b08-2.el8_3.x86_64 @appstream
    Upgraded java-1.8.0-openjdk-headless-1:1.8.0.275.b01-1.el8_3.x86_64 @@System
    Upgrade bpftool-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgraded bpftool-4.18.0-240.15.1.el8_3.x86_64 @@System
    Upgrade crypto-policies-20210209-1.gitbfb6bed.el8_3.noarch @baseos
    Upgraded crypto-policies-20200713-1.git51d1222.el8.noarch @@System
    Upgrade crypto-policies-scripts-20210209-1.gitbfb6bed.el8_3.noarch @baseos
    Upgraded crypto-policies-scripts-20200713-1.git51d1222.el8.noarch @@System
    Upgrade dbus-1:1.12.8-12.el8_3.x86_64 @baseos
    Upgraded dbus-1:1.12.8-11.el8.x86_64 @@System
    Upgrade dbus-common-1:1.12.8-12.el8_3.noarch @baseos
    Upgraded dbus-common-1:1.12.8-11.el8.noarch @@System
    Upgrade dbus-daemon-1:1.12.8-12.el8_3.x86_64 @baseos
    Upgraded dbus-daemon-1:1.12.8-11.el8.x86_64 @@System
    Upgrade dbus-libs-1:1.12.8-12.el8_3.x86_64 @baseos
    Upgraded dbus-libs-1:1.12.8-11.el8.x86_64 @@System
    Upgrade dbus-tools-1:1.12.8-12.el8_3.x86_64 @baseos
    Upgraded dbus-tools-1:1.12.8-11.el8.x86_64 @@System
    Upgrade file-5.33-16.el8_3.1.x86_64 @baseos
    Upgraded file-5.33-16.el8.x86_64 @@System
    Upgrade file-libs-5.33-16.el8_3.1.x86_64 @baseos
    Upgraded file-libs-5.33-16.el8.x86_64 @@System
    Upgrade kernel-headers-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgraded kernel-headers-4.18.0-240.15.1.el8_3.x86_64 @@System
    Upgrade kernel-tools-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgraded kernel-tools-4.18.0-240.15.1.el8_3.x86_64 @@System
    Upgrade kernel-tools-libs-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgraded kernel-tools-libs-4.18.0-240.15.1.el8_3.x86_64 @@System
    Upgrade python3-perf-4.18.0-240.22.1.el8_3.x86_64 @baseos
    Upgraded python3-perf-4.18.0-240.15.1.el8_3.x86_64 @@System
    Upgrade systemd-239-41.el8_3.2.x86_64 @baseos
    Upgraded systemd-239-41.el8_3.1.x86_64 @@System
    Upgrade systemd-libs-239-41.el8_3.2.x86_64 @baseos
    Upgraded systemd-libs-239-41.el8_3.1.x86_64 @@System
    Upgrade systemd-pam-239-41.el8_3.2.x86_64 @baseos
    Upgraded systemd-pam-239-41.el8_3.1.x86_64 @@System
    Upgrade systemd-udev-239-41.el8_3.2.x86_64 @baseos
    Upgraded systemd-udev-239-41.el8_3.1.x86_64 @@System
    Upgrade zlib-1.2.11-16.2.el8_3.x86_64 @baseos
    Upgraded zlib-1.2.11-16.el8_2.x86_64 @@System
    Upgrade zlib-devel-1.2.11-16.2.el8_3.x86_64 @baseos
    Upgraded zlib-devel-1.2.11-16.el8_2.x86_64 @@System
    Removed kernel-4.18.0-240.1.1.el8_3.x86_64 @@System
    Removed kernel-core-4.18.0-240.1.1.el8_3.x86_64 @@System
    Removed kernel-debug-devel-4.18.0-240.1.1.el8_3.x86_64 @@System
    Removed kernel-devel-4.18.0-240.1.1.el8_3.x86_64 @@System
    Removed kernel-modules-4.18.0-240.1.1.el8_3.x86_64 @@System

  • I think that’s right. My SSH config has what amounts to four sections:

    1. Directives that should not be overridden, ever
    2. Host-specific directives
    3. Network-specific directives
    4. Fall-through defaults

    For example:

    # ===== %< ===== # don't override StrictHostKeyChecking ask # host settings Host dev.my.net prod.my.net ForwardAgent yes ForwardX11 yes ForwardX11Trusted yes # network settings Host *.my.net Compression yes IdentityFile ~/.ssh/id_ed25519 # defaults Host * Compression no ForwardAgent no ForwardX11 no ForwardX11Trusted no Protocol 2 # ===== %< =====