How Does Autofs Deal With Stuck NFS Mounts And Suspending To RAM?

Home » CentOS » How Does Autofs Deal With Stuck NFS Mounts And Suspending To RAM?
CentOS 7 Comments

Hi,

after trying SSHFS to mount a remote file system on a server with the result that SSHFS will sooner or later get stuck and require a reboot of the client, I’m fed up with it and am looking for alternatives.

So next I would like to use NFS over a VPN connection instead. To minimize the instances of the NFS mount getting stuck, it might be helpful to use autofs.

What happens when the mount is stuck because the connection is down and autofs figures the idle timeout has expired and tries to unmount the remote file system?

What happens when I put the client to sleep by suspending to RAM? Will autofs automatically unmount first, or will the server have to deal with a client that has apparently gone away and might re-appear later in unexpected ways?

Is there a way to tell NFS to retry an operation _now_ after the connection went down and came back, rather than having to wait for a possibly rather long time?

Is there a better alternative for mounting remote file systems over unreliable connections?

7 thoughts on - How Does Autofs Deal With Stuck NFS Mounts And Suspending To RAM?

  • I don’t have a good answer for you, because if you’d asked me without all this backstory whether NFS or SSHFS is more tolerant of bad connections, I’d have told you SSHFS.

    NFS comes out of the “Unix lab” world, where all of the computers are hard-wired to nearby servers. It gets really annoyed when packet loss starts happening, and since it’s down in the kernel, that can mean the whole box locks up until NFS gets happy again.

    NFS is that way on purpose: it’s often used to provide critical file service (e.g. root-on-NFS) so if file I/O stops happening it *must* block and wait out the failure, else all I/O dependent on NFS starts failing.

    Some of this affects SSHFS as well. To some extent, the solution to the broader problem is “Dropbox” et al. That is, a solution that was designed around the idea that connectivity might not be constant.

    This is also while DVCSes like Git have become popular.

  • That’s what I thought. Should I make a bug report? SSHFS is clearly intended to reconnect automatically when mounted like that, and it doesn’t do that.

    It’s intended to do that, which is fine. SSHFS is intended to do that as well. Both are supposed to reconnect when the connection is back. So far, sshfs has failed to do that to the extend that it is unusable. So far, NFS
    with autofs hasn’t caused issues, yet the testing continues. It’s also a lot faster despite I used compression with SSHFS.

    Well, I need the file system accessible like a file system, not involving storing files somewhere else and downloading them somewhere else or somehow syncing some files manually between servers and clients once in a while. How am I supposed to work remotely when I don’t have access to the files involved.

    Are you sure that’s the reason?

  • Not so clearly. Look at the SSHFS reconnect option, and also ssh/ssfs ServerAliveInterval/ServerAliveCountMax.

  • On the other hand, NFS is a fully-featured filesystem that supports fancy features like locking and a full ACL system. SSHFS is a FUSE
    filesystem that will break a lot of software if you try to use it for anything more complex than ‘ls’ and ‘cp’.

    For what it’s worth, Samba with SMBv3 and the POSIX extension[1] is a lot more tolerant of bad connections, and presents itself as a real filesystem under linux.

    1. https://wiki.samba.org/index.php/SMB3-Linux


    Jonathan Billings

  • Nothing good, and bad things happen before this.

    This is the mechanism that I use to try to mitigate this on our systems:

    This triggers on suspend type events:

    # cat /etc/systemd/system/suspend.target.wants/offnet.service
    [Unit]
    Description=Unmount all NFS mounts before disconnecting from network Before=systemd-hibernate.service Before=systemd-shutdown.service Before=systemd-suspend.service

    [Service]
    ExecStart=/usr/local/sbin/offnet Type=oneshot

    [Install]
    WantedBy=hibernate.target WantedBy=shutdown.target WantedBy=suspend.target

  • Isn’t the reconnect option intended to re-establish the connection after it was interrupted while the keep-alive options are supposed to be able to detect when the conncetion is interrupted? I used them, and still SSHFS will just freeze and require a reboot, which makes it unusable.

    It’s easy to verify if the connection is back or not by logging in manually with ssh. So how isn’t this a bug with SSHFS in that it should resume rather than freeze up?

  • Ok, I won’t use it anymore then.

    I could use it as well. How does it deal with interrupted connections? I
    don’t want to loose data or otherwise break things when the connection is interrupted. I know that NFS is supposed to resume when the connection is back, but what does samba/cifs do?