NFS Mount On CentOS 7 Crashing

Home » CentOS » NFS Mount On CentOS 7 Crashing
CentOS 12 Comments

Hello,

We have a VM (under KVM – a VPS service by our ISP) running CentOS 7.

On it we have 2 NFS mounts, one for backup and one as a live file system
(where there are two user homes as well):

———————————————————————————————————————

12 thoughts on - NFS Mount On CentOS 7 Crashing

  • Le 02/06/2017 à 08:41, Nikolaos Milas a écrit :

    I have same problem since last rpcbind package update
    (rpcbind-0.2.0-38.el7_3)

    Reverting to rpcbind-0.2.0-38.el7 solves the problem for me


    Philippe BOURDEU d’AGUERRE
    AIME – Campus de l’INSA http://aime-toulouse.fr/
    135 av. de Rangueil Tél +33 561 559 885
    31077 TOULOUSE Cedex 4 – FRANCE Fax +33 561 559 870

  • Thank you very much Philippe,

    I notice that I have upgraded to rpcbind-0.2.0-38.el7_3.x86_64 on May 26.

    Have you checked if this bug/behavior has been reported or should we file a bug report?

    Nick

  • I have been working fine with CentOS 7.3, since I downgraded to rpcbind-0.2.0-38.el7.x86_64.

    Today, I decided to upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64); after that I have started having similar NFS issues again: NFS communication hungs. In
    /var/log/messages:

    —————————————————————————————–
    … Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1
    backchannel transport module. Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
    Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support. Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996
    okir@monad.swb.de). Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
    … Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1… Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart… Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart. Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon. Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs ‘nfs’ registered for caching Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1. Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems. Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
    … Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
    … Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying

    —————————————————————————————–

    I tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn’t help.

    I mount either directly:

      mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800
    -t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2

    or through /etc/fstab:

      10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0

    The box may even hung during reboot, which has never happened in the past.

    It needs a hard reboot (via VM admin console) to boot again.

    I have confirmed the above behavior multiple times.

    Please advise me on how to resolve this situation. We are very much dependent on NFS mounts.

    Is it a known bug? (As far as I could search, I didn’t came up with something.)

    The earlier bug report appears resolved:
    https://bugzilla.redhat.com/show_bug.cgi?id=1454876

    Can I safely/easily revert to 7.3?

    Thanks in advance, Nick

  • Correction: the /etc/fstab nfs mount line has one more zero:

      10.201.40.34:/data/col1/hesperia-mount /hesperiamount2   nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0

    I am looking forward to your feedback.

    Based on the facts and experience, it looks like a bug. After all, it occurred right after upgrade to 7.4, without any system configuration changes.

    Please help!
    Nick

  • I have created bug report: https://bugs.CentOS.org/view.php?id=13891 for this.

    Isn’t there anyone else having NFS mount issues after upgrade to 7.4?

    (I have found this report: https://access.redhat.com/solutions/3146191
    which I think is not directly related.)

    Other possible error report which could be related:
    https://www.reddit.com/r/ansible/comments/6tu9c4/mounting_a_nfs_share_from_aix_to_rhel_74_remote/dlpdco6/?st=j7w56e1a&sh=065301d7

    Please let me know if there can be a workaround or something.

    Thanks, Nick

  • This config is working fior me, with just using an older kernel.

    [root@mnemosyne ~]# uname -r
    3.10.0-514.21.2.el7.x86_64
    [root@mnemosyne ~]# rpm -qa | grep rpcbind rpcbind-0.2.0-42.el7.x86_64
    [root@mnemosyne ~]# rpm -qa | grep nfs libnfsidmap-0.25-17.el7.x86_64
    nfs-utils-1.3.0-0.48.el7.x86_64

    Patrick

  • Thanks Patrick,

    Unfortunately, it doesn’t work for me. I tried booting with an older kernel (3.10.0-514.21.1.el7.x86_64) and/or downgrading rpcbind to rpcbind-0.2.0-38.el7.x86_64 (all three combinations: each one separately and both at the same time), but it didn’t work. I have not tried to downgrade nfs-utils as well…

    Note that on the same VLAN, on the same cluster, I have another VM which I have not upgraded yet (thankfully) to 7.4 and this works normally
    (using 7.3 and rpcbind-0.2.0-38.el7).

    My findings and logs are available at:
    https://bugzilla.redhat.com/show_bug.cgi?id=1494834

    Nick

  • Problem solved – at least in my case – by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:

    rw,no_root_squash,no_all_squash,secure,nolog

    to:

    rw,no_root_squash,no_all_squash,insecure,nolog

    I don’t know if the behavior I had described can be explained by using the “secure” option, but after I changed to “insecure” everything works fine, using the latest packages – latest kernel and latest rpms.

    If anyone can provide some insight on it, that would be appreciated
    (since I know little about NFS).

    Cheers, Nick

  • In the end, it occurred that the issue re-appeared after a couple of days.

    So, it seems that this change did not actually solve the problem.

    I am still trying to find a solution.

    Nick

  • The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.

    NFS packets were incorrectly being dropped by ASA and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.

    I can’t tell why this was not happening for many months, but only lately.

    Case closed.

    Cheers, Nick