No solution yet, but -as a workaround- it seems that -at least- nfs problems are indeed solved with downgrading.
Cheers, Nick
I have been working fine with CentOS 7.3, since I downgraded to rpcbind-0.2.0-38.el7.x86_64.
Today, I decided to upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64); after that I have started having similar NFS issues again: NFS communication hungs. In
/var/log/messages:
—————————————————————————————–
… Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1
backchannel transport module. Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support. Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de). Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
… Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1… Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart… Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart. Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon. Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs ‘nfs’ registered for caching Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1. Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems. Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
… Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
… Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
…
—————————————————————————————–
I tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn’t help.
I mount either directly:
mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800
-t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2
and I have uploaded a lot of (hopefully useful) information, but there doesn’t seem to exist any activity on the issue nor have I had any feedback, although it’s been a week since the report.
I continue to have this issue.
Nick
This config is working fior me, with just using an older kernel.
Unfortunately, it doesn’t work for me. I tried booting with an older kernel (3.10.0-514.21.1.el7.x86_64) and/or downgrading rpcbind to rpcbind-0.2.0-38.el7.x86_64 (all three combinations: each one separately and both at the same time), but it didn’t work. I have not tried to downgrade nfs-utils as well…
Note that on the same VLAN, on the same cluster, I have another VM which I have not upgraded yet (thankfully) to 7.4 and this works normally
(using 7.3 and rpcbind-0.2.0-38.el7).
Problem solved – at least in my case – by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:
rw,no_root_squash,no_all_squash,secure,nolog
to:
rw,no_root_squash,no_all_squash,insecure,nolog
I don’t know if the behavior I had described can be explained by using the “secure” option, but after I changed to “insecure” everything works fine, using the latest packages – latest kernel and latest rpms.
If anyone can provide some insight on it, that would be appreciated
(since I know little about NFS).
Cheers, Nick
In the end, it occurred that the issue re-appeared after a couple of days.
So, it seems that this change did not actually solve the problem.
I am still trying to find a solution.
Nick
The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.
NFS packets were incorrectly being dropped by ASA and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.
I can’t tell why this was not happening for many months, but only lately.
12 thoughts on - NFS Mount On CentOS 7 Crashing
Le 02/06/2017 à 08:41, Nikolaos Milas a écrit :
I have same problem since last rpcbind package update
(rpcbind-0.2.0-38.el7_3)
Reverting to rpcbind-0.2.0-38.el7 solves the problem for me
—
Philippe BOURDEU d’AGUERRE
AIME – Campus de l’INSA http://aime-toulouse.fr/
135 av. de Rangueil Tél +33 561 559 885
31077 TOULOUSE Cedex 4 – FRANCE Fax +33 561 559 870
Thank you very much Philippe,
I notice that I have upgraded to rpcbind-0.2.0-38.el7_3.x86_64 on May 26.
Have you checked if this bug/behavior has been reported or should we file a bug report?
Nick
After a bit of search, I found the associated reports:
https://bugs.CentOS.org/view.php?id=13351
https://bugzilla.redhat.com/show_bug.cgi?id=1454876
No solution yet, but -as a workaround- it seems that -at least- nfs problems are indeed solved with downgrading.
Cheers, Nick
I have been working fine with CentOS 7.3, since I downgraded to rpcbind-0.2.0-38.el7.x86_64.
Today, I decided to upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64); after that I have started having similar NFS issues again: NFS communication hungs. In
/var/log/messages:
—————————————————————————————–
… Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1
backchannel transport module. Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support. Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996
okir@monad.swb.de). Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
… Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1… Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart… Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart. Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon. Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs ‘nfs’ registered for caching Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1. Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems. Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
… Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
… Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
…
—————————————————————————————–
I tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn’t help.
I mount either directly:
mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800
-t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2
or through /etc/fstab:
10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0
The box may even hung during reboot, which has never happened in the past.
It needs a hard reboot (via VM admin console) to boot again.
I have confirmed the above behavior multiple times.
Please advise me on how to resolve this situation. We are very much dependent on NFS mounts.
Is it a known bug? (As far as I could search, I didn’t came up with something.)
The earlier bug report appears resolved:
https://bugzilla.redhat.com/show_bug.cgi?id=1454876
Can I safely/easily revert to 7.3?
Thanks in advance, Nick
Correction: the /etc/fstab nfs mount line has one more zero:
10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
I am looking forward to your feedback.
Based on the facts and experience, it looks like a bug. After all, it occurred right after upgrade to 7.4, without any system configuration changes.
Please help!
Nick
I have created bug report: https://bugs.CentOS.org/view.php?id=13891 for this.
Isn’t there anyone else having NFS mount issues after upgrade to 7.4?
(I have found this report: https://access.redhat.com/solutions/3146191
which I think is not directly related.)
Other possible error report which could be related:
https://www.reddit.com/r/ansible/comments/6tu9c4/mounting_a_nfs_share_from_aix_to_rhel_74_remote/dlpdco6/?st=j7w56e1a&sh=065301d7
Please let me know if there can be a workaround or something.
Thanks, Nick
I have also created
https://bugzilla.redhat.com/show_bug.cgi?id=1494834
and I have uploaded a lot of (hopefully useful) information, but there doesn’t seem to exist any activity on the issue nor have I had any feedback, although it’s been a week since the report.
I continue to have this issue.
Nick
This config is working fior me, with just using an older kernel.
[root@mnemosyne ~]# uname -r
3.10.0-514.21.2.el7.x86_64
[root@mnemosyne ~]# rpm -qa | grep rpcbind rpcbind-0.2.0-42.el7.x86_64
[root@mnemosyne ~]# rpm -qa | grep nfs libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7.x86_64
Patrick
Thanks Patrick,
Unfortunately, it doesn’t work for me. I tried booting with an older kernel (3.10.0-514.21.1.el7.x86_64) and/or downgrading rpcbind to rpcbind-0.2.0-38.el7.x86_64 (all three combinations: each one separately and both at the same time), but it didn’t work. I have not tried to downgrade nfs-utils as well…
Note that on the same VLAN, on the same cluster, I have another VM which I have not upgraded yet (thankfully) to 7.4 and this works normally
(using 7.3 and rpcbind-0.2.0-38.el7).
My findings and logs are available at:
https://bugzilla.redhat.com/show_bug.cgi?id=1494834
Nick
Problem solved – at least in my case – by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:
rw,no_root_squash,no_all_squash,secure,nolog
to:
rw,no_root_squash,no_all_squash,insecure,nolog
I don’t know if the behavior I had described can be explained by using the “secure” option, but after I changed to “insecure” everything works fine, using the latest packages – latest kernel and latest rpms.
If anyone can provide some insight on it, that would be appreciated
(since I know little about NFS).
Cheers, Nick
In the end, it occurred that the issue re-appeared after a couple of days.
So, it seems that this change did not actually solve the problem.
I am still trying to find a solution.
Nick
The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.
NFS packets were incorrectly being dropped by ASA and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.
I can’t tell why this was not happening for many months, but only lately.
Case closed.
Cheers, Nick