NFS Mount On CentOS 7 Crashing

Home » CentOS » NFS Mount On CentOS 7 Crashing

June 2, 2017 Nikolaos Milas CentOS 12 Comments

Hello,

We have a VM (under KVM – a VPS service by our ISP) running CentOS 7.

On it we have 2 NFS mounts, one for backup and one as a live file system
(where there are two user homes as well):

———————————————————————————————————————

12 thoughts on - NFS Mount On CentOS 7 Crashing

Philippe BOURDEU says:

June 2, 2017 at 2:40 am

Le 02/06/2017 à 08:41, Nikolaos Milas a écrit :

I have same problem since last rpcbind package update
(rpcbind-0.2.0-38.el7_3)

Reverting to rpcbind-0.2.0-38.el7 solves the problem for me

—
Philippe BOURDEU d’AGUERRE
AIME – Campus de l’INSA http://aime-toulouse.fr/
135 av. de Rangueil Tél +33 561 559 885
31077 TOULOUSE Cedex 4 – FRANCE Fax +33 561 559 870
Nikolaos Milas says:

June 2, 2017 at 2:58 am

Thank you very much Philippe,

I notice that I have upgraded to rpcbind-0.2.0-38.el7_3.x86_64 on May 26.

Have you checked if this bug/behavior has been reported or should we file a bug report?

Nick
Nikolaos Milas says:

June 2, 2017 at 5:47 am

After a bit of search, I found the associated reports:

https://bugs.CentOS.org/view.php?id=13351
https://bugzilla.redhat.com/show_bug.cgi?id=1454876

No solution yet, but -as a workaround- it seems that -at least- nfs problems are indeed solved with downgrading.

Cheers, Nick
Nikolaos Milas says:

September 22, 2017 at 6:58 am

I have been working fine with CentOS 7.3, since I downgraded to rpcbind-0.2.0-38.el7.x86_64.

Today, I decided to upgrade to 7.4 (which, among several hundred updates, includes rpcbind-0.2.0-42.el7.x86_64); after that I have started having similar NFS issues again: NFS communication hungs. In
/var/log/messages:

—————————————————————————————–
… Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module. Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1
backchannel transport module. Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support. Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996
okir@monad.swb.de). Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
… Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1… Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart… Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart. Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon. Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs ‘nfs’ registered for caching Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1. Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems. Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
… Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
… Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not responding, still trying
…
—————————————————————————————–

I tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it didn’t help.

I mount either directly:

mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800
-t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2

or through /etc/fstab:

10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0

The box may even hung during reboot, which has never happened in the past.

It needs a hard reboot (via VM admin console) to boot again.

I have confirmed the above behavior multiple times.

Please advise me on how to resolve this situation. We are very much dependent on NFS mounts.

Is it a known bug? (As far as I could search, I didn’t came up with something.)

The earlier bug report appears resolved:
https://bugzilla.redhat.com/show_bug.cgi?id=1454876

Can I safely/easily revert to 7.3?

Thanks in advance, Nick
Nikolaos Milas says:

September 22, 2017 at 7:46 am

Correction: the /etc/fstab nfs mount line has one more zero:

10.201.40.34:/data/col1/hesperia-mount /hesperiamount2 nfs auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0

I am looking forward to your feedback.

Based on the facts and experience, it looks like a bug. After all, it occurred right after upgrade to 7.4, without any system configuration changes.

Please help!
Nick
Nikolaos Milas says:

September 22, 2017 at 12:15 pm

I have created bug report: https://bugs.CentOS.org/view.php?id=13891 for this.

Isn’t there anyone else having NFS mount issues after upgrade to 7.4?

(I have found this report: https://access.redhat.com/solutions/3146191
which I think is not directly related.)

Other possible error report which could be related:
https://www.reddit.com/r/ansible/comments/6tu9c4/mounting_a_nfs_share_from_aix_to_rhel_74_remote/dlpdco6/?st=j7w56e1a&sh=065301d7

Please let me know if there can be a workaround or something.

Thanks, Nick
Nikolaos Milas says:

September 30, 2017 at 4:09 pm

I have also created

https://bugzilla.redhat.com/show_bug.cgi?id=1494834

and I have uploaded a lot of (hopefully useful) information, but there doesn’t seem to exist any activity on the issue nor have I had any feedback, although it’s been a week since the report.

I continue to have this issue.

Nick
Patrick Bégou says:

October 2, 2017 at 3:19 am

This config is working fior me, with just using an older kernel.

[root@mnemosyne ~]# uname -r
3.10.0-514.21.2.el7.x86_64
[root@mnemosyne ~]# rpm -qa | grep rpcbind rpcbind-0.2.0-42.el7.x86_64
[root@mnemosyne ~]# rpm -qa | grep nfs libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7.x86_64

Patrick
Nikolaos Milas says:

October 2, 2017 at 3:46 pm

Thanks Patrick,

Unfortunately, it doesn’t work for me. I tried booting with an older kernel (3.10.0-514.21.1.el7.x86_64) and/or downgrading rpcbind to rpcbind-0.2.0-38.el7.x86_64 (all three combinations: each one separately and both at the same time), but it didn’t work. I have not tried to downgrade nfs-utils as well…

Note that on the same VLAN, on the same cluster, I have another VM which I have not upgraded yet (thankfully) to 7.4 and this works normally
(using 7.3 and rpcbind-0.2.0-38.el7).

My findings and logs are available at:
https://bugzilla.redhat.com/show_bug.cgi?id=1494834

Nick
Nikolaos Milas says:

October 4, 2017 at 7:10 am

Problem solved – at least in my case – by changing the NFS Export Options (of the NFS shared directory, at the data storage system) from secure to insecure. That is, I changed from:

rw,no_root_squash,no_all_squash,secure,nolog

to:

rw,no_root_squash,no_all_squash,insecure,nolog

I don’t know if the behavior I had described can be explained by using the “secure” option, but after I changed to “insecure” everything works fine, using the latest packages – latest kernel and latest rpms.

If anyone can provide some insight on it, that would be appreciated
(since I know little about NFS).

Cheers, Nick
Nikolaos Milas says:

October 7, 2017 at 11:35 am

In the end, it occurred that the issue re-appeared after a couple of days.

So, it seems that this change did not actually solve the problem.

I am still trying to find a solution.

Nick
Nikolaos Milas says:

October 20, 2017 at 3:30 am

The problem was finally traced down to a Cisco ASA bug (this firewall device lies between the connected networks); bug CSCuq80704 was resolved by an ASA software update.

NFS packets were incorrectly being dropped by ASA and were causing nfs traffic to stall. After ASA software upgrade the problem has not occurred again.

I can’t tell why this was not happening for many months, but only lately.

Case closed.

Cheers, Nick

NFS Mount On CentOS 7 Crashing

12 thoughts on - NFS Mount On CentOS 7 Crashing

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta