Nfs Causes CentOS 7.7 System To Hang

Home » CentOS » Nfs Causes CentOS 7.7 System To Hang
CentOS 2 Comments

Hello,

MERRY CHRISTMAS to all in list!

After I upgraded to latest: CentOS Linux release 7.7.1908 (Core) I am facing nfs crashes which cause the system to hang frequently.

This is caused by cp to nfs-mounted shares.

Below is dmesg output; you will see call traces. These cause system to gradually overload:

[root@hesperia1 ~]# top top – 10:09:40 up 10:16,  1 user,  load average: 53.66, 54.13, 52.98
Tasks: 475 total,   2 running, 436 sleeping,   0 stopped,  37 zombie
%Cpu(s):  0.1 us,  0.6 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi, 0.0 si, 
0.1 st KiB Mem :  3879928 total,   813504 free,  1733216 used,  1333208 buff/cache KiB Swap:  4063228 total,  4062708 free,      520 used.  1797264 avail Mem

and finally hangs showing messages (which I have not recorded precisely)
in the CLI login screen like “System out of memory”. Then I have to reboot.

I tried to downgrade nfs-utils and rpcbind to earlier versions (in case there is a bug in latest ones), but I couldn’t:

[root@hesperia1 ~]# yum downgrade rpcbind-0.2.0-47 nfs-utils-1.3.0-61
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: ftp.ntua.gr
 * epel: mirrors.daticum.com
 * extras: ftp.ntua.gr
 * updates: ftp.ntua.gr
No package rpcbind-0.2.0-47 available.
No package nfs-utils-1.3.0-61 available.
Error: Nothing to do

Do you know if this is a known bug? (I couldn’t find something in my searches.)

Can you suggest a solution / workaround? I am facing this problem constantly and the system has become unstable.

[root@hesperia1 ~]# dmesg

[   26.525584] RPC: Registered named UNIX socket transport module.
[   26.525588] RPC: Registered udp transport module.
[   26.525590] RPC: Registered tcp transport module.
[   26.525592] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   27.103268] type=1305 audit(1577317991.465:3): audit_pid=836 old=0
auid=4294967295 ses=4294967295 res=1
[   29.384755] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   29.629701] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   30.773604] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   31.977476] FS-Cache: Loaded
[   32.170682] FS-Cache: Netfs ‘nfs’ registered for caching
[   53.671997] random: crng init done
[26399.967630] INFO: task cp:14560 blocked for more than 120 seconds.
[26399.967984] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[26399.968202] cp              D ffff89373fc9ac80     0 14560 14053
0x00000080
[26399.968211] Call Trace:
[26399.968272]  [] ? nfs_permission+0x105/0x220 [nfs]
[26399.968321]  [] ? security_inode_permission+0x22/0x30
[26399.968345]  [] schedule_preempt_disabled+0x29/0x70
[26399.968375]  [] __mutex_lock_slowpath+0xc7/0x1d0
[26399.968395]  [] ? unlazy_walk+0x133/0x140
[26399.968404]  [] mutex_lock+0x1f/0x2f
[26399.968410]  [] lookup_slow+0x33/0xa7
[26399.968421]  [] path_lookupat+0x838/0x8b0
[26399.968445]  [] ? kmem_cache_alloc+0x35/0x1f0
[26399.968475]  [] ? getname_flags+0x4f/0x1a0
[26399.968481]  [] filename_lookup+0x2b/0xc0
[26399.968487]  [] user_path_at_empty+0x67/0xc0
[26399.968492]  [] ? cp_new_stat+0x165/0x1a0
[26399.968497]  [] user_path_at+0x11/0x20
[26399.968501]  [] vfs_fstatat+0x63/0xc0
[26399.968505]  [] SYSC_newstat+0x2e/0x60
[26399.968530]  [] ? __audit_syscall_exit+0x1e6/0x280
[26399.968534]  [] SyS_newstat+0xe/0x10
[26399.968544]  [] system_call_fastpath+0x25/0x2a
[27479.967352] INFO: task cp:4656 blocked for more than 120 seconds.
[27479.967414] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.967711] cp              D ffff8936b6548000     0  4656 2914
0x00000080
[27479.967726] Call Trace:
[27479.967749]  [] ? nfs_permission+0x105/0x220 [nfs]
[27479.967762]  [] ? security_inode_permission+0x22/0x30
[27479.967769]  [] schedule_preempt_disabled+0x29/0x70
[27479.967774]  [] __mutex_lock_slowpath+0xc7/0x1d0
[27479.967786]  [] ? unlazy_walk+0x133/0x140
[27479.967792]  [] mutex_lock+0x1f/0x2f
[27479.967821]  [] lookup_slow+0x33/0xa7
[27479.967828]  [] path_lookupat+0x838/0x8b0
[27479.967836]  [] ? kmem_cache_alloc+0x35/0x1f0
[27479.967841]  [] ? getname_flags+0x4f/0x1a0
[27479.967846]  [] filename_lookup+0x2b/0xc0
[27479.967852]  [] user_path_at_empty+0x67/0xc0
[27479.967857]  [] ? cp_new_stat+0x165/0x1a0
[27479.967862]  [] user_path_at+0x11/0x20
[27479.967866]  [] vfs_fstatat+0x63/0xc0
[27479.967870]  [] SYSC_newstat+0x2e/0x60
[27479.967880]  [] ? __audit_syscall_exit+0x1e6/0x280
[27479.967884]  [] SyS_newstat+0xe/0x10
[27479.967892]  [] system_call_fastpath+0x25/0x2a
[27479.967910] INFO: task cp:7754 blocked for more than 120 seconds.
[27479.968147] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.968437] cp              D ffff89373fc1ac80     0  7754 7285
0x00000080
[27479.968443] Call Trace:
[27479.968455]  [] ? nfs_permission+0x105/0x220 [nfs]
[27479.968461]  [] ? security_inode_permission+0x22/0x30
[27479.968466]  [] schedule_preempt_disabled+0x29/0x70
[27479.968470]  [] __mutex_lock_slowpath+0xc7/0x1d0
[27479.968475]  [] ? unlazy_walk+0x133/0x140
[27479.968481]  [] mutex_lock+0x1f/0x2f
[27479.968486]  [] lookup_slow+0x33/0xa7
[27479.968491]  [] path_lookupat+0x838/0x8b0
[27479.968500]  [] ? kmem_cache_alloc+0x35/0x1f0
[27479.968505]  [] ? getname_flags+0x4f/0x1a0
[27479.968510]  [] filename_lookup+0x2b/0xc0
[27479.968515]  [] user_path_at_empty+0x67/0xc0
[27479.968520]  [] ? cp_new_stat+0x165/0x1a0
[27479.968525]  [] user_path_at+0x11/0x20
[27479.968528]  [] vfs_fstatat+0x63/0xc0
[27479.968533]  [] SYSC_newstat+0x2e/0x60
[27479.968539]  [] ? __audit_syscall_exit+0x1e6/0x280
[27479.968543]  [] SyS_newstat+0xe/0x10
[27479.968547]  [] system_call_fastpath+0x25/0x2a
[27479.968557] INFO: task cp:20086 blocked for more than 120 seconds.
[27479.968798] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.969064] cp              D ffff89373fc9ac80     0 20086 19644
0x00000080
[27479.969069] Call Trace:
[27479.969080]  [] ? nfs_permission+0x105/0x220 [nfs]
[27479.969086]  [] ? security_inode_permission+0x22/0x30
[27479.969090]  [] schedule_preempt_disabled+0x29/0x70
[27479.969094]  [] __mutex_lock_slowpath+0xc7/0x1d0
[27479.969099]  [] ? unlazy_walk+0x133/0x140
[27479.969105]  [] mutex_lock+0x1f/0x2f
[27479.969110]  [] lookup_slow+0x33/0xa7
[27479.969115]  [] path_lookupat+0x838/0x8b0
[27479.969119]  [] ? kmem_cache_alloc+0x35/0x1f0
[27479.969124]  [] ? getname_flags+0x4f/0x1a0
[27479.969129]  [] filename_lookup+0x2b/0xc0
[27479.969134]  [] user_path_at_empty+0x67/0xc0
[27479.969139]  [] ? cp_new_stat+0x165/0x1a0
[27479.969144]  [] user_path_at+0x11/0x20
[27479.969148]  [] vfs_fstatat+0x63/0xc0
[27479.969152]  [] SYSC_newstat+0x2e/0x60
[27479.969157]  [] ? __audit_syscall_exit+0x1e6/0x280
[27479.969161]  [] SyS_newstat+0xe/0x10
[27479.969166]  [] system_call_fastpath+0x25/0x2a
[30359.967071] INFO: task cp:4656 blocked for more than 120 seconds.
[30359.967120] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.967345] cp              D ffff89373fd9ac80     0  4656 2914
0x00000080
[30359.967369] Call Trace:
[30359.967391]  [] ? nfs_permission+0x105/0x220 [nfs]
[30359.967403]  [] ? security_inode_permission+0x22/0x30
[30359.967410]  [] schedule_preempt_disabled+0x29/0x70
[30359.967415]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.967421]  [] ? unlazy_walk+0x133/0x140
[30359.967427]  [] mutex_lock+0x1f/0x2f
[30359.967432]  [] lookup_slow+0x33/0xa7
[30359.967438]  [] path_lookupat+0x838/0x8b0
[30359.967444]  [] ? kmem_cache_alloc+0x35/0x1f0
[30359.967449]  [] ? getname_flags+0x4f/0x1a0
[30359.967454]  [] filename_lookup+0x2b/0xc0
[30359.967459]  [] user_path_at_empty+0x67/0xc0
[30359.967464]  [] ? cp_new_stat+0x165/0x1a0
[30359.967469]  [] user_path_at+0x11/0x20
[30359.967473]  [] vfs_fstatat+0x63/0xc0
[30359.967477]  [] SYSC_newstat+0x2e/0x60
[30359.967486]  [] ? __audit_syscall_exit+0x1e6/0x280
[30359.967490]  [] SyS_newstat+0xe/0x10
[30359.967497]  [] system_call_fastpath+0x25/0x2a
[30359.967502] INFO: task cp:6363 blocked for more than 120 seconds.
[30359.967708] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.967933] cp              D ffff89373fd9ac80     0  6363 6002
0x00000080
[30359.967938] Call Trace:
[30359.968064]  [] ? rpcauth_lookupcred+0x91/0xd0 [sunrpc]
[30359.968071]  [] schedule_preempt_disabled+0x29/0x70
[30359.968076]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.968082]  [] mutex_lock+0x1f/0x2f
[30359.968088]  [] do_last+0x298/0x1290
[30359.968093]  [] path_openat+0xcd/0x5a0
[30359.968100]  [] ? __check_object_size+0x1ca/0x250
[30359.968121]  [] do_filp_open+0x4d/0xb0
[30359.968129]  [] ? __alloc_fd+0xc2/0x170
[30359.968135]  [] do_sys_open+0x124/0x220
[30359.968140]  [] SyS_open+0x1e/0x20
[30359.968145]  [] system_call_fastpath+0x25/0x2a
[30359.968151] INFO: task cp:9517 blocked for more than 120 seconds.
[30359.968414] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.968649] cp              D ffff89373fd9ac80     0  9517 9100
0x00000080
[30359.968656] Call Trace:
[30359.968670]  [] ? nfs_permission+0x105/0x220 [nfs]
[30359.968677]  [] ? security_inode_permission+0x22/0x30
[30359.968682]  [] schedule_preempt_disabled+0x29/0x70
[30359.968686]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.968691]  [] ? unlazy_walk+0x133/0x140
[30359.968696]  [] mutex_lock+0x1f/0x2f
[30359.968701]  [] lookup_slow+0x33/0xa7
[30359.968706]  [] path_lookupat+0x838/0x8b0
[30359.968711]  [] ? kmem_cache_alloc+0x35/0x1f0
[30359.968716]  [] ? getname_flags+0x4f/0x1a0
[30359.968721]  [] filename_lookup+0x2b/0xc0
[30359.968726]  [] user_path_at_empty+0x67/0xc0
[30359.968730]  [] ? cp_new_stat+0x165/0x1a0
[30359.968735]  [] user_path_at+0x11/0x20
[30359.968739]  [] vfs_fstatat+0x63/0xc0
[30359.968743]  [] SYSC_newstat+0x2e/0x60
[30359.968749]  [] ? __audit_syscall_exit+0x1e6/0x280
[30359.968753]  [] SyS_newstat+0xe/0x10
[30359.968760]  [] system_call_fastpath+0x25/0x2a
[30359.968767] INFO: task cp:15665 blocked for more than 120 seconds.
[30359.969037] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.969287] cp              D ffff89373fd1ac80     0 15665 15170
0x00000080
[30359.969308] Call Trace:
[30359.969319]  [] ? nfs_permission+0x105/0x220 [nfs]
[30359.969325]  [] ? security_inode_permission+0x22/0x30
[30359.969330]  [] schedule_preempt_disabled+0x29/0x70
[30359.969334]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.969339]  [] ? unlazy_walk+0x133/0x140
[30359.969344]  [] mutex_lock+0x1f/0x2f
[30359.969348]  [] lookup_slow+0x33/0xa7
[30359.969354]  [] path_lookupat+0x838/0x8b0
[30359.969358]  [] ? kmem_cache_alloc+0x35/0x1f0
[30359.969363]  [] ? getname_flags+0x4f/0x1a0
[30359.969368]  [] filename_lookup+0x2b/0xc0
[30359.969375]  [] user_path_at_empty+0x67/0xc0
[30359.969380]  [] ? cp_new_stat+0x165/0x1a0
[30359.969385]  [] user_path_at+0x11/0x20
[30359.969389]  [] vfs_fstatat+0x63/0xc0
[30359.969393]  [] SYSC_newstat+0x2e/0x60
[30359.969398]  [] ? __audit_syscall_exit+0x1e6/0x280
[30359.969402]  [] SyS_newstat+0xe/0x10
[30359.969407]  [] system_call_fastpath+0x25/0x2a
[30359.969413] INFO: task cp:21924 blocked for more than 120 seconds.
[30359.969657] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.969902] cp              D ffff89373fc9ac80     0 21924 21419
0x00000080
[30359.969907] Call Trace:
[30359.969917]  [] ? nfs_permission+0x105/0x220 [nfs]
[30359.969923]  [] ? security_inode_permission+0x22/0x30
[30359.969927]  [] schedule_preempt_disabled+0x29/0x70
[30359.969931]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.969936]  [] ? unlazy_walk+0x133/0x140
[30359.969941]  [] mutex_lock+0x1f/0x2f
[30359.969945]  [] lookup_slow+0x33/0xa7
[30359.969950]  [] path_lookupat+0x838/0x8b0
[30359.969969]  [] ? kmem_cache_alloc+0x35/0x1f0
[30359.969974]  [] ? getname_flags+0x4f/0x1a0
[30359.969979]  [] filename_lookup+0x2b/0xc0
[30359.969984]  [] user_path_at_empty+0x67/0xc0
[30359.969988]  [] ? cp_new_stat+0x165/0x1a0
[30359.969994]  [] user_path_at+0x11/0x20
[30359.970000]  [] vfs_fstatat+0x63/0xc0
[30359.970004]  [] SYSC_newstat+0x2e/0x60
[30359.970009]  [] ? __audit_syscall_exit+0x1e6/0x280
[30359.970013]  [] SyS_newstat+0xe/0x10
[30359.970018]  [] system_call_fastpath+0x25/0x2a
[30359.970024] INFO: task cp:28139 blocked for more than 120 seconds.
[30359.970260] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.970497] cp              D ffff8937380a5780     0 28139 27597
0x00000080
[30359.970504] Call Trace:
[30359.970510]  [] ? security_inode_permission+0x22/0x30
[30359.970514]  [] schedule_preempt_disabled+0x29/0x70
[30359.970518]  [] __mutex_lock_slowpath+0xc7/0x1d0
[30359.970523]  [] ? unlazy_walk+0x133/0x140
[30359.970529]  [] mutex_lock+0x1f/0x2f
[30359.970533]  [] lookup_slow+0x33/0xa7
[30359.970538]  [] path_lookupat+0x838/0x8b0
[30359.970542]  [] ? kmem_cache_alloc+0x35/0x1f0
[30359.970547]  [] ? getname_flags+0x4f/0x1a0
[30359.970552]  [] filename_lookup+0x2b/0xc0
[30359.970557]  [] user_path_at_empty+0x67/0xc0
[30359.970561]  [] ? cp_new_stat+0x165/0x1a0
[30359.970566]  [] user_path_at+0x11/0x20
[30359.970570]  [] vfs_fstatat+0x63/0xc0
[30359.970592]  [] SYSC_newstat+0x2e/0x60
[30359.970598]  [] ? __audit_syscall_exit+0x1e6/0x280
[30359.970602]  [] SyS_newstat+0xe/0x10
[30359.970607]  [] system_call_fastpath+0x25/0x2a

Please advise!

Cheers, Nick

2 thoughts on - Nfs Causes CentOS 7.7 System To Hang

  • This is happening on the client, right?  What system is providing the NFS service?

    Assuming this is a bug, it’s probably in the kernel and not one of those two packages.  Select an older kernel from the GRUB list when the system boots, and see if the problem goes away.

  • you don’t have to specify versions, just

    yum downgrade rpcbind nfs-utils

    and then yum will work out the versions by itself, and their dependencies – in this case, it appears that you’ll have to downgrade some libraries as well.

    HTH, Kay