Nfs Causes CentOS 7.7 System To Hang
Hello,
MERRY CHRISTMAS to all in list!
After I upgraded to latest: CentOS Linux release 7.7.1908 (Core) I am facing nfs crashes which cause the system to hang frequently.
This is caused by cp to nfs-mounted shares.
Below is dmesg output; you will see call traces. These cause system to gradually overload:
[root@hesperia1 ~]# top top – 10:09:40 up 10:16, 1 user, load average: 53.66, 54.13, 52.98
Tasks: 475 total, 2 running, 436 sleeping, 0 stopped, 37 zombie
%Cpu(s): 0.1 us, 0.6 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si,
0.1 st KiB Mem : 3879928 total, 813504 free, 1733216 used, 1333208 buff/cache KiB Swap: 4063228 total, 4062708 free, 520 used. 1797264 avail Mem
and finally hangs showing messages (which I have not recorded precisely)
in the CLI login screen like “System out of memory”. Then I have to reboot.
I tried to downgrade nfs-utils and rpcbind to earlier versions (in case there is a bug in latest ones), but I couldn’t:
[root@hesperia1 ~]# yum downgrade rpcbind-0.2.0-47 nfs-utils-1.3.0-61
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: ftp.ntua.gr
* epel: mirrors.daticum.com
* extras: ftp.ntua.gr
* updates: ftp.ntua.gr
No package rpcbind-0.2.0-47 available.
No package nfs-utils-1.3.0-61 available.
Error: Nothing to do
Do you know if this is a known bug? (I couldn’t find something in my searches.)
Can you suggest a solution / workaround? I am facing this problem constantly and the system has become unstable.
[root@hesperia1 ~]# dmesg
…
[ 26.525584] RPC: Registered named UNIX socket transport module.
[ 26.525588] RPC: Registered udp transport module.
[ 26.525590] RPC: Registered tcp transport module.
[ 26.525592] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 27.103268] type=1305 audit(1577317991.465:3): audit_pid=836 old=0
auid=4294967295 ses=4294967295 res=1
[ 29.384755] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 29.629701] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 30.773604] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 31.977476] FS-Cache: Loaded
[ 32.170682] FS-Cache: Netfs ‘nfs’ registered for caching
[ 53.671997] random: crng init done
[26399.967630] INFO: task cp:14560 blocked for more than 120 seconds.
[26399.967984] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[26399.968202] cp D ffff89373fc9ac80 0 14560 14053
0x00000080
[26399.968211] Call Trace:
[26399.968272] [
[26399.968321] [
[26399.968345] [
[26399.968375] [
[26399.968395] [
[26399.968404] [
[26399.968410] [
[26399.968421] [
[26399.968445] [
[26399.968475] [
[26399.968481] [
[26399.968487] [
[26399.968492] [
[26399.968497] [
[26399.968501] [
[26399.968505] [
[26399.968530] [
[26399.968534] [
[26399.968544] [
[27479.967352] INFO: task cp:4656 blocked for more than 120 seconds.
[27479.967414] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.967711] cp D ffff8936b6548000 0 4656 2914
0x00000080
[27479.967726] Call Trace:
[27479.967749] [
[27479.967762] [
[27479.967769] [
[27479.967774] [
[27479.967786] [
[27479.967792] [
[27479.967821] [
[27479.967828] [
[27479.967836] [
[27479.967841] [
[27479.967846] [
[27479.967852] [
[27479.967857] [
[27479.967862] [
[27479.967866] [
[27479.967870] [
[27479.967880] [
[27479.967884] [
[27479.967892] [
[27479.967910] INFO: task cp:7754 blocked for more than 120 seconds.
[27479.968147] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.968437] cp D ffff89373fc1ac80 0 7754 7285
0x00000080
[27479.968443] Call Trace:
[27479.968455] [
[27479.968461] [
[27479.968466] [
[27479.968470] [
[27479.968475] [
[27479.968481] [
[27479.968486] [
[27479.968491] [
[27479.968500] [
[27479.968505] [
[27479.968510] [
[27479.968515] [
[27479.968520] [
[27479.968525] [
[27479.968528] [
[27479.968533] [
[27479.968539] [
[27479.968543] [
[27479.968547] [
[27479.968557] INFO: task cp:20086 blocked for more than 120 seconds.
[27479.968798] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[27479.969064] cp D ffff89373fc9ac80 0 20086 19644
0x00000080
[27479.969069] Call Trace:
[27479.969080] [
[27479.969086] [
[27479.969090] [
[27479.969094] [
[27479.969099] [
[27479.969105] [
[27479.969110] [
[27479.969115] [
[27479.969119] [
[27479.969124] [
[27479.969129] [
[27479.969134] [
[27479.969139] [
[27479.969144] [
[27479.969148] [
[27479.969152] [
[27479.969157] [
[27479.969161] [
[27479.969166] [
[30359.967071] INFO: task cp:4656 blocked for more than 120 seconds.
[30359.967120] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.967345] cp D ffff89373fd9ac80 0 4656 2914
0x00000080
[30359.967369] Call Trace:
[30359.967391] [
[30359.967403] [
[30359.967410] [
[30359.967415] [
[30359.967421] [
[30359.967427] [
[30359.967432] [
[30359.967438] [
[30359.967444] [
[30359.967449] [
[30359.967454] [
[30359.967459] [
[30359.967464] [
[30359.967469] [
[30359.967473] [
[30359.967477] [
[30359.967486] [
[30359.967490] [
[30359.967497] [
[30359.967502] INFO: task cp:6363 blocked for more than 120 seconds.
[30359.967708] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.967933] cp D ffff89373fd9ac80 0 6363 6002
0x00000080
[30359.967938] Call Trace:
[30359.968064] [
[30359.968071] [
[30359.968076] [
[30359.968082] [
[30359.968088] [
[30359.968093] [
[30359.968100] [
[30359.968121] [
[30359.968129] [
[30359.968135] [
[30359.968140] [
[30359.968145] [
[30359.968151] INFO: task cp:9517 blocked for more than 120 seconds.
[30359.968414] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.968649] cp D ffff89373fd9ac80 0 9517 9100
0x00000080
[30359.968656] Call Trace:
[30359.968670] [
[30359.968677] [
[30359.968682] [
[30359.968686] [
[30359.968691] [
[30359.968696] [
[30359.968701] [
[30359.968706] [
[30359.968711] [
[30359.968716] [
[30359.968721] [
[30359.968726] [
[30359.968730] [
[30359.968735] [
[30359.968739] [
[30359.968743] [
[30359.968749] [
[30359.968753] [
[30359.968760] [
[30359.968767] INFO: task cp:15665 blocked for more than 120 seconds.
[30359.969037] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.969287] cp D ffff89373fd1ac80 0 15665 15170
0x00000080
[30359.969308] Call Trace:
[30359.969319] [
[30359.969325] [
[30359.969330] [
[30359.969334] [
[30359.969339] [
[30359.969344] [
[30359.969348] [
[30359.969354] [
[30359.969358] [
[30359.969363] [
[30359.969368] [
[30359.969375] [
[30359.969380] [
[30359.969385] [
[30359.969389] [
[30359.969393] [
[30359.969398] [
[30359.969402] [
[30359.969407] [
[30359.969413] INFO: task cp:21924 blocked for more than 120 seconds.
[30359.969657] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.969902] cp D ffff89373fc9ac80 0 21924 21419
0x00000080
[30359.969907] Call Trace:
[30359.969917] [
[30359.969923] [
[30359.969927] [
[30359.969931] [
[30359.969936] [
[30359.969941] [
[30359.969945] [
[30359.969950] [
[30359.969969] [
[30359.969974] [
[30359.969979] [
[30359.969984] [
[30359.969988] [
[30359.969994] [
[30359.970000] [
[30359.970004] [
[30359.970009] [
[30359.970013] [
[30359.970018] [
[30359.970024] INFO: task cp:28139 blocked for more than 120 seconds.
[30359.970260] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs”
disables this message.
[30359.970497] cp D ffff8937380a5780 0 28139 27597
0x00000080
[30359.970504] Call Trace:
[30359.970510] [
[30359.970514] [
[30359.970518] [
[30359.970523] [
[30359.970529] [
[30359.970533] [
[30359.970538] [
[30359.970542] [
[30359.970547] [
[30359.970552] [
[30359.970557] [
[30359.970561] [
[30359.970566] [
[30359.970570] [
[30359.970592] [
[30359.970598] [
[30359.970602] [
[30359.970607] [
Please advise!
Cheers, Nick
2 thoughts on - Nfs Causes CentOS 7.7 System To Hang
This is happening on the client, right? What system is providing the NFS service?
Assuming this is a bug, it’s probably in the kernel and not one of those two packages. Select an older kernel from the GRUB list when the system boots, and see if the problem goes away.
…
you don’t have to specify versions, just
yum downgrade rpcbind nfs-utils
and then yum will work out the versions by itself, and their dependencies – in this case, it appears that you’ll have to downgrade some libraries as well.
HTH, Kay