NFS Server On CentOS 7.6.1810 Dying Without Reason

Home » CentOS » NFS Server On CentOS 7.6.1810 Dying Without Reason
CentOS 2 Comments

Good morning, in order to post proper documentation, what logs (or log level) do I
need to troubleshoot a CentOS 7.6.1810 3.10.0-957.27.2.el7.x86_64 tha tis running a NFS server on top LVM on top of XFS on top of VDO on top of MDAM on a 6 SSD disk RAID6 ?

This physical NFS server is servign 2 NFS v4.2 shares to 2 physical KVM virtualization hosts.

When remote NFS clients start doing intensive stuff (massive writes) , the NFS server crashes and sometimes make the Linux server reboot.

form KVM side this is all I get: hvm002 kernel: nfs: server 10.10.10.2
not responding, timed out

thanks

——————-

2 thoughts on - NFS Server On CentOS 7.6.1810 Dying Without Reason

  • I managed to get NFS statistics just before server rebooted. I can reproduce this every time. Question: Am I saturating the SATA bus on the NFS server so violently with the SSDs that the server hard reboots?
    Perhaps due to the RAID 6 nature of operations?

    Server info AMD EPYC 7281 16CORE
    32 GIG RAM
    6xSATA3 SSD 1.9TB in mdadm RAID6
    OS: CentOS 7.6
    SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA
    Controller [AHCI mode] (rev 51)

    NFS server parameters:
    /etc/sysconfig/nfs RPCNFSDCOUNT=32
    all other parameters are as default in a CentOS 7.6 install

    From NFS client, share is mounted as:
    10.10.10.2:/mnt/lv_vdo0 on
    /rhev/data-center/mnt/10.10.10.2:_mnt_lv__vdo0 type nfs4
    (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.10.10.5,local_lock=none,addr=10.10.10.2)

    1- My test:
    a- mount nfs share A1 from nfs server in nfs client b- mount nfs share B1 from SAME nfs server in SAME nfs client c- copy file1.iso from /mnt/A1/file1.iso to /mnt/B1/file1.iso

    Notes:
    if file is small, tried with 1gb iso images, then no problem happens. if file is bigger tried with 3gb iso, nfs server hard reboots before server hard reboots, iostats show 16 nfsd worker threads and
    211.03 M/s READS and 223.68 M/s WRITES sustained

    2- from NFS server these are the last logs… At 14:26:52 it is the last line before the NFS server suddenly hard reboots.

    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 1
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 1641
    slot_seqid 1640
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8b6a7be080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8b6a7be080
    opcnt 3 #3: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8bd7bbd000
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #3: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8bd7bba000
    Aug 30 14:16:12 hvmnas01 kernel: renewing client (clientid 5d694c2d/12bb1419)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd_dispatch: vers 4 proc 1
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #1/3: 53 (OP_SEQUENCE)
    Aug 30 14:16:12 hvmnas01 kernel: __find_in_sessionid_hashtbl:
    1567181869:314250265:6:0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 0
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 12545
    slot_seqid 12544
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd_dispatch: vers 4 proc 1
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #1/3: 53 (OP_SEQUENCE)
    Aug 30 14:16:12 hvmnas01 kernel: __find_in_sessionid_hashtbl:
    1567181869:314250265:6:0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 1
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 1642
    slot_seqid 1641
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8b6a7be080
    opcnt 3 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8b6a7be080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8b6a7be080
    opcnt 3 #3: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8bd7bba000
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #3: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8bd7bbd000
    Aug 30 14:16:12 hvmnas01 kernel: renewing client (clientid 5d694c2d/12bb1419)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd_dispatch: vers 4 proc 1
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #1/4: 53 (OP_SEQUENCE)
    Aug 30 14:16:12 hvmnas01 kernel: __find_in_sessionid_hashtbl:
    1567181869:314250264:5:0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 0
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 8036
    slot_seqid 8035
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 4 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/4: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(32: 81060001 87b2cdfe a744333d d3d8b5a0 d514631c 04000440)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 4 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/4: 3 (OP_ACCESS)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(32: 81060001 87b2cdfe a744333d d3d8b5a0 d514631c 04000440)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 4 #3: 3: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #4/4: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(32: 81060001 87b2cdfe a744333d d3d8b5a0 d514631c 04000440)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 4 #4: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8b38ab0000
    Aug 30 14:16:12 hvmnas01 kernel: renewing client (clientid 5d694c2d/12bb1418)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd_dispatch: vers 4 proc 1
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #1/3: 53 (OP_SEQUENCE)
    Aug 30 14:16:12 hvmnas01 kernel: __find_in_sessionid_hashtbl:
    1567181869:314250265:6:0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 0
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 12546
    slot_seqid 12545
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #3: 9: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound returned 0
    Aug 30 14:16:12 hvmnas01 kernel: –> nfsd4_store_cache_entry slot ffff9c8bd7bbd000
    Aug 30 14:16:12 hvmnas01 kernel: renewing client (clientid 5d694c2d/12bb1419)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd_dispatch: vers 4 proc 1
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #1/3: 53 (OP_SEQUENCE)
    Aug 30 14:16:12 hvmnas01 kernel: __find_in_sessionid_hashtbl:
    1567181869:314250265:6:0
    Aug 30 14:16:12 hvmnas01 kernel: nfsd4_sequence: slotid 0
    Aug 30 14:16:12 hvmnas01 kernel: check_slot_seqid enter. seqid 12547
    slot_seqid 12546
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #1: 53: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op ffff9c8bfcaa8080
    opcnt 3 #2: 22: status 0
    Aug 30 14:16:12 hvmnas01 kernel: nfsv4 compound op #3/3: 9 (OP_GETATTR)
    Aug 30 14:16:12 hvmnas01 kernel: nfsd: fh_verify(20: 00060001 9799be55
    5a4051e2 b5f423b9 801ff55f 00000000)
    Aug 30 14:26:52 hvmnas01 kernel: do_IRQ: 1.35 No irq handler for vector (irq -1)
    Aug 30 14:26:52 hvmnas01 rsyslogd: imjournal: 233164 messages lost due to rate-limiting Aug 30 14:30:01 hvmnas01 systemd: Started Session 38 of user root. Aug 30 14:38:29 hvmnas01 journal: Runtime journal is using 8.0M (max allowed 1.5G, trying to leave 2.3G free of 15.5G available → current limit 1.5G). Aug 30 14:38:29 hvmnas01 kernel: Initializing cgroup subsys cpuset Aug 30 14:38:29 hvmnas01 kernel: Initializing cgroup subsys cpu Aug 30 14:38:29 hvmnas01 kernel: Initializing cgroup subsys cpuacct Aug 30 14:38:29 hvmnas01 kernel: Linux version
    3.10.0-957.27.2.el7.x86_64 (mockbuild@kbuilder.bsys.CentOS.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Mon Jul 29
    17:46:05 UTC 2019
    Aug 30 14:38:29 hvmnas01 kernel: Command line:
    BOOT_IMAGE=/vmlinuz-3.10.0-957.27.2.el7.x86_64
    root=/dev/mapper/CentOS-root ro crashkernel=auto rd.lvm.lv=CentOS/root rd.lvm.lv=CentOS/swap rhgb quiet LANG=en_US.UTF-8 skew_tick=1
    Aug 30 14:38:29 hvmnas01 kernel: e820: BIOS-provided physical RAM map:

    ———————
    Erick Perez
    ———————

  • First I’d replace the network card, preferably with something like an Intel e1000. Perhaps use a dedicated network card for the server itself and another card for the KVM clients.