BUG: Soft Lockup – CPU#0 Stuck For 36s! [swapper/0:0]

Home » CentOS » BUG: Soft Lockup – CPU#0 Stuck For 36s! [swapper/0:0]
CentOS 14 Comments

This bug is reported only on the VM’s with CentOS 7 running on on VMware ESXi 5.1. The vSphere performance graph shows high CPU consume and disk activity only on VM’s with CentOS 7. Sometimes I can not connect remotely with ssh
(timeout error).

The details of last issues was reported to retrace.fedoraproject.org.

¿Do you have a hint?

[root@vmguest ~]# abrt-cli list id c52b463b15cfa94af7a96f237e5f525332750dd3
reason: systemd-journald killed by SIGABRT
time: Tue 16 Aug 2016 03:10:52 PM CLST
cmdline: /usr/lib/systemd/systemd-journald package: systemd-219-19.el7_2.12
uid: 0 (root)
count: 1
Directory: /var/spool/abrt/ccpp-2016-08-16-15:10:52-458
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/d5f5d4f75b200eeab2f83c8340a37c5d507e29a3

id e6955aa621a0d296d4cdd05421523885e85a179b reason: BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]
time: Tue 09 Aug 2016 04:46:33 PM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.28.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-08-09-16:46:33-3165-0
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/4e231b49f72864c3487d8985337f9d41ce43da28

id 402f22e7214ea6bddcb0db9a9315527be245f943
reason: systemd-logind killed by SIGABRT
time: Wed 20 Jul 2016 06:10:55 AM CLT
cmdline: /usr/lib/systemd/systemd-logind package: systemd-219-19.el7_2.9
uid: 0 (root)
count: 3
Directory: /var/spool/abrt/ccpp-2016-07-20-06:10:55-32283
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/307b26a77cc6d5005ce2fdf18ff010fe3dc94401

id 58a46f4a45699384ad74850f53e749c702ee7b0b reason: systemd-journald killed by SIGABRT
time: Tue 02 Aug 2016 05:44:50 PM CLT
cmdline: /usr/lib/systemd/systemd-journald package: systemd-219-19.el7_2.11
uid: 0 (root)
count: 1
Directory: /var/spool/abrt/ccpp-2016-08-02-17:44:50-454
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/1af5058d2b9650a1d91c676fe15feb5612afccb9

id f4a35ca85a046e74bf4a4382c9f9a5c8dd8be149
reason: BUG: soft lockup – CPU#0 stuck for 24s! [vmtoolsd:579]
time: Tue 02 Aug 2016 06:45:29 AM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.22.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-08-02-06:45:29-11859-0
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/87298dcaf6b7dea7a92b136e3332209f3fd7c4d2

id edaec629ccce62943e9bdb514fe6e319ab320669
reason: BUG: soft lockup – CPU#0 stuck for 27s! [khugepaged:51]
time: Tue 26 Jul 2016 06:00:13 PM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.18.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-07-26-18:00:08-641-4
Reported: cannot be reported

id b707fd06199e2e1edcb105878a46e238a50746f3
reason: BUG: soft lockup – CPU#3 stuck for 23s!
[systemd-journal:422]
time: Tue 26 Jul 2016 06:00:10 PM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.18.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-07-26-18:00:08-641-1
Reported: cannot be reported

id a39eead9c9f75c2dc94df0852cd24260f414b80b reason: BUG: soft lockup – CPU#2 stuck for 22s! [swapper/2:0]
time: Tue 26 Jul 2016 06:00:08 PM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.18.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-07-26-18:00:08-641-0
Reported: cannot be reported

id fe7e2542e93848e41d9d702af5c4d12b1d833b72
reason: systemd-logind killed by SIGABRT
time: Wed 20 Jul 2016 08:20:29 AM CLT
cmdline: /usr/lib/systemd/systemd-logind package: systemd-219-19.el7_2.9
uid: 0 (root)
count: 1
Directory: /var/spool/abrt/ccpp-2016-07-20-08:20:29-32660

id e58538a3aa1f01f384fe9bdd40a693f9d2f32889
reason: BUG: soft lockup – CPU#2 stuck for 24s! [kworker/2:0:31607]
time: Mon 13 Jun 2016 04:48:05 PM CLT
cmdline: BOOT_IMAGE=/vmlinuz-3.10.0-327.18.2.el7.x86_64
root=/dev/mapper/CentOS_vmguest-root ro crashkernel=auto rd.lvm.lv=CentOS_vmguest/root rd.lvm.lv=CentOS_vmguest/swap rhgb quiet LANG=en_US.UTF-8
package: kernel uid: 0 (root)
count: 1
Directory: /var/spool/abrt/oops-2016-06-13-16:48:05-10215-0

id a4dd378d494c8eee43407b572e2b314f38f7c5b9
reason: systemd-journald killed by SIGABRT
time: Sat 28 May 2016 06:24:40 PM CLT
cmdline: /usr/lib/systemd/systemd-journald uid: 0
Directory: /var/spool/abrt/ccpp-2016-05-28-18:24:40-6058
Reported: cannot be reported

14 thoughts on - BUG: Soft Lockup – CPU#0 Stuck For 36s! [swapper/0:0]

  • Does this happen (only) while taking or consolidating snapshots? The VM
    is suspended during these operations and the OS isn’t too crazy about it, especially if you have slow storage.

    Jack

  • Yes, I tried it, but does not exists:

    vmguest # cat /proc/sys/kernel/softlockup_thresh cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

  • 2016-08-18 13:32 GMT-04:00 JJB :

    Nope, no snapshots. Just plain running. In fact, many times the guests are under light usage (internal instrumentation, no external VMware stats). We’re investigating because we do have reasons to believe that our provider is probably overcommitting or overselling (not out of malice, AFAIK).

    HTH, Carlos.

  • No, I don’t use snapshots.

    It is a Dell 2 TB Enterprise 3.5″ SATA Hard Drive.

    The disk activity of the host is normal to low. Few VM’s.

  • Not sure if this was the last email on this.  If not ignore me. However I found a post for new operating systems that says to set the watchdog_thresh value instead of softlockup_thresh. 
    http://askubuntu.com/questions/592412/why-is-there-no-proc-sys-kernel-softlockup-thresh this is an Ubuntu post, but on my CentOS 7 system this parameter exists, and softlockup_thresh does not.  I have set it but I will need to see if I still get the CPU lock up messages on my VM. I hope this helps.KM

    From: correomm
    To: CentOS mailing list
    Sent: Thursday, August 18, 2016 1:50 PM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    Yes, I tried it, but does not exists:

    vmguest # cat /proc/sys/kernel/softlockup_thresh cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

    CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS

  • All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, the file softlockup_thresh does not exist.  Should it be added?  What is the best way to get rid of this behavior. Thanks in advance and sorry if I missed something along the way.KM

    From: correomm
    To: CentOS mailing list
    Sent: Thursday, August 18, 2016 1:55 PM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    Yes, I tried it, but does not exists:

    vmguest # cat /proc/sys/kernel/softlockup_thresh cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

    CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS

    | | Virus-free. http://www.avg.com |

  • Never saw this email….Did anyone get it?  anyone know how to fix this?thanks again.

    From: KM
    To: CentOS mailing list
    Sent: Monday, August 7, 2017 11:26 AM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, the file softlockup_thresh does not exist.  Should it be added?  What is the best way to get rid of this behavior. Thanks in advance and sorry if I missed something along the way.KM

    From: correomm
    To: CentOS mailing list
    Sent: Thursday, August 18, 2016 1:55 PM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    Yes, I tried it, but does not exists:

    vmguest # cat /proc/sys/kernel/softlockup_thresh cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

    CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS

    | | Virus-free. http://www.avg.com |

  • Never saw this email….Did anyone get it?  anyone know how to fix this?thanks again.

    From: KM
    To: CentOS mailing list
    Sent: Monday, August 7, 2017 11:26 AM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, the file softlockup_thresh does not exist.  Should it be added?  What is the best way to get rid of this behavior. Thanks in advance and sorry if I missed something along the way.KM

    From: correomm
    To: CentOS mailing list
    Sent: Thursday, August 18, 2016 1:55 PM
    Subject: Re: [CentOS] BUG: soft lockup – CPU#0 stuck for 36s! [swapper/0:0]

    Yes, I tried it, but does not exists:

    vmguest # cat /proc/sys/kernel/softlockup_thresh cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

    CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS

    | | Virus-free. http://www.avg.com |

  • Yes, I see this behavior as well. Never have found a solution – other than increasing the threshold and pretending it doesn’t happen.

  • On bare metal is usually means some hardware has gone into an uninteruptable IRQ and the CPU is waiting for it to go away. I saw this with systems with Green disk drives a while ago. Something going to talk to the drive would just sit for long times while the drive spun up, the cache was validated etc. Other things would be drives on USB disks too when some other USB item started needing input.. since it is a hub environment they can spew for a while and the CPU would report a soft-lockup.

  • Not hardly. We discovered green drives were nothing we wanted right after they came out. And I’m talking at work, with servers, all drives are either enterprise, as we bought them, or NAS-rated (e.g. WD Red).

    mark