Debugging Kernel Problems

Home » CentOS » Debugging Kernel Problems
CentOS 4 Comments

Not sure if this is the correct subject line but my recently installed CentOS build (Linux localhost.localdomain 3.10.0-229.14.1.el7.x86_64 #1
SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux)
periodically just freezes – completely locks up, no activity, nothing in the logs, just stops dead requiring a power off and reboot.

I’ve really looked around to try and find the _best_ way to set up debugging but there is a lot written about it from a lot of parties but I’m not sure who the definitive source is.

I did try booting with the ‘CentOS Linux (3.10.0-229.14.1.el7.x86_64) 7
(Core) with debugging’ option but that really didn’t add anything to finding a solution. Dmesg did report this however:

dmesg|grep debug
[ 0.000000] Command line:
BOOT_IMAGE=/vmlinuz-3.10.0-229.14.1.el7.x86_64
root=UUID28d2da-784c-4b18-868c-f9858bceea6d ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 systemd.debug
[ 0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-3.10.0-229.14.1.el7.x86_64
root=UUID28d2da-784c-4b18-868c-f9858bceea6d ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 systemd.debug
[ 0.940176] ehci-pci 0000:00:12.2: debug port 1
[ 0.946472] ehci-pci 0000:00:13.2: debug port 1
[ 1.238335] systemd[1]: Unknown kernel switch systemd.debug. Ignoring.
[ 5.083981] SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts
[ 5.206423] systemd[1]: Unknown kernel switch systemd.debug. Ignoring.

Which I found interesting.

So, can anyone point me to a detailed guide to debugging what’s going on with my build that will help me solve my locking problem? BTW, I think it might be my wireless Atheros equipped PCI card (dmesg|grep -i atheros
[ 2.231264] ath5k: phy0: Atheros AR2414 chip found (MAC: 0x79, PHY:
0x45) but can’t be sure because I’ve not any real proof that’s the issue.

Thanks in advance for your patience and assistance.

4 thoughts on - Debugging Kernel Problems

  • “nothing in the logs”? Have you run memtest for an extended period of time? You might first want to eliminate the possibility that this is a hardware problem.

    Akemi

  • If you have hardware raid on this machine, try to mount xfs partitions with nobarrier. We had similar freezes and this helped for us.


    Marius Vaitiekūnas

  • Akemi Yagi wrote:

    Actually, we’ve had that occasionally, on a number of boxes. I *think*
    they were all SuperMicros (sold by Penguin), and they become unresponsive
    – when I plug in the monitor-on-a-stick, there’s no response at all on the console, keys do nothing. we have to power cycle them, and nothing ever shows, not in dmesg.old, not messages, nowhere. Never figured it out.

    mark