Panic On Boot With 7.3 Kernels When Decrypting Hard Drive

Home » CentOS » Panic On Boot With 7.3 Kernels When Decrypting Hard Drive
CentOS 3 Comments

I was excited to update my system to CentOS 7.3/1611 yesterday when the release announcement came through, but since then I’ve been running into some serious kernel issues.

I’ve been successfully running 7.2 on a Thinkpad X301 for about 6
months now, with an encrypted disk that unlocks with a password at boot time. I think that’s LUKS on LVM? I didn’t tinker with the default encryption options so it’s nothing too exotic: unencrypted
/boot on sda1, encrypted system/home on sda2. I’ve used both the standard 3.10 kernels and 4.4 kernels from the kernel-lt package from ELrepo with no problems so far.

When performing the update to 7.3 yesterday, the packages kernel-lt-4.4.38-1.el7.elrepo.x86_64 and kernel-3.10.0-514.2.2.el7.x86_64 were installed. With both of those, my system comes to a complete, frozen halt on boot, after I’ve entered my disk decryption key and before I see a login screen. The little progress spinner freezes, and the caps lock light on my keyboard starts blinking, which I’ve been told indicates a kernel panic. Any manner of key combinations, even the “magic SysRq” combinations do nothing to recover the system–the only way out is to cut the power. I
was able to fall back to the still-installed
4.4.36-1.el7.elrepo.x86_64 kernel with no problems. On a whim, I tried installing the latest mainline kernel from ELrepo as well
(kernel-ml-4.9.0-1.el7.elrepo.x86_64) and encountered a similar hard lock, minus the blinkenlight in the caps lock key.

Has anyone else had a similar issue? Is there anything in what I’ve described that sounds obvious to what the solution is? It really just seems like the problems are with kernels that have come through following the 7.3 release. I’m still somewhat new at getting into the inner workings of Linux, but is there a log that might shed some light on what went wrong?

Thanks in advance,

–C

3 thoughts on - Panic On Boot With 7.3 Kernels When Decrypting Hard Drive

  • You need to capture the actual panic, or else it’s just guessing. Boot without the rhgb quiet kernel args.

    Also check it’s not just something silly like running out os space on /boot causing incomplete/corrupt initramfs.

    jh

  • Thanks for the advice. Df reports that /boot has 80% free space right now; I would assume that’s enough but I don’t know. I need to remove a few of the old kernels I’ve got taking up space. There are a few initramfs-*kdump.img files that have appeared that I assume might be helpful in tracking down the problems?

    I changed the boot arguments (and also added the args to automatically reboot following a panic, which fixes one of my problems), and I’m definitely getting kernel panics. Both 4.4.39-1 and
    4.9.0-1 end up with pretty much the same issue: “Kernel panic – not syncing: Fatal exception in interrupt.” I know the details of the panic can/should be kept in a log somewhere, but that’s where my limited experience fails me–I see references to Xorg.0.log and boot.log online, but those seem to only keep the details of the most recent boot. I used the not-so-high-tech method of taking a photo of my laptop’s screen output before it restarted, and it seems like the panic details for both kernels started with a “BUG: unable to handle kernel NULL pointer dereference at 0000000000000008” and followed shortly by an “Oops: 0000 [#1] SMP.”

    Beyond that, the behavior seems to be somewhat inconsistent–the first time I booted the 4.9 kernel without the quiet rhgb args it seemed to hang on something related to ipv6 setup without actually getting into a panic, but I foolishly didn’t capture any of the other details. The pre-7.3 4.4.36-1 kernel that had been working just fine now seems to fail in a panic every other boot, and I’m currently working from within the latest 3.10.0-514 kernel that came with 7.3, which had previously failed to boot in a panic as well.

    This is somewhat embarassing because I was feeling pretty confident that I had pretty much gotten the hang of regular Linux use, but now it feels like I somehow caused some major problems when I updated to 7.3
    (though I didn’t do anything beyond “yum update.”) If I were back on Windows, this would be the part where I backup my files and start from a fresh install, since that was usually the least painless way to fix major system problems. Is there a possibility that this is fixable, or would that be a good strategy to employ here? Or does this suggest something more serious, like a hardware issue?

    My apologies for dragging beginner’s issues here into a mailing list for an enterprise OS; CentOS just works so nicely, decently responsive with low resource usage and was (up until now) completely stable on my aging X301.

    Thanks again.

  • I have the exact same problem. I upgraded to kernel-3.10.0-514.2.2.el7.x86_64 Yesterday and the system fails to boot. Downgrading to the old kernel fixes the issue temporarily. Reinstalling the new kernel doesn’t help. I’m also using encrypted disks. I’ll see if I can get some details (a photo of the screen for example), but the server is heavily used so I really can’t afford much down time. I was very happy when I got it up again!