Kernel Updates Do Not Boot – Always Boots Oldest Kernel

Home » CentOS » Kernel Updates Do Not Boot – Always Boots Oldest Kernel
CentOS 8 Comments

This issue has been around for some months, but other things keep crowding out a fix.

uname give me

3.10.0-1160.36.2.el7.x86_64 #1 SMP Wed Jul 21 11:57:15 UTC 2021

yet I have

3.10.0-1160.76.1.el7.x86_64
3.10.0-1160.81.1.el7.x86_64
3.10.0-1160.83.1.el7.x86_64
3.10.0-1160.88.1.el7.x86_64

loaded.

The system uses UEFI to boot.

sudo ls -l /sys/firmware/efi gives:

total 0
-r–r–r–.  1 root root 4096 Feb 19 16:47 config_table drwxr-xr-x.  2 root root    0 Feb 19 16:47 efivars
-r–r–r–.  1 root root 4096 Mar 14 17:57 fw_platform_size
-r–r–r–.  1 root root 4096 Feb 19 16:47 fw_vendor drwxr-xr-x.  2 root root    0 Mar 14 17:57 mok-variables
-r–r–r–.  1 root root 4096 Feb 19 16:47 runtime drwxr-xr-x.  9 root root    0 Feb 19 16:47 runtime-map
-r——–.  1 root root 4096 Feb 19 16:47 systab drwxr-xr-x. 65 root root    0 Mar 14 17:57 vars

and

sudo efibootmgr

gives:

BootCurrent: 000F
BootOrder: 000F,000D,000B,000E,0008,0000,0002,0003,0004,0005,0006,0007
Boot0000* CD/DVD Rom Boot0002* PXE Network Boot0003  Enter Setup Boot0004  Boot Devices Boot0005  Boot Manager Boot0006  Setup Boot0007  Diagnostics Boot0008* Embedded Hypervisor Boot000B* CentOS Linux Boot000D* CentOS-AltDrv Boot000E* Hard Disk 3
Boot000F* CentOS-MainDrv

This is a remote server, thus I need a sure fire fix. My previous attempts have either had no impact – the old kernel boots

or

machine hangs and I need to do a trip to the site.

Now this issue could be a residual from my initial setup when I
installed 2 by 3.x TB SSD and needed to manually change from bios/grub2
boot to UEFI.

I have already spent 10’s of hours on this system, just want to have it run the latest kernels – for obvious reasons.

Some other items:

sudo grep “^menuentry” /boot/grub2/grub.cfg | cut -d “‘” -f2

gives:

CentOS Linux (3.10.0-1160.88.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.83.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.81.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.76.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-a39773847cf34651bc34d02222566f53) 7 (Core)

indicating that .88.1 should boot.

sudo grub2-editenv list

gives:

saved_entry=CentOS Linux (3.10.0-1160.88.1.el7.x86_64) 7 (Core)

also as expected.

/etc/default/grub exists and contains

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=”$(sed ‘s, release .*$,,g’ /etc/system-release)”
GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT=”console”
GRUB_CMDLINE_LINUX=”crashkernel=auto rd.md.uuid=066ffecb:69137a0b:4e579b4f:dfbf1696
rd.md.uuid=bd87f682:e6df10e2:d2a6e247:834133f7 rhgb quiet”
GRUB_DISABLE_RECOVERY=”true”

the /boot/grub2/grubenv contains

# GRUB Environment Block saved_entry=CentOS Linux (3.10.0-1160.88.1.el7.x86_64) 7 (Core)
#######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################

All these point to the correct version of the kernel but always boots the old .36.2 version.

Just realized these files only relate to BIOS boot, and my system is UEFI boot.

Now documentation seems to get scarce.

seems like the boot files are now residing in

/boot/efi/EFI/CentOS

AND

/boot/efi2/EFI/CentOS

although looking at timestamps the latter directory is not being updated.

/boot/efi/EFI/CentOS contains

total 7028
-rwx——. 1 root root     134 Aug  1  2020 BOOT.CSV
-rwx——. 1 root root     134 Aug  1  2020 BOOTX64.CSV
drwx——. 2 root root    4096 Dec 23 22:01 fonts
-rwx——. 1 root root    8589 Mar 14 17:51 grub.cfg
-rwx——. 1 root root    1024 Aug 26  2021 grubenv
-rwx——. 1 root root 1125704 Dec 17 06:13 grubx64.efi
-rwx——. 1 root root 1154640 Aug  1  2020 mmx64.efi
-rwx——. 1 root root 1154640 Aug  1  2020 MokManager.efi
-rwx——. 1 root root 1243864 Aug  1  2020 shim.efi
-rwx——. 1 root root 1237824 Aug  1  2020 shimx64-CentOS.efi
-rwx——. 1 root root 1243864 Aug  1  2020 shimx64.efi

and we see that the grub.cfg is being updated.

However, here the grubenv file contains

# GRUB Environment Block saved_entry=CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)
#######################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################

So maybe I have the reason.

Now how to cure??

I can obviously edit this file to point to the latest kernel, but this would likely need to be done each time I update the system and get a new kernel – not my preferred option.

How come the BIOS boot files are updated each time, but not the UEFI
based one?? Well grub.cfg is, but not grubenv

Can I edit /etc/default/grub and change

GRUB_DEFAULT=saved

to something else?

Stumped.

I think I have some basic UEFI install stuff missing, but back when I
manually changed the boot system from BIOS to UEFI I was told only anaconda does this at install time, and I was unwilling to do a complete reinstall on an already in production system. So after much trial, I did get it booting reliably, just this one issues remains.

Any pointers appreciated.

TIA Rob.

8 thoughts on - Kernel Updates Do Not Boot – Always Boots Oldest Kernel

  • OK,

    found out the problem as to why it doesn’t boot any kernel except 36.2

    the system reports that it cannot find

    vmlinuz-3.10.0-1160.88.1.el7.x86_64

    or any one of the others, except for vmlinuz-3.10.0-1160.36.2.el7.x86_64

    hence a manual selection from the grub menu when in front of the machine will only load the 36.2 kernel

    I found that under /boot/grub2 there were two .rpmnew files that mucked up the symbolic link to the grubenv file – so fixed that and did a reinstall of the latest kernel.

    Now all the grub and efi files appear to update correctly – progress.

    Now just need to work out why the efi boot process can see the old
    (original) kernel (36.2) but none of the later ones.

    Any ideas of where to look for this? seems a much more fundamental problem related to kernel install and efi booting

    Thanks Rob

  • I had something like this happen some years ago on a workstation with
    2-disk (software/Linux) RAID 1. Turns out one of the disks had been ejected from the raid array. It was that ejected disk that was getting the updates, but since it was no longer in the array it wasn’t being booted, but rather the other one that wasn’t getting the updates.

    Fred

    CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS

  • Am 14.03.23 um 12:30 schrieb Rob Kampen:

    Whats the _complete_ output of cat /etc/default/grub ?

  • Here is the contents of the entire

    cat /etc/default.grub

    GRUB_TIMEOUT=5
    GRUB_DISTRIBUTOR=”$(sed ‘s, release .*$,,g’ /etc/system-release)”
    GRUB_DEFAULT=0
    GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT=”console”
    GRUB_CMDLINE_LINUX=”crashkernel=auto rd.md.uuid=066ffecb:69137a0b:4e579b4f:dfbf1696
    rd.md.uuid=bd87f682:e6df10e2:d2a6e247:834133f7 rhgb quiet”
    GRUB_DISABLE_RECOVERY=”true”

    I have only changed GRUB_DEFAULT from “saved” to “0”

    I have also run

    /usr/sbin/grub2-mkconfig -o /boot/efi/EFI/CentOS/grub.cfg

    and seen the grub.cfg and grubenv updated in /boot/efi/EFI/CentOS

    At this point I think I have grub doing its stuff in the correct folder
    / destination used by UEFI for booting.

    When I look at grub.cfg there is some stuff I cannot understand

    there are five menuentry in this file, like:

    menuentry ‘CentOS Linux (3.10.0-1160.88.1.el7.x86_64) 7 (Core)’ –class CentOS –class gnu-linux –class gnu –class os –unrestricted
    $menuentry_id_option
    ‘gnulinux-3.10.0-1160.81.1.el7.x86_64-advanced-7276336b-d2f2-4b94-b491-ad8c5662acb3′
    {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_gpt
        insmod part_gpt
        insmod diskfilter
        insmod mdraid1x
        insmod xfs
        set root=’mduuid/bd87f682e6df10e2d2a6e247834133f7′
        if [ x$feature_platform_search_hint = xy ]; then
          search –no-floppy –fs-uuid –set=root
    –hint=’mduuid/bd87f682e6df10e2d2a6e247834133f7’
    f12be7f3-a6c6-4b90-8c51-286c32d11d12
        else
          search –no-floppy –fs-uuid –set=root f12be7f3-a6c6-4b90-8c51-286c32d11d12
        fi
        linuxefi /vmlinuz-3.10.0-1160.88.1.el7.x86_64
    root=UUID=7276336b-d2f2-4b94-b491-ad8c5662acb3 ro crashkernel=auto rd.md.uuid=066ffecb:69137a0b:4e579b4f:dfbf1696
    rd.md.uuid=bd87f682:e6df10e2:d2a6e247:834133f7 rhgb quiet LANG=en_US.UTF-8
        initrdefi /initramfs-3.10.0-1160.88.1.el7.x86_64.img
    }

    the above is the latest kernel – doesn’t boot as the console tells me it cannot load the vmlinuz file

    the kernel that boots looks like:

    menuentry ‘CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)’ –class CentOS –class gnu-linux –class gnu –class os –unrestricted
    $menuentry_id_option
    ‘gnulinux-3.10.0-1160.36.2.el7.x86_64-advanced-7276336b-d2f2-4b94-b491-ad8c5662acb3′
    {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_gpt
        insmod part_gpt
        insmod diskfilter
        insmod mdraid1x
        insmod xfs
        set root=’mduuid/bd87f682e6df10e2d2a6e247834133f7′
        if [ x$feature_platform_search_hint = xy ]; then
          search –no-floppy –fs-uuid –set=root
    –hint=’mduuid/bd87f682e6df10e2d2a6e247834133f7’
    f12be7f3-a6c6-4b90-8c51-286c32d11d12
        else
          search –no-floppy –fs-uuid –set=root f12be7f3-a6c6-4b90-8c51-286c32d11d12
        fi
        linuxefi /vmlinuz-3.10.0-1160.36.2.el7.x86_64
    root=UUID=7276336b-d2f2-4b94-b491-ad8c5662acb3 ro crashkernel=auto rd.md.uuid=066ffecb:69137a0b:4e579b4f:dfbf1696
    rd.md.uuid=bd87f682:e6df10e2:d2a6e247:834133f7 rhgb quiet
        initrdefi /initramfs-3.10.0-1160.36.2.el7.x86_64.img
    }

    I see that the first line names the kernel in brackets (correctly) but the $menuentry_id_option ‘…..’ doesn’t make sense to me.

    For the kernel that boots (3.10.0-1160.36.2) the entry is
    ‘gnulinux-3.10.0-1160.36.2.el7.x86_64-advanced-7276336b-d2f2-4b94-b491-ad8c5662acb3’

    For kernels that don’t boot, e.g (3.10.0-1160.88.1) we see

    ‘gnulinux-3.10.0-1160.81.1.el7.x86_64-advanced-7276336b-d2f2-4b94-b491-ad8c5662acb3’

    and this entry just seems wrong

    firstly the kernel version doesn’t match – it has been set to … 81.1
    … rather than 88.1

    secondly the last part of the line is the same for every menuentry, namely

    -advanced-7276336b-d2f2-4b94-b491-ad8c5662acb3

    where does this come from? what is this part for? doing?

    Thanks Rob

  • I may be wrong here but IIRC, using grub2-mkconfig as described in the Grub docs didn’t work for me when I tried to use it years ago.

    I think you have to find out what is done when installing kernels and try to find out where it goes wrong in your case. When you look at ‘rpm -q
    –scripts kernel’ you can see that new kernels are registered with the script ‘/usr/sbin/new-kernel-pkg’. I suggest to analyze what it does exactly. I think it calls ‘grubby’ to do further work…

    Regards, Simon

  • If not already done, you can also go through the official documentation page for working with Grub 2 on RH EL 7 and the different commands it is reporting, both for bios and UEFU based systems.:
    https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-working_with_the_grub_2_boot_loader

    Eventually trying and managing before with some commands on another UEFI
    based system/vm that is more practical to use for you, as the target one is a remote system, as you wrote HIH, Gianluca

  • Thanks all for your comments and suggestions.

    The main fix for the topic fault was fixing a soft link to
    /boot/efi/EFI/CentOS/grubenv – this is the one location used by UEFI

    It turns out that the update process for this file, when a new kernel is installed, uses /boot/grub2/grubenv.

    In my case a /boot/grub2/grubenv.rpmnew updated soft link was pointing to the correct file in /boot/efi/EFI/CentOS/, the original(?) grubenv in
    /boot/grub2/ was being updated correctly, just that UEFI booting doesn’t use any files in this location. Fixed the soft link and it now gets updated correctly. Thus I can use

    GRUB_DEFAULT=saved

    However my booting problems were a little more obscure.

    The grub.cfg file menuentry stanza for each kernel was correct. The set root=’mduuid/‘ points to the /boot UUID where the vmlinuz files live.

    Also the linuxefi /vmlinuz-3.10.0-1180 ….. has both ‘/boot’ and ‘/’
    UUIDs included.

    In my case, due to a manual migration from BIOS boot (MBR partition) to UEFI boot (GPT partition) on the server, plus a manual disc upgrade from a pair of RAID1 500GB HDD (MBR partitioned) to a pair of RAID1 3.4TB SSD
    (GPT partitioned), everything appeared to be working, BUT I left the old HDDs plugged in.

    The old HDD only had the 36.2 kernel installed. All the updated kernels were correctly installed onto the new SSD. HOWEVER, due to the migration process I employed the UUID for the partitions were the same. Thus UEFI
    boot, prior to OS load by loading vmlinuz only knows about the visible UUID on the partition tables  – MDRAID hasn’t loaded yet. Thus in my case the hardware had four storage devices (2x RAID1) all with the same UUID for /boot [ blkid is your friend ]. Unfortunately I didn’t realize this, and thus the UEFI simply looked at the first drive with that UUID
    – one of the original HDD and the not SSDs which were being updated correctly.

    Removed the old drives and presto, UEFI now sees the new /boot and loads the later kernels.

    Not sure if this will help anyone else, had to track this one down by fully walking through the step by step UEFI boot process and understanding how grub2 updates are applied.