CentOS6: HELP! EFI Boot Fails After Replacing Disks…

Home » CentOS » CentOS6: HELP! EFI Boot Fails After Replacing Disks…
CentOS 12 Comments

OK, I wanted to replace the 500G disks in a Dell T20 server with new 2TB
disks. The machine has 4 SATA ports, one used for the optical disk and three for the hard drives. It is set up with /dev/sda and /dev/sdb with each three partitions:

1 — VFAT (for EFI)
2 — ext4 (for /boot)
3 — LVM

/dev/sda2 and /dev/sdb2 are a mirror raid (/dev/md0)
/dev/sda3 and /dev/sdb3 are a mirror raid (/dev/md1), with LVM on top of that.

/dev/sdc is a disk containing one file system and mostly used by AMANDA for backup (it has a “virtual” tape changer).

This morning I shut the machine down and pulled [the 500G] /dev/sdb and installed a new 2TB disk as a new /dev/sdb. Partitioned it (with parted) to be much like /dev/sda (except partition 3 is way bigger). I added the new
/dev/sdb2 to /dev/sda2 (boot raid set) and dd’ed /dev/sda1 to /dev/sdb1. I
then created a new RAID set (degraded) with /dev/sdb3, used pvcreate on it, used vgextend to add it to the system volume group, then used pvmove to move the extents from the old disk to the new disk. Meanwhile I partitioned a second 2TB disk using a USB SATA dock and copied the old 500G /dev/sdc1 to the new 2TB disk. So far so good.

Then I shut the machine down, swapped in the new backup [2TB] disk and pulled the old system [500G] disk and installed the third new [2TB] disk. The system won’t boot that way. It seems there is something in the UEFI (secure boot)
logic that wants the original disk, even if everything is moved over.

I ended up putting the original /dev/sda in. The machine boots and is using the new system disk (a raid array in degraded mode).

What is the magic to fix this?

I tried to run efibootmgr, but it wants a model named efivars loaded, but there is no such module available.

What am I missing?

12 thoughts on - CentOS6: HELP! EFI Boot Fails After Replacing Disks…

  • OK, one other tidbit:

    The EFI BIOS has a UUID in its boot options. I expect that this identifies the old system disk, but I don’t know where that UUID comes from. It is NOT
    the VFAT UUID for the EFI partition and is not any of the UUIDs for any of the Linux file systems or RAID arrays, or really anything else I can find under Linux. I’m guessing it is something the EFI BIOS has come up with, but I am not sure what exactly. *I* don’t remember filling that in — I think anaconda filled it in during the install process, so presumably, there is some magic under Linux to get this UUID (for the new disk(s)) and fill it in, but I
    cannot figure out what. Maybe efibootmgr has something to do with it, but efibootmgr does not work, either on a live system or a system booted from a DVD (CentOS 6.9 boot/install DVD) — efibootmgr wants a kernel module named efivars that does not seem to exist.

    At Mon, 28 May 2018 18:25:39 -0400 (EDT) CentOS mailing list wrote:

  • No fix here, but I’ve had this issue for a while, only “fix” is not booting with UEFI apparently.

  • Are you not running a CentOS kernel? That module should be available. The UUID of the VFAT volume (not the mirror) would be used in the EFI boot entry. Boot off a rescue disk when the new disk is installed and add an additional boot entry for the new disk. It will reflect the UUID of the EFI partition on the new disk. Run ‘blkid’ to compare.


    Jonathan Billings

  • That’s interesting.  Can you post the command and output where you see that?

    Also, post the output of “dmesg | grep efi:” and “ls /sys/firmware/efi/”

  • It should be the UUID of the partition, not of the VFAT volume. The partition UUID is stored in the GPT.  The volume UUID is stored in the filesystem header (I believe).

    For example, on my laptop:

    # efibootmgr -v | grep Fedora Boot0000* Fedora PciRoot(0x0)/Pci(0x17,0x0)/Sata(2,65535,0)/HD(1,GPT,39484dd8-b1d9-47b2-b4d7-89dfe3ce5e09,0x800,0x12c000)/File(\EFI\fedora\shimx64.efi)

    # blkid | grep sda1
    /dev/sda1: LABEL=”ESP” UUID=”3850-574E” TYPE=”vfat” PARTLABEL=”EFI
    System Partition” PARTUUID=”39484dd8-b1d9-47b2-b4d7-89dfe3ce5e09″

    # sgdisk -i1 /dev/sda Partition GUID code: C12A7328-F81F-11D2-BA4B-00A0C93EC93B (EFI System)
    Partition unique GUID: 39484DD8-B1D9-47B2-B4D7-89DFE3CE5E09
    First sector: 2048 (at 1024.0 KiB)
    Last sector: 1230847 (at 601.0 MiB)
    Partition size: 1228800 sectors (600.0 MiB)
    Attribute flags: 0000000000000000
    Partition name: ‘EFI System Partition’

  • Look into the possibility that you’re booting via BIOS and not UEFI.

    It’s possible to have /boot and /boot/efi, and also to have GRUB2
    installed on the disk, in which case the system is able to boot from both BIOS and UEFI.

    If that’s the case, then you’d need to install GRUB2 on the new disk as well.

  • At Mon, 28 May 2018 20:54:32 -0700 CentOS mailing list wrote:

    No, it was booting from UEFI.

    I don’t believe grub2 is available for CentOS *6*. At least I don’t see a RPM
    for it.

  • That’s what I meant, I think.  Legacy mode is BIOS-compatible.  If you’re booting in legacy mode, you can’t access the UEFI variables. The old disk probably has GRUB installed on the first block.  It might be booting in legacy mode *because* the UEFI boot option’s UUID doesn’t match your partition.

    You might want to disable legacy mode entirely.  Boot a rescue disk if needed, and fix the boot option from there.

  • This caught my attention. Did you mean that you are using the secure boot options? I don’t know if that ties down a system to a specific disk until it is cleared from the install. From all the other items you listed in the the thread, your system looks like it is booting into a form where it is saying it isn’t UEFI anymore which would be a boot firmware option. The firmware can lock down what it thinks is ok to boot from and may require some sort of flush depending on the type of disks it thinks are ok. I had this with one set of systems where the system required a hard flush of UEFI buffers before it would boot from a larger disk. It was ok with the same old model but a new one was not possible.

    My debugging methodology at this point would be the following:

    1. Boot from EL6 iso and see if EFI variables/modules work
    2. Boot from EL7 iso and see if EFI variables/modules work
    3. See if a Dell firmware update set is available and if the changelogs say it fixes EFI boot issues
    4. repeat 1&2 if a firmware update is done.

    From what I can tell, most of the install methods and 0 downtime
    ‘hacks’ we have used for the last 30 years on BIOS systems are either impossible or need serious fixes to work again in a UEFI world.

  • At Tue, 29 May 2018 10:03:07 -0400 CentOS mailing list wrote:

    Not using “secure boot” in the sense of setting any partitular security (the
    “secure” section of the BIOS is not enabled. What *was* enabled was the UEFI
    boot. At this point I have mostly given up on UEFI. I disconnected the optical disk (we don’t really use it anyway) and connected the original boot disk as /dev/sdd and installed the third 2TB disk. The system boots (in Legacy Mode) and it is now rebuilding the RAID array onto the new disk. The
    /boot array now has three elements (partition 2 of the two new disks and partition 2 of the old disk). I condemed the old (empty) RAID array on the old disk — it now has an unformatted third partition that is not being used for anything. I will be *eventually* upgrading the system to CentOS 7
    (sometime later this summer). Maybe at that time I will revisit the world of UEFI…

  • Robert Heller wrote:
    (/dev/sda).

    Y’know, what you just wrote above… that makes it sound like you need to go into the BIOS and reset the boot order.

    mark

  • At Tue, 29 May 2018 11:35:39 -0400 CentOS mailing list wrote:

    Tried that. Right now, the boot order lists all of the disks in *Legacy* boot mode.