“Can’t Find Root Device” With Lvm Root After Moving Drive On CentOS 6.3

Home » CentOS » “Can’t Find Root Device” With Lvm Root After Moving Drive On CentOS 6.3
CentOS 20 Comments

I have an 8-core SuperMicro Xeon server with CentOS 6.3. The OS is installed on a 120 GB SSD connected by SATA, the machine also contains an Areca SAS controller with 24 drives connected. The motherboard is a SuperMicro X9DA7.

When I installed the OS, I used the default options, which creates an LVM volume group to contain / and /home, and keeps /boot and /boot/efi outside the volume group.

The machine is a couple of months old, and has been stable. While installing some new hardware, I decided to also clean up the cabling in the box, since it was a bit messy. In doing this, I probably moved the boot SSD disk to another port on the motherboard (it has a bunch, 2 SATA
6GBps and 6 SATA 3GBps).

When I booted the box after this, I got a kernel panic, the typical
“Can’t find root device“.

I read some docs, and first tried to boot from a rescue disc and reinstal GRUB, but that didn’t change anything. Further Googling got me the rdshell kernel parameter, and that dropped me to a shell when it failed to find the root device.

Reading https://fedoraproject.org/wiki/How_to_debug_Dracut_problems , I
did the following:

# lvm vgscan
# lvm vgchange -ay

And then

# ln -s /dev/mapper/ /dev/root
# exit

After this, the box boots up normally, and everything works as it should. However, when I reboot, it again fails to find the root device.

So, after all this, my question is, how do I make Dracut (I’m assuming)
understand that this LVM volume is my root device and pick it up automatically?

And, is there a way to avoid this problem in the future, if I move drives around? Surely it can’t be normal for this to happen just because I connect a drive to another port?

20 thoughts on - “Can’t Find Root Device” With Lvm Root After Moving Drive On CentOS 6.3

  • Grub (in the menu) has the following commands:

    root (hd0,1)

    kernel /vmlinuz-2.6.32-279.el6.x86_64 ro root=/dev/mapper/vg_resolve02-lv_root rd_NO_LUKS LANG=en.US.UTF-8
    rd_NO_MD crashkernel=128M rd_LVM_LV=vg_resolve02/lv_root SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet pcie_aspm=off

    When I successfully booted manually, I removed “rhgb quiet” and added
    “rdshell” to that line.

    To the best of my memory, that line is stock, I don’t recall ever changing it permanently.

    The names of the volume group and logical volume in that line correspond to my actual root device.

  • Oh, and the exact Dracut error I get is:

    dracut Warning: No root device “block:/dev/mapper/vg_resolve02-lv_root”
    found

    dracut Warning: LVM vg_resolve02/lv_root not found

    But then:

    # lvm vgscan

    Found volume group “vg_resolve02” using metadata type lvm2

    # lvm vgchange -ay

    1 logical volume(s) in volume group “vg_resolve02” now active

    # ls /dev/mapper

    control vg_resolve02-lv_root

    # ln -s /dev/mapper/vg_resolve02-lv_root /dev/root

    ln: creating symbolic link “/dev/root”: File exists

    # ls -l /dev/root

    /dev/root -> dm-0

    # rm /dev/root
    # ln -s /dev/mapper/vg_resolve02-lv_root /dev/root
    # exit

    And everything boots normally.

    Apologies if there are minor mistakes or omissions in this text. Since I
    can’t copy/paste, I’ve transcribed it, excluding some parts, like the permissions of the symlink in the ls output. I have, however, double checked the important parts, like the names of devices and files.

  • I’ve looked through Dracut trying to spot circumstances that might cause the problem that you’ve described, but came up with nothing. udev should be scanning block devices as they become available, and setting up any logical volumes on all of the available block devices.

    It may be useful to capture some information in the debugging shell, before running vgscan.

    As suggested in the fedora debugging document, capture the output of the following commands to get a better idea of what the kernel knows about block devices before you manually start the volumes, and maybe that’ll lead us to some conclusion about why the devices aren’t found.

    lvm pvdisplay lvm vgdisplay lvm lvdisplay blkid dmesg

  • Thank you, I will do this tomorrow. It’ll take me a little time, since I
    need to transcribe everything manually, but I’ll get it done. It’s just a very weird problem all in all.

  • You should be able to pipe the output into a file and copy it to the actual root filesystem after vgchange & mount. Unless there’s absolutely no writable space in the rescue shell?

  • I haven’t actually tried writing anywhere in the rescue shell before vgchange and mount. I’ll give it a try, that would simplify things.

  • If nothing else, you probably can fit most or all of that in the shell’s environment. In case it’s ever useful:

    # debug=$(lvm pvdisplay; lvm vgdisplay; lvm lvdisplay; blkid; dmesg)
    # vgchange -a y
    # mount …
    # echo “$debug” > /mnt/sysroot/root/debug

  • Hm, it seems the list strips attachments, and just pasting it makes the mail too big to go through, so, pastebin to the rescue:

    http://pastebin.com/x4wqzHyc

    That’s the output of, like you suggested:

    lvm pvdisplay; lvm vgdisplay; lvm lvdisplay; blkid; dmesg

    In that order.

  • And you ran that before you ran “vgchange -a y”? That doesn’t make any sense. The commands show the volume group active. I can’t see any reason why the system wouldn’t boot.

    I hate for you to keep rebooting your server, but do the device nodes look correct in both /dev/mapper and /dev/vg_resolve02 at that point?

  • Yes, I ran that immediately after getting dropped to the shell. I can take a look at the device nodes tomorrow, but if I remember correctly,
    /dev/mapper contained only the file “control” before running vgchange
    -ay, that is, there was no “vg_resolve02-lv_root” device there. That device only shows up after I run vgchange -ay.

    I did not check whether /dev/vg_resolve02 exists, I can do that tomorrow.

  • Apologies if someone mentioned this already ( don’t have the whole thread in my mailbox), but whenever I’ve had to re-name a root lvm volume, I also had to recreate initrd. I haven’t done it on 6.X, but I
    assume it applies to initramfs as well. The notes in my corp wiki link back to this redhat bugzilla post, https://bugzilla.redhat.com/show_bug.cgi?id=230190 try that maybe?

    Patrick

  • I haven’t actually renamed the root LVM volume, it’s had the same name since install. I just moved some drives around on the SATA ports. Is it still worth recreating initrd?

  • I wouldn’t expect it to make a difference, but it probably wouldn’t hurt anything. Copy or rename your existing initrd to a path in /boot, so that you can revert if anything goes wrong. After that, create a new one. If that fixes the problem, I’d be curious to know why. We can compare the content of the two if that changes anything, and I’ll learn something. As far as I know, the path to the devices isn’t included in the initrd.

    # mkinitrd /boot/initramfs-$(uname -r).img $(uname -r)

  • Recreating initrd made no difference.

    Immediately after getting dropped to rdshell, I looked around in /dev, which brought me a few surprises…

    /dev/mapper contains only “control”, that is, “vg_resolve02-lv_root” is missing.

    /dev/root is a symlink to /dev/dm-0

    Which is a bit surprising, since, when I do lvm vgscan and lvm vgchange
    -ya, /dev/mapper/vg_resolve02-lv_root appears, but I just now noticed that it’s a symlink to /dev/dm-0, so, in effect, when I symlink
    /dev/root to /dev/vg_resolve02-lv_root, I’m just creating the same symlink that was already there, with one more level of redirection.

    That means /dev/root already is correct, so the only thing I’m actually changing to make the system boot is to scan for volume groups and activate them.

    The big question then becomes: Why do I have to do this manually? How do I make Dracut (I assume this is Dracut’s job) make this automatically?

  • Did you get to look at or for /dev/vg_resolve02 as well?

    Does /dev/dm-0 exist?

    Does the system boot if you just “exit” from the rdshell? What about if you “vgchange -a y” without changing the symlink?

    udev should be doing this. And… I was just looking at this again, because the last time I came up with nothing useful. Look at
    /usr/share/dracut/modules.d/90lvm/64-lvm.rules. If I’m reading this correctly, udev will look for dm-0 in /sys and will not run lvm_scan if it’s found. I wonder if it’s possible that the /sys nodes are getting set up, but device-mapper isn’t setting up the nodes in /dev?

    I’m really at a loss… it seems like a much simpler explanation is simply that the devices take so long to detect that init gives up. When you run vgchange, they’ve had the time they need. That idea is inconsistent with the fact that your dmesg output shows what I assume is the correct devices and partition tables.

    You could try adding “rdinitdebug rdudevdebug” to your kernel command line, but you’re going to see a LOT of output, and it’s only really going to be meaningful if you’ve read the /init script that Dracut creates, and understand more or less what it’s doing, particularly in the “main_loop” section.

  • I checked this a bit more thoroughly. The status is as follows:

    When I boot up and get dropped to rdshell, neither /dev/root nor
    /dev/vg_resolve02, nor /dev/dm-0 exist. Just exiting at this point drops me back into rdshell. Waiting a few minutes makes no difference.

    Doing lvm vgscan finds the volume group, but creates no device nodes. Just exiting at this point drops me back into rdshell as well.

    When I do lvm vgchange -ay, /dev/dm-0 is created, /dev/root is created as a symlink to it, as well as /dev/vg_resolve02/ with lv_root inside it, and /dev/mapper/vg_resolve02-lv_root. I don’t need to change the symlink or do anything else, if I exit after doing lvm vgchange -ay, everything is ok.

    It turns out I was wrong about dm-0 already existing, it’s created on vgchange -ay. I’m looking at the file you mention, but I’m afraid I
    don’t know LVM well enough to make that much sense of it. From what I
    can tell, it calls lvm_scan for each device, and there’s an lvm_scan.sh in there that looks like it should be doing lvchange -ay, but if dm-0
    doesn’t already exist, I don’t think this will do anything, am I wrong?

    I can try this, but it might be a bit beyond my area of expertise, I’m afraid.

    If I were to just try a brute force approach, what RPM packages should I
    reinstall/update to get all this stuff reinstalled as it was the first time I installed the system?

  • Just bumping this up, any ideas about this? It’s a little annoying not having this box boot by itself…

LEAVE A COMMENT