EL9/udev Generates Wrong Device Nodes/symlinks With HPE Smart Array Controller
Hi,
I see some strange and dangerous things happening on a HPE server with HPE
Smart Array controller where EL9 ends up with wrong device nodes/symlinks to the attached disks/raid volumes:
(I didn’t touch anything here but at 08:09 some symlinks were changed)
/dev/disk/by-id/:
lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000 ->
../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57
scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar 1 07:57
scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_01000000 ->
../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 scsi-0HP_LOGICAL_VOLUME_02000000 ->
../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_03000000 ->
../../sdd lrwxrwxrwx 1 root root 9 Mar 1 08:09
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 07:57
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar 1 07:57
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdc2
/dev/disk/by-path/:
lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0 ->
../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part1
-> ../../sdc1
lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part2
-> ../../sdc2
lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:1 ->
../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 pci-0000:03:00.0-scsi-0:1:0:2 ->
../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:3 ->
../../sdd
After rebooting, the things are different but also wrong:
(here nothing has changed after boot but symlinks are already wrong)
/dev/disk/by-id/:
lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000
-> ../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56
scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Mar 1 10:56
scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_01000000
-> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_02000000
-> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_03000000
-> ../../sdc lrwxrwxrwx 1 root root 9 Mar 1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Mar 1 10:56
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
/dev/disk/by-path/:
lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0 ->
../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56
pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Mar 1 10:56
pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:1 ->
../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:2 ->
../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:3 ->
../../sdc
Note that two things are strange:
1) the /dev/sd* nodes are in a random order after every restart.
# lsscsi
[1:0:0:0] storage HP P410i 6.64 –
[1:1:0:0] disk HP LOGICAL VOLUME 6.64 /dev/sdb
[1:1:0:1] disk HP LOGICAL VOLUME 6.64 /dev/sda
[1:1:0:2] disk HP LOGICAL VOLUME 6.64 /dev/sdd
[1:1:0:3] disk HP LOGICAL VOLUME 6.64 /dev/sdc
2) some symlinks created by udev are just wrong and therefore very dangerous to use:
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1
scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2
While 1 may be expected(???) I think 2 should really not happen.
I’ve tried to find out where things go wrong but the whole udev stuff started to hurt my brain :)
I’m quite sure HPE Smart Array based servers are quite common so my big question is: do others see that same?
While it’s possible to live with this mess I’d really like to fix it somehow.
Thanks, Simon
3 thoughts on - EL9/udev Generates Wrong Device Nodes/symlinks With HPE Smart Array Controller
Simon Matter
I think it maybe caused by sd driver asynchronous scanning.
I am lucky that I didn’t see this before. nvme may have similar issues, but nvme has boot parameter to avoid it.
Suse has boot parameter to avoid it.
with EL9 we will wait until EL 9.3 if we are lucky.
I had report issue: https://bugzilla.redhat.com/show_bug.cgi?id!40017
Hi,
Thanks for confirming that I’m not alone with this “feature”
In the above example, it’s much fun if you want to wipe the two partitions on
/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 and therefore wipe this device. You end up wiping the wrong disk!
When I see such things my blood start boiling :(
Regards, Simon
Simon Matter
so I said I am lucky that my storage controllers didn’t have such behavior. nvme has similar situation so there are people who destroy the wrong drive. https://github.com/linux-nvme/nvme-cli/issues/501