// RESEND // 7.6: Software RAID1 Fails The Only Meaningful Test

Home » CentOS » // RESEND // 7.6: Software RAID1 Fails The Only Meaningful Test

December 5, 2018 Lists CentOS 2 Comments

(Resend: message didn’t show, was my original message too big? Posted one of the output files to a website to see)

The point of RAID1 is to allow for continued uptime in a failure scenario. When I assemble servers with RAID1, I set up two HDDs to mirror each other, and test by booting from each drive individually to verify that it works. For the OS partitions, I use simple partitions and ext4 so it’s as simple as possible.

Using the CentOS 7.6 installer (v 1810) I cannot get this test to pass in any way, with or without LVM. Using an older installer, it works fine (v 1611) and I am able to boot from either drive but as soon as I do a yum update then it fails.

I think this may be related or the same issue reported in “LVM failure after CentOS 7.6 upgrade” since that also involves booting from a degraded RAID1
array.

This is a terrible bug.

See below for some (hopefully) useful output while in recovery mode after a failed boot.

### output of fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000c1fd0

Device Boot Start End Blocks Id System
/dev/sda1 2048 629409791 314703872 fd Linux raid autodetect
/dev/sda2 * 629409792 839256063 104923136 fd Linux raid autodetect
/dev/sda3 839256064 944179199 52461568 fd Linux raid autodetect
/dev/sda4 944179200 976773119 16296960 5 Extended
/dev/sda5 944181248 975654911 15736832 fd Linux raid autodetect

### output of cat /prod/mdstat Personalities :
md126 : inactive sda5[0](S)
15727616 blocks super 1.2

md127 : inactive sda2[0](S)
104856576 blocks super 1.2

unused devices:

### content of rdosreport.txt It’s big; see http://chico.benjamindsmith.com/rdsosreport.txt

2 thoughts on - // RESEND // 7.6: Software RAID1 Fails The Only Meaningful Test

I used my test system to test RAID failures. It has a two-disk RAID1
mirror. I pulled one drive, waited for the kernel to acknowledge the missing drive, and then rebooted. The system started up normally with just one disk (which was originally sdb).

The thing that stands out as odd, to me, is that your kernel command line includes “root=UUID=1b0d6168-50f1-4ceb-b6ac-85e55206e2d4” but that UUID doesn’t appear anywhere in the blkid output. It should, as far as I know.

Your root filesystem is in a RAID1 device that includes sda2 as a member. Its UUID is listed as an rd.md.uuid option on the command line so it should be assembled (incomplete) during boot. But I think your kernel command line should include
“root=UUID=f127cce4-82f6-fa86-6bc5-2c6b8e3f8e7a” and not
“root=UUID=1b0d6168-50f1-4ceb-b6ac-85e55206e2d4”

my procedure was to shutdown with the system “whole” – both drives working. Then, while dark, removing either disk and then starting up the server. Regardless of which drive I tried to boot on, the failure was consistent.

Except that UUID exists when both drives are present. And this, even though under an earlier CentOS version, it booted fine on either drive singly with the above procedure before doing a yum update. And to clarify my procedure:

1) Set up system with 7.3, RAID1 bare partitions.
2) Wait for mdstat sync to finish.
3) Shutdown system
4) Remove either drive
5) system boots fine
6) Resync drives
7) yum update -y to 7.6
8) shutdown system.
9) remove either drive
10) bad putty tat.

Unfortunately, I have used this same system for other tests and no longer have these UUIDs to test further. However, I can reproduce the problem to test further as soon as I have something to test.

I’m going to see if using EXT4 as the file system has any effect.

Gordon Messmer says:

December 5, 2018 at 10:07 pm

I used my test system to test RAID failures. It has a two-disk RAID1
mirror. I pulled one drive, waited for the kernel to acknowledge the missing drive, and then rebooted. The system started up normally with just one disk (which was originally sdb).

The thing that stands out as odd, to me, is that your kernel command line includes “root=UUID=1b0d6168-50f1-4ceb-b6ac-85e55206e2d4” but that UUID doesn’t appear anywhere in the blkid output. It should, as far as I know.

Your root filesystem is in a RAID1 device that includes sda2 as a member. Its UUID is listed as an rd.md.uuid option on the command line so it should be assembled (incomplete) during boot. But I think your kernel command line should include
“root=UUID=f127cce4-82f6-fa86-6bc5-2c6b8e3f8e7a” and not
“root=UUID=1b0d6168-50f1-4ceb-b6ac-85e55206e2d4”
Lists says:

December 7, 2018 at 6:15 pm

my procedure was to shutdown with the system “whole” – both drives working. Then, while dark, removing either disk and then starting up the server. Regardless of which drive I tried to boot on, the failure was consistent.

Except that UUID exists when both drives are present. And this, even though under an earlier CentOS version, it booted fine on either drive singly with the above procedure before doing a yum update. And to clarify my procedure:

1) Set up system with 7.3, RAID1 bare partitions.
2) Wait for mdstat sync to finish.
3) Shutdown system
4) Remove either drive
5) system boots fine
6) Resync drives
7) yum update -y to 7.6
8) shutdown system.
9) remove either drive
10) bad putty tat.

Unfortunately, I have used this same system for other tests and no longer have these UUIDs to test further. However, I can reproduce the problem to test further as soon as I have something to test.

I’m going to see if using EXT4 as the file system has any effect.

// RESEND // 7.6: Software RAID1 Fails The Only Meaningful Test

2 thoughts on - // RESEND // 7.6: Software RAID1 Fails The Only Meaningful Test

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta