*very* Ugly Mdadm Issue

very Ugly Mdadm Issue

Home » CentOS » *very* Ugly Mdadm Issue

August 29, 2014 CentOS 16 Comments

We have a machine that’s a distro mirror – a *lot* of data, not just CentOS. We had the data on /dev/sdc. I added another drive, /dev/sdd, and created that as /dev/md4, with –missing, made an ext4 filesystem on it, and rsync’d everything from /dev/sdc.

Note that we did this on *raw*, unpartitioned drives (not my idea). I then umounted /dev/sdc, and mounted /dev/md4, and it looked fine; I added
/dev/sdc to /dev/md4, and it started rebuilding.

Then I was told to reboot it, right after the rebuild started. I don’t know if that was the problem. At any rate, it came back up… and /dev/sdc is on as /dev/md127, and no /dev/md4, nothing in /etc/mdadm.conf, and, oh, yes, mdadm -A /dev/md4 /dev/sdd mdadm: Cannot assemble mbr metadata on /dev/sdd mdadm: /dev/sdd has no superblock – assembly aborted

Oh, and mdadm -E /dev/sdd
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)

ee? A quick google says that indicates a legacy MBR, followed by an EFI….

I *REALLY* don’t want to loose all that data. Any ideas?

mark

16 thoughts on - very Ugly Mdadm Issue

Warren Young says:

September 2, 2014 at 12:42 pm

Nothing wrong with that, particularly with big “midden” volumes like this one.

*facepalm*

You forgot the primary maxim of data integrity: two is one, one is none.

When you overwrote your original copy with what you thought was a clone, you reduced yourself to a single copy again. If anything is wrong with that copy, you now have two copies of the error.

What you *should* have done is buy two drives, set them up as a new mirror, copy the data over to them, then pull the old /dev/sdc and put it on a shelf as an offline archive mirror. /dev/sdc has probably already given you its rated service life, so it’s not like you’re really saving money here. The drive has already depreciated to zero.

You’re probably going to spend more in terms of your time (salary +
benefits) to fix this than the extra drive would have cost you, and at the end of it, you still won’t have the security of that offline archive mirror.

I know this isn’t the answer you wanted, but it’s probably the answer a lot of people *wanted* to give, but chose not to, going by the crickets.
(It’s either that or the 3-day holiday weekend.)

I don’t know how much I can help you. I have always used hardware RAID
on Linux, even for simple mirrors.

I don’t see why it matters that your /dev/sdd partitioning is different from your /dev/sdc. When you told it to blast /dev/sdc with the contents of /dev/sdd, it should have copied the partitioning, too.

Are you certain /dev/sdc is partially overwritten now? What happens if you try to mount it? If it mounts, go buy that second fresh disk, then set the mirror up correctly this time.
Les Mikesell says:

September 2, 2014 at 1:02 pm

I haven’t used raw devices as members so I’m not sure I understand the scenario. However, I thought that devices over 2TB would not auto assemble so you would have to manually add the ARRAY entry for
/dev/md4 in /etc/mdadm.conf containing /dev/sdd and /dev/sdc for the system to recognize it at bootup.

But sdd _should_ have the correct data – it just isn’t being detected as a raid member. I think with smaller devices – or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.
Keith Keller says:

September 2, 2014 at 1:08 pm

Indeed–hardware RAID controllers don’t partition their drives before creating their arrays.

If it was an rsync, then partitioning would not have been copied, just the filesystem contents.

As for the OP, while this certainly doesn’t seem to be a problem with mdadm specifically (or linux md in general), the folks on the linux RAID
mailing list may be able to help you recover (I, too, seldom use linux md, and do not know it well enough to be helpful).

http://vger.kernel.org/vger-lists.html#linux-raid

–keith
Warren Young says:

September 2, 2014 at 1:14 pm

It also has the side benefit that you don’t have to worry about 4K
partition alignment. Starting with 0 means you’re always aligned.
Mark Tinberg says:

September 2, 2014 at 1:20 pm

Just to confirm that /dev/sdd is the new disk after you rebooted, the right model and serial number, drive letters are assigned based on the order the block devices are detected so can change on reboot.
says:

September 2, 2014 at 1:33 pm

I’m the OP, here….

Les Mikesell wrote:

Yeah. That was one thing I discovered. Silly me, assuming that the mdadm would create an entry in /etc/mdadm.conf. And this is not something I do more than once or twice a year, and haven’t this year (we have a good number of Dells with a PERC 7, or then there’s the JetStors….).

set

It was toast.

Both had a GPT on them, just no partitions. And that’s the thing that really puzzles me – why mdadm couldn’t find the RAID info on /dev/sdd, which *had* been just fine.

Anyway, the upshot was my manager was rather annoyed – I *should* have pulled sdc, and put in a new one, and just let that go. I still think it would have failed, given the inability of mdadm to find the info on sdd. We wound up just remaking the RAID, and rebuilding the mirror over the weekend.

mark
Les Mikesell says:

September 2, 2014 at 2:06 pm

With devices < 2TB and MBR's, you don't need /etc/mdadm.conf - the kernel just figures it all out at boot time, regardless of the disk location or detection order. I have sometimes set up single partitions as 'broken' raids just to get that autodetect/mount effect on boxes where the disks are moved around a lot because it worked long before distos started mounting by unique labels or uuids. And I miss it on big drives. I think either adding the ARRAY entry in /etc/mdadm.conf and rebooting or some invocation of mdadm could have revived /dev/md4 with /dev/sdd (and the contents you wanted) active.
says:

September 2, 2014 at 3:03 pm

Les Mikesell wrote:

Tried that. No joy.

mark
GKH says:

September 2, 2014 at 3:08 pm

Hmm, very bad idea to create a file system on the raw disk. The swap type partitions know how to handle this well but for a partition with data why take the chance that something will write the MBR there. That’s what happenned I bet.

The procedure is this:

Create a partition 1 on the new unused drive (use all space). That leaves space for the MBR.

Create the mirror on this new drive, use “–missing”
mkfs -t ext4 on the new disk

mdadm -D /dev/mdx (where x is the number of the mirror)
should show 1 drive on the mirror. cat /proc/mdstat should show same thing.

Now, copy the data from the old disk to the new
“mirrored” disk.

When done, reboot. Yes, reboot now. If something had gone very wrong you would not lose your data and you would see your data on both disks.

Ok, you rebooted, you see the data on both disks. Do fdisk the old disk. Create one partition, add the partition to the mirror, wait for sync to end and reboot again.

You should be able to see your data mirrored.

..That’s the right way!

Ok, so you did not do this and something tried to write the MBR in the raw disk and you lost all your data???

Well maybe.

Try using fsck with alternate superblocks. The first superblock should be 32.

Good luck dude.

GKH.
says:

September 2, 2014 at 3:16 pm

GKH wrote:
I know how to do this – it *is* how I started. Also, I guess you didn’t read the original post – these are 4TB drives, so no MBR, GPT only.

And my manager has taken a fancy to raw drives; not sure why.

mark
Lamar Owen says:

September 2, 2014 at 5:41 pm

Wait just a minute. How can you use the raw device but still have a GPT
on it? That doesn’t seem right, to have a GUID Partition Table but no partitions.
Joseph L. says:

September 2, 2014 at 6:36 pm

Have you never deleted all the partitions on a disk under any scheme before?
Keith Keller says:

September 3, 2014 at 12:28 am

Some reasons have already been cited in this thread. No reasons are given, but the author of md and mdadm apparently prefers raw drives too.

https://raid.wiki.kernel.org/index.php/Partition_Types

I think the take-home message from that document is: “There is no right answer – you can choose.”

–keith
Lamar Owen says:

September 4, 2014 at 9:59 am

Of course; but in the context of an MD RAID device with member devices as raw disks I would not expect a partition table of any kind, GPT or otherwise. Whether it can be there or not is not my point; it’s whether it’s expected or not.

Now, for C6 the default RAID superblock is version 1.2; but if you were to create a version 1.1 superblock it would go on the very first sector of the raw device, and would overwrite the partition table. (The 1.2
superblock goes 4K in from the first sector; prior to 1.1 the superblock went to the last sector of the drive).

Of course, ext4 at least for block group 0 skips the first 1k bytes…..
Les Mikesell says:

September 4, 2014 at 12:35 pm

Does that mean autodetection/assembly would be possible with 1.2 but not 1.1? I’ve always considered that to be one of the best features of software raid.

How does this mesh with the ability to mount a RAID1 member as a normal non-raid partition? I’ve done that for data recovery but never knew if it was safe to write that way.
Lamar Owen says:

September 4, 2014 at 2:56 pm

Don’t know. Try it and let us know…..

Good question; try it and let us know. I have never tried it.