*very* Ugly Mdadm Issue

Home » CentOS » *very* Ugly Mdadm Issue
CentOS 16 Comments

We have a machine that’s a distro mirror – a *lot* of data, not just CentOS. We had the data on /dev/sdc. I added another drive, /dev/sdd, and created that as /dev/md4, with –missing, made an ext4 filesystem on it, and rsync’d everything from /dev/sdc.

Note that we did this on *raw*, unpartitioned drives (not my idea). I then umounted /dev/sdc, and mounted /dev/md4, and it looked fine; I added
/dev/sdc to /dev/md4, and it started rebuilding.

Then I was told to reboot it, right after the rebuild started. I don’t know if that was the problem. At any rate, it came back up… and /dev/sdc is on as /dev/md127, and no /dev/md4, nothing in /etc/mdadm.conf, and, oh, yes, mdadm -A /dev/md4 /dev/sdd mdadm: Cannot assemble mbr metadata on /dev/sdd mdadm: /dev/sdd has no superblock – assembly aborted

Oh, and mdadm -E /dev/sdd
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)

ee? A quick google says that indicates a legacy MBR, followed by an EFI….

I *REALLY* don’t want to loose all that data. Any ideas?

mark

16 thoughts on - *very* Ugly Mdadm Issue

  • Nothing wrong with that, particularly with big “midden” volumes like this one.

    *facepalm*

    You forgot the primary maxim of data integrity: two is one, one is none.

    When you overwrote your original copy with what you thought was a clone, you reduced yourself to a single copy again. If anything is wrong with that copy, you now have two copies of the error.

    What you *should* have done is buy two drives, set them up as a new mirror, copy the data over to them, then pull the old /dev/sdc and put it on a shelf as an offline archive mirror. /dev/sdc has probably already given you its rated service life, so it’s not like you’re really saving money here. The drive has already depreciated to zero.

    You’re probably going to spend more in terms of your time (salary +
    benefits) to fix this than the extra drive would have cost you, and at the end of it, you still won’t have the security of that offline archive mirror.

    I know this isn’t the answer you wanted, but it’s probably the answer a lot of people *wanted* to give, but chose not to, going by the crickets.
    (It’s either that or the 3-day holiday weekend.)

    I don’t know how much I can help you. I have always used hardware RAID
    on Linux, even for simple mirrors.

    I don’t see why it matters that your /dev/sdd partitioning is different from your /dev/sdc. When you told it to blast /dev/sdc with the contents of /dev/sdd, it should have copied the partitioning, too.

    Are you certain /dev/sdc is partially overwritten now? What happens if you try to mount it? If it mounts, go buy that second fresh disk, then set the mirror up correctly this time.

  • I haven’t used raw devices as members so I’m not sure I understand the scenario. However, I thought that devices over 2TB would not auto assemble so you would have to manually add the ARRAY entry for
    /dev/md4 in /etc/mdadm.conf containing /dev/sdd and /dev/sdc for the system to recognize it at bootup.

    But sdd _should_ have the correct data – it just isn’t being detected as a raid member. I think with smaller devices – or at least devices with smaller partitions and FD type in the MBR it would have worked automatically with the kernel autodetect.

  • Indeed–hardware RAID controllers don’t partition their drives before creating their arrays.

    If it was an rsync, then partitioning would not have been copied, just the filesystem contents.

    As for the OP, while this certainly doesn’t seem to be a problem with mdadm specifically (or linux md in general), the folks on the linux RAID
    mailing list may be able to help you recover (I, too, seldom use linux md, and do not know it well enough to be helpful).

    http://vger.kernel.org/vger-lists.html#linux-raid

    –keith

  • It also has the side benefit that you don’t have to worry about 4K
    partition alignment. Starting with 0 means you’re always aligned.

  • Just to confirm that /dev/sdd is the new disk after you rebooted, the right model and serial number, drive letters are assigned based on the order the block devices are detected so can change on reboot.

  • I’m the OP, here….

    Les Mikesell wrote:

    Yeah. That was one thing I discovered. Silly me, assuming that the mdadm would create an entry in /etc/mdadm.conf. And this is not something I do more than once or twice a year, and haven’t this year (we have a good number of Dells with a PERC 7, or then there’s the JetStors….).

    set

    It was toast.

    Both had a GPT on them, just no partitions. And that’s the thing that really puzzles me – why mdadm couldn’t find the RAID info on /dev/sdd, which *had* been just fine.

    Anyway, the upshot was my manager was rather annoyed – I *should* have pulled sdc, and put in a new one, and just let that go. I still think it would have failed, given the inability of mdadm to find the info on sdd. We wound up just remaking the RAID, and rebuilding the mirror over the weekend.

    mark

  • With devices < 2TB and MBR's, you don't need /etc/mdadm.conf - the kernel just figures it all out at boot time, regardless of the disk location or detection order. I have sometimes set up single partitions as 'broken' raids just to get that autodetect/mount effect on boxes where the disks are moved around a lot because it worked long before distos started mounting by unique labels or uuids. And I miss it on big drives. I think either adding the ARRAY entry in /etc/mdadm.conf and rebooting or some invocation of mdadm could have revived /dev/md4 with /dev/sdd (and the contents you wanted) active.

  • Hmm, very bad idea to create a file system on the raw disk. The swap type partitions know how to handle this well but for a partition with data why take the chance that something will write the MBR there. That’s what happenned I bet.

    The procedure is this:

    Create a partition 1 on the new unused drive (use all space). That leaves space for the MBR.

    Create the mirror on this new drive, use “–missing”
    mkfs -t ext4 on the new disk

    mdadm -D /dev/mdx (where x is the number of the mirror)
    should show 1 drive on the mirror. cat /proc/mdstat should show same thing.

    Now, copy the data from the old disk to the new
    “mirrored” disk.

    When done, reboot. Yes, reboot now. If something had gone very wrong you would not lose your data and you would see your data on both disks.

    Ok, you rebooted, you see the data on both disks. Do fdisk the old disk. Create one partition, add the partition to the mirror, wait for sync to end and reboot again.

    You should be able to see your data mirrored.

    ..That’s the right way!

    Ok, so you did not do this and something tried to write the MBR in the raw disk and you lost all your data???

    Well maybe.

    Try using fsck with alternate superblocks. The first superblock should be 32.

    Good luck dude.

    GKH.

  • GKH wrote:
    I know how to do this – it *is* how I started. Also, I guess you didn’t read the original post – these are 4TB drives, so no MBR, GPT only.

    And my manager has taken a fancy to raw drives; not sure why.

    mark

  • Wait just a minute. How can you use the raw device but still have a GPT
    on it? That doesn’t seem right, to have a GUID Partition Table but no partitions.

  • Of course; but in the context of an MD RAID device with member devices as raw disks I would not expect a partition table of any kind, GPT or otherwise. Whether it can be there or not is not my point; it’s whether it’s expected or not.

    Now, for C6 the default RAID superblock is version 1.2; but if you were to create a version 1.1 superblock it would go on the very first sector of the raw device, and would overwrite the partition table. (The 1.2
    superblock goes 4K in from the first sector; prior to 1.1 the superblock went to the last sector of the drive).

    Of course, ext4 at least for block group 0 skips the first 1k bytes…..

  • Does that mean autodetection/assembly would be possible with 1.2 but not 1.1? I’ve always considered that to be one of the best features of software raid.

    How does this mesh with the ability to mount a RAID1 member as a normal non-raid partition? I’ve done that for data recovery but never knew if it was safe to write that way.

LEAVE A COMMENT