Drive Failed In 4-drive Md RAID 10

Home » CentOS » Drive Failed In 4-drive Md RAID 10
CentOS 6 Comments

I got the email that a drive in my 4-drive RAID10 setup failed. What are my options?

Drives are WD1000FYPS (Western Digital 1 TB 3.5″ SATA).

mdadm.conf:

# mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md/root level=raid10 num-devices=4
UUID

6 thoughts on - Drive Failed In 4-drive Md RAID 10

  • Hi,

    mdadm –remove /dev/md127 /dev/sdf1

    and then the same with –add should hotremove and add dev device again.

    If it rebuilds fine it may again work for a long time.

    Simon

  • –Thanks. That reminds me: If I need to replace it, is there some easy way to figure out which drive bay is sdf? It’s an old Supermicro rack chassis with
    6 drive bays. Perhaps a way to blink the drive light?

  • Hi,

    # smartctl -all /dev/sdf

    will give you the serial number if the drive. That will be printed on the disk label. But you still have to shut down your machine and pull all drives to find it.

    best regards

  • It’s easy enough with dd. Be sure it’s the drive you want to find then put dd if=/dev/sdf of=/dev/null into a shell, but don’t run it. Look at the drives and hit enter and watch for which one lights up. Then ^c while watching to be sure the light turns off exactly when you hit it.

  • –This worked like a charm. When I added it back, it told me it was
    “re-adding” the drive, so it recognized the drive I’d just removed. I
    checked /proc/mdstat and it showed rebuilding. It took about 90 minutes to finish and is now running fine.

  • I think it’s usually like this:
    When a drive has a bad sector, the sector is then read from the other raid disk but the failing disk is marked bad. Then when rebuilding, the bad sector gets written again and the drive remaps it to a spare sector. As a result all is well again. Note that the drive firmware can handle such cases differently depending on the drive type.

    Regards, Simon