Replacing SW RAID-1 With SSD RAID-1

Home » CentOS » Replacing SW RAID-1 With SSD RAID-1

November 23, 2020 Frank Bures CentOS 23 Comments

Hi,

I want to replace my hard drives based SW RAID-1 with SSD’s.

What would be the recommended procedure? Can I just remove one drive, replace with SSD and rebuild, then repeat with the other drive?

Thanks Frank

23 thoughts on - Replacing SW RAID-1 With SSD RAID-1

Simon Matter says:

November 23, 2020 at 9:46 am

I suggest to “mdadm –fail” one drive, then “mdadm –remove” it. After replacing the drive you can “mdadm –add” it.

If you boot from the drives you also have to care for the boot loader. I
guess this depends on how exactly the system is configured.

Regards, Simon
Frank Bures says:

November 23, 2020 at 9:49 am

Thanks, that’s what I had in mind. Of course, I will rebuild grab2 after each iteration.

Thanks Fra
Phil Perry says:

November 23, 2020 at 10:04 am

You could also grow the array to add in the new devices before removing the old HDDs ensuring you retain at least 2 devices in the array at any one time. For example, in an existing raid of sda1 and sdb1, add in sdc1
before removing sda1 and add sdd1 before removing sdb1, finally shrinking the array back to 2 devices:

mdadm –grow /dev/md127 –level=1 –raid-devices=3 –add /dev/sdc1
mdadm –fail /dev/md127 /dev/sda1
mdadm –remove /dev/md127 /dev/sda1
mdadm /dev/md127 –add /dev/sdd1
mdadm –fail /dev/md127 /dev/sdb1
mdadm –remove /dev/md127 /dev/sdb1
mdadm –grow /dev/md127 –raid-devices=2

then reinstall grub to sdc and sdd once everything has fully sync’d:

blockdev –flushbufs /dev/sdc1
blockdev –flushbufs /dev/sdd1
grub2-install –recheck /dev/sdc grub2-install –recheck /dev/sdd
says:

November 23, 2020 at 10:09 am

then grow, add, wait, fail, remove, shrink. That way you will never loose redundancy…

# grow and add new disk

mdadm –grow -n 3 /dev/mdX -a /dev/…

# wait for rebuild of the array

mdadm –wait /dev/mdX

# fail old disk

mdadm –fail /dev/sdY

# remove old disk

mdadm /dev/mdX –remove /dev/sdY

# add second disk

mdadm /dev/mdX –add /dev/…

# wait

mdadm –wait /dev/mdX

# fail and remove old disk

mdadm –fail /dev/sdZ

mdadm /dev/mdX –remove /dev/sdZ

# shrink

mdadm –grow -n 2 /dev/mdX

peter
Ralf Prengel says:

November 23, 2020 at 10:16 am

Backup!!!!!!!!

Von meinem iPhone gesendet
says:

November 23, 2020 at 12:12 pm

You do have a recent backup available anyway, haven’t you? That is: Even without planning to replace disks. And testing such strategies/sequences using loopback devices is definitely a good idea to get used to the machinery…

On a side note: I have had a fair number of drives die on me during RAID-rebuild so I would try to avoid (if at all possible) to deliberately reduce redundancy just for a drive swap. I have never had a problem (yet) due to a problem with the RAID-1 kernel code itself. And:
If you have to change a disk because it already has issues it may be dangerous to do a backup – especially if you do a file based backups –
because the random access pattern may make things worse. Been there, done that…

peter
Simon Matter says:

November 24, 2020 at 1:20 am

Sure, and for large disks I even go further: don’t put the whole disk into one RAID device but build multiple segments, like create 6 partitions of same size on each disk and build six RAID1s out of it. So, if there is an issue on one disk in one segment, you don’t lose redundancy of the whole big disk. You can even keep spare segments on separate disks to help in case where you can not quickly replace a broken disk. The whole handling is still very easy with LVM on top.

Regards, Simon
Valeri Galtsev says:

November 24, 2020 at 10:37 am

Oh, boy, what a mess this will create! I have inherited a machine which was set up by someone with software RAID like that. You need to replace one drive, other RAIDs which that drive’s other partitions are participating are affected too.

Now imagine that somehow at some moment you have several RAIDs each of them is not redundant, but in each it is partition from different drive that is kicked out. And now you are stuck unable to remove any of failed drives, removal of each will trash one or another RAID (which are not redundant already). I guess the guy who left me with this setup listened to advises like the one you just gave. What a pain it is to deal with any drive failure on this machine!!

It is known since forever: The most robust setup is the simplest one.

One can do a lot of fancy things, splitting things on one layer, then joining them back on another (by introducing LVM)… But I want to repeat it again:

The most robust setup is the simplest one.

Valeri
Kenneth Porter says:

November 24, 2020 at 10:44 am

–Does it make sense to dd or ddrescue from the removed drive to the replacement? My md RAID set is on primary partitions, not raw drives, so I’m assuming the replacement drive needs at least the boot sector from the old drive to copy the partition data.
Stephen John says:

November 24, 2020 at 10:51 am

I used to do something like this (but because there isn’t enough detail in the above I am not sure if we are talking the same thing). On older disks having RAID split over 4 disks with / /var /usr /home allowed for longer redundancy because drive 1 could have a ‘failed’ /usr but drive 0,2,3,4
were ok and the rest all worked n full mode because /, /var, /home/, were all good. This was because most of the data on /usr would be in a straight run on each disk. The problem is that a lot of modern disks do not guarantee that data for any partition will be really next to each other on the disk. Even before SSD’s did this for wear leveling a lot of disks did this because it was easier to allow the full OS which runs in the Arm chip on the drive do all the ‘map this sector the user wants to this sector on the disk’ in whatever logic makes sense for the type of magnetic media inside. There is also a lot of silent rewriting going on the disks with the real capacity of a drive can be 10-20% bigger with those sectors slowly used as failures in other areas happen. When you start seeing errors, it means that the drive has no longer any safe sectors and probably has written /usr all over the disk in order to try to keep it going as long as it could.. the rest of the partitions will start failing very quickly afterwards.

Not all disks do this but a good many of them do from commercial SAS to commodity SATA.. and a lot of the ‘Red’ and ‘Black’ NAS drives are doing this also..

While I still use partition segments to spread things out, I do not do so for failure handling anymore. And if what I was doing isn’t what the original poster was meaning I look forward to learning it.
Simon Matter says:

November 24, 2020 at 11:05 am

I understand that, I also like keeping things simple (KISS).

Now, in my own experience, with these multi terabyte drives today, in 95%
of the cases where you get a problem it is with a single block which can not be read fine. A single write to the sector makes the drive remap it and problem is solved. That’s where a simple resync of the affected RAID
segment is the fix. If a drive happens to produce such a condition once a year, there is absolutely no reason to replace the drive, just trigger the remapping of the bad sector and and drive will remember it in the internal bad sector map. This happens all the time without giving an error to the OS level, as long as the drive could still read and reconstruct the correct data.

In the 5% of cases where a drive really fails completely and needs replacement, you have to resync the 10 RAID segments, yes. I usually do it with a small script and it doesn’t take more than some minutes.

The good things is that LVM has been so stable for so many years that I
don’t think twice about this one more layer. Why is a layered approach worse than a fully included solution like ZFS? The tools differ but some complexity always remains.

That’s how I see it, Simon
Simon Matter says:

November 24, 2020 at 11:07 am

I usually dd the first mb to the new disk if it’s used for booting, yes.

Simon
Simon Matter says:

November 24, 2020 at 11:17 am

I don’t do it the same way on every system. But, on large multi TB system with 4+ drives, doing segmented raid has helped very often. There is one more thing: I always try to keep spare segments. Now, when a problem hows up, the first thing is to pvmove the broken raid data to wherever there is free space. One command and some minutes later the system is again fully redundant. LVM is really nice for such things as you can move filesystems around as long as they share the same VG. I also use LVM to optimize storage by moving things to faster or slower disks after adding storage or replacing it.

Regards, Simon
Valeri Galtsev says:

November 24, 2020 at 11:20 am

It is one story if you administer one home server. It is quite different is you administer a couple of hundreds of them, like I do. And just 2-3
machines set up in such a disastrous manner as I just described suck
10-20 times more of my time each compared to any other machine – the ones I configured hardware for myself, and set up myself, then you are entitled to say what I said.

Hence the attitude.

Keep things simple, so they do not suck your time – if you do it for living.

But if it is a hobby of yours – the one that takes all your time, and gives you a pleasure just to fiddle with it, then it’s your time, and your pleasure, do it the way to get more of it ;-)

Valeri
John Pierce says:

November 24, 2020 at 11:32 am

zpool create newpool mirror sdb sdc mirror sdd sde mirror sdf sdg mirror sdh sdi spare sdj sdk zfs create -o mountpoint=/var/lib/pgsql-11 newpool/postgres11

and done.
says:

November 24, 2020 at 11:44 am

This *might* be a valid answer if zfs was supported on plain CentOS…
(and if the question hadn’t involved an existing RAID ;-) ). Or did I
miss something?

peter
Warren Young says:

November 24, 2020 at 11:48 am

Since we’re talking about CentOS, “support” here must mean community support, as opposed to commercial support, so:

https://openzfs.github.io/openzfs-docs/Getting%20Started/RHEL%20and%20CentOS.html
Simon Matter says:

November 24, 2020 at 12:44 pm

Your assumptions about my work environment are quite wrong.

It was a hobby 35 years ago coding in assembler and designing PCBs for computer extensions.

Simon
Valeri Galtsev says:

November 24, 2020 at 1:15 pm

Great, then you are much mightier than I am in managing fast something set up very sophisticated way. That is amazing: managing sophisticated things the same fast as managing simple straightforward things ;-)

I also noticed one more sophistication you do: you always strip off the name of the poster you reply to. ;-)

Oh, great, we are the same of a kind. I did design electronics, and made PCBs both as hobby and for living. And I still do it as a hobby. I also did programming both as hobby and for living. The funniest was: I wrote for single board Z-80 processor based computer: assembler, disassembler, and emulator (that emilated what that Z-80 will do running some program). I did it on Wang 2200 (actually replica of such), and I
programmed it, believe it or not, in Basic. That was the only language available for us on that machine. The ugly simple interpretive language with all variables global…

But now I’m sysadmin. And – for me at least – the simplest possible setup is the one that will be most robust. And it will be the easiest and fastest to maintain (both for me or for someone else if one steps in to do it instead of me).

Valeri
Warren Young says:

November 24, 2020 at 1:19 pm

Just one reason is that you lose visibility of lower-level elements from the top level.

You gave the example of a bad block in a RAID. What current RHEL type systems can’t tell you when that happens is which file is affected.

ZFS not only can tell you that, deleting or replacing the file will fix the array. That’s the bottom-most layer (disk surface) telling the top-most layer (userspace) there’s a problem, and user-space fixing it by telling the bottom-most layer to check again.

Because ZFS is CoW, this isn’t forcing the drive to rewrite that sector, it’s a new set of sectors being brought into use, and the old ones released. The sector isn’t retried until the filesystem reassigns those sectors.

Red Hat is attempting to fix all this with Stratis, but it’s looking to take years and years for them to get there. ZFS is ready today.

In my experience, ZFS hides a lot of complexity, and it is exceedingly rare to need to take a peek behind the curtains.

(And if you do, there’s the zdb command.)
Jonathan Billings says:

November 24, 2020 at 1:45 pm

I disagree.

It is ready today only if you are willing to abandon Linux entirely and switch to BSD, or run a Linux distro like Ubuntu that is possibly violating a license. 3rd-party repositories that use dkms can be dangerous for a storage service, and I’d prefer to keep compilers out of my servers.

I’m not willing to move away from CentOS and am ethically bound not to violate the GPL. I would say that unless the ZFS project can fix their license, then it would be ready for Linux.

At least with Stratis, there’s an attempt to work within the Linux world. I’m excited to see Fedora making btrfs as the default root filesystem, too.
Roberto Ragusa says:

November 26, 2020 at 3:53 am

Same setup I’ve been using for 15 years at least. Just have a standard partition size and keep using that (or multiple of that, e.g. 256GiB, then 512GiB, than 1024MiB), so to keep numbers down.

Best regards.
Simon Matter says:

November 26, 2020 at 7:36 am

Thanks for sharing! Interesting to hear that some people did the same or similar things as I did without knowing from each other.

IIRC initially I started to do this when I got a server with different disk sizes and different paths to the disks. Think of some 18G disks, some
36G, some 73G and also some 146G. Now, if you have to make the storage redundant for disk failures and also for single path failures, you get creative how to cut the larger disks into slices and spread the mirror pairs over the paths.

It proved to be quite flexible in the end and still allowed the extension of the storage without any downtime. Needless to say that the expensive hardware RAID controllers have been removed from the box and replaced by simple SCSI controllers – because the hardware just couldn’t do what was required here.

Regards, Simon

Replacing SW RAID-1 With SSD RAID-1

23 thoughts on - Replacing SW RAID-1 With SSD RAID-1

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta