Ssacli Start Rebuild?

Home » CentOS » Ssacli Start Rebuild?
CentOS 27 Comments

Hi,

is there a way to rebuild an array using ssacli with a P410?

A failed disk has been replaced and now the array is not rebuilding like it should:

Array A (SATA, Unused Space: 1 MB)

logicaldrive 1 (14.55 TB, RAID 1+0, Ready for Rebuild)

physicaldrive 1I:0:1 (port 1I:box 0:bay 1, SATA HDD, 4 TB, OK)
physicaldrive 1I:0:2 (port 1I:box 0:bay 2, SATA HDD, 4 TB, OK)
physicaldrive 1I:0:3 (port 1I:box 0:bay 3, SATA HDD, 4 TB, OK)
physicaldrive 1I:0:4 (port 1I:box 0:bay 4, SATA HDD, 8 TB, OK)
physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA HDD, 4 TB, OK)
physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA HDD, 4 TB, OK)
physicaldrive 2I:0:7 (port 2I:box 0:bay 7, SATA HDD, 4 TB, OK)
physicaldrive 2I:0:8 (port 2I:box 0:bay 8, SATA HDD, 4 TB, OK)

I’d expect the rebuild to start automatically after 1I:0:4 was replaced. Is the new drive being larger than the old one (4–>8) causing issues?

27 thoughts on - Ssacli Start Rebuild?

  • Am Fr., 6. Nov. 2020 um 00:52 Uhr schrieb hw :

    Have you checked the rebuild priority:

    ❯ ssacli ctrl slot=0 show config detail | grep “Rebuild Priority”
    ~
    Rebuild Priority: Medium

    Slot needs to be adjusted to your configuration.

    Kind regards Thomas

    Linux … enjoy the ride!

  • Yes, I’ve set it to high:

    ssacli ctrl slot=3 show config detail | grep Prior
    Rebuild Priority: High
    Expand Priority: Medium

    Some search results indicate that it’s possible that other disks in the array have read errors and might prevent rebuilding for RAID 5. I don’t know if there are read errors, and if it’s read errors, I think it would mean that these errors would have to affect just the disk which is mirroring the disk that failed, this being a RAID 1+0. But if the RAID
    is striped across all the disks, that could be any or all of them.

    The array is still in production and still works, so it should just rebuild. Now the plan is to use another 8TB disk once it arrives, make a new RAID 1
    with the two new disks and copy the data over. The remaining 4TB disks can then be used to make a new array.

    Learn from this that it can be a bad idea to use a RAID 0 for backups and that least one generation of backups must be on redundant storage …

  • Am Fr., 6. Nov. 2020 um 20:38 Uhr schrieb hw :

    Just checked on one of my HP boxes, you can indeed not figure out if one of the discs has read errors. Do you have the option to reboot the box and check on the controller directly?

    Kind regards Thomas

  • Thanks! The controller (it’s BIOS) doesn’t show up during boot, so I can’t check there for errors.

    The controller is extremely finicky: The plan to make a RAID 1 from the two new drives has failed because the array with the failed drive is unusable when the failed is missing entirely.

    In the process of moving the 8TB drives back and forth, it turned out that when an array that was made from them is missing one drive, that array is unusable — and when putting the missing drive is put back in, the array remains ‘Ready for Rebuild’ without the rebuild starting. There is also no way to delete an array that is missing a drive.

    So the theory that the array isn’t being rebuilt because other disks have errors is likely wrong. That means that whenenver a disk fails and is being replaced, there is no way to rebuild the array (unless it would happen automatically, which it doesn’t).

    With this experience, these controllers are now deprecated. RAID controllers that can’t rebuild an array after a disk has failed and has been replaced are virtually useless.

  • Am Mi., 11. Nov. 2020 um 07:28 Uhr schrieb hw :

    HW RAID is often delivered with quite limited functionality. Because of this I switched in most cases to software RAID meanwhile and configured the HW RAID as JBOD. The funny thing is, when you use the discs previously used in the HW RAID in such a scenario, the software RAID detects them as RAID
    disks. It looks like a significant amount of HW RAID controllers use the Linux software RAID code in their firmware.

    Kind regards Thomas

  • I have yet to see software RAID that doesn’t kill the performance. And where do you get cost-efficient cards that can do JBOD? I don’t have any.

    It turned out that the controller does not rebuild the array even with a disk that is the same model and capacity as the others. What has HP been thinking?

  • When was the last time you tried it?

    Why would you expect that a modern 8-core Intel CPU would impede I/O in any measureable way as compared to the outdated single-core 32-bit RISC CPU typically found on hardware RAID cards? These are the same CPUs, mind, that regularly crunch through TLS 1.3 on line-rate fiber Ethernet links, a much tougher task than mediating spinning disk I/O.

    $69, 8 SATA/SAS ports: https://www.newegg.com/p/0ZK-08UH-0GWZ1

    Search for “LSI JBOD” for tons more options. You may have to fiddle with the firmware to get it to stop trying to do clever RAID stuff, which lets you do smart RAID stuff like ZFS instead.

    That the hardware vs software RAID argument is over in 2020.

  • the only ‘advantage’ hardware raid has is write-back caching.

    with ZFS you can get much the same performance boost out of a small fast SSD used as a ZIL / SLOG.

  • Just for my information: how do you map failed software RAID drive to physical port of, say, SAS-attached enclosure. I’d love to hot replace failed drives in software RAIDs, have over hundred physical drives attached to a machine. Do not criticize, this is box installed by someone else, I have “inherited” it.To replace I have to query drive serial number, power off the machine and pulling drives one at a time read the labels…

    With hardware RAID that is not an issue, I always know which physical port failed drive is in. And I can tell controller to “indicate” specific drive (it blinks respective port LED). Always hot replacing drives in hardware RAIDs, no one ever knows it has been done. And I’d love to deal same way with drives in software RAIDs.

    Thanks for advises in advance. And my apologies for “stealing the thread”

    Valeri

  • I’m sure you can reflash LSI card to make it SATA or SAS HBA, or MegaRAD hardware RAID adapter. Is far as I recollect it is the same electronics board. I reflashed a couple of HBAs to make them MegaRAID boards.

    One thing though bothers me about LSI, now after last it was bought by Intel its future faith worries me. Intel pushed 3ware which it acquired in the same package with LSI already into oblivion…

    Valeri

  • I’d rather have distributed redundant storage on multiple machines… but I still have [mostly] hardware RAIDs ;-)

    Valeri

  • With ZFS, you set a partition label on the whole-drive partition pool member, then mount the pool with something like “zpool mount -d /dev/disk/by-partlabel”, which then shows the logical disk names in commands like “zpool status” rather than opaque “/dev/sdb3” type things.

    It is then up to you to assign sensible drive names like “cage-3-left-4” for the 4th drive down on the left side of the third drive cage. Or, maybe your organization uses asset tags, so you could label the disk the same way, “sn123456”, which you find by looking at the front of each slot.

  • in large raids, I label my disks with the last 4 or 6 digits of the drive serial number (or for SAS disks, the WWN). this is visible via smartctl, and I record it with the zpool documentation I keep on each server
    (typically a text file on a cloud drive). zpools don’t actually care WHAT slot a given pool member is in, you can shut the box down, shuffle all the disks, boot back up and find them all and put them back in the pool.

    the physical error reports that proceed a drive failure should list the drive identification beyond just the /dev/sdX kind of thing, which is subject to change if you add more SAS devices.

    I once researched what it would take to implement the drive failure lights on the typical brand name server/storage chassis, there’s a command for manipulating SES devices such as those lights, the catch is figuring out the mapping between the drives and lights, its not always evident, so would require trial and error.


    -john r pierce
    recycling used bits in santa cruz

  • you can reflash SOME megaraid cards to put them in IT ‘hba’ mode, but not others.

    Its Avago, formerly Aligent, and before that HP, which bought LSI, 3Ware, and then Broadcom, and renamed itself Broadcom.


    -john r pierce
    recycling used bits in santa cruz

  • Oops, I’m mixing the zpool and zfs commands. It’d be “zpool import”.

    And you do this just once: afterward, the automatic on-boot import brings the drives back in using the names they had before, so when you’ve got some low-skill set of remote hands in front of the machine, and you’re looking at a failure indication in zpool status, you just say “Swap out the drive in the third cage, left side, four slots down.”

  • I am apparently wrong, at least about LSI, it still belongs to broadcom, thanks!

    Long before broadcom acquired LSI and 3ware, I was awfully displeased by their WiFi chip: infamous BCM43xx. It is 32 bit chip sitting on 64 bit bus. No [sane] open source programmer will be happy to write driver for that. For ages we were using ndis wrapper…. As much as I disliked broadcom for their wireless chipset, I loved them for their ethernet one. And I recollect this was long ago before acquisition by broadcom of LSI and 3ware. Or am I wrong?

    Valeri

  • I get info about software RAID failure from cronjob executing raid-check (coming with mdadm rpm). I can get S/N of failed drive (they are not dead-dead, still one query one) using smartctl, but I am too lazy to have all serial numbers of drives printed and affixed to fronts of drive trays… but so far I see no other way ;-(

    Valeri

  • There are different methods depending on how the disks are attached. In some cases you can use a tool to show the corresponding disk or slot. Otherwise, once you have hot removed the drive from the RAID, you can either dd to the broken drive or make some traffic on the still working RAID and you’ll see the disk immediately when looking at the disks busy LEDs.

    I’ve used Linux Software RAID during the last two decades and it has always worked nicely while I started to hate hardware RAID more and more. Now with U.2 NVMe SSD drives, at least when we started using them, there were no RAID controllers available at all. And performance with Linux Software RAID1 on AMD EPYC boxes is amazing :-)

    Regards, Simon

  • I’m currently using it, and the performance sucks. Perhaps it’s not the software itself or the CPU but the on-board controllers or other components being incable handling multiple disks in a software raid. That’s something I can’t verify.

    It doesn’t matter what I expect.

    That says it’s for HP. So will you still get firmware updates once the warranty is expired? Does it exclusively work with HP hardware?

    And are these good?

    Do you have a reference for that, like a final statement from HP?
    Did they stop developing RAID controllers, or do they ship their servers now without them and tell customers to use btrfs or mdraid?

  • That specific card is a bad choice, it’s the very obsolete SAS1068E chip, which was SAS 1.0, with max 2gb per disk.

    Cards based on the SAS 2008, 2308, and 3008 chips are a much better choice.

    Any oem card with these chips can be flashed with generic LSI/Broadcom IT
    firmware.

  • HPE and the other large vendors won’t tell you directly because they love to sell you their outdated SAS/SATA Raid stuff. They were quite slow to introduce NVMe storage, be it as PCIe cards or U.2 format, but it’s also clear to them that NVMe is the future and that it’s used with software redundancy provided by MDraid, ZFS, Btrfs etc. Just search for HPE’s
    4AA4-7186ENW.pdf file which also mentions it.

    In fact local storage was one reason why we turned away from HPE and Dell after many years because we just didn’t want to invest in outdated technology.

    Regards, Simon

  • Be specific. Give chip part numbers, drivers used, whether this is on-board software RAID or something entirely different like LVM or MD RAID, etc. For that matter, I don’t even see that you’ve identified whether this is CentOS 6, 7 or 8. (I hope it isn’t older!)

    Sure you can. Benchmark RAID-0 vs RAID-1 in 2, 4, and 8 disk arrays.

    In a 2-disk array, a proper software RAID system should give 2x a single disk’s performance for both read and write in RAID-0, but single-disk write performance for RAID-1.

    Such values should scale reasonably as you add disks: RAID-0 over 8 disks gives 8x performance, RAID-1 over 8 disks gives 4x write but 8x read, etc.

    These are rough numbers, but what you’re looking for are failure cases where it’s 1x a single disk for read or write. That tells you there’s a bottleneck or serialization condition, such that you aren’t getting the parallel I/O you should be expecting.

    It *does* matter if you know what the hardware’s capable of.

    TLS is a much harder problem than XOR checksumming for traditional RAID, yet it imposes [approximately zero][1] performance penalty on modern server hardware, so if your CPU can fill a 10GE pipe with TLS, then it should have no problem dealing with the simpler calculations needed by the ~2 Gbit/sec flat-out max data rate of a typical RAID-grade 4 TB spinning HDD.

    Even with 8 in parallel in the best case where they’re all reading linearly, you’re still within a small multiple of the Ethernet case, so we should still expect the software RAID stack not to become CPU-bound.

    And realize that HDDs don’t fall into this max data rate case often outside of benchmarking. Once you start throwing ~5 ms seek times into the mix, the CPU’s job becomes even easier.

    [1]: https://stackoverflow.com/a/548042/142454

    You asked for “cost-efficient,” which I took to be a euphemism for “cheapest thing that could possibly work.”

    If you’re willing to spend money, then I fully expect you can find JBOD cards you’ll be happy with.

    Personally, I get servers with enough SFF-8087 SAS connectors on them to address all the disks in the system. I haven’t bothered with add-on SATA cards in years.

    I use ZFS, so absolute flat-out benchmark speed isn’t my primary consideration. Data durability and data set features matter to me far more.

    Since I’m not posting from an hpe.com email address, I think it’s pretty obvious that that is my opinion, not an HP corporate statement.

    I base it on observing the Linux RAID market since the mid-90s. The massive consolidation for hardware RAID is a big part of it. That’s what happens when a market becomes “mature,” which is often the step just prior to “moribund.”

    Were you under the impression that HP was trying to provide you the best possible technology for all possible use cases, rather than make money by maximizing the ratio of cash in vs cash out?

    Just because they’re serving it up on a plate doesn’t mean you hafta pick up a fork.

  • Thanks! That’s probably why it isn’t so expensive.

    I don’t like the idea of flashing one. I don’t have the firmware and I don’t know if they can be flashed with Linux. Aren’t there any good — and cost efficient — ones that do JBOD by default, preferably including 16-port cards with mini-SAS connectors?

  • I’m currently running an mdadm raid-check and two RAID-1 arrays, and the server shows 2 processes with 24–27% CPU each and two others around 5%. And you want to tell me that the CPU load is almost non-existent.

    I’ve also constantly seen much better performance with hardware RAID than with software RAID over the years and ZFS having the worst performance of anything, even with SSD caches.

    It speaks for itself, and, like I said, I have yet to see a software RAID
    that doesn’t bring the performance down. Show me one that doesn’t.

    Are there any hardware RAID controllers designed for NVMe storage you could use to compare software RAID with? Are there any ZFS or btrfs hardware controllers you could compare with?

  • the firmware is freely downloadable from lsi/broadcom, and linux has sas2flash or sas3flash (for 2×08/3008 respectively) command line tools to do the flashing.

    pretty much standard procedure for the ZFS crowd to flash those… the
    2×08 cards often come with “IR” firmware that does limited raid, and its preferable to flash them with the IT firmware that puts them in plain HBA
    mode, stands for Initiator-Target.

  • Hardware vs software RAID discussion is like a clash of two different religions. I, BTW, on your religious side: hardware RAID. For different reason: in hardware RAID it is small piece of code (hence well debugged), and dedicated hardware. Thus, things like kernel panic (of the main system, the one that would be running software RAID) does not affect hardware RAID function, whereas software RAID function will not be fulfilled in case of kernel panic. Whereas unclean filesystem can be dealt with, “unclean” RAID pretty much can not.

    But again, it is akin religion, and after both sides shoot out all their ammunition, everyone returns back being still on the same side one was before the “discussion”.

    So, I would just suggest… Hm, never mind, everyone, do what you feel right ;-)

    Valeri

  • I don’t need to be specific because I have seen the difference in practical usage over the last 20 years. I’m not setting up scientific testing environments that would cost tremendous amounts of money and am using available and cost-efficient hard- and software.

    No, I can’t. I don’t have tons of different CPUs, mainboards, controller cards and electronic diagnosting equipment around to do that, and what would you even benchmark? Is the user telling you that the software they are using in a VM that is stored on an NFS server, run by another server connected to it, is now running faster or slower? Are you doing SQL queries to create reports that are rarely required and take a while to run your benchmark? And what is even relevant?

    I am seeing that a particular software running in a VM is now running not any slower and maybe even faster than before the failed disk was replaced. That means hardware RAID with 8 disks in hardware RAID 1+0 vs. two disks as RAID 0 each in software RAID, using the otherwise same hardware, is not faster and even slower than the software RAID. The CPU load on the storage server is also higher, which in this case does not matter. I’m happy with the result so far, and that is what matters.

    If the disks were connected to the mainboard instead, the software might be running slower. I can’t benchmark that, either, because I can’t connect the disks to the SATA ports on the board. If there were 8 disks in a RAID 1+0, all connected to the board, it might be a lot slower. I can’t benchmark that, the board doesn’t have so many SATA connectors.

    I only have two new disks and no additional or different hardware. Telling me to specify particular chips and such is totally pointless. Benchmarking is not feasible and pointless, either.

    Sure you can do some kind of benchmarking in a lab if you can afford it, but how does that correlate to the results you’ll be getting in practise? Even if you involve users, those users will be different from the users I’m dealing with.

    And?

    I can expect a hardware to do something as much as I want, it will always only do whatever it does regardless.

    This may all be nice and good in theory. In practise, I’m seeing up to 30% CPU
    during a mdraid resync for a single 2-disk array. How much performance impact does that indicate for “normal” operations?

    Buying crap tends not to be cost-efficient.

    Like $500+ cards? That’s not cost efficient for my backup server I’m running about once a month to put backups on it. If I can get one good 16-port card or two 8-port cards for max. $100, I’ll consider it. Otherwise, I can keep using the P410s, turn all disks into RAID0 and use btrfs.

    How do you get all these servers?

    Well, I tried ZFS and was not happy with it, though it does have some nice features.

    I haven’t payed attention to the email address.

    If they had stopped making hardware RAID controllers, that would show that they have turned away from hardware RAID, and that might be seen as putting an end to the discussion — *because* they are trying to make money. If they haven’t stopped making them, that might indicate that there is still sufficient demand for the technology, and there are probably good reasons for that. That different technologies have matured over time doesn’t mean that others have become bad. Besides, always “picking up the best technology” comes with it’s own disadvantages while all technology will ultimately fail eventually, and sometimes hardware RAID can be the “best technology”.