CentOS 7 And Areca ARC-1883I SAS Controller: JBOD Or Not To JBOD?

Home » CentOS » CentOS 7 And Areca ARC-1883I SAS Controller: JBOD Or Not To JBOD?

January 20, 2017 Peter Peltonen CentOS 14 Comments

Hi,

Does anyone have experiences about ARC-1883I SAS controller with CentOS7?

I am planning to have RAID1 setup and I am wondering if I should use the controller‘s RAID functionality which has 2GB cache or should I go with JBOD + Linux software RAID?

The disks I am going to use are 6TB Seagate Enterprise ST6000NM0034
7200rpm SAS/12Gbit 128 MB

If hardware RAID is preferred, the controller’s cache could be updated to 4GB and I wonder how much performance gain this would give me?

Thanks!
Peter

14 thoughts on - CentOS 7 And Areca ARC-1883I SAS Controller: JBOD Or Not To JBOD?

Nux! says:

January 20, 2017 at 12:22 pm

Haven’t used Areca in a very long time, but with raid controllers the rule is to use it in order to take advantage of the cache. The performance gains will be more than significant, especially for writes.

hth
Joseph L. says:

January 20, 2017 at 1:00 pm

Sorry to hear that, my experience is the Seagate brand has the shortest MTBF
of any disk I have ever used…

Lots, especially with slower disks. You can also leverage write back caching if you have a battery on the controller as well. There are countless frameworks and one off utilities that can properly report on the throughput for various patterns, set it up both ways and know for sure.

Not related to your question, but something to keep in mind: What type of enclosure are you using? If you are using an engineered system your enclosure will communicate with the controller. When a disk fails it’s a pain in the arse to figure out where it exists physically. If you have an expander for example, this gets even more challenging.

jlc
Valeri Galtsev says:

January 20, 2017 at 1:14 pm

This is why before configuring and installing everything you may want to attach drives one at a time, and upon boot take a note which physical drive number the controller has for that drive, and definitely label it so y9ou will know which drive to pull when drive failure is reported.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
Joseph L. says:

January 20, 2017 at 5:16 pm

Sorry Valeri, that only works if you’re the only guy in the org.

In reality, you cannot and should not rely on this given how easily it can change and more than likely someone won’t update it.

Would you walk up to a production unit in a degraded state and simply pull out a drive and risk a production issue? I wouldn’t…

You need to assert the position of the drive and prepare it in the array controller for removal, then swap, scan, add to virtual disk then initiate rebuild.

Not to mention if it’s a busy system, confirm that the IO load from the rebuild is not having an impact on the application. You may need to lower the rate.
John R says:

January 20, 2017 at 5:35 pm

if the controller’s cache has battery or flash backed writeback protection, that should be in play whether you use it as jbod or
“hardware” raid. you’ll find the biggest performance boost will be on OLTP type database operations where many small insert/update transactions are being done…. but, those drives are more of a bulk storage sort of drive, suitable for things like backup data, media servers, rather than database… dedicated database servers generally use higher RPM drives or nowdays, enterprise-grade SSD.
Valeri Galtsev says:

January 20, 2017 at 5:38 pm

Well, this is true, I’m only one sysadmin working for two departments here…

I routinely do: I just hot remove failed drive from running production systems, and replace with good drive (take a note what I said about my job above though). No one of our users ever notices. When I do it I usually am only taking chance of making degraded RAID6 (with one drive failed)
degraded yet even more and become not fault tolerant, though still on line with all data on it. But even that chance is slim given I take all precautions when I am initially setting up the box.

Hm, not certain what process you describe. Most of my controllers are
3ware and LSI, I just pull failed drive (and I know phailed physical drive number), put good in its place and rebuild stars right away. I have a couple of Areca ones (I love them too!), I don’t remember if I have to manually initialize rebuild. (I’m lucky in using good drives – very careful in choosing good ones ;-).

Indeed, in 3ware configuration there is a choice of several grades of rebuild vs IO, I usually choose slower rebuild – faster IO. If I have only one drive failing on me during a year in a given rack, there is almost zero chance of second drive failing during quite some time (we had heated discussion about it once and I still stand by my opinion that drive failures are independent events). So, my degraded RAID-6 can keep running and even still stay redundant (“single redundant” akin RAID-5) for the period of rebuild, even if that takes quite long.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
John R says:

January 20, 2017 at 5:51 pm

this is my biggest gripe with roll your own ‘whitebox’ storage systems today… you get a brand name server, like HP or Dell, with HP or Dell storage, and when a drive fails, the failure light on that drive comes on bright RED, so there’s no QUESTION what drive is offline and needs replacing…. whitebox raid or HBA card, with generic SAS expander trays, or whatever? good luck, implementing that failure mechanism is left as an exercise to the reader, without any clue how to go about it.
Gordon Messmer says:

January 20, 2017 at 6:04 pm

I’d recommend testing the specific application that will run on this system. It’s not unusual for a system to preform well in simplistic benchmarks (like bonnie++), but not in real-world use. I recently conducted tests in which a MegaRAID controller handily outclassed software RAID under bonnie++, but fell short under better benchmarks provided by filebench:

https://plus.google.com/+GordonMessmer/posts/eSe6iNmk1Fs?sfc
Cameron Smith says:

January 20, 2017 at 7:00 pm

Hi Valeri,

Before you pull a drive you should check to make sure that doing so won’t kill the whole array.

MegaCli can help you prevent a storage disaster and can let you have more insight into your RAID and the status of the virtual disks and the disks than make up each array.

MegaCli will let you see the health and status of each drive. Does it have media errors, is it in predictive failure mode, what firmware version does it have etc. MegaCli will also let you see the status of the enclosure, the adapter and the virtual disks (logical disks).

Before you pull a drive it’s a good idea to properly prepare it for removal after confirming that it’s OK to remove it.

Here are a few commands:

OFFLINE A DISK
MegaCli -PDOffline -PhysDrv[32:0] -a0

MARK A DISK AS MISSING
MegaCli -pdmarkmissing -physdrv[32:0] -a0

MARK A DISK AS PREPARED FOR REMOVAL
MegaCli -pdprprmv -physdrv[32:0] -a0

Here are some easy overview commands that I run when first looking at the storage on a system:
MegaCli -AdpAllInfo -aAll |grep -A 8 “Device Present”;
MegaCli -PDList -aALL |grep “Firmware state”;
MegaCli -PDList -aALL |grep “Media Error Count”;
MegaCli -PDList -aALL |grep “Predictive Failure Count”;
MegaCli -PDList -aALL |grep “Inquiry Data”;
MegaCli -PDList -aALL |grep “Device Firmware Level”;
MegaCli -PDList -aALL |grep “Drive has flagged”;
MegaCli -PDList -aALL |grep Temperature;

I also leverage MegaCli from bash scripts on my older Dell 11Gen that I run in cron.hourly that check the health status of my arrays and email me if there is an issue.

Cameron Smith Technical Operations Manager Network Redux, LLC
Cell: 503-926-4928
Valeri Galtsev says:

January 20, 2017 at 7:17 pm

Wow! What did I say to make you treat me as an ultimate idiot!? ;-) All my comments, at least in my own reading, we about things you need to do to make sure when you hot unplug bad drive it is indeed failed drive you have to replace.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
Cameron Smith says:

January 20, 2017 at 8:32 pm

I was just trying to be helpful.

*backs away slowly*

Cameron
Keith Keller says:

January 21, 2017 at 12:20 am

I know for sure that LSI’s storcli utility supports an identify operation, which (if the hardware all cooperates) causes the drive’s light to blink. I’m fairly sure I’ve used this feature on 3ware controllers as well. I use this even when I’m sure of the failed drive number and am the only sysadmin for these systems, because I don’t even trust my own memory. :)

This is one reason I prefer RAID6 over RAID5: if you have one failed drive in your array, and you pull the wrong one, your RAID5 is now gone, but your RAID6 is still functional. The odds are with you in a RAID10
but you could get unlucky. (Not that you want to rebuild two drives at the same time but it’s still better than losing the array.)

–keith
Valeri Galtsev says:

January 21, 2017 at 11:02 am

Yes, that’s my attitude exactly. If controller is connected to backplane correctly, failed (in controller’s opinion) drive would have different LED
light lit up (in color, or extra LED depending on backplane). So, just looking at the box you know which drive to pull. But exactly as you, I am making sure when rolling out the box into production that I know which drive has which physical drive number in controller’s book, then I know from controller which drive failed, and which drive’s LED expect to shine when I have my hands on the box, and if it is not what I expect, it will be long investigation why before I do something. Luckily never happened that way. Still, as you do, I prefer RAID-6, because even improbable can happen. Even if RAID10 can give you more speed (which with controller cache is questionable) I prefer reliability (yes, RAID60 is there too, but too wasteful for simple things we do).

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
John R says:

January 21, 2017 at 8:07 pm

no raid group should be over 10-12 drives, so if you have a really large array, say 30 drives, its best to stripe three raid6’s of 10 drives each, or whatever.