Slow RAID Check/high %iowait During Check After Updgrade From CentOS 6.5 -> CentOS 7.2

Home » CentOS » Slow RAID Check/high %iowait During Check After Updgrade From CentOS 6.5 -> CentOS 7.2
CentOS 26 Comments

I’ve posted this on the forums at https://www.CentOS.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 – posting to the list in the hopes of getting more eyeballs on it.

We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs:

2x E5-2650
128 GB RAM
12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA
Dual port 10 GB NIC

The drives are configured as one large RAID-10 volume with mdadm, filesystem is XFS. The OS is not installed on the drive – we PXE boot a CentOS image we’ve built with minimal packages installed, and do the OS configuration via puppet. Originally, the hosts were running CentOS 6.5, with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and Kafka 0.9, and that’s when the trouble started.

What we’re seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec – the limit that’s been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults:

dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000

Here’s 10 seconds of iostat output, which illustrates the issue:

[root@r1k1log] # iostat 1 10
Linux 3.10.0-327.18.2.el7.x86_64 (r1k1)    05/24/16    _x86_64_   (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.80    0.06    1.89   14.79    0.00   74.46

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              52.59      2033.16     10682.78 1210398902 6359779847
sdb              52.46      2031.25     10682.78 1209265338 6359779847
sdc              52.40      2033.21     10683.53 1210433924 6360229587
sdd              52.22      2031.16     10683.53 1209212513 6360229587
sdf              52.20      2031.17     10682.06 1209216701 6359354331
sdg              52.62      2033.22     10684.17 1210437080 6360606756
sdh              52.57      2031.21     10684.17 1209242746 6360606756
sde              51.67      2033.17     10682.06 1210408935 6359354331
sdj              51.90      2031.13     10684.48 1209191501 6360795559
sdi              52.47      2033.16     10684.48 1210399262 6360795559
sdk              52.09      2033.15     10684.36 1210396915 6360724971
sdl              51.95      2031.20     10684.36 1209235241 6360724971
md127           138.20        74.49     64101.35   44348810 38161468777

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.57    0.09    1.33   26.19    0.00   63.81

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              28.00       512.00      8416.00        512       8416
sdb              28.00       512.00      8416.00        512       8416
sdc              25.00       448.00      8876.00        448       8876
sdd              24.00       448.00      8364.00        448       8364
sdf              23.00       448.00      8192.00        448       8192
sdg              24.00       512.00      7680.00        512       7680
sdh              24.00       512.00      7680.00        512       7680
sde              23.00       448.00      8192.00        448       8192
sdj              23.00       512.00      7680.00        512       7680
sdi              23.00       512.00      7680.00        512       7680
sdk              23.00       512.00      7680.00        512       7680
sdl              23.00       512.00      7680.00        512       7680
md127           101.00         0.00     48012.00          0      48012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.50    0.00    1.04   24.27    0.00   68.19

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              26.00       512.00      9216.00        512       9216
sdb              26.00       512.00      9216.00        512       9216
sdc              27.00       576.00      9204.00        576       9204
sdd              28.00       576.00      9716.00        576       9716
sdf              31.00       768.00      9728.00        768       9728
sdg              28.00       512.00     10240.00        512      10240
sdh              28.00       512.00     10240.00        512      10240
sde              31.00       768.00      9728.00        768       9728
sdj              28.00       512.00      9744.00        512       9744
sdi              28.00       512.00      9744.00        512       9744
sdk              27.00       512.00      9728.00        512       9728
sdl              27.00       512.00      9728.00        512       9728
md127           114.00         0.00     57860.00          0      57860

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.24    0.00    1.32   20.02    0.00   69.42

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              50.00       512.00     20408.00        512      20408
sdb              50.00       512.00     20408.00        512      20408
sdc              48.00       512.00     19984.00        512      19984
sdd              48.00       512.00     19984.00        512      19984
sdf              50.00       704.00     19968.00        704      19968
sdg              47.00       512.00     19968.00        512      19968
sdh              47.00       512.00     19968.00        512      19968
sde              50.00       704.00     19968.00        704      19968
sdj              48.00       512.00     19972.00        512      19972
sdi              48.00       512.00     19972.00        512      19972
sdk              48.00       512.00     19980.00        512      19980
sdl              48.00       512.00     19980.00        512      19980
md127           241.00         0.00    120280.00          0     120280

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.98    0.00    0.98   18.42    0.00   72.63

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              39.00       640.00     14076.00        640      14076
sdb              39.00       640.00     14076.00        640      14076
sdc              36.00       512.00     14324.00        512      14324
sdd              36.00       512.00     14324.00        512      14324
sdf              36.00       576.00     13824.00        576      13824
sdg              43.00      1024.00     13824.00       1024      13824
sdh              43.00      1024.00     13824.00       1024      13824
sde              36.00       576.00     13824.00        576      13824
sdj              44.00      1024.00     14104.00       1024      14104
sdi              44.00      1024.00     14104.00       1024      14104
sdk              45.00      1024.00     14336.00       1024      14336
sdl              45.00      1024.00     14336.00       1024      14336
md127           168.00         0.00     84488.00          0      84488

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.39    0.00    1.01   19.48    0.00   72.13

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              22.00       896.00      4096.00        896       4096
sdb              22.00       896.00      4096.00        896       4096
sdc              19.00       640.00      4344.00        640       4344
sdd              19.00       640.00      4344.00        640       4344
sdf              18.00       512.00      5120.00        512       5120
sdg              18.00       512.00      5120.00        512       5120
sdh              18.00       512.00      5120.00        512       5120
sde              18.00       512.00      5120.00        512       5120
sdj              18.00       512.00      4624.00        512       4624
sdi              18.00       512.00      4624.00        512       4624
sdk              18.00       512.00      4608.00        512       4608
sdl              18.00       512.00      4608.00        512       4608
md127            57.00         0.00     27912.00          0      27912

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.92    0.00    1.58   21.84    0.00   65.66

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              23.00       576.00      7168.00        576       7168
sdb              23.00       576.00      7168.00        576       7168
sdc              29.00       896.00      7680.00        896       7680
sdd              29.00       896.00      7680.00        896       7680
sdf              31.00      1024.00      7680.00       1024       7680
sdg              31.00      1024.00      7680.00       1024       7680
sdh              31.00      1024.00      7680.00       1024       7680
sde              31.00      1024.00      7680.00       1024       7680
sdj              30.00      1024.00      7168.00       1024       7168
sdi              31.00      1024.00      7680.00       1024       7680
sdk              32.00      1024.00      7424.00       1024       7424
sdl              32.00      1024.00      7424.00       1024       7424
md127            89.00         0.00     44800.00          0      44800

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.89    0.03    2.63   21.54    0.00   61.91

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              30.00       960.00      7680.00        960       7680
sdb              30.00       960.00      7680.00        960       7680
sdc              32.00      1024.00      7684.00       1024       7684
sdd              32.00      1024.00      7684.00       1024       7684
sdf              31.00      1024.00      7680.00       1024       7680
sdg              31.00      1024.00      7680.00       1024       7680
sdh              31.00      1024.00      7680.00       1024       7680
sde              31.00      1024.00      7680.00       1024       7680
sdj              32.00      1024.00      8192.00       1024       8192
sdi              31.00      1024.00      7680.00       1024       7680
sdk              26.00       704.00      7680.00        704       7680
sdl              26.00       704.00      7680.00        704       7680
md127            92.00         0.00     46596.00          0      46596

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.24    0.00    2.22   19.89    0.00   63.65

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              33.00      1024.00      7244.00       1024       7244
sdb              33.00      1024.00      7244.00       1024       7244
sdc              31.00      1024.00      7668.00       1024       7668
sdd              31.00      1024.00      7668.00       1024       7668
sdf              31.00      1024.00      7680.00       1024       7680
sdg              26.00       768.00      6672.00        768       6672
sdh              26.00       768.00      6672.00        768       6672
sde              31.00      1024.00      7680.00       1024       7680
sdj              21.00       512.00      6656.00        512       6656
sdi              21.00       512.00      6656.00        512       6656
sdk              27.00       832.00      7168.00        832       7168
sdl              27.00       832.00      7168.00        832       7168
md127            88.00         0.00     43088.00          0      43088

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.02    0.13    1.42   23.90    0.00   66.53

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn sda              30.00      1024.00      7168.00       1024       7168
sdb              30.00      1024.00      7168.00       1024       7168
sdc              29.00       960.00      7168.00        960       7168
sdd              29.00       960.00      7168.00        960       7168
sdf              23.00       512.00      7668.00        512       7668
sdg              28.00       768.00      7680.00        768       7680
sdh              28.00       768.00      7680.00        768       7680
sde              23.00       512.00      7668.00        512       7668
sdj              30.00      1024.00      6672.00       1024       6672
sdi              30.00      1024.00      6672.00       1024       6672
sdk              30.00      1024.00      7168.00       1024       7168
sdl              30.00      1024.00      7168.00       1024       7168
md127            87.00         0.00     43524.00          0      43524

Details of the array:

[root@r1k1] # cat /proc/mdstat 
Personalities : [raid10] 
md127 : active raid10 sdf[5] sdi[8] sdh[7] sdk[10] sdb[1] sdj[9] sdc[2] sdd[3] sdl[11] sde[13] sdg[12] sda[0]
      23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
      [======>…………..]  check = 30.8% (7237496960/23441323008) finish=62944.5min speed=4290K/sec
      
unused devices:
[root@r1k1] # mdadm –detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Sep 18 09:57:57 2014
     Raid Level : raid10
     Array Size : 23441323008 (22355.39 GiB 24003.91 GB)
  Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Tue May 24 15:32:56 2016
          State : active, checking 
 Active Devices : 12
Working Devices : 12
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

   Check Status : 30% complete

           Name : localhost:kafka
           UUID : b6b98e3e:65ee06c3:3599d781:98908041
         Events : 2459193

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync set-A   /dev/sda
       1       8       16        1      active sync set-B   /dev/sdb
       2       8       32        2      active sync set-A   /dev/sdc
       3       8       48        3      active sync set-B   /dev/sdd
      13       8       64        4      active sync set-A   /dev/sde
       5       8       80        5      active sync set-B   /dev/sdf
      12       8       96        6      active sync set-A   /dev/sdg
       7       8      112        7      active sync set-B   /dev/sdh
       8       8      128        8      active sync set-A   /dev/sdi
       9       8      144        9      active sync set-B   /dev/sdj
      10       8      160       10      active sync set-A   /dev/sdk
      11       8      176       11      active sync set-B   /dev/sdl

We’ve tried changing the I/O scheduler, queue_depth, queue_type, read-ahead, etc, but nothing has helped. We’ve also upgraded all of the firmware, and installed HP’s mpt2sas driver.

We have 4 other Kafka clusters, however they’re HP DL180 G6 servers. We completed the same CentOS 6.5 -> 7.2/Kafka 0.8 -> 0.9 upgrade on those clusters, and there has been no impact to their performance.

We’ve been banging our heads against the wall for a few weeks now, really hoping someone from the community can point us in the right direction.

Thanks,

Kelly Lesperance

26 thoughts on - Slow RAID Check/high %iowait During Check After Updgrade From CentOS 6.5 -> CentOS 7.2

  • Kelly Lesperance wrote:

    Really stupid question: are the drives in that the ones that came with the unit?

    mark, who, a few years ago, found serious issues with green drives in a
    server….

  • Kelly Lesperance wrote:

    One more stupid question: could the configuration of the card for how the drives are accessed been accidentally changed?

    mark

  • They are:

    [root@r1k1 ~] # hdparm -I /dev/sda

    /dev/sda:

    ATA device, with non-removable media Model Number: MB4000GCWDC
    Serial Number: S1Z06RW9
    Firmware Revision: HPGD
    Transport: Serial, SATA Rev 3.0

    Thanks,

    Kelly

  • [merging]

    The HBA the drives are attached to has no configuration that I’m aware of. We would have had to accidentally change 23 of them ☺

    Thanks,

    Kelly

  • What is the HBA the drives are attached to?
    Have you done a quick benchmark on a single disk to check if this is a raid problem or further down the stack?

    Regards,
    Dennis

  • The HBA is an HP H220.

    We haven’t really benchmarked individual drives – all 12 drives are utilized in one RAID-10 array, I’m unsure how we would test individual drives without breaking the array.

    Trying ‘hdparm -tT /dev/sda’ now – it’s been running for 25 minutes so far…

    Kelly

  • OH. its a very good idea to verify the driver is at the same revision level as the firmware. not 100% sure how you do this under CentOS, my H220 system is running FreeBSD, and is at revision P20, both firmware and driver. HP’s firmware, at least what I could find, was a fairly old P14 or something, so I had to re-flash mine with ‘generic’ LSI
    firmware, this isn’t exactly a recommended thing to do, but its sure working fine for me.

  • Hdparm didn’t get far:

    [root@r1k1 ~] # hdparm -tT /dev/sda

    /dev/sda:
    Timing cached reads: Alarm clock
    [root@r1k1 ~] #

  • John R Pierce wrote:

    Not sure if dmidecode will tell you, but you might see if you can run smartctl -i

    Also, you could either, on boot, go into the card’s firmware interface, and that’ll tell you, somewhere, what the firmware version is. Not sure if MegaRAID will work with this card – if it does, you really want it..even though it has an actively user-hostile interface.

    mark

  • I installed the latest firmware and driver (mpt2sas) from HP on one system. The driver is v20, it appears the firmware may be 15, though:

    [ 11.128979] mpt2sas version 20.100.00.00 loaded
    [ 11.513836] mpt2sas0: LSISAS2308: FWVersion(15.10.09.00), ChipRevision(0x05), BiosVersion(07.39.00.00)

  • LSI/Avago’s web pages don’t have any downloads for the SAS2308, so I think I’m out of luck wrt MegaRAID.

    Bounced the node, confirmed MPT Firmware 15.10.09.00-IT. HP Driver is v 15.10.04.00.

    Both are the latest from HP.

    Unsure why, but the module itself reports version 20.100.00.00:

    [root@r1k1 sys] # cat module/mpt2sas/version
    20.100.00.00

  • Kelly Lesperance wrote:

    Suggestion: if these are new, they’re under warranty, and it’s a hardware issue. Call HP tech support and open a ticket with them – they might have an answer.

    mark

  • Already done – they’re not being very helpful, as we don’t have a support contract, just standard warranty.

  • I should rephrase that – some parts of HP are helping us, but the team I opened the case with isn’t being very helpful.

  • Kelly Lesperance wrote:
    that, we don’t get rid of them till they’re dying. (Don’t talk to me about
    “wasting tax dollars”)

    And I don’t care for HP “support”, they don’t want to give you *anything*
    unless you’re paying for support. The only ones worse are
    a) none of the above, and
    b) Sun/Oracle (I refer to dealing with their “tech support” as self-abuse)

    mark

  • Hi Kelly,

    Try running ‘iostat -xdmc 1’. Look for a single drive that has substantially greater await than ~10msec. If all the drives except one are taking 6-8msec, but one is very much more, you’ve got a drive that drags down the whole array’s performance.

    Ignore the very first output from the command – it’s an average of the disk subsystem since boot.

    Post a representative output along with the contents /proc/mdstat.

    Good luck,

    Charles Polisher

  • in IT mode, the 2308 is a straight SAS host bus adapter, all drives are presented directly to the host OS as native SAS devices.

  • Hi Charles,

    Looks to me like all of the drives are performing roughly the same – there’s certainly not 1 that sticks out (also note this is happening on all 23 nodes in the cluster).

    Thanks!

    Kelly

    [root@r1k1.kafka.log10.blackberry sys] # cat /proc/mdstat Personalities : [raid10]
    md127 : active raid10 sdc[2] sdh[7] sdb[1] sdf[5] sde[13] sdg[12] sdj[9] sdk[10] sda[0] sdl[11] sdd[3] sdi[8]
    23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
    [>………………..] check = 0.0% (618944/23441323008) finish=108288.4min speed=3607K/sec

    unused devices:
    [root@r1k1.kafka.log10.blackberry sys] # iostat -xdmc 1 10
    Linux 3.10.0-327.18.2.el7.x86_64 (r1k1.kafka.log10.blackberry) 05/26/16 _x86_64_ (32 CPU)

    avg-cpu: %user %nice %system %iowait %steal %idle
    12.76 0.07 2.48 0.16 0.00 84.53

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.01 0.56 0.26 26.44 0.06 11.30 871.39 9.67 362.22 5.14 365.71 6.68 17.83
    sdk 0.01 0.56 0.26 26.56 0.06 11.30 867.70 9.53 355.45 5.05 358.84 6.58 17.65
    sdc 0.01 0.46 0.26 26.34 0.06 11.29 874.67 9.73 365.89 4.86 369.38 6.81 18.11
    sdd 0.01 0.46 0.20 26.34 0.07 11.29 876.98 10.40 391.99 5.33 394.93 7.17 19.02
    sda 0.01 0.49 0.26 26.53 0.06 11.29 868.24 9.48 353.91 4.96 357.36 6.57 17.61
    sdj 0.01 0.56 0.20 26.44 0.07 11.30 873.73 10.04 376.87 5.48 379.68 6.91 18.40
    sdl 0.01 0.56 0.20 26.56 0.07 11.30 869.99 9.77 365.16 5.92 367.92 6.72 17.99
    sdh 0.01 0.57 0.21 26.79 0.07 11.30 862.30 9.65 357.60 5.27 360.31 6.63 17.90
    sde 0.01 0.47 0.26 26.13 0.06 11.29 881.38 10.60 401.47 6.62 405.41 7.35 19.41
    sdf 0.01 0.47 0.20 26.13 0.07 11.29 883.71 9.53 361.85 5.24 364.64 6.73 17.73
    sdg 0.01 0.57 0.26 26.79 0.06 11.30 859.99 10.15 375.20 5.26 378.82 6.86 18.57
    sdb 0.01 0.49 0.20 26.53 0.07 11.29 870.69 9.85 368.48 5.35 371.23 6.79 18.15
    md127 0.00 0.00 2.51 156.82 0.77 67.77 881.06 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    25.51 0.03 4.37 1.05 0.00 69.04

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 1.00 8.00 30.00 0.50 14.18 791.16 1.06 28.03 0.50 35.37 6.97 26.50
    sdk 0.00 0.00 8.00 30.00 0.50 14.52 809.47 0.93 24.32 0.00 30.80 7.87 29.90
    sdc 0.00 1.00 9.00 32.00 0.56 15.21 787.90 1.13 27.54 0.67 35.09 6.90 28.30
    sdd 0.00 1.00 10.00 32.00 0.62 15.21 772.19 1.29 30.69 0.70 40.06 6.76 28.40
    sda 0.00 0.00 8.00 38.00 0.50 15.54 714.09 1.40 30.35 0.38 36.66 7.91 36.40
    sdj 0.00 1.00 8.00 30.00 0.50 14.18 791.16 1.05 27.68 0.50 34.93 7.00 26.60
    sdl 0.00 0.00 8.00 30.00 0.50 14.52 809.47 0.90 23.61 0.25 29.83 7.66 29.10
    sdh 0.00 1.00 13.00 34.00 0.81 14.11 650.04 1.17 24.98 0.31 34.41 6.60 31.00
    sde 0.00 0.00 16.00 31.00 1.00 14.54 676.94 1.20 25.45 0.31 38.42 7.13 33.50
    sdf 0.00 0.00 16.00 31.00 1.00 14.54 676.94 1.19 25.38 0.31 38.32 5.57 26.20
    sdg 0.00 1.00 13.00 34.00 0.81 14.11 650.04 1.22 25.98 0.31 35.79 6.70 31.50
    sdb 0.00 0.00 8.00 38.00 0.50 15.54 714.09 1.31 28.41 0.25 34.34 8.02 36.90
    md127 0.00 0.00 0.00 198.00 0.00 86.59 895.60 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    21.31 0.00 2.99 0.00 0.00 75.69

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 0.00 8.00 7.00 0.50 3.50 546.13 0.13 8.47 6.25 11.00 8.47 12.70
    sdk 0.00 0.00 8.00 8.00 0.50 3.98 574.00 0.16 9.88 1.00 18.75 10.06 16.10
    sdc 0.00 0.00 8.00 8.00 0.50 4.00 576.00 0.12 7.25 0.62 13.88 7.25 11.60
    sdd 0.00 0.00 8.00 8.00 0.50 4.00 576.00 0.12 7.44 0.50 14.38 7.44 11.90
    sda 0.00 0.00 8.00 8.00 0.50 4.00 576.00 0.13 8.00 0.50 15.50 8.00 12.80
    sdj 0.00 0.00 8.00 7.00 0.50 3.50 546.13 0.18 12.20 9.25 15.57 12.20 18.30
    sdl 0.00 0.00 8.00 9.00 0.50 4.48 600.47 0.11 6.94 1.00 12.22 6.59 11.20
    sdh 0.00 0.00 8.00 9.00 0.50 3.51 482.82 0.10 6.12 0.50 11.11 6.12 10.40
    sde 0.00 0.00 8.00 9.00 0.50 4.00 542.59 0.16 9.65 0.25 18.00 9.65 16.40
    sdf 0.00 0.00 8.00 9.00 0.50 4.00 542.59 0.13 7.65 0.25 14.22 7.65 13.00
    sdg 0.00 0.00 8.00 9.00 0.50 3.51 482.82 0.13 7.59 0.50 13.89 7.59 12.90
    sdb 0.00 0.00 8.00 8.00 0.50 4.00 576.00 0.11 6.62 2.12 11.12 6.62 10.60
    md127 0.00 0.00 0.00 49.00 0.00 23.00 961.14 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    18.70 4.21 4.24 0.00 0.00 72.85

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 0.00 8.00 15.00 0.50 6.09 586.78 0.22 9.17 0.25 13.93 6.26 14.40
    sdk 0.00 0.00 8.00 14.00 0.50 5.55 563.27 0.25 11.68 2.38 17.00 7.59 16.70
    sdc 0.00 0.00 8.00 13.00 0.50 6.50 682.67 0.15 7.00 0.25 11.15 6.00 12.60
    sdd 0.00 0.00 8.00 13.00 0.50 6.50 682.67 0.17 7.95 0.25 12.69 6.86 14.40
    sda 0.00 0.00 8.00 14.00 0.50 6.50 652.00 0.26 11.77 0.62 18.14 7.86 17.30
    sdj 0.00 0.00 8.00 15.00 0.50 6.09 586.78 0.34 14.35 2.00 20.93 9.87 22.70
    sdl 0.00 0.00 8.00 13.00 0.50 5.05 541.33 0.25 11.86 0.50 18.85 7.57 15.90
    sdh 0.00 0.00 10.00 17.00 0.62 7.14 589.04 0.33 12.19 0.60 19.00 7.41 20.00
    sde 0.00 0.00 8.00 18.00 0.50 6.68 565.85 0.31 11.77 0.25 16.89 7.00 18.20
    sdf 0.00 0.00 8.00 18.00 0.50 6.68 565.85 0.42 16.12 2.25 22.28 9.96 25.90
    sdg 0.00 0.00 10.00 17.00 0.62 7.14 589.04 0.33 12.30 0.60 19.18 6.59 17.80
    sdb 0.00 0.00 8.00 14.00 0.50 6.50 652.00 0.27 12.14 2.25 17.79 8.00 17.60
    md127 0.00 0.00 0.00 91.00 0.00 38.47 865.76 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    16.69 0.03 3.08 0.03 0.00 80.16

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 0.00 18.00 14.00 1.12 7.00 520.00 0.15 4.84 0.50 10.43 4.62 14.80
    sdk 0.00 0.00 16.00 14.00 1.00 7.00 546.13 0.14 4.77 0.38 9.79 4.77 14.30
    sdc 0.00 0.00 16.00 13.00 1.00 6.50 529.66 0.14 5.00 0.38 10.69 5.00 14.50
    sdd 0.00 0.00 16.00 13.00 1.00 6.50 529.66 0.15 5.10 0.38 10.92 5.10 14.80
    sda 0.00 0.00 16.00 18.00 1.00 7.54 514.59 0.21 6.12 1.31 10.39 6.26 21.30
    sdj 0.00 0.00 18.00 14.00 1.12 7.00 520.00 0.13 4.25 0.50 9.07 4.03 12.90
    sdl 0.00 0.00 16.00 14.00 1.00 7.00 546.13 0.13 4.47 0.31 9.21 4.47 13.40
    sdh 7.00 0.00 10.00 13.00 1.06 6.50 673.39 0.10 4.57 0.50 7.69 4.57 10.50
    sde 6.00 0.00 10.00 13.00 1.00 6.50 667.83 0.15 6.35 0.60 10.77 6.35 14.60
    sdf 6.00 0.00 10.00 13.00 1.00 6.50 667.83 0.14 6.22 0.60 10.54 6.22 14.30
    sdg 7.00 0.00 10.00 13.00 1.06 6.50 673.39 0.10 4.39 0.50 7.38 4.39 10.10
    sdb 0.00 0.00 16.00 19.00 1.00 7.57 501.71 0.13 3.77 0.31 6.68 3.77 13.20
    md127 0.00 0.00 0.00 85.00 0.00 40.57 977.60 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    22.73 0.00 5.91 0.06 0.00 71.30

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 242.00 0.00 1048.00 1.00 80.62 0.01 157.43 4.38 4.18 4.17 8.00 0.37 38.50
    sdk 334.00 0.00 954.00 0.00 80.50 0.00 172.81 7.27 7.62 7.62 0.00 0.45 43.20
    sdc 294.00 0.00 994.00 0.00 80.50 0.00 165.86 5.56 5.59 5.59 0.00 0.40 39.80
    sdd 249.00 0.00 1039.00 0.00 80.50 0.00 158.68 4.11 3.95 3.95 0.00 0.37 38.00
    sda 268.00 0.00 1020.00 11.00 80.50 0.18 160.26 5.47 5.31 5.14 21.36 0.58 60.20
    sdj 253.00 0.00 1037.00 1.00 80.62 0.01 159.10 4.42 4.26 4.26 3.00 0.37 38.80
    sdl 257.00 0.00 1031.00 0.00 80.50 0.00 159.91 5.13 4.98 4.98 0.00 0.37 38.30
    sdh 224.00 0.00 1064.00 1.00 80.50 0.00 154.81 3.80 3.57 3.57 10.00 0.36 38.30
    sde 247.00 0.00 1041.00 0.00 80.50 0.00 158.37 4.96 4.77 4.77 0.00 0.37 38.60
    sdf 220.00 0.00 1068.00 0.00 80.50 0.00 154.37 3.70 3.47 3.47 0.00 0.33 35.40
    sdg 242.00 0.00 1046.00 1.00 80.50 0.00 157.47 5.05 4.82 4.81 13.00 0.39 40.80
    sdb 239.00 0.00 1049.00 10.00 80.50 0.15 155.97 4.77 4.51 4.43 12.10 0.45 47.90
    md127 0.00 0.00 0.00 13.00 0.00 0.17 26.46 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    28.14 0.03 6.00 0.00 0.00 65.83

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 5.00 12.00 37.00 0.75 0.29 43.27 1.13 23.04 0.67 30.30 4.98 24.40
    sdk 0.00 17.00 16.00 34.00 1.00 0.34 54.88 2.95 59.02 0.56 86.53 3.98 19.90
    sdc 0.00 4.00 8.00 33.00 0.50 0.25 37.66 1.14 27.88 0.25 34.58 5.66 23.20
    sdd 0.00 4.00 8.00 33.00 0.50 0.25 37.66 0.52 12.83 0.25 15.88 4.02 16.50
    sda 0.00 3.00 16.00 21.00 1.00 0.17 64.86 0.26 7.14 1.06 11.76 3.92 14.50
    sdj 0.00 5.00 12.00 37.00 0.75 0.29 43.27 0.84 17.24 0.42 22.70 4.47 21.90
    sdl 0.00 17.00 16.00 34.00 1.00 0.34 54.88 2.98 59.56 0.56 87.32 3.92 19.60
    sdh 0.00 4.00 8.00 26.00 0.50 0.20 41.88 0.67 19.71 1.75 25.23 4.50 15.30
    sde 0.00 4.00 8.00 22.00 0.50 0.19 47.20 0.39 12.83 2.38 16.64 3.93 11.80
    sdf 0.00 4.00 8.00 22.00 0.50 0.19 47.20 0.35 11.60 2.25 15.00 3.67 11.00
    sdg 0.00 4.00 8.00 26.00 0.50 0.20 41.88 0.67 19.62 1.50 25.19 4.12 14.00
    sdb 0.00 3.00 16.00 21.00 1.00 0.17 64.86 0.42 11.27 1.00 19.10 6.73 24.90
    md127 0.00 0.00 0.00 210.00 0.00 1.44 14.06 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    35.57 0.00 10.34 0.00 0.00 54.08

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 0.00 0.00 9.00 11.00 0.56 0.04 62.00 0.14 7.00 1.78 11.27 6.50 13.00
    sdk 0.00 0.00 8.00 11.00 0.50 0.05 58.95 0.10 5.26 0.88 8.45 5.26 10.00
    sdc 0.00 0.00 16.00 16.00 1.00 0.07 68.75 0.17 5.44 0.44 10.44 5.44 17.40
    sdd 0.00 0.00 16.00 16.00 1.00 0.07 68.75 0.17 5.38 1.00 9.75 5.38 17.20
    sda 0.00 0.00 8.00 16.00 0.50 0.06 48.08 0.20 8.54 0.75 12.44 5.42 13.00
    sdj 0.00 0.00 9.00 11.00 0.56 0.04 62.00 0.13 6.65 0.44 11.73 5.90 11.80
    sdl 0.00 0.00 8.00 11.00 0.50 0.05 58.95 0.12 6.16 1.62 9.45 6.16 11.70
    sdh 0.00 0.00 16.00 30.00 1.00 0.11 49.63 0.32 6.85 0.44 10.27 4.39 20.20
    sde 0.00 0.00 16.00 6.00 1.00 0.02 95.27 0.10 4.41 0.44 15.00 4.41 9.70
    sdf 0.00 0.00 16.00 6.00 1.00 0.02 95.27 0.14 6.59 4.06 13.33 6.55 14.40
    sdg 0.00 0.00 16.00 29.00 1.00 0.11 50.56 0.31 6.80 0.44 10.31 4.82 21.70
    sdb 0.00 0.00 8.00 16.00 0.50 0.06 48.08 0.24 10.17 0.62 14.94 6.75 16.20
    md127 0.00 0.00 0.00 89.00 0.00 0.36 8.24 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    35.50 0.03 7.43 0.00 0.00 57.04

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 74.00 0.00 21.00 7.00 5.94 1.53 546.00 0.47 16.71 17.57 14.14 4.89 13.70
    sdk 70.00 0.00 26.00 10.00 6.00 1.41 421.56 0.41 10.94 11.73 8.90 4.33 15.60
    sdc 77.00 0.00 11.00 9.00 5.50 1.57 723.60 0.64 32.00 42.64 19.00 10.65 21.30
    sdd 77.00 0.00 11.00 9.00 5.50 1.57 723.60 1.19 59.60 96.36 14.67 12.10 24.20
    sda 71.00 1.00 24.00 11.00 5.94 1.53 437.09 0.51 14.46 14.38 14.64 5.09 17.80
    sdj 74.00 0.00 21.00 7.00 5.94 1.53 546.00 0.58 20.79 20.57 21.43 7.04 19.70
    sdl 70.00 0.00 26.00 11.00 6.00 1.91 437.84 0.39 10.54 11.04 9.36 4.32 16.00
    sdh 77.00 0.00 11.00 7.00 5.50 1.52 798.67 0.43 24.17 33.82 9.00 6.61 11.90
    sde 77.00 0.00 11.00 6.00 5.50 1.52 845.18 0.58 34.24 36.91 29.33 13.71 23.30
    sdf 77.00 0.00 11.00 6.00 5.50 1.52 845.18 0.60 35.35 45.36 17.00 10.06 17.10
    sdg 77.00 0.00 11.00 8.00 5.50 1.52 757.05 0.43 22.95 32.00 10.50 6.89 13.10
    sdb 71.00 1.00 24.00 11.00 5.94 1.53 437.09 0.60 17.14 13.67 24.73 9.03 31.60
    md127 0.00 0.00 0.00 52.00 0.00 9.57 376.96 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    27.06 0.03 6.00 0.00 0.00 66.91

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdi 14.00 0.00 10.00 9.00 1.50 4.06 599.58 0.13 6.84 2.60 11.56 6.63 12.60
    sdk 14.00 0.00 10.00 10.00 1.50 5.00 665.60 0.13 7.05 2.50 11.60 6.35 12.70
    sdc 14.00 0.00 11.00 10.00 1.56 4.01 543.24 0.15 7.33 1.00 14.30 7.24 15.20
    sdd 14.00 0.00 11.00 10.00 1.56 4.01 543.24 0.15 7.14 1.00 13.90 7.05 14.80
    sda 14.00 0.00 11.00 10.00 1.56 4.20 561.52 0.12 5.38 0.91 10.30 5.43 11.40
    sdj 14.00 0.00 10.00 9.00 1.50 4.06 599.58 0.26 13.68 3.60 24.89 13.47 25.60
    sdl 14.00 0.00 10.00 9.00 1.50 4.50 646.74 0.13 6.63 1.30 12.56 6.47 12.30
    sdh 13.00 0.00 11.00 9.00 1.50 4.00 563.60 0.11 5.70 1.18 11.22 5.55 11.10
    sde 14.00 0.00 10.00 8.00 1.50 4.00 625.78 0.09 4.78 1.10 9.38 4.67 8.40
    sdf 14.00 0.00 10.00 8.00 1.50 4.00 625.78 0.14 8.06 4.00 13.12 7.17 12.90
    sdg 13.00 0.00 11.00 9.00 1.50 4.00 563.60 0.14 7.00 1.91 13.22 6.80 13.60
    sdb 14.00 0.00 11.00 10.00 1.56 4.20 561.52 0.17 7.67 1.73 14.20 7.71 16.20
    md127 0.00 0.00 0.00 56.00 0.00 25.27 924.14 0.00 0.00 0.00 0.00 0.00 0.00

  • It looks like some pretty heavy writes are going on at the time. I’m not sure what you mean by “nose dives”, but I’d expect *some* performance impact of running a read-intensive process like a RAID check at the same time you’re running a write-intensive process.

    Do the same write-heavy processes run on the other clusters, where you aren’t seeing performance issues?

  • All of our Kafka clusters are fairly write-heavy. The cluster in question is our second-heaviest – we haven’t yet upgraded the heaviest, due to the issues we’ve been experiencing in this one.

    Here is an iostat example from a host within the same cluster, but without the RAID check running:

    [root@r2k1 ~] # iostat -xdmc 1 10
    Linux 3.10.0-327.13.1.el7.x86_64 (r2k1) 05/27/16 _x86_64_ (32 CPU)

    avg-cpu: %user %nice %system %iowait %steal %idle
    8.87 0.02 1.28 0.21 0.00 89.62

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.02 0.55 0.15 27.06 0.03 11.40 859.89 1.02 37.40 36.13 37.41 6.86 18.65
    sdf 0.02 0.48 0.15 26.99 0.03 11.40 862.17 0.15 5.56 40.94 5.37 7.27 19.73
    sdk 0.03 0.58 0.22 27.10 0.03 11.40 857.01 1.60 58.49 36.20 58.67 7.17 19.58
    sdb 0.02 0.52 0.15 27.43 0.03 11.40 848.37 0.02 0.78 42.84 0.55 7.07 19.50
    sdj 0.02 0.55 0.15 27.11 0.03 11.40 858.28 0.62 22.70 41.97 22.59 7.43 20.27
    sdg 0.03 0.68 0.22 27.76 0.03 11.40 836.98 0.76 27.10 34.36 27.04 7.33 20.51
    sde 0.03 0.48 0.22 26.99 0.03 11.40 860.43 0.33 12.07 33.16 11.90 7.34 19.98
    sda 0.03 0.52 0.22 27.43 0.03 11.40 846.65 0.57 20.48 36.42 20.35 7.34 20.31
    sdh 0.02 0.68 0.15 27.76 0.03 11.40 838.63 0.47 16.66 40.96 16.53 7.20 20.09
    sdc 0.03 0.55 0.22 27.06 0.03 11.40 858.19 0.74 27.30 36.96 27.22 7.55 20.58
    sdi 0.03 0.53 0.22 27.13 0.03 11.40 856.04 1.60 58.50 27.43 58.75 5.21 14.24
    sdl 0.02 0.56 0.15 27.11 0.03 11.40 858.27 1.12 41.09 27.89 41.16 5.00 13.63
    md127 0.00 0.00 2.53 161.84 0.36 68.39 856.56 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    13.11 0.00 1.82 1.07 0.00 84.01

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 81.00 0.00 38.48 972.95 51.00 219.06 0.00 219.06 6.37 51.60
    sdf 0.00 1.00 0.00 73.00 0.00 33.70 945.33 55.02 235.86 0.00 235.86 7.12 52.00
    sdk 0.00 1.00 0.00 56.00 0.00 25.70 939.73 60.45 223.79 0.00 223.79 9.29 52.00
    sdb 0.00 2.00 0.00 70.00 0.00 34.48 1008.70 58.88 292.81 0.00 292.81 7.37 51.60
    sdj 0.00 3.00 0.00 62.00 0.00 29.87 986.60 59.32 243.48 0.00 243.48 8.26 51.20
    sdg 0.00 1.00 0.00 49.00 0.00 23.43 979.45 60.37 234.98 0.00 234.98 10.53 51.60
    sde 0.00 1.00 0.00 61.00 0.00 27.95 938.38 58.17 239.57 0.00 239.57 8.52 52.00
    sda 0.00 2.00 0.00 56.00 0.00 27.48 1004.88 56.27 202.88 0.00 202.88 9.27 51.90
    sdh 0.00 1.00 0.00 70.00 0.00 33.57 982.19 59.00 277.84 0.00 277.84 7.43 52.00
    sdc 0.00 0.00 0.00 64.00 0.00 30.06 961.89 58.20 268.30 0.00 268.30 8.08 51.70
    sdi 0.00 3.00 0.00 116.00 0.00 55.62 981.94 44.54 199.72 0.00 199.72 4.56 52.90
    sdl 0.00 1.00 0.00 128.00 0.00 60.31 964.88 43.91 215.94 0.00 215.94 4.11 52.60
    md127 0.00 0.00 0.00 1143.00 0.00 538.90 965.59 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    15.70 0.00 1.97 0.44 0.00 81.89

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 119.00 0.00 56.39 970.42 42.84 639.45 0.00 639.45 6.66 79.20
    sdf 0.00 1.00 0.00 129.00 0.00 61.21 971.84 48.89 672.04 0.00 672.04 6.34 81.80
    sdk 0.00 0.00 0.00 152.00 0.00 72.62 978.53 61.02 716.76 0.00 716.76 5.74 87.20
    sdb 0.00 1.00 0.00 133.00 0.00 62.86 967.88 54.10 695.35 0.00 695.35 6.45 85.80
    sdj 0.00 0.00 0.00 146.00 0.00 68.36 958.85 69.22 767.12 0.00 767.12 6.85 100.00
    sdg 0.00 0.00 0.00 146.00 0.00 69.87 980.11 77.99 789.53 0.00 789.53 6.85 100.00
    sde 0.00 1.00 0.00 141.00 0.00 66.96 972.60 56.21 707.61 0.00 707.61 6.21 87.60
    sda 0.00 1.00 0.00 147.00 0.00 69.86 973.22 62.21 728.76 0.00 728.76 6.32 92.90
    sdh 0.00 0.00 0.00 134.00 0.00 62.61 956.90 55.79 711.49 0.00 711.49 6.63 88.90
    sdc 0.00 0.00 0.00 136.00 0.00 64.81 975.94 61.46 753.57 0.00 753.57 6.93 94.20
    sdi 0.00 0.00 0.00 93.00 0.00 42.67 939.61 17.60 419.10 0.00 419.10 4.63 43.10
    sdl 0.00 0.00 0.00 80.00 0.00 38.02 973.20 11.00 340.79 0.00 340.79 4.25 34.00
    md127 0.00 0.00 0.00 87.00 0.00 40.99 964.97 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    12.11 0.00 1.35 0.00 0.00 86.54

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 15.00 0.00 15.00 15.00 1.50
    sdf 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10
    sdk 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10
    sdb 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 7.00 0.00 7.00 7.00 0.70
    sdj 0.00 0.00 0.00 2.00 0.00 0.06 64.50 0.01 733.50 0.00 733.50 7.50 1.50
    sdg 0.00 0.00 0.00 10.00 0.00 2.88 588.90 0.55 1212.80 0.00 1212.80 15.50 15.50
    sde 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20
    sda 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10
    sdh 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 20.00 0.00 20.00 20.00 2.00
    sdc 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70
    sdi 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20
    sdl 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70
    md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    15.22 0.00 1.50 0.00 0.00 83.28

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    16.96 0.09 1.63 0.16 0.00 81.16

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.09 11.50 0.00 11.50 8.75 7.00
    sdf 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.08 16.20 0.00 16.20 16.20 8.10
    sdk 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.06 20.33 0.00 20.33 20.33 6.10
    sdb 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 16.67 0.00 16.67 16.67 5.00
    sdj 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 14.50 0.00 14.50 11.00 4.40
    sdg 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.14 36.00 0.00 36.00 32.75 13.10
    sde 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.07 13.60 0.00 13.60 13.60 6.80
    sda 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 15.67 0.00 15.67 15.67 4.70
    sdh 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.06 14.50 0.00 14.50 11.50 4.60
    sdc 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.11 13.25 0.00 13.25 10.62 8.50
    sdi 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 15.50 0.00 15.50 12.00 4.80
    sdl 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.04 13.67 0.00 13.67 13.67 4.10
    md127 0.00 0.00 0.00 17.00 0.00 3.78 455.53 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    14.08 0.00 1.50 0.00 0.00 84.42

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    14.89 0.00 1.98 0.00 0.00 83.13

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 90.00 0.00 41.31 940.01 27.25 302.80 0.00 302.80 7.07 63.60
    sdf 0.00 0.00 0.00 87.00 0.00 41.35 973.44 22.73 261.30 0.00 261.30 6.92 60.20
    sdk 0.00 2.00 0.00 97.00 0.00 42.08 888.42 39.86 410.94 0.00 410.94 8.10 78.60
    sdb 0.00 0.00 0.00 87.00 0.00 41.07 966.82 24.39 280.30 0.00 280.30 7.14 62.10
    sdj 0.00 1.00 0.00 91.00 0.00 41.94 943.92 36.37 399.62 0.00 399.62 8.44 76.80
    sdg 0.00 0.00 0.00 86.00 0.00 40.67 968.48 31.76 369.33 0.00 369.33 8.81 75.80
    sde 0.00 0.00 0.00 87.00 0.00 41.35 973.44 30.80 354.05 0.00 354.05 9.01 78.40
    sda 0.00 0.00 0.00 87.00 0.00 41.07 966.82 32.61 374.80 0.00 374.80 8.57 74.60
    sdh 0.00 0.00 0.00 86.00 0.00 40.67 968.48 29.52 343.23 0.00 343.23 8.56 73.60
    sdc 0.00 0.00 0.00 89.00 0.00 40.81 939.07 32.80 360.15 0.00 360.15 8.91 79.30
    sdi 0.00 1.00 0.00 91.00 0.00 41.94 943.92 19.60 215.34 0.00 215.34 5.62 51.10
    sdl 0.00 2.00 0.00 97.00 0.00 42.08 888.42 19.59 201.93 0.00 201.93 4.69 45.50
    md127 0.00 0.00 0.00 535.00 0.00 248.42 950.95 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    11.08 0.00 1.41 0.00 0.00 87.51

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 5.00 0.00 42.00 0.00 0.38 18.55 2.25 53.52 0.00 53.52 4.93 20.70
    sdf 0.00 0.00 0.00 35.00 0.00 0.21 12.43 1.62 46.17 0.00 46.17 5.29 18.50
    sdk 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.99 47.29 0.00 47.29 4.64 19.50
    sdb 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.77 47.78 0.00 47.78 4.12 23.90
    sdj 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.79 45.97 0.00 45.97 5.21 20.30
    sdg 0.00 11.00 0.00 66.00 0.00 0.40 12.45 3.60 54.47 0.00 54.47 3.42 22.60
    sde 0.00 0.00 0.00 35.00 0.00 0.21 12.43 2.13 61.00 0.00 61.00 8.89 31.10
    sda 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.48 42.81 0.00 42.81 3.71 21.50
    sdh 0.00 11.00 0.00 66.00 0.00 0.40 12.45 4.81 72.83 0.00 72.83 3.80 25.10
    sdc 0.00 5.00 0.00 43.00 0.00 0.88 41.93 1.99 63.81 0.00 63.81 5.00 21.50
    sdi 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.31 33.69 0.00 33.69 4.03 15.70
    sdl 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.23 29.33 0.00 29.33 3.71 15.60
    md127 0.00 0.00 0.00 313.00 0.00 2.01 13.14 0.00 0.00 0.00 0.00 0.00 0.00

    avg-cpu: %user %nice %system %iowait %steal %idle
    16.16 0.03 1.66 0.00 0.00 82.15

    Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

  • I did some additional testing – I stopped Kafka on the host, and kicked off a disk check, and it ran at the expected speed overnight. I started kafka this morning, and the raid check’s speed immediately dropped down to ~2000K/Sec.

    I then enabled the write-back cache on the drives (hdparm -W1 /dev/sd*). The raid check is now running between 100000K/Sec and 200000K/Sec, and has been for several hours (it fluctuates, but seems to stay within that range). Write-back cache is NOT enabled for the drives on the hosts we haven’t upgraded yet, but the speeds are similar (I kicked off a raid check on one of our CentOS 6 hosts as well, the window seems to be 150000 – 200000K/Sec on that host).

    Kelly

  • Kelly Lesperance wrote:

    Perhaps I missed where you answered this: is this software RAID, or hardware? And I think you said you’re upgrading existing boxes?

    mark

  • Software RAID 10. Servers are HP DL380 Gen 8s, with 12×4 TB 7200 RPM drives.

  • Hi Kelly,

    I hope this is relevant — you might want to try the very most recent kernel in git to see if your problem is fixed.

    Best regards,