Filesystem Writes Unexpectedly Slow (CentOS 6.4)

Home » CentOS » Filesystem Writes Unexpectedly Slow (CentOS 6.4)
CentOS 8 Comments

I have a rather large box (2×8-core Xeon, 96GB RAM) where I have a couple of disk arrays connected on an Areca controller. I just added a new external array,
8 3TB drives in RAID5, and the testing I’m doing right now is on this array, but this seems to be a problem on this machine in general, on all file systems
(even, possibly, NFS, but I’m not sure about that one yet).

So, if I use iozone -a to test write speeds on the raw device, I get results in the 500-800MB/sec range, depending on write sizes, which is about what I’d expect.

However, when I have an ext4 filesystem on this device, mounted with noatime and data=writeback, (the filesystem is completely empty) and I test with dd, the results are less encouraging:

dd bs=1M if=/dev/zero of=/Volumes/data_10-2/test.bin count=40000
40000+0 records in
40000+0 records out
41943040000 bytes (42 GB) copied, 292.288 s, 143 MB/s

Now, I’m not expecting to get the raw device speeds, but this seems at least to be 2-3 times slower than what I’d expect.

Using conv=fsync oflag=direct makes it utterly pathetic:

dd bs=1M if=/dev/zero of=/Volumes/data_10-2/test.bin oflag=direct conv=fsync count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 178.791 s, 29.3 MB/s

Now, I’m sure there can be many reasons for this, but I wonder where I should start looking to debug this.

8 thoughts on - Filesystem Writes Unexpectedly Slow (CentOS 6.4)

  • My first question would be, why not test the filesystem with iozone too?
    (And/or, test the device with dd.) You may or may not come up with the same results, but at least someone can’t come back and blame your testing methodology for the odd results.

    (Just as an aside, if your 6.4 box is on a public network, you should probably consider updating it as well, since many security and bug fixes have been issued since 6.4 was released.)

    If you are still getting poor results from ext4, you have at least two more options.

    ==Check with the ext4 mailing list; they’re usually pretty helpful.
    ==Try your tests against xfs. Try to make sure your tests are replicating your use cases as closely as you can manage; you wouldn’t want to pick a filesystem based on a test that doesn’t actually replicate how you’re going to use the fs.

    –keith

  • The first thing I would check is that you have a BBU installed on the areca controller and that it is functioning properly (check the cli, I
    don’t know the exact commands off the top of my head), also make sure that write caching is enabled on the controller (after you’ve checked the BBU, of course). Without a working BBU in place hardware RAID
    controllers, such as areca, disable write caching (by default) and this will have a significant impact on write speeds.

    Note that newer controllers use a type of flash memory instead of a BBU.

    Peter

  • Yes, I have a BBU and it’s working. No write caching should, however, not affect raw device writes and filesystem writes so differently, I think.

  • Googling shows some people who solved what seems like a similar problem with a kernel upgrade, so I’m going to try that. This box is on 2.6.32-358, and
    2.6.32-431.29.2 seems to be the newest. At least it’s a factor to eliminate.

  • First I’d suggest comparing apples to apples. That is try doing the dd test on the raw device and compare to dd on ext4.

    Then you may want to try changing io scheduler from the default cfq to deadline. This typically works better for many raid controllers but ymmv.

    Also testing with xfs instead of ext4 is probably worth it. xfs usually outperform ext4 in streaming writes (like dd). Of course this raises the question of whether that dd is a useful metric for your actual load… xfs may infact be needed (3T * 7 = 21 TB > ext4 max (if I
    remember correctly, refer to rh online data for rhel6 to make sure)).

    Good luck, Peter K

  • Upgrading to 6.5 with its new kernel did not fix the problem. I will be doing some more testing. The strange thing is, I have a near-identical machine also running CentOS 6.5, also with ext4 on the same controller (and another, newer Areca controller), and there it’s extremely fast, on the fastest controller there, dd hits around 2GB/sec sustained over 200 GB of data on a 24-disk RAID6
    (both systems have 96GB of RAM each).

    And yes, I’ve formatted with a newer version of e2fsprogs than is included with the distro, to get 16TB+ support, although in the case of the device I’m currently testing, it actually has two partitions, so I wouldn’t have needed to.

    I’ll do a bit more testing and come back with my results.

  • That is interesting. I have barriers enabled, I will try disabling them (I have battery backup on the RAID controller, so it should be ok). Then again, I have barriers enabled on the other, very similar box too, and I hit 2GB/sec there, so I don’t think that’s the main factor. Still, will test. Thanks.


    Joakim Ziegler – Supervisor de postproducción – Terminal joakim@terminalmx.com – 044 55 2971 8514 – 5264 0864

  • Have you tried swapping disks from the bad performing system into the chassis of the one that performs as expected?
    Or move the RAID hardware if that’s easier.

    One would think if it’s a OS configuration issue the problem would follow the OS/disks (or RAID controller if you swap that instead). It’s a pain, but you’re probably close to that as a last resort to narrow down your problem.