Nfs (or Tcp Or Scheduler) Changes Between CentOS 5 And 6?

Home » CentOS » Nfs (or Tcp Or Scheduler) Changes Between CentOS 5 And 6?
CentOS 11 Comments

We have a “compute cluster” of about 100 machines that do a read-only NFS mount to a big NAS filer (a NetApp FAS6280). The jobs running on these boxes are analysis/simulation jobs that constantly read data off the NAS.

We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5. We did a “piecemeal” upgrade, usually upgrading five or so machines at a time, every few days. We noticed improved performance on the CentOS
6 boxes. But as the number of CentOS 6 boxes increased, we actually saw performance on the CentOS 5 boxes decrease. By the time we had only a few CentOS 5 boxes left, they were performing so badly as to be effectively worthless.

What we observed in parallel to this upgrade process was that the read latency on our NetApp device skyrocketed. This in turn caused all compute jobs to actually run slower, as it seemed to move the bottleneck from the client servers’ OS to the NetApp. This is somewhat counter-intuitive: CentOS 6 performs faster, but actually results in net performance loss because it creates a bottleneck on our centralized storage.

All indications are that CentOS 6 seems to be much more “aggressive”
in how it does NFS reads. And likewise, CentOS 5 was very “polite”, to the point that it basically got starved out by the introduction of the 6.5 boxes.

What I’m looking for is a “deep dive” list of changes to the NFS
implementation between CentOS 5 and CentOS 6. Or maybe this is due to a change in the TCP stack? Or maybe the scheduler? We’ve tried a lot of sysctl tcp tunings, various nfs mount options, anything that’s obviously different between 5 and 6… But so far we’ve been unable to find the “smoking gun” that causes the obvious behavior change between the two OS versions.

Just hoping that maybe someone else out there has seen something like this, or can point me to some detailed documentation that might clue me in on what to look for next.

Thanks!

11 thoughts on - Nfs (or Tcp Or Scheduler) Changes Between CentOS 5 And 6?

  • Matt Garman wrote:

    *IF* I understand you, I’ve got one question: what parms are you using to mount the storage? We had *real* performance problems when we went from 5
    to 6 – as in, unzipping a 26M file to 107M, while writing to an NFS-mounted drive, went from 30 sec or so to a *timed* 7 min. The final answer was that once we mounted the NFS filesystem with nobarrier in fstab instead of default, the time dropped to 35 or 40 sec again.

    barrier is in 6, and tries to make writes atomic transactions; its intent is to protect in case of things like power failure. Esp. if you’re on UPSes, nobarrier is the way to go.

    mark

  • –Some things come to mind as far as investigating differences; you don’t have to answer them all here; just making sure you’ve covered them all:

    Have you looked at the client-side NFS cache? Perhaps the C6 cache is either disabled, has fewer resources, or is invalidating faster?
    (I don’t think that would explain the C5 starvation, though, unless it’s a secondary effect from retransmits, etc.)

    Regarding the cache, do you have multiple mount points on a client that resolve to the same server filesystem? If so, do they have different mount options? If so, that can result in multiple caches instead of a single disk cache. The client cache can also be bypassed if your application is doing direct I/O on the files. Perhaps there is a difference in the application between C5 and C6, including whether or not it was just recompiled? (If so, can you try a C5 version on the C6 machines?)

    If you determine that C6 is doing aggressive caching, does this match the needs of your application? That is, do you have the situation where the client NFS layer does an aggressive read-ahead that is never used by the application?

    Are C5 and C6 using the same NFS protocol version? How about TCP vs UDP? If UDP is in play, have a look at fragmentation stats under load.

    Are both using the same authentication method (ie: maybe just UID-based)?

    And, like always, is DNS sane for all your clients and servers? Everything
    (including clients) has proper PTR records, consistent with A records, et al? DNS is so fundamental to everything that if it is out of whack you can get far-reaching symptoms that don’t seem to have anything to do with DNS.

    <http://wiki.linux-nfs.org> has helpful information about enabling debug output on the client end to see what is going on. I don’t know in your situation if enabling server-side debugging is feasible.
    <http://nfs.sourceforge.net> also has useful tuning information.

    You may want to look at NFSometer and see if it can help.

    Devin

  • James Pearson wrote:
    you’re on What kind of filesystem is it? I note that xfs also has barrier as a mount option.

    mark

  • Do you know where the NFS cache settings are specified? I’ve looked at the various nfs mount options. Anything cache-related appears to be the same between the two OSes, assuming I didn’t miss anything. We did experiment with the “noac” mount option, though that had no effect in our tests.

    FWIW, we’ve done a tcpdump on both OSes, performing the same tasks, and it appears that 5 actually has more “chatter”. Just looking at packet counts, 5 has about 17% more packets than 6, for the same workload. I haven’t dug too deep into the tcpdump files, since we need a pretty big workload to trigger the measurable performance discrepancy. So the resulting pcap files are on the order of 5 GB.

    No multiple mount points to the same server.

    No application differences. We’re still compiling on 5, regardless of target platform.

    That was one of our early theories. On 6, you can adjust this via
    /sys/class/bdi/X:Y/read_ahead_kb (use stat on the mountpoint to determine X and Y). This file doesn’t exist on 5. But we tried increasing and decreasing it from the default (960), and didn’t see any changes.

    Yup, both are using tcp, protocol version 3.

    Yup, sec=sys.

    I believe so. I wouldn’t bet my life on it. But there were certainly no changes to our DNS before, during or since the OS upgrade.

    Haven’t seen that, will definitely give it a try!

    Thanks for your thoughts and suggestions!

  • The server is a NetApp FAS6280. It’s using NetApp’s filesystem. I’m almost certain it’s none of the common Linux ones. (I think they call it WAFL IIRC.)

    Either way, we do the NFS mount read-only, so write barriers don’t even come into play. E.g., with your original example, if we unzipped something, we’d have to write to the local disk.

    Furthermore, in “low load” situations, the NetApp read latency stays low, and the 5/6 performance is fairly similar. It’s only when the workload gets high, and it turn this “aggressive” demand is placed on the NetApp, that we in turn see overall decreased performance.

    Thanks for the thoughts!

  • Try “nfsstat -cn” on the clients to see if any particular NFS operations occur more or less frequently on the C6 systems.

    Also look at the “lookupcache” option found in “man nfs”:

    lookupcache=mode Specifies how the kernel manages its cache of directory entries for a given mount point. mode can be one of all, none, pos, or positive. This option is supported in kernels 2.6.28 and later.
    (there is more text in the man page)

    Since C5 came with 2.6.18 and C6 with 2.6.32 this might have something to do with it.

    Regards,
    Dennis

  • Also check out NetApp performance monitors, e.g. autoupport web site or trusty old filer-mrtg. NFS ops and cpu load might be an indication of things going wrong at the NetApp end – you might run into particular bugs and want to upgrade to the latest patch level of the OS.

  • You may want to try reducing sunrpc.tcp_max_slot_table_entries . In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16
    In CentOS 6, this number is dynamic with a maximum of sunrpc.tcp_max_slot_table_entries which by default has a value of 65536.

    We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc tcp_max_slot_table_entries8

    You can’t put this in /etc/sysctl.conf because the sunrpc kernel module is loaded before sysctl -p is done.

    peter

  • This appears to be the “smoking gun” we were looking for, or at least a significant piece of the puzzle.

    We actually tried this early on in our investigation, but were changing it via sysctl, which apparently has no effect. Your email convinced me to try again, but this time configuring the parameters via modprobe.

    In our case, 128 was still too high. So we dropped it all the way down to 16. Our understanding is that 16 is the CentOS 5 value. What we’re seeing is now our apps are starved for data, so looks like we might have to nudge it up. In other words, there’s either something else at play which we’re not aware of, or the meaning of that parameter is different between CentOS 5 and CentOS 6.

    Anyway, thank you very much for the suggestion. You turned on the light at the end of the tunnel!