Thought I’d post this here, too – I emailed it to the redhat list, and that’s pretty moribund, while I’ve seen redhatters here….
TSwgIDxtLnJvdGhANS1jZW50LnVzPiB3cm90ZToKPiBUaG91Z2h0IEknZCBwb3N0IHRoaXMgaGVy ZSwgdG9vIC0gSSBlbWFpbGVkIGl0IHRvIHRoZSByZWRoYXQgbGlzdCwgYW5kCj4gdGhhdCdzIHBy ZXR0eSBtb3JpYnVuZCwgd2hpbGUgSSd2ZSBzZWVuIHJlZGhhdHRlcnMgaGVyZS4uLi4KPgo+IC0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0gT3JpZ2luYWwgTWVzc2FnZSAtLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tCj4gU3ViamVjdDogQnVnIDgwMDE4MTogTkZTdjQgb24gUkhFTCA2LjIg b3ZlciBzaXggdGltZXMgc2xvd2VyIHRoYW4gNS43Cj4gRnJvbTogICAgbS5yb3RoQDUtY2VudC51
PiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLQo+Cj4gbS5yb3RoQDUtY2VudC51cyB3cm90ZToKPj4gRm9yIGFu eSByZWRoYXR0ZXJzIG9uIHRoZSBsaXN0LCBJJ20gZ29pbmcgdG8gYmUgcmVvcGVuaW5nIHRoaXMg YnVnIHRvZGF5Lgo+Pgo+PiBJIGFtIGFsc28gVkVSWSB1bmhhcHB5IHdpdGggUmVkaGF0LiBJIGZp bGVkIHRoZSBidWcgbW9udGhzIGFnbywgYW5kIGl0IHdhcwo+PiAqbmV2ZXIqIGFzc2lnbmVkIC0g bm8gb25lIGFwcGFyZW50bHkgZXZlbiBsb29rZWQgYXQgaXQuIEl0J3MgYQo+PiBzaG93LXN0b3Bw ZXIgZm9yIHVzLCBzaW5jZSBpdCBoaXRzIHVzIG9uIG91ciBob21lIGRpcmVjdG9yeSBzZXJ2ZXJz Lgo+Pgo+PiBBIHdlZWsgb3Igc28gYWdvLCBJIHVwZGF0ZWQgb3VyIHRlc3Qgc3lzdGVtIHRvIDYu MywgYW5kICpub3RoaW5nKiBoYXMKPj4gY2hhbmdlZC4gVW5wYWNrIGEgbGFyZ2UgZmlsZSBsb2Nh bGx5LCBhbmQgaXQncyBzZWNvbmRzLiBVbnBhY2sgZnJvbSBhbgo+PiBORlMtbW91bnRlZCBkaXJl Y3RvcnkgdG8gYSBsb2NhbCBkaXNrIHRha2VzIGFib3V0IDEuNW1pbi4gTkZTIG1vdW50IGVpdGhl cgo+PiBhbiBleHQzIG9yIGV4dDQgZnMsIGNkIHRvIHRoYXQgZGlyZWN0b3J5LCBhbmQgSSBydW4g YSBqb2IgdG8gdW5wYWNrIGEKPj4gbGFyZ2UgZmlsZSB0byB0aGUgTkZTLW1vdW50ZWQgZGlyZWN0
IG1vdmUgb3VyIGhvbWUgZGlyZWN0b3J5IHNlcnZlcnMgdG8gNi54IHdpdGggdGhpcwo+PiB1bmFj a25vd2xlZGdlZCAtPkJVRzwtLgo+Pgo+PiBMYXJnZSBmaWxlIGlzIGRlZmluZWQgYXMgYSAyOE0g Lmd6IGZpbGUsIHVucGFja2VkIHRvIDkyTS4KPj4KPj4gVGhpcyBpcyAxMDAlIHJlcGVhdGFibGUu Cj4+Cj4+IEkgdHJpZWQgc2VuZGluZyBhbiBlbWFpbCB0byBvdXIgc3VwcG9ydCB3ZWVrcyBhZ28s IGFuZCBnb3Qgbm8gcmVzcG9uc2UuCj4+IE1heWJlIGl0IHRha2VzIHNoYW1pbmcgaW4gYSBwdWJs aWMgZm9ydW0gdG8gZ2V0IGFueW9uZSB0byBhY2tub3dsZWRnZSB0aGlzCj4+IGV4aXN0cy4uLi4K
Pj4KPiAgICAgICAgICAgbWFyawo+Cj4KPgo+IF9fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fCj4gQ2VudE9TIG1haWxpbmcgbGlzdAo+IENlbnRPU0BjZW50b3Mu b3JnCj4gaHR0cDovL2xpc3RzLmNlbnRvcy5vcmcvbWFpbG1hbi9saXN0aW5mby9jZW50b3MKCgoK
Out of curiosity, do you have a Red Hat subscription with Standard or better support? The SLAs for even a severity 4 issue should have got you a response within 2 business days.
Did you give them a call?
If you are just using the Red Hat bugzilla that might be your problem. I’ve heard a rumour that Red Hat doesn’t really monitor that channel, giving preference to issues raised though their customer portal. That does makes _some_ commercial sense, but if they are, it would be polite to shut down the old bugzilla service and save some frustration. I
don’t have a Red Hat subscription myself, so I can’t really test this. Can anyone, perhaps with a Red Hat subscription, shed any light on this?
It occurs that I might be hi-jacking a thread here, so apologies if that is the case.
We have this issue.
I have a support call open with Red Hat about it. Bug reports will only really forcibly get actioned if you open a support call and point at the bug report.
I also have this issue though much much worse on Fedora (using BTRFS), which will surely have to be fixed before BTRFS becomes the default fs in RHEL. But the Fedora bug I have open on this provided some useful insights on NFSv4 esp :
“NFS file and directory creates are synchronous operation: before the create can return, the client must get a reply from the server saying not only that it has created the new object, but that the create has actually hit the disk.”
Also listed here is a proposed protocol extension to NFS v4 to make file creation more efficient:
Not sure if this will be added to RH.
Also RH support found:
“NFSv4 file creation is actually about half the speed of file creation over NFSv3, but NFSv4 can delete files quicker than NFSv3. By far the largest speed gains come from running with the async option on, though using this can lead to issues if the NFS server crashes or is rebooted.”
I’m glad we aren’t the only ones seeing this, it sort of looked like we were when talking to support!
I’ll add this RH bug number to my RH support ticket.
But think yourself lucky, BTRFS on Fedora 16 was much worse. This was the time it took me to untar a vlc tarball.
F16 to RHEL5 – 0m 28.170s F16 to F16 ext4 – 4m 12.450s F16 to F16 btrfs – 14m 31.252s
A quick test seems to say this is better in F17 (3m7.240s on BTRFS but still looks like we are hitting NFSv4 issues for this but btrfs itself is better).
I wonder if the real issue is that NFSv4 waits for a directory change to sync to disk but linux wants to flush the whole disk cache before saying the sync is complete.
IHRoZSBkaXNrIGNhY2hlIGlzCmZsdXNoZWQgYmVmb3JlICdmc3luYycgcmV0dXJucy4gVGhpcyBp cyBlc3BlY2lhbGx5IHRydWUgaWYgeW91IHVzZQpzb2Z0d2FyZSBSQUlEIGFuZC9vciBMVk0uIFlv dSBtYXkgYmUgYWJsZSB0byBnZXQgdGhlIG9sZCBwZXJmb3JtYW5jZQpiYWNrIGJ5IGRpc2FibGlu ZyBJL08gYmFycmllcnMgYW5kIHVzaW5nIGEgVVBTLCBhIFJBSUQgY29udHJvbGxlciB0aGF0Cmhh cyBiYXR0ZXJ5IGJhY2tlZCBSQU0sIG9yIGVudGVycHJpc2UtZ3JhZGUgZHJpdmVzIHRoYXQgZ3Vh cmFudGVlCmZsdXNoaW5nIGFsbCB0aGUgZGF0YSB0byBkaXNrIGJ5IHVzaW5nIGEgJ3N1cGVyY2Fw JyB0byBzdG9yZSBlbm91Z2gKZW5lcmd5IHRvIGNvbXBsZXRlIGFsbCB3cml0ZXMuCgpHw6kKCk9u IFdlZCwgSnVsIDExLCAyMDEyIGF0IDk6NDkgQU0sIExlcyBNaWtlc2VsbCA8bGVzbWlrZXNlbGxA
Z21haWwuY29tPiB3cm90ZToKPiBPbiBXZWQsIEp1bCAxMSwgMjAxMiBhdCAxMToyOSBBTSwgQ29s aW4gU2ltcHNvbgo+IDxDb2xpbi5TaW1wc29uQGlvbmdlby5jb20+IHdyb3RlOgo+Pgo+PiBCdXQg dGhpbmsgeW91cnNlbGYgbHVja3ksIEJUUkZTIG9uIEZlZG9yYSAxNiB3YXMgbXVjaCB3b3JzZS4g VGhpcyB3YXMKPj4gdGhlIHRpbWUgaXQgdG9vayBtZSB0byB1bnRhciBhIHZsYyB0YXJiYWxsLgo+
Pgo+PiBGMTYgdG8gUkhFTDUgLSAwbSAyOC4xNzBzCj4+IEYxNiB0byBGMTYgZXh0NCAtICA0bSAx Mi40NTBzCj4+IEYxNiB0byBGMTYgYnRyZnMgLSAxNG0gMzEuMjUycwo+Pgo+PiBBIHF1aWNrIHRl c3Qgc2VlbXMgdG8gc2F5IHRoaXMgaXMgYmV0dGVyIGluIEYxNyAoM203LjI0MHMgb24gQlRSRlMg YnV0Cj4+IHN0aWxsIGxvb2tzIGxpa2Ugd2UgYXJlIGhpdHRpbmcgTkZTdjQgaXNzdWVzIGZvciB0
aGlzIGJ1dCBidHJmcyBpdHNlbGYKPj4gaXMgYmV0dGVyKS4KPgo+IEkgd29uZGVyIGlmIHRoZSBy ZWFsIGlzc3VlIGlzIHRoYXQgTkZTdjQgd2FpdHMgZm9yIGEgZGlyZWN0b3J5IGNoYW5nZQo+IHRv IHN5bmMgdG8gZGlzayBidXQgbGludXggd2FudHMgdG8gZmx1c2ggdGhlIHdob2xlIGRpc2sgY2Fj aGUgYmVmb3JlCj4gc2F5aW5nIHRoZSBzeW5jIGlzIGNvbXBsZXRlLgo+Cj4gLS0KPiAgIExlcyBN
aWtlc2VsbAo+ICAgICAgbGVzbWlrZXNlbGxAZ21haWwuY29tCj4gX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18KPiBDZW50T1MgbWFpbGluZyBsaXN0Cj4gQ2Vu dE9TQGNlbnRvcy5vcmcKPiBodHRwOi8vbGlzdHMuY2VudG9zLm9yZy9tYWlsbWFuL2xpc3RpbmZv L2NlbnRvcwoKCgotLSAKR8OpCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fCkNlbnRPUyBtYWlsaW5nIGxpc3QKQ2VudE9TQGNlbnRvcy5vcmcKaHR0cDovL2xp c3RzLmNlbnRvcy5vcmcvbWFpbG1hbi9saXN0aW5mby9jZW50b3MK
R8OpIFdlaWplcnMgd3JvdGU6Cj4gVGhpcyBpcyBsaWtlbHkgdG8gYmUgYSBidWcgaW4gUkhFTDUg cmF0aGVyIHRoYW4gb25lIGluIFJIRUw2LiBSSEVMNQo+IChrZXJuZWwgMi42LjE4KSBkb2VzIG5v dCBhbHdheXMgZ3VhcmFudGVlIHRoYXQgdGhlIGRpc2sgY2FjaGUgaXMKPiBmbHVzaGVkIGJlZm9y ZSAnZnN5bmMnIHJldHVybnMuIFRoaXMgaXMgZXNwZWNpYWxseSB0cnVlIGlmIHlvdSB1c2UKPiBz b2Z0d2FyZSBSQUlEIGFuZC9vciBMVk0uIFlvdSBtYXkgYmUgYWJsZSB0byBnZXQgdGhlIG9sZCBw ZXJmb3JtYW5jZQo+IGJhY2sgYnkgZGlzYWJsaW5nIEkvTyBiYXJyaWVycyBhbmQgdXNpbmcgYSBV
UFMsIGEgUkFJRCBjb250cm9sbGVyIHRoYXQKPiBoYXMgYmF0dGVyeSBiYWNrZWQgUkFNLCBvciBl bnRlcnByaXNlLWdyYWRlIGRyaXZlcyB0aGF0IGd1YXJhbnRlZQo+IGZsdXNoaW5nIGFsbCB0aGUg ZGF0YSB0byBkaXNrIGJ5IHVzaW5nIGEgJ3N1cGVyY2FwJyB0byBzdG9yZSBlbm91Z2gKPiBlbmVy Z3kgdG8gY29tcGxldGUgYWxsIHdyaXRlcy4KPgo+IEfDqQo+Cj4gT24gV2VkLCBKdWwgMTEsIDIw MTIgYXQgOTo0OSBBTSwgTGVzIE1pa2VzZWxsIDxsZXNtaWtlc2VsbEBnbWFpbC5jb20+Cj4gd3Jv dGU6Cj4+IE9uIFdlZCwgSnVsIDExLCAyMDEyIGF0IDExOjI5IEFNLCBDb2xpbiBTaW1wc29uCj4+
Pj4gdGhlIHRpbWUgaXQgdG9vayBtZSB0byB1bnRhciBhIHZsYyB0YXJiYWxsLgo+Pj4KPj4+IEYx NiB0byBSSEVMNSAtIDBtIDI4LjE3MHMKPj4+IEYxNiB0byBGMTYgZXh0NCAtICA0bSAxMi40NTBz Cj4+PiBGMTYgdG8gRjE2IGJ0cmZzIC0gMTRtIDMxLjI1MnMKPj4+Cj4+PiBBIHF1aWNrIHRlc3Qg c2VlbXMgdG8gc2F5IHRoaXMgaXMgYmV0dGVyIGluIEYxNyAoM203LjI0MHMgb24gQlRSRlMgYnV0
Cj4+PiBzdGlsbCBsb29rcyBsaWtlIHdlIGFyZSBoaXR0aW5nIE5GU3Y0IGlzc3VlcyBmb3IgdGhp cyBidXQgYnRyZnMgaXRzZWxmCj4+PiBpcyBiZXR0ZXIpLgo+Pgo+PiBJIHdvbmRlciBpZiB0aGUg cmVhbCBpc3N1ZSBpcyB0aGF0IE5GU3Y0IHdhaXRzIGZvciBhIGRpcmVjdG9yeSBjaGFuZ2UKPj4g dG8gc3luYyB0byBkaXNrIGJ1dCBsaW51eCB3YW50cyB0byBmbHVzaCB0aGUgd2hvbGUgZGlzayBj YWNoZSBiZWZvcmUKPj4gc2F5aW5nIHRoZSBzeW5jIGlzIGNvbXBsZXRlLgoKVGhhbmtzLCBMZXMs IHRoYXQncyAqdmVyeSogaW50ZXJlc3RpbmcuCgpCYXNlZCBvbiB0aGF0LCBJJ20gdHJ5aW5nIGFn YWluLCBhcyBJIGRpZCBiYWNrIGluIE1hcmNoLCB3aGVuIEkgZmlsZWQgdGhlCm9yaWdpbmFsIGJ1
IHJlc3VsdHMuCj4+CkdlLCBzb3JyeSwgYnV0IGl0IGhpdCB1cywgd2l0aCB0aGUgc2FtZSBjb25m aWd1cmF0aW9uIHdlIGhhZCBpbiA1LCB3aGVuIHdlCnRyaWVkIHRvIG1vdmUgdG8gNi4KCkFuZCBw bGVhc2UgZG9uJ3QgdG9wIHBvc3QuCgogICAgICAgIG1hcmsKCgpfX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fXwpDZW50T1MgbWFpbGluZyBsaXN0CkNlbnRPU0Bj ZW50b3Mub3JnCmh0dHA6Ly9saXN0cy5jZW50b3Mub3JnL21haWxtYW4vbGlzdGluZm8vY2VudG9z Cg==
—– Original Message —
I have tried the async option and that reverts to being as fast as previously.
So I guess the choice is use the less safe async and get file creation being quick or live with the slow down until a potentially new protocol extension appears to help with this.
IGNvcnJlY3RseSwgd2hpY2ggc2xvd3MgdGhpbmdzIGRvd24gYnV0IGtlZXBzIHlvdSBmcm9tIGxv c2luZyBkYXRhLi4uLgoKV2hpY2ggaXMgb2YgY291cnNlIG5vIGV4Y3VzZSBmb3Igbm90IGV2ZW4g cmVzcG9uZGluZyB0byBhIHN1cHBvcnQKcmVxdWVzdC4gIkl0J3Mgbm90IGEgYnVnLCBpdCdzIGEg ZmVhdHVyZSIgbWF5IG5vdCBiZSB0aGUgcmVzcG9uc2UgdGhlCmNsaWVudCB3YW50cyB0byBoZWFy LCBidXQgaXQncyBtdWNoIGJldHRlciB0aGFuIG5vIHJlc3BvbnNlIGF0IGFsbC4KCkptMmMKCi0t IApUaWxtYW4gU2NobWlkdApQaG9lbml4IFNvZnR3YXJlIEdtYkgKQm9ubiwgR2VybWFueQpfX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpDZW50T1MgbWFpbGlu ZyBsaXN0CkNlbnRPU0BjZW50b3Mub3JnCmh0dHA6Ly9saXN0cy5jZW50b3Mub3JnL21haWxtYW4v bGlzdGluZm8vY2VudG9zCg==
The most aggravating part of this is when my manager first set me the problem of trying to find a workaround, I *did* try async, and got no difference. Now, I can’t replicate that… but the oldest version I have is still 6.2, and I think I was working under 6.0 or 6.1.
*After* I test further, I think it’s up to my manager and our users to decide if it’s worth it to go with less secure – this is a real issue, since some of their jobs run days, and one or two weeks, on an HBS* or a good sized cluster. (We’re speaking of serious scientific computing here.)
* Technical term: honkin’ big server, things like 48 or 64 cores, quarter of a terabyte of memory or so….
I always wondered why the default for nfs was ever sync in the first place. Why shouldn’t it be the same as local use of the filesystem?
The few things that care should be doing fsync’s at the right places anyway.
This rumour is almost certainly unfounded. I report the odd bug to RH
through Bugzilla and I have always had a timely acknowledgement and as far as I can tell they have either been rejected or accepted within a reasonably short time. Some of them have actually been fixed.
Well, the reason would be that LOCAL operations happen at speeds that are massively smaller (by factors of hundreds or thousands of times)
than do operations that take place via NFS on a normal network. If you are doing something with your network connection to make it very low latency where the speeds rival local operations, then it would likely be fine to use the exact same settings as local operations. If you are not doing low latency operations, then you are increasing the risk of the system thinking something has happened while the operation is still queued and things like a loss of power will have different items on disk than the system knows about, etc. But people get to override the default settings and increase risk to benefit performance in they choose to.
Everything _except_ moving a disk head around, which is the specific operation we are talking about.
What I mean is that nobody ever uses sync operations locally – writes are always buffered unless the app does an fsync, and data will sit in that buffer much longer that it does on the network.
Les Mikesell wrote:
I would also think that, historically speaking, networks used to be noisier, and more prone to dropping things on the floor (watch out for the bitrot in the carpet, all those bits get into it, y’know…), and so it was for reliability of data.
But unless the system goes down, that data *will* get written. As I said in what I think was my previous post on this subject, I do have concerns about data security when it might be the o/p of a job that’s been running for days.
How many apps really expect the status of every write() to mean they have a recoverable checkpoint?
But the thing with the spinning disks is the thing that will go down. Not much reason for a network to break – at least since people stopped using thin coax.
It is a rare application that can recover (or expects to) without losing any data from a random disk write. In fact it would be a foolish application that expects that, since it isn’t guaranteed to be committed to disk locally without an fsync. Maybe things like link and rename that applications use as atomic checkpoints in the file system need it. These days wouldn’t it be better to use one of the naturally-distributed and redundant databases (riak, cassandra, mongo, etc.) for big jobs instead of nfs filesystems anyway?
Just a few days ago I watched a facility’s switched network go basically ‘down’ due to a jabbering NIC. A power cycle of the workstation in question fixed the issue. The network was a small one, using good midrange vendor ‘C’ switches. All VLANs on all switches got flooded; the congestion was so bad that only one out of every ten pings would get a reply, from any station to any other station, except on the switches more than one switch away from the jabbering workstation.
Jabbering, of course, being a technical term….. :-)
While managed switches with a dedicated management VLAN are good, when the traffic in question overwhelms the control plane things get unmanaged really quickly. COPP isn’t available on these particular switches, unfortunately.
Sure, everything can break and most will sometime, but does this happen often enough that you’d want to slow down all of your network disk writes by an order of magnitude on the odd chance that some app really cares about a random write that it didn’t bother to fsync?
For some applications, yes, that is exactly what I would want to do. It depends upon whether performance is more or less important than reliability.
I realize that admins often have to second-guess badly designed things but shouldn’t the application make that decision itself and fsync at the points where restarting is possible or useful? To do it at the admin level it becomes a mount-point choice not just an application setting.