CentOS 7 As A Guest VM

Home » CentOS » CentOS 7 As A Guest VM
CentOS 8 Comments

I am experiencing an issue that my process does not wake out of a select()
call when a single character is received in an input file descriptor when running as a VMware guest.

Anyone ever experienced this ?

I can run tshark and see the character arrive, but my process does not wake up and see that character. Most times it works – but once in a while it does not.

So I made a change on my code – and did not just wait on select() – but just try to read the buffer all the time and print the results. once in a while that character is “delayed” getting to my input buffer. Top reports the machine is 99% idle.

Any thoughts?

Jerry

8 thoughts on - CentOS 7 As A Guest VM

  • You don’t say what the app is written in but I ran into this with perl. 
    perl apps can either be line buffered or character buffered ($| if I
    remember right is the switch).  Line buffered means the buffer is not delivered until a newline character is received.  If nothing else, try
    \n” and see if that gets consistently delivered.

    Cheers, Dave


    “They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.”

    — Benjamin Franklin

  • That is the funny thing also – that character I’m waiting on is a CR. My program is written in C.

    vmtoolsd is running.

    Jerry

  • You imply but don’t say that this doesn’t happen when the app is running on bare metal. Is that the case?

    No, and I’ve been writing sockets-type code since the days when it wasn’t clear whether BSD sockets would win out over AT&T TLI/XTI/STREAMS.

    That’s probably the Nagle algorithm:

    https://en.wikipedia.org/wiki/Nagle%27s_algorithm

    It’s intentional. You almost never want to disable it.

    I think that’s controlled by the kernel’s terminal driver code, not by Perl. Perl is just giving you an alternate configuration to the underlying termios() or whatever call controls this.

    Anyway, you have to go out of your way to get line-buffered sockets on Linux. One way is to bind a socket to a pty, as SSH does, bringing the terminal I/O code into it again, but I doubt Jerry’s doing that.

    I’d bet Jerry’s app is just making assumptions about the way TCP works that just aren’t true.

    Jerry, please show your sockets setup code and the skeletonized read loop. I’m talking socket(), bind(), setsockopt(), etc. I want to see every sockets call. Your app logic you’re free to keep hidden away.

  • Seems like its the single byte thing… I tried adding:

    int flag = 1;
    if(setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag)) < 0) but did not have any effect. I also did the echo 1 >
    /proc/sys/net/ipv4/tcp_low_latency seems to have no effect also.

    Jerry

  • So first, I said “don’t do that,” and then you went and did that. :)

    But second, I’m guessing you did this on the receiving side, where it has no effect under any conditions. The Angle algorithm is about delaying the first packet on the sender’s side in anticipation of shortly receiving more data that can go in the same packet.

    Additionally, Nagle’s algorithm only works when there’s unacknowledged data, not on the first packet out on a new conn.

    Now that we’ve dispensed with Nagle, let’s get down to the actual issue.

  • Warren,

    Correct. I was trying to find something… Agreed that is on the sending side – I am on the receiving side. Are there other options that this single byte CR over socket is not getting seen by my application. tshark shows its been received, I have tried to skip my select() call and just call recv() with nonblocking direct – the byte is not seen. Very odd.

    Jerry

  • Sure, but without the code, you’re reducing me to blind speculation. I’m offering free debugging services here.

    You also haven’t answered the question of whether the VM qualifier in the subject line actually affects the symptom.

    If the problem only occurs under VMware, are we talking about ESXi or the Workstation/Fusion flavor? (Type 1 vs Type 2 hypervisor?)

    And if it also occurs when you run the receiver on C7 on bare metal, then you’ve got a generic sockets problem, not a VM problem.