I will try to explain this as best I can. I have two computers; one a Supermicro X10SAE running CentOS 6, the other a very old DOS box.[*] The DOS
box runs a CCD camera, sending images via Ethernet to the X10SAE. Thus, the X10SAE runs a Python server on port 5700 (a socket which binds to 5700 and listens, and then accepts a connection from the DOS box; nothing fancy).[**]
The DOS box connects to the server and sends images. This all works great, except:
When the DOS box exits, crashes, or is rebooted, it fails to shut down the socket properly. Under CentOS 6.5, upon reboot, when the DOS box would attempt to reconnect, the original accepted server socket would (after a couple of connection attempts from the DOS box) see a 0-length recv and close, allowing the server to accept a new connection and resume receiving images.
Under CentOS 6.6, the server never sees the 0-length recv. The DOS box flails away attempting to reconnect forever, and the server never seems to get any type of signal that the DOS box is attempting to reconnect.
Possibly relevant facts:
– The DOS box uses the same local port (1025) every time it tries to connect. It does not use a random ephemeral port.
– The exact same code was tested on a CentOS 6.5 and 6.6 box, resulting in the described behavior. The boxes were identical clones except for the O/S upgrade.
– The Python interpreter was not changed during the upgrade, because I run this code using my own 2.7.2 install. However, both glibc and the kernel were upgraded as part of the O/S upgrade.
My only theory is that this has something to do with non-ephemeral ports and socket reuse, but I’m not sure what. It is entirely possible that some low-level socket option default has changed between 6.5 and 6.6, and I
wouldn’t know it. It is also possible that I have been relying on unsupported behavior this whole time, and that the current behavior is actually correct.
Does anyone have any insight they can offer?
[*] Hardware is not an issue; in fact, I have two identical systems, each of which has one X10SAE and three DOS boxes. But the problem can be boiled down to a single pair.
[**] I’m actually using an asyncore.dispatcher to do the bind/listen, and then tossing the accept()ed socket into an asynchat. But I actually went ahead and put a trap on socket.recv() just to be sure that I’m not swallowing the
0-length recv by accident.