UDP De-fragmentation Problem

Home » CentOS » UDP De-fragmentation Problem
CentOS 7 Comments

Hi all.

I have a strange problem at hand regarding UDP fragmentation on CentOS7:
Applications are unable to receive UDP packets which have undergone fragmentation UNLESS the netfilter modules are loaded.

The problem arose on a application which would run fine on OpenSuse but does not work on CentOS7. The application processes UDP data and on CentOS only small packets are received and processed, packets below the fragmentation size limit of about 1500 bytes. UDP packets which have undergone fragmentation are not received by the application.

The application in question uses Qt, which opens the UDP socket in non-blocking mode – apparently that’s an issue because reading from the socket in blocking mode does not cause the problem.

By chance I hit on the fact that once the netfilter kernel-modules
(nf_nat, iptable_nat, nf_nat …) are loaded the problem disappears and UDP packets of all sizes are correctly delivered and processed.

NOTES:
– I’m not using netfilter. My iptables are empty, firewalld is not running.

– Other networking applications -at least tcp- are working fine:
webbrowsing, ssh, nfs etc even DNS

– Does not happen on Opensuse regardless if netfilter modules are loaded or not.

– Does not happen on Opensuse on the same machine. Does happen on different machines on CentOS7. So it’s not HW dependend

– There is AFAIK nothing special about my CentOS7 installation. Out of the box install, simple network config, latest updates applied.

– Rebuilding the application on CentOS7 with CentOS supplied gcc, libs etc does not make the problem go away.

– I have broken the application down to a small Qt test program which opens a UDP socket, binds and waits on it

This is an strace output of the problem, where a 10000 byte UDP packet is send to the application, triggers the select(), then the recvfrom(7…) fails with eagain
[…]
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 7
fcntl(7, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0
setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
bind(7, {sa_family

7 thoughts on - UDP De-fragmentation Problem

  • I’m not sure you need to look much further than that. Using select()
    and non-blocking sockets together doesn’t make a whole lot of sense.
    The man page for select() says that descriptors listed in readfds will be watched “precisely, to see if a read will not block.”

    So, if the socket returned by select() can be read without blocking, you don’t need to put the socket in non-blocking mode.

    It’s hard to say more since your strace output was cut. I’d expect you to get another return from select() when the rest of the data arrives, and for recvfrom() to work at that point. I can’t tell if that’s happening or not. If it’s not, then you probably have hit a bug. The documented behavior for message-based sockets is to read the entire message in one operation, and to discard data on recvfrom() if it’s too big for the buffer. Maybe you’re creating a condition where the system is discarding data if it can’t be read in one operation.

    Regardless, in the select()/recvfrom() pattern you described, the socket should be in blocking mode.

  • Hi,

    Em 07-04-2016 12:19, Volker escreveu:

    Which kernel are you using?
    And as you have trimmed it down to a reproducer, can you share it please?

    Marcelo

  • Em 08-04-2016 14:38, Gordon Messmer escreveu:

    Yes it does. You can mix both. Why not? For example, select will return only “data ready” but will not tell you how much is in there. With non-blocking he can keep reading until the data is drained and without calling select() on it on every single iteration. If the traffic is bursty, this may save some syscalls.

    No because that implies that the application has to do the defragmentation, which is impossible as it doesn’t have the necessary information for that. Kernel must hold the fragments until it’s reconstructed, and only then deliver it to the application.

  • Em 07-04-2016 12:19, Volker escreveu:

    ^— I think that’s why

    RHEL7 still doesn’t have this patch:
    https://patchwork.ozlabs.org/patch/561746/

    As test/workaround while that patch isn’t out, please make that buffer as big as your payload, even if you’re going to use only one byte. It will copy more but it’s MSG_PEEK, it’s not discarding/removing anything from the queue anyway. It will probably work then.

    Marcelo

  • Em 10-04-2016 14:25, Volker escreveu:

    Okay, you can’t do that change I mentioned easily because the call is made by Qt instead. When testing with this app, on nstat I’m getting after 2 attempts:
    UdpInErrors 2 0.0
    UdpInCsumErrors 2 0.0

    I tried removing the probe for more data, the while condition, but it still peeked the socket. I think it’s because of this Qt code:

    qint64 QNativeSocketEngine::readDatagram(char *data, qint64 maxSize, QHostAddress *address,
    quint16 *port)
    {
    Q_D(QNativeSocketEngine);
    Q_CHECK_VALID_SOCKETLAYER(QNativeSocketEngine::readDatagram(), -1);
    Q_CHECK_TYPE(QNativeSocketEngine::readDatagram(), QAbstractSocket::UdpSocket, false); <--