CentOS 7: UPD Packet Checksum Verification?

Home » CentOS » CentOS 7: UPD Packet Checksum Verification?

January 26, 2020 hw CentOS 19 Comments

Hi,

what does CentOS 7 do with UPD packets having invalid checksums?

Are such packets inevitably dropped? Does a network card drop them when it does checksum verification in hardware even before the packets go anywhere?

In general, if someone were to send me UPD packets with invalid checksums over the internet, how far would such packets get?

In particular, how likely it is that SRTP packets sent over the internet over UPD could be damaged in such a way that the verification of the authentication tag fails when they arrive at the receiver, and how might such damage be caused?

19 thoughts on - CentOS 7: UPD Packet Checksum Verification?

Pete Biggs says:

January 26, 2020 at 8:59 am

By default I assume they are just dropped – that’s what should happen.

Applications can specifically disable checksum checking for the kernel network stack on a per application basis, but the default is to check and drop if in error.

Depends on the hardware. I suspect that most modern cards allow the OS
to offload the checksum functions. You can check with, e.g.,

ethtool –show-offload eth0

As far as the checksumming code – either in the hardware or kernel network stack. They should be dropped as soon as the checksum fails because at that point it shows that the contents are flawed.

Don’t know – how does any network packet get corrupted? Bad hardware, cosmic rays, bad cables, bad source? I would doubt there would be anything malicious: why do something to a packet such that it is almost guaranteed to be dropped.

P.
hw says:

January 26, 2020 at 9:34 am

Hm that’s what thought.

Ok, I wouldn’t expect asterisk to disable checksumming by default.

If it was so easy:

Features for bond0:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]

Features for enp5s0:
rx-checksumming: on tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]

Both physical interfaces show the same. But does this mean it’s on as in “rx-
checksumming: on” or off as in “tx-checksum-ipv4: off [fixed]”?

Ok, I’ll assume I wouldn’t receive damaged packages.

Assuming that I do not receive packets with invalid UPD checksums, then the packages must be somehow altered and their UPD checksums recalculated to arrive here. Does bad hardware etc. do that? Why would the UDP checksums just happen to get recalculated correctly but like randomly without intent?

Only when asterisk (i. e. libsrtp) finally verifies the authentication tag of an SRTP package against the authenticated part of the package — which, according to RFC 3711, seems to be the entire payload of the UPD package –
Pete Biggs says:

January 26, 2020 at 4:19 pm

First of all – disclaimer – I’m no network specialist, I just read and am interested in it. I may get things wrong!!

As far as I understand it rx-checksum is the underlying wire checksumming – and from what I’ve read about it, disabling that disables the UDP checksums.

I’m not sure I understand what you are asking. But it’s unlikely (very unlikely) that the checksums are randomly correct. But packet checksums are recalculated when packets are forwarded by layer 4 switches – the contents of the package are inspected as part of the switching process.

If it’s SRTP checksum error, then that checksum is part of the packet payload at the application level – the UDP checksum is for the whole packet. Presumably the contents of the application payload were altered after the SRTP checksum was calculated but before the UDP
packet checksum. It could be a bad layer 4 switch I suppose.

Probably your best bet is to use wireshark to decode the packets to see what the raw data looks like.

P.
hw says:

January 26, 2020 at 7:45 pm

You mean layer 1 checksumming? Is there such a thing with ethernet? I think I read something about encoding, when I was trying to understand what
“bandwidth” actually means, being involved in signal transmissions; and I seem to remember that there was no checksumming involved and it had to do with identifying signals as a requirement for the very possibility to transmit something before anything could be transmitted at all.

It is about VOIP calls via SRTP being interrupted at irregular intervals. The intervals appear to depend on the time of day: Such phone calls can last for a duration of about 5–25 minutes during the day to up to 1.5 hours at around
3am before being interrupted.

Asterisk says that a package is being replayed, meaning that libsrtp has already seen and processed the packet earlier. That can happen a couple times until asterisk reports authentication failures. The result is that the call is interrupted in that I can not hear the opposite end while the other end sometimes can still hear me, sometimes not. The interruption can take even minutes and the audio can continue after that, though usually I either hang up the call, or the calls ends by itself before the audio is back.

IIUC, authentication failures mean that libsrtp figures that the authentication tag of an SRTP package does not match the data contained otherwise within the packet. The authentication tag is encrpyted on the sender side after initially keys have been exchanged between sender and receiver from which new keys are being derived as needed. The key exchange can go over SIP (using TLS) when sdes is used, which it is in this case.

The receiver decrypts the authentication tag and verifies that the tag matches all the other data in the packet. Only when the package was thusly successfully authenticated, the RTP-payload of the package is decrypted.

The SRTP package seems to be the entire payload of the UDP package, so if the data of the SRTP package gets damaged or were to be intentionally altered, the UDP checksum would have to be intentionally re-calculated.

Two independent installations of asterisk at physically different locations are showing the same error messages, both connecting to the same VOIP
provider.

As you can imagine, this is really fun to debug …

Yes, I thought so, IIRC it’s required for routing and changing the TTL maybe.

Now that someone would intentionally alter the SRTP packages and re-calculate the checksums seems rather unlikely, all the more so since they would need to do that at two different places.

Right — or the SRTP package has been created incorrectly by their phone system because it is overloaded at busy times, or it’s buggy.

My favorite theory is that I am sometimes suddenly receiving the wrong SRTP
stream. I think it would fit the symptoms. Perhaps the VOIP provider is experiencing interesting NAT issues when their connection tracking is getting messed up at times when there are more connections than they can handle.

That defective hardware is causing the same problem at both places at the same time seems rather unlikely.

So I’ve been trying to figure out what the problem might be. After learning all this, I’m sufficiently sure that the problem is on their side.

Hm, I tried that and wireshark doesn’t seem to like SRTP packages very much.
Apparently it doesn’t have a way to decrypt SRTP packages at all, even if I
could get the initial keys. Maybe someone who is much more proficient with wireshark could find something. To me, it has been useless so far.

If wireshark could do stuff with SRTP packages, what could it possibly show other than that some packages either carry a damaged payload, or that the encryption keys don’t fit, which is something I already know? If the problem was with asterisk or libsrtp, the problem would be much more common.
Nataraj says:

January 28, 2020 at 2:00 am

My sense is you may be starting at too low of a level in trying to debug this. I have seen the same kind of problems with my voip service when there is a problem with my Internet connection. When this happens I
also see high retransmission rates for tcp connections and other signs of network problem. If I check the modem for my Internet connection there are issues with the signal levels and high error rates reported by the modem. If you believe your Internet connection is reliable, then if you run managed switches, check your switch logs for any reported errors.

You could try tools like iperf to check for problems on your internal network. You could run some of the basic tools for testing voip performance of your Inetnet connection and if necessary run iperf to a cloud hosted system.

I think it is highly unlikely that you are only having issues with srtp packets and I would look at the broader picture first to try to isolate some other problem in your network or Internet connection.

Nataraj
Stephen John says:

January 28, 2020 at 6:51 am

UDP is called Unreliable Datagram Protocol for a reason. It can be dropped at all kinds of places in between the two users depending on how busy the routers/firewalls between 2 users can be. Packets can get out of order or a dozen other things which then relies on the application layer to put the things back in ‘order’. For voice, that usually means a drop or other ugliness because it is assumed that if the quality is too bad, the people would just call each other again. For the most part this works pretty well but all it takes is a firewall to get busy on something else and you have a bunch of UDP
packets out of order and people’s calls dropping.
hw says:

January 28, 2020 at 2:39 pm

One of the reasons I have to look into it is that it is usually good to know more/better.

How do you monitor such retransmissions to be able to see if and when they occur?

Can you suggest useful tools to analyze VOIP performance, and how do you define VOIP performance?

The performance is kinda acceptable as long as the calls are not interrupted.
It’s still worlds apart from what it used to be 25 years ago, before VOIP was used. Back then, you never had to worry that calls could be interrupted or that you couldn’t hear someone or that you couldn’t have a conversation because the latency makes it impossible. You could just talk to someone on a phone, like it should be. Nowadays, we get to pay 10 times as much and more, plus all the expensive hardware, and it still doesn’t work right and doesn’t even come close.

See it this way: It is highly likely that I don’t have any issues with SRTP at all. Calls over the LAN work fine. The only issue is with the VOIP provider.
What I have learned about SRTP so far tells me that, like everything else does.

How would you explain that the same problem occurs at two entirely unrelated physical locations each having their own asterisk installations, using entirely different hardware and entirely different internet connections from entirely different ISPs, with the only thing in common being the VOIP
provider?

If it was only my internet connection which is affected, I’d be talking to my ISP (probably useless) instead of the VOIP provider (who will probably do something about it).
hw says:

January 28, 2020 at 2:56 pm

How would packets being dropped explain the replay errors and authentication failures?

libsrtp seems to have provisions to deal with packets arriving out of order.

That’s a funny idea. Phone calls just worked fine and were good quality 25
years ago, and mostly long before that. I have never expected to have to call anyone back because of poor quality in over 40 years, and I’m not going to start to expect that now.

It’s unacceptable, and it’s not feasible, either. For example, try to call paypal to solve some issue with your account. It can take an hour before they call you back because everyone is busy. Finally you talk to someone and just after you explained the problem, the call is interrupted. Good luck calling the same person back. You won’t get anywhere because your next try will only result in another interrupted call.

VOIP calls are worlds away from what phone calls used to be. Dropping calls has never been an option and is not an option now.
Stephen John says:

January 28, 2020 at 5:39 pm

I got that from watching various training videos from the 1940’s to the 1970’s on phone switching systems… and also the basic design of how Erlang is programmed and deals with errors. It could be wrong, erroneous or crap. However talking to phone people over the years that was how they described things. A lot of them would say that a phone call could die a billion different ways and it was a miracle it didn’t happen to everyone every day. It just happened to a couple of people a day in different places because everything was coded for redundancy and the expectation that it could get bad. That redundancy and over-engineering seems to have allowed for the ‘worse case they will call back’ to be a viable option.

The problem is that if that was real or is still the case… unless your VOIP solution has as much redundancy.. failure is going to happen a lot more and in ways that lead to the general experience of the last
8 VOIP solutions I have been stuck with… dropped calls to Paypal as you said or sounding like a Dalek if the latency or such just got a little bad.
Johnny Hughes says:

January 29, 2020 at 1:54 am

Just wait another 10 or 20 years and everybody will tell you it’s normal and nothing to worry. They won’t believe you if you tell them there was a time long ago when telephony just worked.

I remember in around 1999, when a lot of companies started to hear about VoIP and wanted to implement it to save money and welcome the future, I
had lot’s of discussion about it in the company I was working back then. Those who new a bit more from the technology side said this can be done in a company but not widely as a replacement for (public) telephony infrastructure. Now that whole countries went all IP, just listen to police and emergency services what they think about it: only now they start to realize that having telephony which just works is a thing of the past!

But hey, don’t worry, they will fix it with “the Cloud” :-)

Telephony is like operating systems these days: a lot of things improve but not everything…

Regards, Simon
Nataraj says:

January 29, 2020 at 3:11 am

netstat -s | grep -i retrans

Well there used to be a number of speedtest like sites that use to report more accurately , latency, jitter and packet loss. It seems most of them have now scaled down their output, but you could use ping. mean deviation is basically jitter.

I think a few of the tests listed on this site, still work.

https://getvoip.com/blog/2014/05/12/20-best-voip-speed-test-tools/

There used to be sites that did a calculation for something called MOS
score, which is a measure of expected voice quality based on the performance of a connection. Don’t know if anyone does that anymore.
In the VOIP industry there is fancy/expensive equipment for measuring end to end performance, but in practice simple ping output with regular sampling from something like a cron job can tell you alot.

Basically, what you want is that if your phone system relies on your Internet connection, the pop that your connecting too needs to be relatively close and have minimal packet loss and similar latency/jitter characteristics on both the up/down stream. Generally that is not too hard to find these days, but if the Internet connectivity to your voip pop takes a route half way across the country over the Internet, that’s not it.

I have one of the lowest cost voip providers, voip.ms, and I find the voip quality to be excellent and call drop rate to be low except when I
have problems with my Internet provider.

Well if your relying on the Internet, you are essentially relying on the availability of burst bandwidth. If you application needs higher reliability then I would be looking at purchasing some kind of commited bandwidth to a voip provider. I don’t follow the voip industry much lately, but I’m guessing that people still provision dedicated bandwidth into a voip provider with a nationwide backbone.

Well it sounds like you know where your problem is then. If your current provider can’t solve the problems to your satisfaction then you probably need to find a different provider.
hw says:

January 29, 2020 at 6:48 am

Cool, that gives a lot of information. Retransmissions are at ~0.012/~0.029
percent on the server/workstation, and the UPD statistics look good.

Most seem to be test for bandwidth, and none of the VOIP related sites work.
Besides, ping times to the US are usually around 200ms, so if there were any results to be abtained, they might be questionable.

VOIP performance comes down to what you might call the human experience.
Tools only trying to measure how well data is being transferred aren’t going to measure that. The human experience is also subjective and not easy to measure.

For example, the network can be perfectly fine, and yet the VOIP performance is total shit when you use a cell phone because their audio output sucks no matter what. What can you expect from a tiny speaker built into a cell phone?

In this country, you can be glad if you can get an internet connection at all, no matter how bad it is. Even some businesses can’t get a connection at all, except maybe mobile connections which are so expensive and with only minimal volumes data to transfer that they are useless.

In this country, you can be glad if you can find a VOIP provider at all.

In this country, there is no other way to do phone calls anymore than over the internet. Why burst? The VOIP audio streams are continuous.

See above, you can be glad if you can get an internet connection at all.

Well, I don’t know, I can only be like 99% sure that the problem is with the VOIP provider. Changing the VOIP provider would be very difficult because there aren’t many left to begin with, and even fewer allow encrypted connections. And try to find one that has a useful support … I might end up with not having a phone anymore, and that would make things extremely difficult.
hw says:

January 29, 2020 at 7:16 am

Maybe it took a lot of effort to keep things working, I can’t tell. But I can tell that for over 40 years, there was one single interruption of the phone line when a major line was damaged due to construction work. Calls weren’t interrupted, either.

That changed with the introduction of mobile phones and got even worse with VOIP. It only means that providers need to figure their stuff out. It doesn’t mean that less quality or less reliability would be acceptable –
hw says:

January 29, 2020 at 7:36 am

All things will be a lot worse in 10 or 20 years.

It’s still something that just needs to work. And 1999? Maybe it’s because this country has kinda detached itself from technology and remains behind further and further, currently about 30 years, but in 1999 nobody has heard of VOIP. That started in 2018, and people expect phone to just work. If anything, it should get better, not worse.

It only means that we can’t do anything anymore.

What has actually improved? In 1999, I could make a phone call whenever I
wanted to, and it worked just fine. In 2020, I can’t make a phone call at all because it will be interrupted before I’d get anywhere with it and it’s just embarrassing.

What am I supposed to do? Travel to paypal to talk to someone in person? At least we can still travel, and that is about to change, too.
Stephen John says:

January 29, 2020 at 8:44 am

40 years ago if you were in North America or Europe you were relying on large infrastructure laid out by ‘Cold War’ needs that if a war started the phones from site A to site B would work no matter what. That meant there were all kinds of redundancy in the system.. enough that pretty much every Phone company whether national or private were valued by the amount of copper they had mostly from all this redundancy. When that was no longer a driving factor and various governments were no longer enforcing ‘war’ regulations on how the phones MUST work.. you saw a lot of lines removed and the copper sold as cash from both private and public phone companies. This was part of the ‘peace’ dividend that many countries saw in the 1990’s where various other infrastructure could be upgraded because it wasn’t being held in reserve in case of war. The improvements in mobile phone networks increased this because phone quality was limited to the codecs used in that and you didn’t need to string as much copper and could even move to lower quality copper. Optical fibre also got cheaper which allowed for improvements and more dumping of old copper lines. The problem is that phone companies over-dumped and the prices of copper have gone up enough that they can’t adequately replace what they need to do now. So they are pushing for more VOIP like solutions to keep their costs down.. but

These things go in cycles.. and usually you have to go through a
‘everything has gone to hell’ before people wake up and realize they needed to invest a certain amount in the infrastructure constantly. So hopefully it will get better someday…
Nataraj says:

January 29, 2020 at 11:53 am

By burst, I mean that you don’t have a bandwidth commitment with an SLA
from your provider. A bandwidth commitment means that you are paying a provider to guarantee you so many MB or GB of bandwidth and this is guaranteed to you. This means it is allocated to you in their network allotments and you can use it at any time.

I can’t really speak for the situation in your country. One more thing comes to mind. I don’t remember if anyone has mentioned that the 1 way voice problem can be caused by an issue with the stateful packet filter in your firewall. I.E. your firewall has become confused and thinks the UDP connection (we’ll not really a connection) is no longer active, so it blocks the packets, creating the one way voice scenario. Most phone switch software and VOIP phones have things that can be configured to send extra packets to fool the stateful packet filter into allowing necessary packets to flow. I’ve never set this up in asterisk, but I
suggest you look into it.
hw says:

January 29, 2020 at 5:26 pm

[…]

Isn’t that called more like “guarantied bandwith” than “burst”?

[…]

How does a firewall allow the desireable SRTP packets to traverse it in the first place?

How would the packets being blocked explain asterisk showing replay errors and authentication failures? Packets that aren’t there can hardly cause such errors.

BTW, the VOIP provider is fixing or has fixed the problem now. It turned out that they need or needed to update the firmware of some network adapters because the old firmware has been causing issues. A test call showed no errors on both sides for over 45 minutes.
Nataraj says:

January 29, 2020 at 7:01 pm

burstable bandwidth is the opposite of guaranteed bandwidth.

My firewall is CentOS running iptables, so you would use something like

iptables -A INPUT -p udp -m state [OTHER MATCH OPTIONS] –state ESTABLISHED -j ACCEPT

You would similarly code an OUTPUT rule. You obviously need to permit whatever packets/ports your voice thisapplications requires i.e. SIPS
srtp etc. I generally limit my voip packets to the IP addresses of any pops that I connect to. There are hackers out there that will connect to your phone switch if you allow voip packets from any source.

Most commercial firewalls have options to enable VOIP services.

I don’t know. Maybe the 1 way voice problem is different than the replay errors. I’m just throwing out ideas, you’ll have to determine if they apply to your situation or not.
hw says:

January 30, 2020 at 2:26 pm

How does creating a burst help when you don’t have enough bandwidth to begin with? You can burst all you want and the packets will be dropped when there is not enough bandwidth to transmit them.

In theory, you could fill packet buffers along the line. In practise, chances are that the buffers are already full because you don’t have enough bandwidth
— and even if they aren’t, creating a burst is not exactly useful because people on phones aren’t going to talk extremely fast every time when a burst could be transmitted and then talk extremely slowly until the burst was transmitted and then fast again, and so on. People also do not want their conversations interrupted by having to wait for packet buffers to eventually become empty, and VOIP related packets are usually not the only ones underway.

Well, yes, I used firewall-cmd when I needed to open ports on the server for SRTP/RTP so that local phones will work — which is a bad solution and makes me wonder if there isn’t a better one.

But there are no ports open in the firewall on the router and it just works, even through NAT. I’ve always been wondering how that works.

You mean in a firewall? What if the address changes?

Well, I don’t want to open any ports to the outside for VOIP just like that.
If I ever need to do that, I might have to look at OpenSIPS maybe …

Yes, it is entirely different.

CentOS 7: UPD Packet Checksum Verification?

19 thoughts on - CentOS 7: UPD Packet Checksum Verification?

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta