CentOS 6.5 Input Lag

Home » CentOS » CentOS 6.5 Input Lag

October 9, 2014 Matt Garman CentOS 6 Comments

I have a CentOS 6.5 x86_64 system that’s been running problem-free for quite a while.

Recently, it’s locked-up hard several times. It’s a headless server, but I do have IP KVM. However, when it’s locked up, all I can see are a few lines of kernel stack trace. No hints to the problem in the system logs. I even enabled remote logging of syslog, hoping to catch the errors that way. No luck.

I ran memtest86+ for about 36 errors, no problems.

I’ve tried to strip away just about all running services. It’s just a home file server. I haven’t had a crash in a while, but I also haven’t had it running very long.

But even while it’s up, I have severe input lag in the shell. I’ll type a few characters, and two to 10 or so seconds pass before anything echoes to the screen.

I’ve checked top, practically zero CPU load.

It’s not swapping – 16 GB of RAM, 0 swap used. Most memory heavy process is java (for CrashPlan backups).

iostat shows 0% disk utilization.

Anyone seen anything like this? Where else can I check to try to determine the source of this lag (which I suspect might be related to the recent crashes)?

Thanks, Matt

6 thoughts on - CentOS 6.5 Input Lag

Joseph L. says:

October 9, 2014 at 11:20 pm

Is it under some type of ddos attack?

What’s running on this machine? In front of it?

—–Original Message—
Matt Garman says:

October 10, 2014 at 6:48 am

A DDOS attack seems unlikely, though I suppose it’s possible. Sitting between the lagging machine and the Internet is a pfSense box. All the other machines in the house have no issues, and they all route through the pfSense system.

Right now, the only stuff running on it:

– CrashPlan (java backup application)
– Munin
– Apache (only for Munin, no external access [i.e. no port forwarding from pfSense])
– mpd (music player daemon)

Thanks, Matt
Joseph L. says:

October 10, 2014 at 4:11 pm

If this is a server – is it possible your raid card battery died?

We have seen issuers where the BBWC fails and the box crawls

The only other thing on the hardware side that comes to mind is actual bad sectors if this is not a raided virtual drive.

From the OS side can you keep the box up long enough to do a yum update?

thanks

—–Original Message—
Matt Garman says:

October 10, 2014 at 4:18 pm

It is a server, but a home file server. The raid card has no battery backup, and in fact has been flashed to pure HBA mode. Actual RAID’ing is done at the software level.

The system has eight total drives: two SSDs in raid-1 for the OS, five
3.5 spinning drives in RAID-6, and a single 3.5 drive normally used for mythtv recordings (though mythtv has been stopped for a long time now to try to debug the issue).

Yes, I updated everything except packages beginning with “l” (“el” /
lowercase ‘L’) due to that generating a number of conflicts that I
haven’t have time to resolve.
Matt Garman says:

October 14, 2014 at 10:13 am

Update on this problem:

From another system, I initiated a constant ping on my laggy server. I noticed that every 10–20 seconds, one or more ICMP packets would drop. These drops were consistent with the input lag I was experiencing.

I did a web search for “linux periodically hangs” and found this Serverfault post that had a lot in common with my symptoms:

http://serverfault.com/questions/371666/linux-bonded-interfaces-hanging-periodically

I in fact have bonded interfaces on the laggy server. When I checked the bonding config, I realized a while ago I had changed from balance-rr / mode 0, to 802.3ad / mode 4. (I did this because I kept getting “bond0: received packet with own address as source address”
when using balance-rr with a bridge interface. The bridge interface was for using KVM.)

For now, I simply disabled one of the slave interfaces, and the lag /
dropped ICMP packets problem has gone away.

Like the Serverfault poster, I have an HP TrueCurve 1800-24g switch. The switch is supposed to support 802.3ad link aggregation. It’s not a managed switch, so I (perhaps incorrectly) assumed that 802.3ad would magically just work. Either there is more required to make it work, or it’s implementation is broken. Curiously, however, running my bond0 in 802.3ad mode did work without any issue for over a month.

Anyway, hopefully this might help someone else struggling with a similar problem.
Lars Hecking says:

October 14, 2014 at 10:24 am

See the comments about mode 0 in this thread,
http://lists.CentOS.org/pipermail/CentOS-virt/2014-March/003720.html in particular
http://lists.CentOS.org/pipermail/CentOS-virt/2014-March/003733.html

I’m unfamiliar with these switches. The Cisco switches we use, all managed, require explicit configuration for LACP/802.3ad.

CentOS 6.5 Input Lag

6 thoughts on - CentOS 6.5 Input Lag

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta