C6 Server Responding Extremely Slow On SSH Interactive

Home » CentOS » C6 Server Responding Extremely Slow On SSH Interactive

January 28, 2015 Patrick Bervoets CentOS 17 Comments

I have a C6 server acting as a kvm-host.

When connecting with SSH the console is extremely slow and hangs for minutes at a time. Connecting to this server is not the problem.

If I use: SSH root@host “whatever” I got immediate response even when interactive consoles opened with SSH are hanging.

Linux […] 2.6.32-504.3.3.el6.x86_64 #1 SMP Wed Dec 17 01:55:02 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

total used free shared buffers cached
Mem: 47 35 11 0 0 0
-/+ buffers/cache: 35 11
Swap: 7 0 7

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg-lv_root
50G 6,4G 41G 14% /
tmpfs 24G 0 24G 0% /dev/shm
/dev/sda1 477M 123M 329M 28% /boot

13:33:34 up 1 day, 18:30, 2 users, load average: 3.39, 2.53, 2.36

(it’s an 8-core)

Nothing particular in log/messages.

The vm’s are running normally and they are not showing the same behaviour.

Can anybody give me a pointer?

Thanks Patrick

17 thoughts on - C6 Server Responding Extremely Slow On SSH Interactive

Marcelo Ricardo says:

January 28, 2015 at 10:21 am

Sorry, is it hanging during the session or while attempting to establish a new one? If this last, it may be dns and SSH -v may help. The former is weird, I don’t think I ever saw it.

Marcelo
Patrick Bervoets says:

January 28, 2015 at 10:41 am

Op 28-01-15 om 17:20 schreef Marcelo Ricardo Leitner:
Marcelo,

It hangs during the session. Once I’m logged in and beginning to type it displays 3-5 chars and then hangs for up to 15 minutes, a few more chars, wait, and so on. Checked my resolv.conf; added ‘options single-request-reopen’ though I don’t know if that is helping.

Yes it is weird; even more that individual commands sent with SSH gives immediate respons.

Thanks Patrick
anax says:

January 28, 2015 at 10:51 am

Hi Patrick have you ever tried to find out on which side the hanger is: on the client’s or on the server’s, using tcpumg or the like?
That migth help a bit further on, that might.

suomi
Patrick Bervoets says:

January 28, 2015 at 11:00 am

Op 28-01-15 om 17:51 schreef anax:

Not yet, I’ll try that out tomorrow Thanks Patrick
Gordon Messmer says:

January 28, 2015 at 1:17 pm

Check for IP address conflicts in the server’s network.

For IPv4:
# arping -D -I
Patrick Bervoets says:

January 28, 2015 at 2:13 pm

Op 28-01-15 om 20:17 schreef Gordon Messmer:

ARPING 192.168.1.15 from 0.0.0.0 br0
Unicast reply from 192.168.1.15 [AC:16:2D:72:67:D4] 0.723ms Sent 1 probes (1 broadcast(s))
Received 1 response(s)

Thanks anyway Patrick
Gordon Messmer says:

January 28, 2015 at 5:00 pm

I’m not sure what you mean by “thanks anyway”.

You got a response. There’s an IPv4 conflict on your network. That’s why you’re seeing those delays. If there’s no conflict, you should see
0 responses.
Patrick Bervoets says:

January 29, 2015 at 1:28 am

Op 29-01-15 om 00:00 schreef Gordon Messmer:

Gordon,

I’m sorry, I misunderstood you (and arping -D)
This was the result of arping on another host; I thought I should see 2 responses in case of an ip conflict.

Arping on the troublesome server gives 0 responses.

I just tried with a physical console on that server and there I got the same unresponsive behaviour. Does this rule out network related problems?

Mark (m.roth) suggested the vms eating up the video bus. (2 vms with an Oracle database)
But I’m not sure how I could test that.

Patrick
Patrick Bervoets says:

January 29, 2015 at 1:35 am

Op 28-01-15 om 17:51 schreef anax:
I’m not sure what you mean with tcpumg. But after testing with a physical console I’m experiencing the same problem. So I guess its the server.

Thanks
SilverTip257 says:

January 29, 2015 at 10:48 am

Probably meant tcpdump.
Gordon Messmer says:

January 29, 2015 at 2:21 pm

Well, that’s a different story, then. :)

I haven’t seen delays anywhere near that long before, even with heavy swapping. But I guess I’d look at that sort of thing first.

Run “iostat -x 2” and see if your disks are being fully utilized during the pauses. Run “top” and see if there’s anything useful there. Check swap use with “free”. Try decreasing swappiness with “echo 10
>/proc/sys/vm/swappiness”
Patrick Bervoets says:

January 30, 2015 at 3:22 am

Op 29-01-15 om 21:21 schreef Gordon Messmer:

iostat random sample avg-cpu: %user %nice %system %iowait %steal %idle
3,77 0,00 1,45 0,00 0,00 94,78

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,50 0,00 11,00 0,00 136,00 12,36 0,00 0,00 0,00 0,00
sdb 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,09 0,09 0,10
sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-0 0,00 0,00 0,00 4,00 0,00 32,00 8,00 0,00 0,00 0,00 0,00
dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-3 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,13 0,13 0,15
dm-4 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
dm-5 0,00 0,00 0,00 7,50 0,00 104,00 13,87 0,00 0,07 0,07 0,05

atop ATOP – CPU | sys 30% | user cpu | sys 2% | user cpu | sys 3% | user cpu | sys 8% | user cpu | sys 3% | user cpu | sys 3% | user cpu | sys 4% | user cpu | sys 2% | user cpu | sys 5% | user CPL | avg1 1.92 | avg5 MEM | tot 47.1G | free SWP | tot 7.8G | free LVM | g_15k-lv_15k | busy LVM | to-lv_oracle | busy LVM | v_oracletest | busy LVM | uito-lv_root | busy DSK | sdb | busy DSK | sda | busy NET | transport | tcpi NET | network | ipi NET | vnet0 8% | pcki NET | vnet1 4% | pcki NET | eth0 0% | pcki NET | br0 —- | pcki 2015/01/30 10:18:14 ——— 10s elapsed PRC | sys 3.87s | user 14.93s | #proc 197 | #zombie 0 | #exit 0 |
119% | irq 1% | idle 533% | wait 0% |
21% | irq 0% | idle 56% | cpu000 w 0% |
19% | irq 0% | idle 59% | cpu001 w 0% |
15% | irq 0% | idle 62% | cpu003 w 0% |
13% | irq 0% | idle 73% | cpu002 w 0% |
14% | irq 0% | idle 70% | cpu006 w 0% |
15% | irq 0% | idle 66% | cpu005 w 0% |
11% | irq 0% | idle 77% | cpu007 w 0% |
11% | irq 0% | idle 73% | cpu004 w 0% |
1.97 | avg15 1.61 | csw 229508 | intr 191786 |
15.9G | cache 519.3M | buff 109.3M | slab 353.3M |
7.3G | | vmcom 31.8G | vmlim 31.3G |
0% | read 1 | write 98 | avio 0.15 ms |
0% | read 0 | write 66 | avio 0.06 ms |
0% | read 0 | write 79 | avio 0.05 ms |
0% | read 0 | write 1 | avio 3.00 ms |
0% | read 1 | write 98 | avio 0.16 ms |
0% | read 0 | write 146 | avio 0.08 ms |
12 | tcpo 12 | udpi 0 | udpo 0 |
13 | ipo 12 | ipfrw 0 | deliv 12 |
2273 | pcko 2581 | si 850 Kbps | so 458 Kbps |
2186 | pcko 2075 | si 391 Kbps | so 422 Kbps |
1330 | pcko 1432 | si 159 Kbps | so 537 Kbps |
43 | pcko 22 | si 1 Kbps | so 4 Kbps |

PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD
1960 2.37s 9.23s 0K 0K 8K 2520K — – S 101% qemu-kvm
1990 0.69s 5.65s 0K 0K 0K 1196K — – S 55% qemu-kvm
1975 0.50s 0.00s 0K 0K 0K 0K — – S 4% kvm-pit-wq
2009 0.20s 0.00s 0K 0K 0K 0K — – S 2% kvm-pit-wq
23321 0.05s 0.02s 0K 0K 0K 0K — – R 1% atop
18384 0.05s 0.01s 0K 0K 0K 0K — – S 1% atop
1719 0.00s 0.01s 0K 0K 0K 0K — – S 0% hpasmlited
1746 0.00s 0.01s 0K 0K 0K 0K — – S 0% hp-asrd
35 0.01s 0.00s 0K 0K 0K 0K — – D 0% events/0
10707 0.00s 0.00s 0K 0K 0K 0K — – S 0% arping
10740 0.00s 0.00s 0K 0K 0K 0K — – S 0% arping
58 0.00s 0.00s 0K 0K 0K 0K — – S 0% kblockd/0
18425 0.00s 0.00s 0K 0K 0K 0K — – S 0% flush-253:0

free
total used free shared buffers cached Mem: 48218 31895 16323 0 108 519
-/+ buffers/cache: 31267 16951
Swap: 7951 476 7475

But I had the same pauses when free gave zero swap.

If swap is the problem: would it matter if a command is run with SSH (ssh @ “command”) or in a shell?

When running atop in a shell I observed pauses between screen updates longer than 10 seconds but atop displayed the time as “10 seconds later”. So drifting away in time. While a date command sent a the same time gave the correct date.

So it seems like the screens are buffered and are being displayed with a delay.
John R says:

January 30, 2015 at 3:29 am

thats an unusually small amount of ‘cached’… I usually see the disk cache as 30-50% of the total memory. does this system not use much disk IO ?
Patrick Bervoets says:

January 30, 2015 at 3:39 am

Op 30-01-15 om 10:29 schreef John R Pierce:

it’s a kvm-host with lvm, the vm’s all have there own lv’s (some on a different pv). Would that explain the small cache?
Gordon Messmer says:

January 30, 2015 at 12:40 pm

“Random” is difficult to evaluate. Is that representative? Are sda, sdb, and sdc typically less than 1% utilized? Or are there large utilization values right after a hang?

Let’s assume it’s not, but I would say “no” to the question. I’d expect the same delays regardless, if the system were swapping heavily.

That’s really weird.

Does the time displayed by “atop” eventually catch up?

Does the problem persist across reboots?

Is this system running ntpd?

Does the problem persist if you turn ntpd off and reboot?
Patrick Bervoets says:

January 30, 2015 at 2:32 pm

Op 30-01-15 om 19:40 schreef Gordon Messmer:
All the output was in the same scale and during a hang in an other shell. Not that I know. But I gave up :-)
Alas, one of the vm’s is our production database. My next update/reboot window is next saturday. But I had the problem just before the last reboot (halfway january). But hadn’t closely monitored it afterwards. Before – in december – I never experienced it. But it’s a server I tend do leave alone, so I’m never very busy on a shell. yes I’ll check that next week.
Patrick Bervoets says:

January 30, 2015 at 3:40 pm

Op 30-01-15 om 21:51 schreef Gordon Messmer:
IIRC
before the problem: kernel.x86_64 0:2.6.32-504.el6
problem occured during kernel.x86_64 0:2.6.32-504.1.3.el6
actual kernel.x86_64 0:2.6.32-504.3.3.el6

But since there is already a new kernel waiting; I’m not sure what to do. I think I’ll first upgrade & test. If my maintenance window permits I’ll test downgrading (but 3 updates…)

BTW I’ve got 3 other kvm-servers without this behavior (but they are completely different machines so not much to compare)

C6 Server Responding Extremely Slow On SSH Interactive

17 thoughts on - C6 Server Responding Extremely Slow On SSH Interactive

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta