Problem On Exceptional Quit

Home » CentOS » Problem On Exceptional Quit

October 7, 2015 Hua Wang CentOS 17 Comments

Dear CentOS Users:

I installed CentOS 7 on my server a few months ago. While using ssh, there is always a strange message “Write failed: Broken pipe”. It forces quit of SSH. It’s really annoying as it happens very often with irregular time interval – from a couple of minutes to a few hours. I have been working using Linux (Red Hat, Fedora and CentOS) over 15 years. This didn’t happen for me even under CentOS 6.6. I have tried the following approaches, but none of them can help. I wonder if it can be solved by reinstall the system again. But it’s time consuming to reinstall a lot of softwares.

1. Login via Mac, Windows, Linux systems from different computers.
2. Modify sshd_config on the server as suggested by many posts:
TCPKeepAlive yes ClientAliveInterval 60
3. Modify ~/.ssh/config file on my local computer:
Host *
ServerAliveInterval 60
4. Login SSH using -Y instead of -X.
5. add ‘unset autologout’ in my .cshrc.
6. I checked IP address with the internet administrator, and it works well.
7. add a file named autologout.csh with ‘set autologout=0’.

Do you know a good solution? Thanks!

Cheers,

Hua

—————————–
Hua Wang, Ph.D. in Geodesy Department of Surveying Engineering, Guangdong University of Technology,
100 Waihuan Xi Rd., Panyu District, Guangzhou, 510006, China. Tel: +86-13570019257
Email: ehwang@163.com Homepage: http://homepages.see.leeds.ac.uk/~earhw

17 thoughts on - Problem On Exceptional Quit

Frank Cox says:

October 7, 2015 at 10:11 pm

It sounds like the network connection between you and the server is dying for some reason.

That being the case you probably can’t fix it yourself if it’s a remote server.

You may need to get a better Internet connection on one or both ends.

—
MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ http://www.melvilletheatre.com
Hua Wang says:

October 7, 2015 at 10:18 pm

Hi Frank,

Thanks for your prompt reply. The server is in my office. Because I tried a few computers, so it shouldn’t be a problem of Internet connection of the clients. I tried to ping the server, and it can accept all data. Is there a good way to check it?

It always worked well for CentOS 6.6 using the same server and the same internet connections (IP, cable etc). The problem came out while reinstalling CentOS 7.7. I suspect it’s still a problem of system instead of network.

Cheers,

Hua

—————————–
Hua Wang, Ph.D. in Geodesy Department of Surveying Engineering, Guangdong University of Technology,
100 Waihuan Xi Rd., Panyu District, Guangzhou, 510006, China. Tel: +86-13570019257
Email: ehwang@163.com Homepage: http://homepages.see.leeds.ac.uk/~earhw
Frank Cox says:

October 7, 2015 at 10:42 pm

ssh -v, SSH -vv and SSH -vvv might give you some interesting information.

Since you’re apparently using some kind of an unofficial or non-standard version of CentOS, you might want to try using a current (regular) one instead.
Hua Wang says:

October 7, 2015 at 10:46 pm

Yes, I tried SSH -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Thanks,

Hua

—————————–
Hua Wang, Ph.D. in Geodesy Department of Surveying Engineering, Guangdong University of Technology,
100 Waihuan Xi Rd., Panyu District, Guangzhou, 510006, China. Tel: +86-13570019257
Email: ehwang@163.com Homepage: http://homepages.see.leeds.ac.uk/~earhw
Johnny Hughes says:

October 8, 2015 at 6:37 am

Try using ClientAliveMaxCount and ServerAliveCountMax (you can set them to 5 or 8 instead of the default of 3 and also make the timeouts higher than 60.

make sure you are using ‘protocol 2’.
zGreenfelder says:

October 8, 2015 at 6:43 am

I’m grasping at straws, admittedly, but does this happen after an extended amount of time?
i.e. you make the connection (possibly to use a SSH tunnel running over the session), leave it for some time, then return to trying to use the tunnel and go back to see the connection error about the failure to write to write? are you sure the remote server isn’t doing some sort of idle cleanup to kill off idle sessions?
—
public gpg key id: 1362BA1A
Hua Wang says:

October 8, 2015 at 8:32 am

Dear Zep,

Thanks for your email. But it happened even when I was typing some command line. So it could be a problem of idle cleanup.

Hua

At 2015-10-08 19:43:05, “zep” wrote:
Hua Wang says:

October 8, 2015 at 8:33 am

Dear Johnny,

Yes, I have tried much larger numbers than 60 and 3 for the above two parameters respectively. And I am sure it is using
Jonathan Billings says:

October 8, 2015 at 8:56 am

At this point, I’d suggest looking at the logs on the remote end and look to see what’s being logged when the session closes.

—
Jonathan Billings
Hua Wang says:

October 8, 2015 at 9:20 am

Which logfile shall I have a look? Thanks,

Hua

At 2015-10-08 21:55:50, “Jonathan Billings” wrote:
Jonathan Billings says:

October 8, 2015 at 9:22 am

That depends on the OS of the remote server. If it’s CentOS 6, then I
suggest checking /var/log/messages and /var/log/secure.
Leon Fauster says:

October 8, 2015 at 9:24 am

Am 08.10.2015 um 15:32 schrieb Hua Wang :

as the system was physically “touch” (reinstalled 6->7), i suggest to check the hw again e.g. check cable, check switch port
(change the port), power supply of the switch etc.
Kahlil Hodgson says:

October 8, 2015 at 5:33 pm

Can you trigger the error reliably by doing something network intenstive, like scp or rsync a large file? I’ve seen similar behaviour with a bad NIC
that was in the process of dying.
Gordon Messmer says:

October 8, 2015 at 6:24 pm

That’s very often a result of IP conflict. I’m assuming that you’re connecting to an IPv4 address. If so, log in to your CentOS server and use arping to look for conflicts:

# arping -c 2 D -I em1

TCPKeepAlive is “yes” by default. ClientAliveInterval doesn’t appear to be a valid setting. Either TCPKeepAlive or ServerAliveInterval could be useful if the problem were a stateful firewall which was dropping your connection from its state table, and then resetting the connection in response to a later packet from your client.

Since those don’t help, that tends to suggest that the problem isn’t an intermediate host, but the server itself. Possibly an IP conflict.
Also, check the output of “dmesg” to see if there are any problems recorded with the NIC. Check the output of “ifconfig” to see if there are TX or RX errors that increase when your connections are reset.

You didn’t say what client OS you’re using, but Fedora and CentOS set ForwardX11Trusted to “yes” by default, so “ssh -Y” is the same as “ssh
-X”. And even if it weren’t, it wouldn’t cause the problem you’re seeing.

The error you’re seeing won’t be triggered by your shell exiting.
Anthony K says:

October 8, 2015 at 8:02 pm

As Gordon suggests, let’s see if the problem might be related to a dying NIC. The output of the following command may reveal any illness:

# ip -s -d l l

Cheers, ak.
Hua Wang says:

October 11, 2015 at 1:25 am

I am not sure if we can not send attachments to the mailing list. There were quite a lot replies before, but I got nothing back since attachements was added. I will remove the attachments and send it again. Please have a look at the email below. Thanks for your help.

–
Gordon Messmer says:

October 12, 2015 at 12:10 pm

You can use services like pastebin.com to temporarily post your logs. I
wouldn’t recommend posting the whole “secure” log. The output of dmesg might be helpful, but you can probably just read it and determine whether or not there’s anything related to your NIC or to networking.

arping and ifconfig don’t show any conflicts or errors. It’s still possible that there’s a conflict with a device that’s not online all of the time, but that’ll be hard to track down.

At this point, I think we’ve exhausted a lot of the simple stuff. The next thing I’d do would be to run tcpdump on your client and watch all of the traffic to and from the server, and any ICMP. When the connection is interrupted, the last few packets should show the cause. I’d expect you to see either a TCP reset or one of a few ICMP messages. So, open a terminal, start a tcpdump, and let it run until your SSH connection (in another terminal, obviously) is reset. Use Ctrl+C to stop tcpdump.

# tcpdump -nn host 222.200.125.5 or icmp

Problem On Exceptional Quit

17 thoughts on - Problem On Exceptional Quit

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta