Network Connectivity Lost After Reboot/upgrade

Home » CentOS » Network Connectivity Lost After Reboot/upgrade
CentOS 10 Comments

I upgraded one of my old machines running 5.x to the latest kernel (from
308.24.1 to 348.1.1). After rebooting network connectivity was gone. I rebooted with the old kernel, I also tried the one before it (308.20.1) still no luck. So I
assume it’s got nothing to do with the kernel or even CentOS. But a hardware failure seems also unlikely, see below.

ethtool shows the link as up and if I remove the cable as down. I attached a laptop via crossover cable, it detects the link, but same problem. I disabled iptables and set selinux to disabled. No change. There’s a Xen VM running on that machine and I can ping it from the hardware. So, internal networking seems to be ok. I’m using bridged networking for Xen connectivity, setup by normal Red Hat means, not via Xen. Never had a problem. There are no errors in the logs, except for dhcpd telling network is down and named is also giving some weird errors. This is my only dhcpd, so I
would like to have it up ASAP :-(

Is there anything else besides a weird hardware failure that I could check? I’m going to get a new card tomorrow and see if that changes the situation. This is mobo internal networking based on nforce-MCP61.

Has anyone seen such a hardware failure where the link goes up but no packets go over the wire? It seems a bit unlikely that this hardware failure (and nothing else) should happen on a reboot after an upgrade.

Thanks.

Kai

10 thoughts on - Network Connectivity Lost After Reboot/upgrade

  • I’ve seen similarly weird things when running VMs on some smart switches where (and I’m not a networking guy here, so my terminology will get fuzzy) something was set to disable ports(port fast, maybe?)
    if multiple MACs were seen on the port (on machine other than my desktop, I can normally get that fixed by having a trunkport and default VLAN assigned to my port(s)). not sure if that could be applied to your situation.

  • thanks for the tip, but, unfortunately, this cannot be the case here. Networking of the host is also affected, even when Xen is shut off. I have no smart switches in this office and I ruled out switches by using a direct connection to the laptop.

    Kai

  • So it’s something unrelated to xen…

    Is the host using a static address or dhcp?

    If you tcpdump do you see all the packets you’d expect for layer 2
    connectivity (ie ARP requests and responses?)

    Does ss or ifconfig show any transmit or receive errors? Do packet counts go up?

    Given that ethtool states the link is up I’d statically configure an address and try to ping the gateway whilst running tcpdump … Then take the packet dump (-w filename to save it) and take a look in wire shark … You should see ‘who has gateway IP’ as an ARP request and the response from the gateway… Along with the ICMP echo-request and echo-reply packets…

  • specifically, use tcpdump on your bridged interface:
    tcpdump -nn -i br0

    Check your bridge details and make sure that the ethernet device is listed:
    brctl show

    If those look good, send the content of
    /etc/sysconfig/network-scripts/ifcfg-{br0,eth0} (or whatever eth device is a member of the bridge).

  • Gordon Messmer wrote on Mon, 04 Mar 2013 15:29:58 -0800:

    This is all fine, it’s been this way for years. It looks as it always has. No errors, collisions, whatever anywhere. TX and RX are about the same. Just to prove that config is fine I removed the bridge and brought up a normal eth0. It’s got the same problem. I’ve never seen such a problem before.

    The tcpdump shows a lot of arp requests who-has tell
    As I understand these are requests for MAC addresses? And tell is the asking IP number? In that case there is at least *some* outside connectivity. Most of the requests are from the local IP and the IP of the VM, but a few are from other machines on the network, including the outbound router. The VM
    runs a monitoring system and these are the clients that want to call in. Also a few UDP requests (port 1900 and NBT), and that’s all. There are also a few responses to the arp requests, but mostly it’s requests. Makes sense if it doesn’t have much in the arp cache. arp -a lists two machines with missing MAC data, that’s all.

    Kai

  • Things I would look at

    1. route to ensure that the routing table is correct.
    2. ifcfg- and see it there are any MAC addresses listed if so ensure they match the MAC address in ifconfig output.


    Regards Robert

    Linux The adventure of a lifetime.

    Linux User #296285
    Get Counted http://linuxcounter.net/

  • The arp request will have both the source IP address and the Ethernet address of the requesting host. tcpdump will only print the IP unless you use the -e flag.

    If the layout of your network is such a closely guarded secret that you can’t share the information that we need to help, you’re mostly on your own here.

    At this point, the problem could be almost anything. A bad switch port, or a bad switch, or a bad cable seem very likely. Try a new cable to a new switch port and reboot the switch if the problem continues. Try a full power down (as in, remove the power cable) for the affected system and with the switch. It sounds like your system is receiving packets but unable to send them to other hosts.

    From any other host on the network, you should be able to:
    tcpdump -nn -e ether host
    where
    is the Ethernet address of the system with no connectivity.
    If you try to ping any address at all, the other system should see it broadcasting ARP requests for the local destination or the default gateway. If you don’t see ARP requests on the other host, then you know that the affected system isn’t able to sent out traffic.

  • Kai Schaetzl wrote on Mon, 04 Mar 2013 19:15:46 +0100:

    It was indeed a weird hardware failure. All works fine with disabled inboard LAN and a cheap PCI network card.

    Kai

  • That’s a suitable workaround for getting a system operational again. In the end that is nothing more than a workaround, not a true solution. :-/

    But it would have been helpful if you had shared more information (think NIC model, NIC chipset, kernel module in use for that chipset).

  • SilverTip257 wrote on Tue, 5 Mar 2013 12:28:29 -0500:

    Why? It’s quite clear that this is a hardware failure. I tested a live CD
    and PXE booting on it with the same problem before buying the new card. I
    also tested the system disk fine in another machine. It’s got nothing to do with the system, although it happened right after the update/reboot.

    So, other than replacing the mobo, it *is* the solution. Mobo might be going haywire next as well, but currently it’s absolutely stable. And I
    have a backup now in case it wants to go …

    Kai

LEAVE A COMMENT