Need Help With Linux Networking Interfaces And NIC Bonding

Home » CentOS » Need Help With Linux Networking Interfaces And NIC Bonding
CentOS 3 Comments

Hello everyone

I am running into some strange issues when configuring networking interfaces on my physical server running CentOS 7.5. Let me give you an overview of what’s going on:

We have a physical server, running CentOS 7.5. This server has one 4 port NIC and one 2 port NIC and a Dell IDRAC port. The first port of the 4 port NIC, em1, is used for Management traffic. The first port of the 2 port NIC, is used for the second port in the NIC bond, device p6p2. The second port on the 4 port NIC, device em2 is the first, port on the NIC bond.

These interfaces are using Static IPs.

Here is my /etc/sysconfig/network-scripts/ifcfg-em1 file. Please keep in mind that I have changed the IPs and MAC addresses in the files for security reasons:

ifcfg-em1:

TYPE=”Ethernet”
PROXY_METHOD=”none”
BROWSER_ONLY=”no”
BOOTPROTO=”none”
DEFROUTE=”yes”
IPV4_FAILURE_FATAL=”no”
IPV6INIT=”yes”
IPV6_AUTOCONF=”yes”
IPV6_DEFROUTE=”yes”
IPV6_FAILURE_FATAL=”no”
IPV6_ADDR_GEN_MODE=”stable-privacy”
NAME=”em1″
UUID=”bbb2f9c2-141b-4a99-ab1e-328551aae612″
DEVICE=”em1″
ONBOOT=”yes”
IPADDR=”192.168.56.50″
PREFIX=”24″
GATEWAY=”192.168.56.1″
DNS1=”192.168.126.10″
DNS2=”192.168.220.10″
IPV6_PRIVACY=”no”
NM_CONTROLLED=no

as for the ifcfg-bond0 (the configuration file for the NIC bond, which is bond0):

DEVICE=bond0
NAME=bond0
TYPE=Bond ONBOOT=yes BOOTPROTO=none IPADDR2.168.56.70
PREFIX$
BONDING_MASTER=yes BONDING_OPT=”mode=1 miimon0″
TYPE=Ethernet

and the ifcfg-slave1 configuration file, which is the first slave port for the NIC bond, this corresponds to em2:

DEVICE=em2
HWADDR=”c8:2f:87:fg:2a:31″
ONBOOT=yes TYPE=Ethernet BOOTPROTO=none MASTER=bond0
SLAVE=yes

and the ifcfg-slave2 configuration file , which corresponds to the second slave port for the NIC bond, which is interface p6p2:

DEVICE=p6p2
HWADDR=”00:6a:d7:7c:e8:09″
BOOTPROTO=none ONBOOT=yes TYPE=Ethernet MASTER=bond0
SLAVE=yes

I created a custom routing policy for the NIC bond, bond0. Here is the configuration for the routing policy:

route-bond0:

192.168.56.0/24 dev bond0 src 192.168.56.70 table t1
default via 192.168.56.1 dev bond0 table t1

and the rule-bond0 file:

table t1 from 192.168.56.70

as for the routing table:

Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.56.1 0.0.0.0 UG 0 0 0 bond0
192.168.56.0 0.0.0.0 255.255.255.0 U 0 0 0
bond0
192.168.56.0 0.0.0.0 255.255.255.0 U 0 0 0 em1
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 em1
169.254.0.0 0.0.0.0 255.255.0.0 U 1008 0 0
bond0

now here is the scenario I am dealing with:

This linux server is used for monitoring purposes. We have Nagios, Cacti and other tools installed on it. There are a few things I have noticed and I want help on:

1) Whenever I ping any of the devices on our network, from this server, the traffic goes out from the management port. I do not want the traffic to go out of the management port. I want it to go out through the active port of the NIC bond. How do I configure the networking so that all primary network traffic flows to and from the NIC bonded interfaces? I only want the management port to be used for SSH purposes and well, management of the server.

2) I have configured the NIC bond in active-backup mode. I notice that when I used another computer to do a continuous ping to the NIC bond, and then I
disable one of the slave interfaces of the bond, the ping drops and it does not failover to the backup slave interface and turn into the active one. It also causes any pings to the management port to drop as well. Then when I disable slave2, and enable slave1, the traffic does not fail over to slave1 and the ping continuously fails. It is only when I enable both slave interfaces and then either restart the networking using systemctl restart network, or reboot the server, the networking resumes and the pings succeed again. What steps should I take to fix this issue? Should I even use active-backup mode with the NIC bond or is there a better mode I should use?

3) Ive tested the networking, by changing the VLAN of the NIC bonded ports, on the switch, to a different VLAN, and it caused the management port to stop responding to ping. Why is this and how do I fix that if I
decide to one day use two different VLANs for Management and the NIC bond ports?

Thank you for all of your help in advance!

Sean

3 thoughts on - Need Help With Linux Networking Interfaces And NIC Bonding

  • Hi Sean,

    [snip]

    When the server *originates* traffic, it will use the main routing table, and that’s why traffic goes out of em1. There’s no rule telling the server that when the traffic is initiated by the server, it must consult a different routing table, t1.

    One way to ensure that all the monitoring traffic goes through bond0, is to configure every service with an explicit source address. However, some services allow this, and some don’t, so this quickly becomes cumbersome.

    What you probably want to do is to invert your rules and routes, so that the bond0 interface is in the main table, and you put your management interface, em1, into another table (t1). Then, when you SSH into the server, it will use em1, but all other traffic will use bond0 by default.

    Regards, Anand

  • I don’t know if this is your situation or not but I have found in my bonding testing that failover can take what I consider to be an inordinate amount of time (as in up to 50 seconds). Were you “patient” (possibly using an altered definition of the term) to see if ping would eventually reply.

    Join us at the 2018 Momentum User Conference!
    Register here Leroy Tennison Network Information/Cyber Security Specialist E: leroy@datavoiceint.com
    2220 Bush Dr McKinney, Texas
    75070
    http://www.datavoiceint.com TThis message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. These companies are listed here
    . If you prefer not to be contacted by Harris Operating Group please notify us
    . This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message.

  • First:  You never mentioned updating /etc/iproute2/rt_tables. Did you create an entry there for table “t1”?  If you haven’t done that, then your alternate routing table and your rules aren’t loading.  Run “ip route show” and “ip route show table t1” to make sure they both exist.

    Second: I agree with Anand.  You should remove the default route from your management interface ifcfg file, and add one to the primary device.  Use the rules and alternate route tables for the management interface *if* it needs a default route at all.  If it’s only supposed to communicate with other devices in the management network (which is typical, in the systems I’ve managed), then it shouldn’t need a default route at all.

    Third: When you are multi-homed, you should be selecting a specific interface with “ping -I “.  Don’t rely on its automatic detection of the appropriate interface.

    Can you define “disable” more specifically?

    The up side of active-backup is that it should work with generic switches, without any specific support on their end.  The down side of active-backup is that your switches may remember the association between MAC address and port number for pretty much as long as they like, and some switches will take a *really* long time to update.

    That depends on what component you think might fail, and what redundancy exists outside the system.  In my opinion: active-backup only makes sense if you have separate switches, since those fail more often than NICs.  And it only makes sense if everything behaves well when you pull power from one of the switches.  If you turn off one of your switches and the network stops working, then you shouldn’t use active-backup.

    Best guess: set arp_filter to “1” and wait for your switches to update their MAC/port mapping

    https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt