Configuring Source-specific Routing

Home » CentOS » Configuring Source-specific Routing
CentOS 22 Comments

I’m attempting to configure source-specific routing so that my servers can exist on multiple subnets from multiple upstream providers.

A rough diagram of the network layout:

ISP1 router (blackbox, routes subnet A, address on subnet A)
\
———–eth0(firewall)eth1—((servers))
/
ISP2 router (blackbox, routes subnet B, address on subnet B)

The aim is to allow the servers to use both subnet A and subnet B. To allow this, any machine on both subnets must have source-specific routing configured, else packets originating from one ISP’s AS will be directed at the other’s router, and neither ISP cares for that.

At the moment, I’m focusing on getting the second ISP properly added to the firewall box. The firewall box is using CentOS 6.4, and normally passes traffic back and forth via proxy_arp. None of my interfaces are NM_CONTROLLED, and NetworkManager is not installed, much less started.

I’ve created a route-eth0:1 file that looks roughly like this:

10.0.0.1 dev eth0:1 \
src 10.0.0.2 \
from 10.0.0.0/29

default via 10.0.0.1 dev eth0:1 \
src 10.0.0.2 \
from 10.0.0.0/29

(Treat indented lines as continuations of the previous line)
(No, the ISPs aren’t giving me RFC1918 addresses; these are redacted.)

If I run “ifup eth0:1”, “ip route show” includes the lines:

10.0.0.1 dev eth0 scope link src 10.0.0.2
10.0.0.0/29 dev eth0 proto kernel scope link src 10.0.0.2
default via 10.0.0.1 dev eth0

Note that the “from 10.0.0.0/29” clause is missing. With the addition of a second default route on my firewall/gateway without any restriction on which traffic should go that way, my whole network, of course, tanks.

I’m surprised it’s been such a pain; I would have expected it to be a relatively common configuration. What’s the proper way of doing source-specific routing on CentOS?

22 thoughts on - Configuring Source-specific Routing

  • Kinda curious why you are attempting this without getting involved in dynamic routing (BGP)… It’s usually someone trying to do multihoming or multi-link load balancing on the cheap without involving their ISPs
    (which tends to be expensive as soon as you’re talking with them about redundant / backup loops, provider independent addresses, and BGP
    peering). Generally equates to “champagne taste on a beer budget” but there are exceptions and reasons, as I know from personal experience. It often doesn’t end well and is unreliable as network conditions change. But that depends on your requirements and application. I’m not one to judge – just pointing out the pitfalls.

    I have done this a number of times in the past (mostly for VPN’s and redundant load-balancing links). You’re probably going to have get real down and dirty into policy routing rules and tables with iproute2. I
    don’t honestly believe you will be able to pull it off with the basic stuff provided in the ifcfg-*, route-*, or static-route files (proviso below).

    I had to do it using completely custom files utilizing “ip rule” and “ip route {add|delete} table [n]” subcommands to “ip” to build custom matching rules and mapping them to different routing tables containing different routes and priorities. In some cases, with OpenVPN VPNs, I
    also had to incorporate iptables filtering commands to mark and match packets and interact with the ip rule tables but I doubt you’re going that deep.

    man ip-rule

  • Yup, I know.

    Intent is to maintain the old, slow (but has an SLA) connection as a fallback, and migrate services to the new connection piecemeal. Meanwhile, the same DNS server on the new connection can be, e.g. “ns3”. The same mailserver can have a new MX on the new connection…likely prioritized to it.

    Inbound services can be load-balanced fairly easily via DNS, if TTLs are kept low, and records updated in response to link state. It’s not anycast DNS, but it also doesn’t require to you get BGP peering and PI
    space. (I don’t even know if I could *get* IPv4 PI space at this point. I certainly know I wouldn’t be able to if I waited a year…)

    Yeah, I’ve gone that deep. And a tad deeper. I had almost *everything*
    working by hand, and went to figure out how to convert it to idomatic CentOS network configuration scripts. And took my network down *three times* because of the script-processing stripping things out.

    Yup. I went through LARTC before writing a line of code, just to be sure.

    Curiously, at least one guy has reported success:

    http://sysadminsjourney.com/content/2009/04/15/doing-simple-source-policy-routing-CentOS/

    Now, the only thing different between his setup and mine (apart from my using ethN:1 instead of ethN, as all three routers hang off the same ethernet segment) is that were his guide says:

    echo “default table CorpNet via 10.0.0.1” >
    /etc/sysconfig/network-scripts/route-eth1

    My first pass at making my code platform-idomatic effectively was:

    echo “default via 10.0.0.1 table CorpNet” >
    /etc/sysconfig/network-scripts/route-eth1

    (the “table $table” clause in mine was at the end of the line, following the pattern I’d read in LARTC, rather than near the beginning of the line.)

  • The files to use for this in RHEL land are rule-ethX similar to how ifcfg-ethX and route-ethX get used …

  • Note that there are more straightforward ways to do this. One is to pretend you are big enough to have a distributed server farm and actually have independent servers at the other IPs, even if they are VMs. This is fairly easy for mostly-static or database-driven web sites, fairly difficult for apllications that are more statefull but perhaps possible with a common NFS backend. Another is to have application-level proxies or load balancers like haproxy, nginx, apache configured as a reverse-proxy, or even port forwarding with an xinetd ‘redirect’ configuration. This loses the source ip from the application logs, although the http proxys have an option to pass them. Similarly you could use iptables to source-nat on the receiving side and forward to a backend server. These all have some disadvantages, but with separate hosts each having one default gateway to the internet and static routes for your own local ranges you have a lot less black magic involved.

  • Actually, this is all stuff (well, except for haproxy) we have implemented. 80-90% of my servers don’t even need (and, ultimately, won’t have) public IP addresses. (And I still won’t need NAT, thank god.)

    Internally, I’m not far from having things set up as a fluid private cloud with scaleable services.

    Ultimately, for this to work cleanly, anything which requires a public IP (be it a raw authoritative DNS server or a load balancer) will require an IP on both public subnets.

    The only blocker right now is getting CentOS to do source-policy routing properly.

  • Read that whole document before writing a line of code.

    Also of use, in case anyone else comes across this thread:
    Network Warrior, by Gary A. Donahue The TCP/IP Guide, by Charles M. Kozierok NIST SP 800-800-119, Guidelines for the Secure Deployment of IPv6
    IPv6 Network Administration, by Niall Richard Murphy & David Malone Content Delivery Networks, edited by Rajkumar Buyya, Mukaddim Pathan, Athena Vakali (In particular, see DNS-based network management)

    That’s most of the relevant network-related stuff I’ve got in my library.

  • Yup. And if you put a line in route-ethN like:

    default via 10.0.0.1 dev ethN from 10.0.0.0/24

    you’re in for a rude shock; running “ip route show” after bringing up ethN will show something like:

    default via 10.0.0.1 dev ethN

    …having stripped the key “from 10.0.0.0/24” portion. I ran into similar problems with “table SomeTable”.

  • No it doesn’t, as long as you don’t mind losing the source IP for logging or configure your http proxy to pass it. You can use separate front end proxies or load balancers on each public range, with its default gateway pointing toward the ISP handling it. DNS service is simple enough to have standalone servers for each instance you need. Web browsers are actually very good at handling multiple IPs in DNS
    responses and doing their own failover if some of the IPs don’t respond. SMTP will retry following your MX priorities. For other services you might need to actively change DNS to drop IPs if you know they have become unreachable, though.

    It’s a black art – I’d give up the source IP logging first and rely on the back end servers sending back to the proxy that received the request and only has the default route to that one ISP.

  • No, I really can’t. And not for reasons I can change until this summer, at the earliest, nor can I discuss them without breach of NDA.

    This would also require either resources or underlying authorizations I
    don’t have.

    It varies greatly by client software. And given the explosion of unreliable network connections (wifi, mobile), some of that failover logic’s margin is already lost in dropped packets between the client and their local network gateway.

    Yup. MX is a no-brainer, as are NS and SIP/SRV.

    Yup. That’s what I was planning on doing, more or less. Start with ordering IPs by route preference, drop IPs by link state. I just wish I
    could drive it by snooping OSPF…

    Once you’ve read the docs and tried a few commands, it’s pretty easy to wrap your head around it. My problem is that what I was able to get working by hand gets mangled by the processing logic for
    /etc/sysconfig/network-scripts/route-ethN.

    I’m not doing any special logging. That one firewall/routing device sits between the ISP routers and _all_ my internal machines. Everything sits behind it. There are reasons for this.

  • CentOS VMs are really, really cheap….

    Yes, but typically they can deal with receiving multple IPs from the initial DNS lookup even if some are broken better/faster than getting one IP which subsequently breaks and then having to do another DNS
    lookup to get a working target. At least the few broswers I tested a while back did…

    I don’t think you can count on your ordering reaching the clients or meaning anything to them if it does. And some applications won’t ever do a lookup again.

  • That’s really, truly, seriously not the issue. I don’t know if you saw where I said I was setting up a private cloud.

    And, as I said, I can’t discuss the problem without breach of NDA.

    You missed my point, my point was that your margin is already eaten into by unreliable networks.

    Yes, intermediate resolvers may reorder responses. That’s fine and pretty normal. If ordering responses doesn’t work, I fall back to a stochastic approach; that’s actually rather a “given”, since an oversaturated link qualifies as “down” for the purpose of new connections.

    And, yes, there’s a lot of client software out there (*especially web browsers*) which cache responses and disregard TTLs. To those users, I
    really can only say “have you tried turning it off and back on again?”

    But here we are, arguing about *load balancing*, when the problem I face is, frankly, one of taking either of a pair of *known-to-work* sequences of invocations of “ip” commands and getting whatever process
    /etc/sysconf/network-scripts/{ifcfg-eth*,ifcfg-route*} to maneuver the kernel into the same resulting state.

    Source-based routing frankly isn’t that hard! From the perspective of an edge node (i.e. a server):

    # First subnet ip addr add 10.0.0.2/24 dev eth0 brd 10.1.0.255
    ip route add default via 10.0.0.1 dev eth0 src 10.0.0.2

    # Second subnet ip addr add 10.1.0.2/24 dev eth0 brd 10.1.0.255
    ip route add default via 10.1.0.1 dev eth0 src 10.1.0.2

    and from a router’s perspective, it’s

    # Assuming proxy_arp is set on eth0 and eth1
    # Sets up source-specific routing for 10.0.0.0/24
    # WAN hangs off eth0. LAN hangs off eth1. ip addr add 10.0.0.2/24 dev eth1 brd 10.0.0.255 # To LAN
    ip addr add 10.0.0.2 dev eth0 # For the benefit of ‘src 10.0.0.2’ below ip route add 10.0.0.1 dev eth0 src 10.0.0.2 # For ‘via 10.0.0.1’ below ip route add default via 10.0.0.1 dev eth0 src 10.0.0.2 from 10.0.0.0/24

    # Assuming proxy_arp is set on eth0 and eth1
    # Sets up source-specific routing for 10.1.0.0/24
    # WAN hangs off eth0. LAN hangs off eth1. ip addr add 10.1.0.2 dev eth1 brd 10.1.0.255 # To LAN
    ip addr add 10.1.0.2 dev eth0 # For the benefit of ‘src 10.1.0.2’ below ip route add 10.1.0.1 dev eth0 src 10.1.0.2 # For ‘via 10.1.0.1’ below ip route add default via 10.1.0.1 dev eth0 src 10.1.0.2 from 10.1.0.0/24

    That’s it! (unless I typo’d or thinko’d something coming up with these examples.) It took me all of three or four hours yesterday to learn this much of it. Then the rest of the day discovering the stuff I was putting in route-ethN wasn’t being honored.

    My problem has been that the “from 10.x.0.0/24” parameter keeps getting stripped by whatever processes /etc/sysconfig/network-scripts/route-ethN

  • You’re not showing table numbers or names there so it’s not clear if you are using different route tables or not (which you MUST do and associate them with appropriate match rules).

    According to “man ip-route” on my router the “from” stanza is not valid in a “route add” (route-ethN files) and in a “route ls” is only applicable to “cloned” routes. What you wrote can not literally work, by my reading of the “ip” man pages.

    You get the source matching from the “rules” not the “routes”. You haven’t mentioned (or acknowledged) anything about them but they are crucial (as are the use of multiple tables). What did you set up for your match rules? No match rules, then only the default and local tables are going to be used. Your “from” specifier goes in your rules, not your routes.

    When I look at my route tables, I see src associated with an appropriate route. I don’t see any “from” matches because they are not in the route tables they’re in the rules. You also have to look at “ip rules ls”. That’s where your “from” is going to show up and then tell you what table it’s going to use as its routing table.

    Ok, so you are using the table named “CorpNet” which you must have added to /etc/iproute2/rt_tables in advance (his step 1) otherwise you can only use table numbers. The position of the “table CorpNet” should make no difference. Have you added a corresponding match rule (his step 3)?
    He defines “CorpNet” as table 200.

    It looks like, if you did everything he describes in that article, it should work. I’ve done this manually and what he describes there exactly matches what I would expect to work. You really haven’t given us the complete picture of what you did (or if you did, you left out a couple of steps). Obfuscate the addresses and names if you must but you need to be clearer on the contents of all the configuration files.

    /etc/iproute2/rt_tables
    /etc/sysconfig/network-scripts/ifcfg-eth0
    /etc/sysconfig/network-scripts/route-eth0
    /etc/sysconfig/network-scripts/rule-eth0

    I’m also not real sure how this works with additional addresses like eth0:N. Not sure if you need these in the parent device or the alias device. Looks like putting under the alias interface should be fine.

    Regards, Mike

  • Alternate source routing, firewall and netfilter marking of packets:

    iptables -t mangle -A PREROUTING -s 172.24.5.0/24 -j MARK –set-mark 100 #
    iptables -t mangle -A PREROUTING -s 192.168.150.107 -j MARK –set-mark
    200 #
    iptables -t mangle -A PREROUTING -s 192.168.150.224 -j MARK –set-mark 100

    # Local network iptables -t mangle -A PREROUTING -d 192.168.0.0/16 -j MARK –set-mark 20
    iptables -t mangle -A PREROUTING -d 172.16.0.0/12 -j MARK –set-mark 20
    iptables -t mangle -A PREROUTING -s -d 192.168.0.0/16 -j MARK
    –set-mark 20
    iptables -t mangle -A PREROUTING -s
    -d 172.16.0.0/12 -j MARK
    –set-mark 20

    And then something like:

    # echo 201 mail.out >> /etc/iproute2/rt_tables
    # ip rule add fwmark 1 table mail.out
    # /sbin/ip route add default via 195.96.98.253 dev eth0 table mail.out

    (http://lartc.org/howto/lartc.netfilter.html).

    Used firewall rules are from StarOS router OS that has simple script for policy routing so that second part with ip rule and ip route is just a pointer in right direction.

    Ljubomir Ljubojevic

  • [snip]

    I tried it both ways, honestly. I’ve been blasted (postfix) or ignored
    (samba) more than a few times in other environments for providing too much information, so I didn’t think it wise doing a writeup of both approaches. Can’t win. Can’t even break even…

    I was going to ask you how you tied in your manual script…

    So, this is interesting. I’d read that you could use a command like

    ip route add 1.2.3.4/32 dev eth0 via 10.1.0.1 src 10.1.0.12 from 4.3.2.1/24

    with the “from 8.3.2.1/24” portion as part of the IP command, but that using tables was usually done because it was easier.

    What’s bizarre is that I could have sworn I had this type of rule even working. But when I run it on my laptop, and follow up with “ip rule show”, the “from X” clause is gone.

    This calls into question everything else I was convinced I had working, too. But I do know my ‘table CorpNet’ approach worked when applied manually, but not when I tried converting it to route-ethN. I won’t be able to try it again for a while, either, but I’ve got a hunch why it didn’t work.

    Yup. See above where I discover “from a.b.c.d” isn’t a valid clause to attach to the ip command. As finicky as that command is, I’m disappointed it didn’t throw an error.

    Yup. I just re-read through to double check, when my manual invocation on my laptop didn’t work.

    I hear you. I just wish I’d documented my first approach (using tables)
    better; I’m sure it was a silly error, and I’m getting more sure it was. I’d rather have had someone thump me over the head and point out a simple error than spend three days arguing over whether or not source-specific routing makes sense.

    I’m thinking my problem must have been in the rule file. If I had to guess, I didn’t add the file, but was instead counting on the rule I’d put in manually to be sufficient to pick it up. I imagine restarting the interface flushed the rule list.

    I’m not fond of the alias devices, but you might call it part of the
    “coding style” of this network.

    Thank you for the responses. Once I get this working, I’ll certainly report back. But I’m going to guess it might be at least a few weeks before I can poke it again. Things need to settle down, first.

  • Michael, very frustrating that so much noise for a very simple request. I
    set up multi source routing in 5.3 or so and was astounded at all the negativity on this list and that it could not be done. It will take forever to read the noise in this thread alone. Some said you have to use DHCP…. i could go on.

    Do not trust that ping -I will work how you would think. Must specify an IP address, not eth0, not eth1. ping -I 10.0.0.1 8.8.8.8

    This really is just a few lines per interface.

    Learn by changing the /etc/sysconfig/network-scripts/ifup-route shell scripts to add logging. echo out variables.

    There is no need to get iptables involved at all unless doing something very special.

    i did not want to setup quagga or some form of dynamic routing deamon because of security concerns. i wanted static IP addresses communicating to the ISP on static routes. It is pretty simple. Maybe i can hook up my laptop to 3G and WiFi and Cat6 and make sure i get it working. Please remember to use IP addresses, not names for ping testing. Scrutinize ping results.

    ping -I 10.0.0.1 8.8.8.8

  • [snip]

    I don’t figure I want to use the mangle table for this. Though thanks for the example code; that will come in handy for tc. Just need how to work that in with sanewall.

    I think I know what I did wrong, but it’s going to be a while before I
    can test it. (Dang, I wish I had enough spare hardware at home to set up a test lab.)

  • Yup. Sans the obfuscated IP address, that’s exactly what I tested.

    I tried adding “set -x” to them. :)

    Yeah, I don’t see a use for quagga at this time.

    [snip]

  • You can set up VM’s and create several virtual interfaces. So you use one virtualNIC (vNIC) to connect 2 VM’s into same subnet, 2nd vNIC for
    2nd and 3rd virtual guest, and so on. That way you can create any type of virtual network. If you use CentOS without GUI, you can reduce used RAM when you run several guests.

    Ljubomir Ljubojevic

  • Any neighbors with Open WiFi?
    Connect Cat5 to laptop in your house and connect to neighbors open WiFi. Woila, two ISPs.

    If you have 3G, it will work better to connect it into a CradlePoint type
    3G hardware gateway device and connect the laptop to the 3G Gateway. NetworkManager would only activate my bluetooth-to-3G connection when i turned WiFi off. (Further, i just ran `ip route` on my android phone while connected to 3G and WiFi and the android output was disappointing. Does not have both active at same time.)

  • Find some businesses that both have open wifi near each other. Bring an old WiFi router and a Cat5 cable. Connect your laptop WiFi to one open hotspot. Connect the old WiFi router in client access mode to another open wifi.

  • Somebody oughta try an external USB WiFi dongle on a laptop with internal WiFi. Does NetworkManager handle two WiFi devices?

LEAVE A COMMENT