CentOS 7, Systemd, NetworkMangler, Oh, My

Home » CentOS » CentOS 7, Systemd, NetworkMangler, Oh, My
CentOS 34 Comments

My manager tells me a system in the datacenter is down. I go down there, and plug in a monitor-on-a-stick and keyboard. It’s up, but no network. I
try systemctl restart NetworkManager several times, and ip a shows *no*
change.

Finally, I do an ifdown, followed by an ifup, and everything’s wonderful.

My manager thinks that the NM daemon thinks everything’s fine, and there’ve been no changes, so it does nothing. He suggests that it might have to be stopped, then started, rather than restarted.

This is completely unacceptable behavior, since it leave the system with no network connection. Pre-systemd, as we all know, restart *RESTARTED*
the damn thing.

Is there some Magic (#insert “pixie-dust-sparkles”) incantation, either restarting NetworkManager, or using nm-cli, to force it to perform the expected actions?

Btw, if this is supposed to be part of the “hide stuff, desktop Linux users don’t need to know this stuff”, this is a *much* worse result.

mark (and yes, my manager’s truly aggravated about this, also)

34 thoughts on - CentOS 7, Systemd, NetworkMangler, Oh, My

  • I’d be interested in the journal from the NetworkManager restart as that’s not the way it behaves … it uses the netlink API to get state and not it’s own internal tracker of state (ie doing an ip link down will reflect in nmcli output) … a restart of NetworkManager should not ignore interfaces but rather bring the system to the on disk configured state … and a quick check it doesn’t override ExecRestart in the unit file to do a reload or similar instead …

    And indeed a quick test in a VM shows nmcli device status correctly changing between connected and unavailable when doing ip link set eth0
    down/up

    Do note that on a NM based system ifup and ifdown are effectively aliases to nmcli conn down and nmcli conn up

    nmcli conn down “connection name” will make it disconnected nmcli conn up “connection mame” will bring it back to connected

    there is a slight interesting difference between using nmcli and ip link set though …

    with ip link set down the interface is marked administratively down (as if you’ve pulled the cable) but nmcli conn down “connection name” will unconfigure the interface but leave it in an UP state … just without an IP address etc

    anyway that’s just an interesting diversion on behavioural differences

    NM won’t change an interface state without some sort of event though
    (manual or virtual cable pulled etc), and if you have a case where it
    *has* done that then you have found a bug that would be great to get reported

    TL;DR: cannot reproduce, need logs to determine what happened without a working crystal ball

  • there’s a really good solution to this.

    yum remove NetworkManager*

    chkconfig network on

    service network start

    and yes thats all under fedora 25, and CentOS 7.

    works like a charm.

    sometimes removing NM leaves resolv.conf pointing to the networkmanager directory, and its best to check this, and replace your resolv.conf link with a file with the correct settings.

    sorry if this upsets the people who maintain network mangler, but its inappropriate on a server.

    regards peter

  • This is terribly bad advice I’m afraid …

    https://access.redhat.com/solutions/783533

    The legacy network service is a fragile compilation of shell scripts
    (which is why certain changes like some bonding or tagging alterations require a full system restart or very careful unpicking manually with ip) and is effectively deprecated in RHEL at this time due to major bug fixes only but no feature work.

    You really should have a read through this as well:

    https://www.hogarthuk.com/?q=node/8

    On EL6 yes NM should be removed on anything but a wifi system but on EL7 unless you fall into a specific edge case as per the network docs:

    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Networking_Guide/index.html

    you really should be using NM for a variety of reasons.

    Incidentally Mark, this had nothing to do with systemd … I wish you would pick your topics a little more appropriately rather than tempting the usual flames.

  • What’s in /etc/sysconfig/network-scripts/ifcfg-? Does it say NM_CONTROLLED=no?

    “systemctl restart NetworkManager” completely stops the service and starts it again.

    Still does.

  • James Hogarth wrote:

    From journalctl, I see this happening when I do systemctl restart NetworkManager (much edited)
    Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7755] manager: (em1): new Ethernet device
    (/org/freedesktop/NetworkManager/Devi Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7791] ifcfg-rh: add connection in-memory
    (79d3ed9d-cc41-498c-9169-44320e332f68, Feb 13 09:47:52 systemd[1]: Started Hostname Service. Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7797] device (em1): state change: unmanaged -> unavailable
    (reason ‘connection-
    Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7805] device (em1): state change: unavailable -> disconnected
    (reason ‘connecti
    < ...>
    eb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7986] device (em1): state change: disconnected -> prepare
    (reason ‘none’) [30 4
    Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.7999] policy: set ’em1′ (em1) as default for IPv6 routing and DNS
    Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.8027] device (em1): state change: prepare -> config (reason
    ‘none’) [40 50 0]
    Feb 13 09:47:52 NetworkManager[67312]:
    [1486997272.8034] device (em1): state change: config -> ip-config (reason
    ‘none’) [50 70 0]
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3594] device (em1): state change: ip-config -> ip-check
    (reason ‘none’) [70 80
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3661] device (em1): state change: ip-check -> secondaries
    (reason ‘none’) [80 9
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3666] device (em1): state change: secondaries -> activated
    (reason ‘none’) [90
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3667] manager: NetworkManager state is now CONNECTED_GLOBAL
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3670] manager: NetworkManager state is now CONNECTED_SITE
    Feb 13 09:47:53 NetworkManager[67312]:
    [1486997273.3670] manager: NetworkManager state is now CONNECTED_GLOBAL
    Feb 13 09:47:53 nm-dispatcher[67317]: req:2
    ‘connectivity-change’: new request (6 scripts)
    Feb 13 09:47:53
    nm-dispatcher[67317]: req:2
    ‘connectivity-change’: start running ordered scripts… Feb 13 09:47:53
    NetworkManager[67312]:
    [1486997273.3697] device (em1): Activation: successful, device activated.

    Note there is no IP address being obtained. Now, when I run ifdown/ifup:

    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6804] device (em1): Activation: starting connection ’em1′
    (c432eaa1-023b-4f1f-a Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6809] audit: op=”connection-activate”
    uuid=”c432eaa1-023b-4f1f-a7b5-4605ec07195
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6810] device (em1): state change: disconnected -> prepare
    (reason ‘none’) [30 4
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6811] manager: NetworkManager state is now CONNECTING
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6816] device (em1): state change: prepare -> config (reason
    ‘none’) [40 50 0]
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6858] device (em1): state change: config -> ip-config (reason
    ‘none’) [50 70 0]
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6869] dhcp4 (em1): activation: beginning transaction (timeout in 45 seconds)
    Feb 13 09:48:17 NetworkManager[67312]:
    [1486997297.6900] dhcp4 (em1): dhclient started with pid 67715
    Feb 13 09:48:17 dhclient[67715]: DHCPDISCOVER on em1 to
    255.255.255.255 port 67 interval 6 (xid=0x745ba623)
    Feb 13 09:48:17
    dhclient[67715]: DHCPREQUEST on em1 to
    255.255.255.255 port 67 (xid=0x745ba623)
    Feb 13 09:48:17
    dhclient[67715]: DHCPOFFER from
    Feb 13 09:48:17 dhclient[67715]: DHCPACK from
    (xid=0x745ba623)

    And it then gets an IP address.

    And looking at /var/log/messages, it *appears* that the restart never invokes the dhclient script, while ifup does.

    mark

  • peter.winterflood wrote:
    That’t’d be a 100% agreement, good buddy…. We may have done it on some systems, but in general, we appear to be stuck with the damn thing.

    And why the *hell* would a server want wifi enabled, or avahi-daemon running by default?

    mark

  • Gordon Messmer wrote:
    Good catch. No, it doesn’t say no… because the line was commented out. I’ve just uncommented it, and set it to yes.

    mark

  • Commented out should be the same as =yes. Only =no will cause it to be managed by the old sysconfig scripts, unless I’m mistaken. As Johnny suggested, ONBOOT=no is another option that could be problematic.

    Your log was a little too edited. Some of the early lines were incomplete, so it’s hard to determine what’s going on. Maybe just send ifcfg-em1?

  • Gordon Messmer wrote:

    NAME=”em1″
    DEVICE=”em1″
    ONBOOT=yes NETBOOT=yes NM_CONTROLLED=”yes”
    UUID=”c432eaa1-023b-4f1f-a7b5-4605ec07195b”
    BOOTPROTO=dhcp TYPE=Ethernet SEARCH=”nih.gov”

    IPV6INIT=”yes”
    DHCPV6C=”yes”

    DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6_AUTOCONF=no IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no

    mark

  • That is your opinion .. and there are thousands of engineers from almost every major Linux distro who disagree with you.

    I am personally fine if people want to turn off NM .. but that is not what any of the Enterprise distros are doing.

    Opinions are fine .. I sometimes turn off NM as well .. and for some cases it is best.

    But as Linux installs become more and more complicated and it is not some individual machines in a rack but clouds, clusters, and containers with software defined networking and individual segments for specific applications spread out within the network, only talking to one another
    .. etc. Well, NM will be much more important.

    I get it .. but no one needed a hand held cell phone before 1973 and no one needed a smart phone before 2007. Now, almost everyone has a smart cell and land lines are dying. Technology moves forward. People want integrated cloud, container, SDN technology, etc. Used a VCR or Cassette Player lately?

  • Johnny Hughes wrote:

    I have no intention of *ever* getting an annoyaphone – I’m online all day at work, before I go to work, and most evenings, in front of a *real*
    computer. My cell’s a flipphone, and I *LOATHE* texts… because the protocol was developed for freakin’ pagers, and after a job 20 years ago, I don’t EVER want that again.

    And my land line phone has *much* better voice quality than any cell/mobile.*

    And yes, I very happily have my VCR, for all the tapes I have, and a good dual cassette deck (OK, I do want to burn them to disk… along with my
    200-300 vinyl records…oh, that’s right, vinyl’s coming back.

    mark, who’s older than a lot of you

  • All due respect, when we drop KISS it is rarely a good thing.

    Issue I am dealing with right now – all my VMs with linode are CentOS 7.

    Three of them are nameservers, I have to run my own because some of my sites – I use certificate authorities but do not trust them, DNSSEC with DANE is a must, and with DNSSEC the only way to make sure I’m the only one with access to the private signing key is to manage the zone files myself.

    One of the VMs (in London data center) was recently migrated to a different machine, I think because of a bad fan in the server.

    NSD never properly came up. After investigation, it is because the IPv6
    address changed.

    Trying to figure out why the IPv6 address changed has been a nightmare.

    Linode support suspects the reason is because the VM is using slaac private to request the IP address instead of slaac hwaddr – and suggested that I change the /etc/dhcpcd.conf file.

    Well CentOS 7 doesn’t use that, and trying to figure out where in the mess of /etc/sysconfig/network-scripts the problem is occurring has caused me much frustration.

    Why the bleep can’t stuff like this be simple KISS with simple key=value configuration files?

    So for now, that particular nameserver is only IPv4 until I figure it out, and modifying the network scripts to try and figure out how to fix it raises my blood pressure because if a modification causes the IPv4
    not to work, recovering becomes a real PITA.

  • As far as me not trusting certificate authorities – I read a Netcraft report a year ago or so that estimated about 100 fraudulent TLS
    certificates that browsers accept as valid are issued every month.

    PKI is seriously broken, it depends upon trusting certificate authorities that have repeatedly demonstrated they put profit over proper validation before issuing certificates.

    DNSSEC + DANE is the only viable solution, and DANE really only is secure when you know no one else has access to the private KSK ans ZSK
    and that pretty much means running your own authoritative nameservers, where a stable IP address is a must and VMs like what linode offers are the most cost effective way of making sure you have enough in geographically diverse locations.

    It’s a shame that Network Manager makes things so difficult, dhcp is how VM hosting service assign the IP addresses and they really shouldn’t change.

  • DHCPv6 is really unusual. IPv6 addressing and routing is set up almost entirely in the kernel, unless you’re using static addresses. IPv6 is neither harder nor easier with NetworkManager, in my experience.

  • Too much temptation to resist, I don’t know which one of us is older but I have a feeling it’s a “horse race”. Like you, I still have a land line, WiFi is too slow and “WiFi security” seems to be an oxymoronic phrase. Why people text (or IM for that matter) anything other than a one-liner is beyond me.

    Now for the real issue, what happens when Network Manager (Systemd, journald, etc.) breaks? Who is going to fix it? Hiding the complexity in software effectively dumbs us down leaving us helpless when problems surface. Anyone who has worked with Microsoft understands – give me the command prompt any day rather than layers of GUI hiding those possibly cryptic but also possibly useful messages.

    —– Original Message —

  • The people who are going to fix it are people who have RHCE certs and/or computer science degrees who work for the companies running Linux.

    And I am a few years old myself.

  • Once upon a time, Gordon Messmer said:

    Not sure about the version in CentOS, but in Fedora, NM disables kernel IPv6 autoconfiguration and “handles” it itself. This means that when I
    wake up my desktop from sleep, it can take 10-60 seconds to get working IPv6 (vs. the second or so it took the kernel). Progress…

  • Yes, stepping up to CentOS 7 reminded me MacOS Server which I had to help my Professor to maintain. For the most part it (MacOS Server) worked and all was self evident, but when it doesn’t you finally have to open their huge doc book just to discover that it merely explains you mostly in pictures how to navigate through their GUI menus. And each of them ended with something like “and you are done”. No descriptions of errors and what to do when one occurs. Because of which (unexpected errors) we actually opened documentation. (Then we finally agreed that no matter how huge the book is, documentation does not exist). My start with CentOS 7 to some extent reminded me this MacOS Server experience ;-) No, not ansence of documentation, but the attitude to make everybody use GUI. Exactly as you notice. I bet many users were lost by Linux then…

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Robert Nichols wrote:
    ‘Fraid I have a lot of sympathy with Robert. When something here breaks, we – me, the other admin, and our manager – are the ones who have to figure it out asap. We do have a few RH licenses… but even so, even if we *were* paying for a 4-hr response, that’s not soon enough….

    mark “I have enough problems with user teams that have *multiple*
    levels
    of symlinks….”

  • Sometimes on this list I get the impression that I’ve downloaded an entirely different release of CentOS 7 to other people.

    Exactly what GUI do you ever have to use with CentOS7? systemd all in has caused me remarkably little bother, getting on and doing what it’s told. I
    had some logind glitches, but those were fixable. I configure the lot with puppet, and to be honest found C7 pretty pain free as an upgrade. For various reasons, real happiness didn’t arrive until 7.2, but then lots of that was due to nvidia driver behaviours with Gnome3 that I suspect most people don’t have to worry about.

    But complaining that CentOS 7 is GUI driven I find baffling.

    jh

  • Exactly.

    If I install CentOS-7 on a desktop, I use gui things. If I install CentOS-7 on a server, I never install gui things (unless I am doing for someone who specifically asks for that).

    nmcli allows you to do anything you would do in a NM GUI.

    But the real bottom line is .. this is not the place where any of that could be changed anyway. CentOS is a rebuild of RHEL source code .. if RHEL does it, so do we.

    The other thing is .. CentOS-6 has security support until 30 Nov 2020, so no one has to upgrade to CentOS-7 or systemd for 3.75 more years. If you like the older things, use CentOS-6. If you want the new things, use CentOS-7.

  • Mark actually gets his hands dirty running the systems (on C7). He has a valid point which worries me – Red Hat’s gradual imitation of Micro
    $oft’s aversion to ordinary people understanding and controlling their systems.

    Luckily some of us remain on C6 because we love simplicity and stability. When C6 expires some will migrate to BSD rather than face C7’s persistent difficulties and confusion.

  • My VCR broke. Replaced it with a DVD/HDD & USB3 unit. Replaced cassette player and tape recorders with broadcast quality handheld recorder DR-100mk3 and an amazingly good Sony PX440.

    Still retain the original functionality. C7 doesn’t retain all the original functionality :-)

  • Enterprise_Linux/7/html-single/Networking_Guide/index.html

    Mark actually gets his hands dirty running the systems (on C7). He has a valid point which worries me – Red Hat’s gradual imitation of Micro
    $oft’s aversion to ordinary people understanding and controlling their systems.

    Luckily some of us remain on C6 because we love simplicity and stability. When C6 expires some will migrate to BSD rather than face C7’s persistent difficulties and confusion.

    And no he doesn’t have a point because that’s nonsense

    And course with the subject chosen this whole thread burned into flames rather than being constructive

    Can we just kill this now and if there is actually something wrong have a fresh thread with diagnostics?

  • Always Learning wrote:

    But how do you play all your old VCR tapes? As I said, I want to burn them to disk, but I still have a working VCR.

    mark

  • I converted my video tapes (the ones I taped myself, not movies I
    purchased on tapes: the last just went to garbage, the law here does not allow you to transfer purchased copyrighted videos to different carrier)
    into DVDs (with poorer quality that VCR has).

    What I needed was video card with video capture capability, and piece of software. Confession: I did it in Windows (2000 probably), the card was ATI Radeon (something), that had video (and audio) inputs and came with capture software. You can find stand alone video capture box that you can feed from VCR as well. once you have mpeg video files, it is trivial to conver them to DVD structure. For that I used ffmpeg and dvdauthor (both run on Linux on FreeBSD).

    I hope this helps.

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • ugh, the video quality of VHS is *so* nasty, I don’t WANT to play those old tapes any more. I do have a still working Hi8 VCR I’ve used to convert some of our old camcorder tapes to digital (burned onto DVDs and/or converted to MP4 files), the quality on that was a good notch better than VHS, I connect the s-video output of the deck to a USB
    dongle (from Hauppauge), and run a pile of MS windows software to suck in the tape and convert the results to useful formats.

    My old cassette deck (a Denon) is still plugged into my stereo, I don’t think I’ve used it once in 10 years.

  • I converted all of them to DVDs several years ago.

    Like you I still have vinyl disks, 33 rpm and 45 rpm from the lat 1960’s and early 1970’s. Although a classical music fan, some of the old singles are evocative classics in their own right. I need to convert them.

    Paul.

    P.S. Landlines = better quality than mobiles. Non-Smart Phones can’t get hacked or mics and cameras turned-on remotely. Prefer my Canon SX40 and Nikon D7100 to any Smart Phone. Wifi has guest zones but is usually disabled.

LEAVE A COMMENT