Dbus/systemd Failure On Startup (CentOS 7.7)

Home » CentOS » Dbus/systemd Failure On Startup (CentOS 7.7)
CentOS 6 Comments

We are seeing a problem that occurs ~5% of the time when rebooting CentOS 7.7 where systemd gets a ‘Connection timed out’ to D-Bus just after the D-Bus service starts – from ‘journalctl -x’ :

… Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System Message Bus.
— Subject: Unit dbus.service has finished start-up
— Defined-By: systemd
— Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

6 thoughts on - Dbus/systemd Failure On Startup (CentOS 7.7)

  • I see such issues on a quite large multi user system but when this happens, after forced restarts for kernel updates, I usually don’t have the time to analyze and play doctor on it. My “solution” now is to simply reboot the server again in such a case, AKA the systemd way :-)

    I think the root of the problem is that there are missing definitions in some of the systemd scripts. They allow things to work in 95% or greater of the cases but this happens by chance, not because of perfect process handling and system control. Small delays somewhere or uncommon system environments then lead to intermittent failures which are difficult to diagnose – at least for me.

    The good news is that you can just fiddle with the systemd scripts the same way we fiddled with init scripts in the past. That way you can try and error until you find a solution. Doesn’t sound like being in full control of things but better than not finding a solution at all.

    Regards, Simon

  • Simon Matter via CentOS wrote:

    Yeah, we found that by introducing a small delay before the ExecStart in the dbus.service unit – even a delay of just 0.01 seconds (via
    ‘ExecStartPre=/usr/bin/sleep 0.01’) _seems_ to workaround the issue …

    However, we would still like to know what the issue is and get a ‘real’
    fix – I guess we could try creating a bug report with Redhat …

    Thanks

    James Pearson

  • Nice that you found at least a workaround. I think I remember that dbus is quite special here because systemd starts it but also depends on it. At least I remember cases where dbus got crazy for whatever reason: the result was that systemd became completely unresponsive and unmanageable and the whole system went down the drain, slowly but steady. Ever tried to shutdown a box if systemd doesn’t listen to you anymore? The perfect Windows experience on Linux ;-)

    By bug report you mean BZ or a support request as paying RHEL customer?

    Unfortunately I’m not too happy anymore with how BZs are handled these days. Am I alone with this feeling?

    Regards, Simon

  • Simon Matter wrote:

    A BZ …

    I’ve had mixed results with BZs – it appears if a bug ‘tickles the fancy’ of someone a Redhat that sees the ticket, then you can get good results – otherwise, they just sit there until the release goes out of support and they get dropped :-)

    James Pearson

  • James Pearson wrote:

    We’ve managed to work out what the problem is – it is the same issue as given in https://bugzilla.redhat.com/show_bug.cgi?id31486

    We have a legacy use of NIS for groups – which can cause a boot time deadlock:

    systemd->dbus->nis(glibc)->rpcbind->systemd

    A workaround is given in https://access.redhat.com/solutions/3900301
    (account needed to view) – but it is just essentially reverting the changes made to /usr/lib/systemd/system/rpcbind.socket between 7.5 and 7.6

    James Pearson

  • Starting with CentOS-8 Stream, you will be able to fix this issues like this yourself and then submit a pull request for review to get it rolled into CentOS Stream and then into RHEL proper.

    Also, you can figure out what is wrong and submit the fix WITH the BZ .. i mean, that is why the CentOS community exists .. to submit community fixes.