A CentOS 7 box here recently started being very slow to give the Password prompt when using sudo. (25 seconds!) I eventually tracked this down to the following complaint in the kernel message log:
Since fprintd is the fingerprint reader daemon and this server doesn’t have such a peripheral, I removed the service. Now sudo is fast again. [*]
Although that solves my immediate problem, while debugging all of this I noticed substantially the same error message also appearing for libvirtd:
I don’t want to employ the same solution in this case — i.e. disable libvirtd — because while I don’t yet use KVM on that server, I actually had plans to do so soon. I could host those VMs elsewhere, but that feels like sweeping the problem under the rug.
I want to fix this, but I’m now stuck.
I’ve ruled out several possible causes already:
1. It isn’t DNS. DNS responds quickly.
2. It isn’t a flaky OS drive. Swapping the spinning rust drive out for an SSD was already on the wish list for this system, so I took the opportunity to reinstall the OS fresh on a brand new SSD. The same problem occurred shortly after booting CentOS 7.2 for the first time, and it continues to happen periodically.
The only thing I’ve copied over from the old spinning drive so far is /home and /usr/local, and I can’t see how that could affect the kernel. Besides, the problem still occurs while booted into single-user mode.
3. It isn’t the RAID card. A RAID verify pass completed successfully, as did smartctl -t long tests on all of the individual drives. I even disabled the RAID card’s OPROM and blacklisted its driver at one point, just to see if the driver or RAID BIOS were the cause of the problem. I haven’t actually pulled the card yet, but that seems unlikely to solve anything, given that the array seems to be storing data reliably.
4. It isn’t the RAM. It just passed a memtest86+ run. That probably exonerates the CPU and north bridge, too. (I used v4.20 from the CentOS 7.2 install ISO.)
5. It isn’t stale packages. The OS reinstall proves that. It’s currently running the tip of CentOS 7.2 and still showing the same symptoms.
6. It isn’t high-uptime kernel space cruft, since it’s been rebooted a bunch of times during all of this, including several hard reboots.
Rebooting is especially painful at the moment because the system hangs on shutdown waiting for libvirtd to respond. While I had fprintd installed, that was causing 3-minute hangs on boot, too.
[*] In case it isn’t clear how fprintd could affect sudo, it’s because fprintd installs a PAM module, and sudo uses PAM for authentication. If fprintd is stuck, the PAM call stalls until it gives up and moves on.
I tracked this down with strace. The 25 seconds above is the timeout value passed to poll(2); that timeout is hit while sudo (via PAM) is trying to talk to dbus, since apparently fprintd communicates via dbus.
Because the problem is in PAM and not sudo, it also affected su. It did not affect console or SSH user logins for some reason.
I’m posting these details in case this diagnosis helps someone. While chasing this, I found a bunch of postings on the net from people who have run into such problems before, but none of the threads mentioned this particular failure mode. For all I know, I’m the first person it’s ever happened to, so this needs to be in the archives.