Server Entering Emergency Shell, But Continues Fine After Pressing Enter

Home » CentOS » Server Entering Emergency Shell, But Continues Fine After Pressing Enter

September 9, 2020 Quinn Comendant CentOS 14 Comments

Hello all,

I’ve got an odd problem that doesn’t seem to be mentioned anywhere.

I have several identical CentOS 7 servers (GCE instances). I recently ran `yum update` and rebooted all of them. All the servers came back fine except one. I opened a connection to the serial console of the broken server, and was greeted with this prompt:

…
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.

I pressed Enter, and the boot process continued successfully! Just to test, I restarted the server again, and the same thing happened: I had to manually log in to the console and press Enter before it would complete the boot process. No other step was required.

I also tried creating a snapshot of the disk, and booted a new VM with a boot disk imaged from the snapshot. The same problem occurred, pressing Enter was all that was required.

Earlier in the boot log, it shows “Started Emergency Shell”, which is why there was a “Press Enter” prompt. The error that the “root account is locked” isn’t the issue; the root account is locked on all our servers.

So, in summary: something is causing the server to enter an Emergency Shell, but continues successfully after pressing Enter. That’s odd because if Emergency Shell is loaded, usually something more serious is happening and requires more actions to get the server to boot.

Anybody have any idea what could be causing this?

I don’t see any significant errors in the boot log, but I would appreciate if anyone has a moment to help me look for issues. Here’s a copy of the serial console boot log – you can find the “Press Enter to continue” on line 536: https://write.as/dwuts24dcw6yh0kf.txt

Thanks!

Quinn

14 thoughts on - Server Entering Emergency Shell, But Continues Fine After Pressing Enter

Thomas Bendler says:

September 10, 2020 at 3:06 am

Hi Quinn,

Am Do., 10. Sept. 2020 um 04:49 Uhr schrieb Quinn Comendant < quinn@strangecode.com>:

If I’m not mistaken, problems after UTMP point to problems with X/ hardware configuration. So I guess you might find more information when you also have a look at the log files of systemd.

Kind regards Thomas
—
Linux … enjoy the ride!
Quinn Comendant says:

September 10, 2020 at 10:57 am

Hi Thomas,

I don’t see any hardware issues. Here’s the output from `journalctl -p 5 -xb`: https://write.as/2vjgz6pfmopg7fnf.txt The time of the last interruption during boot was at Sep 10 15:01:46.

Thanks, Quinn
Johnny Hughes says:

September 10, 2020 at 12:42 pm

I had similar issue on 7.6 – the LVM timeouts were too short and it was timing out as we had a lot of multipath devices. I don’t see any hardware issues. Here’s the output from `journalctl -p 5 -xb`: https://write.as/2vjgz6pfmopg7fnf.txt The time of the last interruption during boot was at Sep 10 15:01:46.

Thanks,

Quinn
Quinn Comendant says:

September 10, 2020 at 2:48 pm

Hi Strahil,

I don’t see any timeout errors in the boot log or output from journalctl -xb.

I’ve tried increasing the timeout by updating the Grub config with `mount.timeout=300s`, is that the correct way to increase LVM timeouts? It has no effect.

Quinn
Quinn Comendant says:

September 11, 2020 at 6:51 pm

Update: I found a workaround to prevent entering emergency shell during boot for no reason. I’ve simply cleared the `OnFailure=` option for initrd-parse-etc.service (which was previously set to `OnFailure=emergency.target`).

Now the server boots successfully without dropping into an emergency shell.

This is a total hack, and I’m a little embarrassed that it’s the only solution that I’ve found.

As I mentioned earlier, there are no errors printed in the boot or systemd logs, so I don’t know what is actually failing. Well, at least now I know that it is `initrd-parse-etc.service` that is failing, but I don’t know why. Does anyone know what initrd-parse-etc.service does? Or have suggestions how to troubleshoot that unit specifically?

Thanks, Quinn
Gordon Messmer says:

September 11, 2020 at 7:23 pm

Run “systemctl daemon-reload && echo success” and verify that it reports success, and not errors.

Check the output of “systemctl status initrd-cleanup” too.
Quinn Comendant says:

September 11, 2020 at 7:29 pm

Those have always reported success (even before I removed the OnFailure option):

[~] sudo systemctl daemon-reload && echo success success
[~] sudo systemctl status initrd-cleanup
● initrd-cleanup.service – Cleaning Up and Shutting Down Daemons
Loaded: loaded (/usr/lib/systemd/system/initrd-cleanup.service; static; vendor preset: disabled)
Active: inactive (dead)

Sep 11 23:34:01 durian systemd[1]: Starting Cleaning Up and Shutting Down Daemons… Sep 11 23:34:01 durian systemd[1]: Stopped Cleaning Up and Shutting Down Daemons.
Simon Matter says:

September 12, 2020 at 10:08 am

Hi,

I’m wondering what the proper solution is in this case. One thing I
learned in the past and can also be learned from the list archives is that a lot of issues exist with systemd but almost never one really finds a good solution to fix the problem.

In most cases ugly hacks and workarounds are used but no real fix is available. IMHO it’s in no way better than old days SysVinit hacking :-)

Regards, Simon
Gordon Messmer says:

September 12, 2020 at 1:41 pm

In that case, I’d revert the change you made, unlock the root account so that you can use the emergency shell, let the system boot to an emergency shell, and collect the output of “systemctl status initrd-parse-etc.service” and “journalctl -b 0”.

(You can still do that in the VM, right?)
Quinn Comendant says:

September 12, 2020 at 3:05 pm

Ok, I was able to log in as root in the emergency shell.

I don’t see any errors from `systemctl status initrd-parse-etc.service` or
`journalctl -b 0` (I’ve pasted the full output here: https://write.as/at21opjv3o9fin1t.txt)

However, if I run `systemctl list-units –failed` it says initrd-switch-root.service is failed. Status of that service:

[root@myhost ~] systemctl status initrd-switch-root.service
● initrd-switch-root.service – Switch Root
Loaded: loaded (/usr/lib/systemd/system/initrd-switch-root.service; static; vendor preset: disabled)
Active: failed (Result: signal) since Sat 2020-09-12 19:41:13 UTC; 17min ago
Process: 204 ExecStart=/usr/bin/systemctl –no-block –force switch-root /sysroot (code=killed, signal=TERM)
Main PID: 204 (code=killed, signal=TERM)
Sep 12 19:41:13 durian systemd[1]: Starting Switch Root…

But the logs don’t show any errors:

[root@myhost ~] journalctl -u initrd-switch-root.service
— Logs begin at Sat 2020-09-12 19:41:07 UTC, end at Sat 2020-09-12 19:41:19 UTC
Sep 12 19:41:13 durian systemd[1]: Starting Switch Root…

That’s it – just “Starting Switch Root” and nothing more…
Gordon Messmer says:

September 12, 2020 at 3:41 pm

I see errors in the journalctl output. Look into these:

Sep 12 19:41:12 myhost systemd-vconsole-setup[84]: /usr/bin/setfont failed with error code 71. Sep 12 19:41:12 myhost systemd-vconsole-setup[84]: ………………. Sep 12 19:41:12 myhost systemd-vconsole-setup[84]: setfont: putfont: 512,8×16: failed: -1
Sep 12 19:41:12 myhost systemd-vconsole-setup[84]: putfont: PIO_FONT: Invalid argument Sep 12 19:41:12 myhost systemd-vconsole-setup[117]: /usr/bin/setfont failed with error code 71.

Sep 12 19:41:13 myhost systemd[1]: [/etc/systemd/system.conf:15] Unknown lvalue ‘TimeoutSec’ in section ‘Manager’

The former errors might be normal on a system with no VGA, and you mentioned seeing this over serial. Do you see those errors on systems that boot normally? Do any of those system share a hardware configuration with the system that doesn’t?

The latter error seems more likely to be the cause of the emergency shell, but that’s a guess.
Quinn Comendant says:

September 12, 2020 at 4:56 pm

The `setfont` errors also occur on other VMs that do boot successfully. Their hardware configuration only differs in the amount of disk space and RAM assigned.

Oh, that was a change I made just during the last reboot in an attempt to speed up the reboot process (server gets stuck on “A stop job is running for LSB…” for 5min after issuing a reboot command). It’s not the cause.

Thanks, Quinn
Gordon Messmer says:

September 12, 2020 at 5:34 pm

Ah, this seems to be a known issue with the update to 7.8:
https://access.redhat.com/solutions/4973191

The initrd contains an old systemd binary, and should be rebuilt. The replacement initrd will have an updated binary:

|# cp /boot/initramfs-*.x86_64.img /root ||# dracut –force –regenerate-all|
||
Quinn Comendant says:

September 12, 2020 at 6:05 pm

Yessssss! That’s it!

The Diagnostic Steps confirmed that issue affected this server, and applying the resolution fixed it.