Boot Failed On Latest CentOS 7 Update

Boot Failed On Latest CentOS 7 Update

Home » CentOS » Boot Failed On Latest CentOS 7 Update

August 1, 2020 Kay Schenk CentOS 41 Comments

Totally and completely on my HP microfiber. Wouldn’t get past anything to even get me into the grub menu. NOT AMUSED!

_______________________
Kay Schenk

41 thoughts on - Boot Failed On Latest CentOS 7 Update

david says:

August 1, 2020 at 3:04 pm

At 12:50 PM 8/1/2020, Kay Schenk wrote:

I was going to confirm the same, but the system that became unbootable was my mail system :-(

Apparently, the SHIM/GRUB bug has hit both CentOS 7 and 8.

Too bad Ubuntu is enough different that makes it hard to switch.

David
Kay Schenk says:

August 1, 2020 at 4:42 pm

Well misery loves company but still…just truly unfathomable!
Time for a change.

_______________________
Sent from MzK’s phone.
Johnny Hughes says:

August 1, 2020 at 4:52 pm

Am 01.08.20 um 23:41 schrieb Kay Schenk:

I can only express my incomprehension for such statements!

Stay and help. Instead running away or should I say out of the frying pan and into the fire? :-)
Johnny Hughes says:

August 1, 2020 at 5:11 pm

The thing, RHEL and CentOS not properly testing updates, cost me at minimum 3-4 full working days, plus losses at customer sites.

This is really a huge failure of RHEL and CentOS.

A lot of trust has been destroyed.
Mike McCarthy says:

August 1, 2020 at 5:42 pm

It appears that it is affecting multiple distributions including Debian and Ubuntu so it looks like the grub2 team messed up. See

https://www.zdnet.com/article/boothole-fixes-causing-boot-problems-across-multiple-linux-distros/

Mike
Laack, Andrea says:

August 1, 2020 at 5:57 pm

” UEFI-related updates have had a history of making devices unusable, and vendors will need to be very cautious.”

https://eclypsium.com/2020/07/29/theres-a-hole-in-the-boot/

The fix for this vulnerability is complex and the fix will have different results on different machines. The volunteers that support CentOS do the very best they can to test patches, but they can’t possibly test for everything.
If people have problems with the way patches are tested, maybe they should step up to the plate and become part of the solution.

We should be offering our thanks to those who donate their time and energy to supporting the CentOS project.

Andrea

—–Original Message—
Kenneth Porter says:

August 1, 2020 at 6:03 pm

Another ZDNet story on the issue:

https://www.zdnet.com/article/red-hat-enterprise-linux-runs-into-boothole-patch-trouble/
paride desimone says:

August 1, 2020 at 6:14 pm

I use debian buster on my old notebook, an asus f3ja and I have not grub throuble. I try a virtual mschine with testing and unstable, and both boot regularly

Il dom 2 ago 2020, 00:42 Mike McCarthy, W1NR ha scritto:

CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS
Kay Schenk says:

August 1, 2020 at 6:21 pm

Mike —

Thanks for the clarification and more information.

_______________________
Sent from MzK’s phone.

CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS
Schatzi Olsen says:

August 1, 2020 at 7:02 pm

And for whatever reason, both my CentOS 7 and 8 survived this apparently with flying colors. Actually, this is out of character for how fate has been dealing the cards recently.

I’m very leary about rebooting either, now, though.
Kay Schenk says:

August 1, 2020 at 7:49 pm

My system was happy UNTIL I rebooted. Then….BZZT!!!
Kay Schenk says:

August 1, 2020 at 9:15 pm

Questions re this statement in the ZDNET article —

“In all cases, users reported that downgrading systems to a previous release to reverse the BootHole patches usually fixed their problems.”

A previous release of what? GRUB2

So that’s my first question.

Second. I’m assuming the the muti-screen UEFI settings I see are standard for more recent BIOS — not sure of version. Do we have any guidance for that?

If it is the case that a downgrade to previous grub2 can fix the problem —
and not latest kernel? Does this matter? — maybe booting from your chosen
“rescue” option AND reinstalling older grub (somehow) can get us further along.

_______________________
Sent from MzK’s phone.

CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS
Alessandro Baggi says:

August 2, 2020 at 2:47 am

Il 02/08/20 00:42, Mike McCarthy, W1NR ha scritto:

Hi Mike,

I’m not interested that the issue is present on Debian, Ubuntu and the others. Currently I’m using CentOS, I’m a CentOS user and currently I’m interested what is happening on CentOS because I have machines that runs CentOS. If the “wrong” patch was not pushed as update so fast (maybe waiting more time before release with more testing to get all cases [yes because when you update grub and depending on the fix you can break a system easily]) there would have been no problem, by the way I prefer wait some days (consider that I can accept the release delay of minor/major release) then break my systems…and without messages on ML
announces about this type of problem does not help. Sorry I can’t know what and when a packages is updated, why it is updated, what type of problem (CVE) it suffers and do my reasoning for an update process. This is a missing for me but I still use CentOS and I should not need a RHEL
account to access to get advisories and see what applies on CentOS
(6,7,8 and Stream).

Many of us, choose CentOS due to its stability and enteprise-ready feature (and because is partially/enterely backed by RH). Due to actual problem, many server and workstation died and it’s normal that some user said “A lot of trust has been destroyed.” because they placed a lot of trust on the pro-redhat support. On the other side, all of us can fall in error and this is the case (like me that I updated blindy, so its also my fault not only the broken update).

Only one error in many years could not destroy a distro and its stability reputation (I think and correct me if I’m wrong) and I hope it won’t happen again.
Alessandro Baggi says:

August 2, 2020 at 2:50 am

Hi Paride,

I also have a debian 10 on a workstation and some VMs for test purpose. Probably you updated after the grub-regression update but I noticed several stories about debian breakage.

Il 02/08/20 01:13, paride desimone ha scritto:
Patrick Bégou says:

August 2, 2020 at 3:32 am

Le 02/08/2020 à 09:47, Alessandro Baggi a écrit :

I am an “old unix admin” and I remember that many years ago (let say
1990), on my unix systems, I was creating a backup before each update. All updates were not successful….

Today, we run “yum update” blindly, sometime daily, as it is “always”
running fine, have rollback commands…. even on critical servers. But not sure that “always” exist really.

Patrick
Johnny Hughes says:

August 2, 2020 at 7:42 am

Well, I am sorry to tell you, but it most likely WILL happen again at some point.

CentOS Linux is a rebuild of RHEL source code, nothing more. We TRY to validate all fixes, but if something is broken in the source code, it will likely be borken in CentOS Linux as well. If you need software with Service Level Agreement type stability and timing .. that is absolutely NOT CentOS.

We have 3 people who build updates as fast as we can and this sofware is free and unvalidated. We don’t do any security testing .. we build and release RHEL source code. RHEL is what you want if you want software assurance or the fastest release cycle or an SLA grade software release.

I have released the new shim update for el7 that should fix this issue.
Gregory P. says:

August 2, 2020 at 9:15 am

Well, I am sorry to tell you, but it most likely WILL happen again at some point.

CentOS Linux is a rebuild of RHEL source code, nothing more. We TRY to validate all fixes, but if something is broken in the source code, it will likely be borken in CentOS Linux as well. If you need software with Service Level Agreement type stability and timing .. that is absolutely NOT CentOS.

We have 3 people who build updates as fast as we can and this sofware is free and unvalidated. We don’t do any security testing .. we build and release RHEL source code. RHEL is what you want if you want software assurance or the fastest release cycle or an SLA grade software release.

I have released the new shim update for el7 that should fix this issue.
————————————————————————-

Johnny,

Is the latest update :
shim-x64 x86_64 15-7.el7_9

Greg
Johnny Hughes says:

August 2, 2020 at 9:17 am

No. 15-8 is.
Valeri Galtsev says:

August 2, 2020 at 10:26 am

Gregory, Johnny, and everybody on CentOS team: Thanks a lot for great job you are doing!

And yes, we, humble users, do realize what you just said, Gregory. We know about “no guarantee” clause, and go with RedHat’s reputation which through the great job you are doing translates into CentOS reliability level. My reading of many comments on this issue is, basically, the RedHat just lost a notch in the reputation level. Hopefully, it is not new lower level now, which hopefully again will be confirmed over long trouble-free period in a future.

On the side note: it is Microsoft that signs one of Linux packages now. We seem to have made one more step away from “our” computers being _our computers_. Am I wrong?

Valeri
Alessandro Baggi says:

August 2, 2020 at 11:08 am

Hi Johnny, thank you for your answer. I always accepted release cycle of CentOS
without any problem (maybe with EL8 but it is ok).

I don’t need SLA and I don’t blame anyone for this, errors can occour. For example in this story, I applied blindly updates without check what and how so really I ran the command that brake my installation…and as I said no problem for this.

You said:” We TRY to validate all fixes, but if something is broken in the source code, it will likely be borken in CentOS Linux as well”. This means that if a rhel package break something, the CentOS team releases it with the bug anyway also if the bug is already known? The update cannot be delayed until the correct version is released if the package bug is already known? Is it not possible by policy or other? Validate is equal to “test if nothing get breakage”?

I repeat, I don’t need SLA or QA or faster update/release. What sound strange to me is that the testing procedure for a released update to see if all works has failed and has not revealed the problem when the bug showed up right away to me on a workstation and on a fresh install with minimal and on a personal “server” with mdadm raid devices? In all my cases, when updating shim I got several and clear messages of failed update without the need to perform a reboot and see that grub was broken.

I have another question. I know that the gear that provides notification for el8 updates does not work due to koji. How is the current status for notification of updates for CentOS 8? We can see update notifications soon re-enabled?

Thank you for your time. I appreciate your work.

Il Dom 2 Ago 2020, 14:42 Johnny Hughes ha scritto:

CentOS mailing list CentOS@CentOS.org https://lists.CentOS.org/mailman/listinfo/CentOS
Stephen John says:

August 2, 2020 at 11:55 am

For CentOS-4, CentOS-5 and CentOS-6, the motto was “Bug for bug compatible with RHEL.” If things failed for RHEL, they would fail in the same way for CentOS as much as possible. Many users of CentOS were exceedingly proud of it and expected it to be the case for when they needed justifications and such. The problem is that no one likes it when a major problem comes out from RHEL. THis happens probably once every 3 to 5 years and then everyone starts wanting to know why CentOS
doesn’t ship things when people find something wrong. People usually get motivated and start testing things more.. but after about 6 months of no other problems.. can justify that bugs aren’t common so why do it.

On a side note, you keep emphasizing you aren’t expecting an SLA.. but all your questions are what someone asks to have in a defined SLA. I
have done the same thing in the past when things have gone badly, but couching it in ‘I am not asking’ just makes the people being asked grumpy. Better to be open and say ‘Look I would like to know what my expectations should be for CentOS’ and be done with it.
Alessandro Baggi says:

August 2, 2020 at 12:09 pm

Il 02/08/20 18:54, Stephen John Smoogen ha scritto:

Sorry, but you are wrong about this.

If I want SLA and QA I will use RHEL.

Now permit me to say one thing: the update on my machines, failed in a so bad way that my first thought was “WTF? they tested this fix?” and I’m not the only one that
Valeri Galtsev says:

August 2, 2020 at 12:23 pm

And this complaint has to fall onto RedHat. Slightly underestimating the job CentOS team is doing, one could say: CentOS in just a binary replica of RedHat Enterprise.

And again, we use distributions for what benefit they give us, and any trouble we may encounter, we have just ourselves to blame for the choice we had made. And I here am not restricting the choices we could have made to variety of Linux flavors, but include in general anything one could use: a bunch of BSD descendants, MacOS (which server administration wise I excluded from chain BSD –> Darwin –> MacOS 10, or rather ignore that to be a chain), MS Windows (no, I am not asking for shots at me, I for one use FreeBSD for servers, not MS Windows), etc.

I use CentOS on workstation (except for my own_ and numbercrunchers. And once again, thanks a lot to the whole CentOS team for the great job, you, guys are doing!

Just my abstract view of this.

Valeri
Alessandro Baggi says:

August 2, 2020 at 12:34 pm

Il 02/08/20 19:09, Alessandro Baggi ha scritto:

I press send wrongly.

Sorry, but you are wrong about this.

If I want SLA and QA I will use RHEL.

Now permit me to say one thing: the update on my machines, failed in a so BAD way that my first thought was “WTF? they tested this fix?” and I’m not the only one that who thought about this. I expect that a package is tested to not break a machine/service like for other distro like debian, opensuse, ubuntu and this is DIFFERENT than expect a defined SLA or QA level. How I can expect SLA from CentOS for personal usage and free?
Alessandro Baggi says:

August 2, 2020 at 12:36 pm

Il 02/08/20 19:22, Valeri Galtsev ha scritto:
Hi Valeri,

Yes you are right.

the previous message was sent incomplete.

I appreciate very much the great job done by the CentOS team with so low resource.
Stephen John says:

August 2, 2020 at 12:57 pm

My apologies. I should have asked for clarification.
Pete Biggs says:

August 2, 2020 at 1:44 pm

Secure booting using UEFI requires that the code is signed – that is the “secure” bit. Microsoft are the CA for that signing. There’s nothing sinister about it, they aren’t signing the RPM package just one of the bits of code in the package. I seem to remember that Microsoft were the most vocal advocates for secure booting to get around boot sector viruses and in order to facilitate a more universal uptake they committed to signing any UEFI boot code from other OSes so long as it came from a bona fide source.

You don’t have to use UEFI secure booting – most machines can fall back to legacy booting using BIOS settings. If you do that, you won’t use any Microsoft signed code.

I haven’t looked in detail at the bug this all was supposed to fix, but I think it had the capability of by-passing the UEFI security checking, hence why the release of the advisory was delayed until the OSes were patched and why there was a scramble to get everything out in time. It’s a nasty bug and was difficult to fix from what I’ve heard.

P.
Phil Perry says:

August 2, 2020 at 1:45 pm

Microsoft are the Certificate Authority for SecureBoot and most SB-enabled hardware (most x86 hardware) comes with a copy of the Microsoft key preinstalled allowing binaries that are signed by Microsoft to work. In the case of linux, that is the shim which becomes the root of trust to load everything else. If you are not happy with that you can always become your own certificate authority by generating your own keys, install your signing keys in the hardware’s firmware (MOK
list) and sign stuff yourself to use on your own machine(s).

However if you wish to distribute stuff to others and have it work seamlessly on hardware outside of your direct control and without the need for every user to import your CA SecureBoot signing key into the MOK list on every device, you would rely on Microsoft to sign SB related content.
John Pierce says:

August 2, 2020 at 1:55 pm

now, does Microsoft have to sign each released module themselves, or will they issue a CA cert to an authorized OS creator, like RH, then let RH
sign their own modules?

EG, Microsoft RootCA -> Signed Package vs, Microsoft RootCA -> RH Child CA -> Signed Package ….

—
-john r pierce
recycling used bits in santa cruz
Phil Perry says:

August 2, 2020 at 3:01 pm

I believe Microsoft signs the shim which then becomes the trusted authority and embeds RH (or CentOS) signing cert, so (I believe) every release of the shim needs to be signed by Microsoft. So it’s not quite as efficient as MS signing a RH/CentOS CA key, but is not far off.
John Pierce says:

August 2, 2020 at 3:19 pm

One of the things that bugs me about PKI trust chains like this, what happens if the unthinkable happens, and Microsoft’s RootCA gets compromised and has to be revoked… does that mean every single piece of UEFI
hardware out there needs a BIOS upgrade? and don’t UEFI bios updates have to be signed too?
Gordon Messmer says:

August 2, 2020 at 5:54 pm

Yes. They’ll be vulnerable to malware signed by the old CA until they’re updated.

That’s better than systems without a PKI trust chain, which are vulnerable all of the time.
John Pierce says:

August 2, 2020 at 6:12 pm

isn’t it more that they simply won’t work with newer boots that were signed by the new keys? and the updated BIOS’s won’t boot older OS versions that weren’t signed by the new keys?

BIOS updates are often not available for sligthly older hardware, once it goes out of production most vendors lose all interest.
Jonathan Billings says:

August 2, 2020 at 6:32 pm

Back in 2017, Intel said that it was going to deprecate the “Legacy” CSM by 2020. They might have changed their schedule but I suspect we’ll start seeing hardware without anything but UEFI.

—
Jonathan Billings
Johnny Hughes says:

August 2, 2020 at 6:42 pm

Am 02.08.20 um 04:15 schrieb Kay Schenk:

There are some canonical packages – the combination/grouping is important.

https://lists.CentOS.org/pipermail/CentOS-announce/2020-July/035779.html

Furthermore the corresponding shim and kernel rpms should be installed together (or the kernel should be reinstalled)

https://lists.CentOS.org/pipermail/CentOS/2020-August/351180.html
Gordon Messmer says:

August 2, 2020 at 7:51 pm

I don’t know if the Secure Boot PKI has a publicly documented contingency plan for a compromised CA, but my understanding is that there are multiple slots for signatures:

http://dreamhack.it/linux/2015/12/03/secure-boot-signed-modules-and-signed-elf-binaries.html

So, I would guess that clients would receive a new trust DB that did not contain the old root CA, and new bootloaders signed by both the old root CA and the new CA. The new bootloaders would work on both new and old systems, having signatures from both. Old bootloaders would not work on new clients.
Chris Adams says:

August 2, 2020 at 9:14 pm

Once upon a time, Jonathan Billings said:

I believe that is still Intel’s plan.

However, as happens often, people are confusing UEFI and Secure Boot. UEFI is a replacement for the ages-old BIOS – Secure Boot is an extension to UEFI to create a “trusted” (for whatever that may mean)
boot chain to get to the OS. You can have UEFI without having Secure Boot enabled (that’s what I do on my systems).
—
Chris Adams
John Pierce says:

August 2, 2020 at 9:57 pm

Legacy BIOS has its own set of issues, like no GPT support, MBR disks are max 2TB.

—
-john r pierce
recycling used bits in santa cruz
Jyrki Tikka says:

August 3, 2020 at 4:21 am

I’m booting just fine on an old BIOS system from a pair (mdraid 1) of 3
TB GPT disks. The MBR compatibility on GPT disks allow the old machine to boot from a GPT disk and load GRUB. Then GRUB takes over and loads the kernel and the kernel has no problems understanding GPT disks.

The boot disks must have an EFI boot partition even though it’s not used in this case.
Gordon Messmer says:

August 3, 2020 at 10:43 am

IIRC, they need a partition at the beginning of the drive to reserve space for GRUB2. That should be a “BIOS boot partition” not an “EFI
System partition” for GRUB2. It’s not used for a filesystem, but it is used to store GRUB2’s second stage image.
Jyrki Tikka says:

August 3, 2020 at 12:10 pm

Yes, you are absolutely right. I’m away from home right now and I don’t have access to my home systems. My memory failed me which is not unusual.

<(*) Jyrki