7.2 Kernel Panic On Boot

Home » General » 7.2 Kernel Panic On Boot
General 95 Comments

Hi All

After upgrading to 7.2, I’m getting an immediate kernel panic on boot

Dropping back to 3.10.0-229.20.1.el7.x86_64 and the system boots fine

How can I go about diagnosing the problem here?

thanks

Duncan

95 thoughts on - 7.2 Kernel Panic On Boot

  • initramfs is missing… check if /boot/initramfs-{kernelversion}.img is correctly there, if not do a “yum reinstall kernel-{version}” and it should be ok !

  • No joy unfortunately, the correct initramfs is there

    I tried reinstalling just in case, but no change

    thanks for the reply

    Duncan

  • I wanted to help you by making sure that you were on the most recent version, but, looking at the CentOS.org website I was unable to figure out if 7.2 was the tip. 7.1503? Is that 7.2? Beats me.

    https://wiki.CentOS.org/Download appears to say that 1503 is the current version.

    I *thought* this wacky CentOS version number would be more like
    7.1.1503? Did I miss something? Is there no easy mapping from RHEL to CentOS? Didn’t I bring this up when the wacky version numbers were suggested? Why am I sending this email?

  • Am 03.12.2015 um 11:08 schrieb Greg Lindahl :

    CentOS 7.1511 (aka ‘7.2’) not yet released …

  • And the way I’d figure this out from the CentOS website is?

    I mean, I’m used to the concept that CentOS used to say the current version is 6.3 when RHEL 6.4 was released but hadn’t made it through the CentOS pipeline.

    But how am I supposed to figure out that CentOS 7.1503 < 7.2 ? I suppose I should blame myself for not being a bigger ass that CentOS didn’t adopt my proposal of saying CentOS 7.1.1503 vs 7.2.1511. But really, does ANYONE think the current scheme is clear? Anyone? Bueller? Am I the only ass about this problem?

  • Hi Leon

    I’m running kmod-nvidia and kmod-forcedeth from elrepo

    The nvidia-kmod had an update to work with the new kernel, the forcedeth did not but as far as I can tell it didn’t need one. (also why on earth the forcedeth module has gone from the stock kernel in 7 I have no idea)

    The first thing I tried however was uninstalling both, but I’m still getting the same panic

    Is there any way of logging it so I can see exactly what the panic says?

    thanks all

    Duncan

  • If you look down the same wiki Download page, in the ‘Base Distribution section’ there is a CentOS release ver to RHEL release ver mapping, to indicate which version of the RHEL sources a specific CentOS build is derived from.

    7(1503) : RHEL 7.1
    7(1406) : RHEL 7.0

  • You might want to also check there is enough diskspace for the initrd to be built and hosted in the right place..

  • Am 03.12.2015 um 11:40 schrieb Duncan Brown :

    was the “init-ram-disk” regenerated (yum reinstall kernel-x.y.z)

  • The boot partition is half empty, and as far as I can tell it is there OK:

    [root@gobbla boot]# ls -lah total 252M
    dr-xr-xr-x. 4 root root 4.0K Dec 3 11:43 . drwxr-xr-x. 18 root root 4.0K Dec 3 11:23 ..
    -rw-r–r– 1 root root 121K Sep 15 16:14 config-3.10.0-229.14.1.el7.x86_64
    -rw-r–r– 1 root root 121K Nov 3 19:18 config-3.10.0-229.20.1.el7.x86_64
    -rw-r–r– 1 root root 124K Nov 19 22:20 config-3.10.0-327.el7.x86_64
    drwxr-xr-x. 2 root root 26 Oct 9 07:20 grub drwx——. 6 root root 104 Dec 3 11:20 grub2
    -rw-r–r–. 1 root root 39M May 21 2015
    initramfs-0-rescue-00095717eba54050b81e403ded5ee369.img
    -rw——- 1 root root 45M Dec 3 11:43
    initramfs-3.10.0-229.14.1.el7.x86_64.img
    -rw——- 1 root root 24M Sep 18 06:33
    initramfs-3.10.0-229.14.1.el7.x86_64kdump.img
    -rw——- 1 root root 35M Dec 3 11:19
    initramfs-3.10.0-229.20.1.el7.x86_64.img
    -rw——- 1 root root 17M Dec 2 13:38
    initramfs-3.10.0-229.20.1.el7.x86_64kdump.img
    -rw——- 1 root root 8.8M Dec 3 11:43
    initramfs-3.10.0-229.20.1.el7.x86_64.tmp
    -rw——- 1 root root 29M Dec 3 11:17
    initramfs-3.10.0-327.el7.x86_64.img
    -rw-r–r–. 1 root root 20M Dec 2 13:40 initrd-plymouth.img
    -rw-r–r– 1 root root 235K Sep 15 16:16
    symvers-3.10.0-229.14.1.el7.x86_64.gz
    -rw-r–r– 1 root root 235K Nov 3 19:21
    symvers-3.10.0-229.20.1.el7.x86_64.gz
    -rw-r–r– 1 root root 247K Nov 19 22:22 symvers-3.10.0-327.el7.x86_64.gz
    -rw——- 1 root root 2.8M Sep 15 16:14
    System.map-3.10.0-229.14.1.el7.x86_64
    -rw——- 1 root root 2.8M Nov 3 19:18
    System.map-3.10.0-229.20.1.el7.x86_64
    -rw——- 1 root root 2.9M Nov 19 22:20 System.map-3.10.0-327.el7.x86_64
    -rwxr-xr-x. 1 root root 4.7M May 21 2015
    vmlinuz-0-rescue-00095717eba54050b81e403ded5ee369
    -rwxr-xr-x 1 root root 4.8M Sep 15 16:14
    vmlinuz-3.10.0-229.14.1.el7.x86_64
    -rw-r–r– 1 root root 171 Sep 15 16:14
    .vmlinuz-3.10.0-229.14.1.el7.x86_64.hmac
    -rwxr-xr-x 1 root root 4.8M Nov 3 19:18
    vmlinuz-3.10.0-229.20.1.el7.x86_64
    -rw-r–r– 1 root root 171 Nov 3 19:18
    .vmlinuz-3.10.0-229.20.1.el7.x86_64.hmac
    -rwxr-xr-x 1 root root 5.0M Nov 19 22:20 vmlinuz-3.10.0-327.el7.x86_64
    -rw-r–r– 1 root root 166 Nov 19 22:20
    .vmlinuz-3.10.0-327.el7.x86_64.hmac

    thanks for the reply

    Duncan

  • You are not the only ass about the problem.

    I have complained bitterly about this, apparently to deaf ears.

    I dislike this version numbering scheme hugely. The implications to CentOS
    not being the same “version” as RHEL is *much* more than just a different number to those who don’t know differently. And those are the people who make this difference a huge amount of extra work for us.

    There’s NO reason for this that makes any sense. None.

  • It’d probably help if you could give us more details on the kernel panic.

    Can you see where it is panicking? Does it happen during the kernel/initrd stage or later during boot?

    I suggest installing the kdump service if it is panicking later in boot, you might be able to capture a kernel dump which makes debugging these things a lot easier. Otherwise, I suggest trying to capture the panic message some other way.

  • well, 1503 == YYMM == March 2015, 7.2 did not exist at that time. Maybe not fully explicit, but that timestamp does provide a nice hint.

  • The last message before it is “switching to clocksource hpet”

    Then the panic scrolls by

    I’ve no idea if that counts as later or not

  • It’s unlikely to be a panic related to your hardware clock (HPET High Precision Event Timer), so it’s probably when the kernel is touching something else on your system.

    The content of the panic is really the only thing that can help.

  • Maybe an issue with X. Look into /var/log/Xorg.0.log for hints. Or, does it boot fine in single user mode?

    Akemi

  • That numbering concept (for 7.0 at least) makes sense:

    “CentOS 7.0-1406 introduces a new numbering scheme that we want to further develop into the life of CentOS-7. The 0 component maps to the upstream realease, whose code this release is built from. The 1406
    component indicates the monthstamp of the code included in the release (
    in this case, June 2014 ). By using a monthstamp we are able to respin and reissue updated media for things like container and cloud images, that are regularly refreshed, while still retaining a connection to the base distro version.”

    Those who care about the upstream version knew that this was derived from RHEL 7.0. Those who don’t care about upstream versions but want to track monthly rebuilds of cloud images, etc., could distinguish between
    “1406” and (for example) “1407”. But somewhere along the line for 7.1, the “component that maps to the upstream release” was dropped, and we got just 7 (1503). I don’t recall seeing where or how that decision was made; is there a link someone can provide to the relevant discussion in CentOS-devel?

    -Greg

  • That’s what I figured, but how do I go about getting a copy of it?

    Most of it has scrolled by when it’s finished

  • If it’s a server or workstation with a serial console, I suggest connecting it to another computer, set up a serial console, set up the other system to capture the console to a file or something you can examine later.

    In the el3 and el4 days, I used to use something called ‘netconsole’
    to dump the panic over the wire to a syslog server listening on the same network. It doesn’t look like that’s available in el7 though. I’m not sure how one captures panics for workstations other than a serial console, if it is too early for kdump to catch the panic.

  • Am 03.12.2015 um 15:06 schrieb Duncan Brown :

    start for example with a photo (or video and grab the frame where the panic occurs) and – disable grub options like rhgb or quit …

  • aka 7.2, huh? auka 7.2 would be more appropriate IMHO (by auka meaning Also UnKnown As). Seriously, the scheme is awfully obscure. Our proficiemcy becomes aking the one of MS Windows admins: you just need to learn new names or locations of yet the same tools.

    Sorry, I forgot to pus sarcasm tags…

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Valeri Galtsev wrote:
    Agreed. I don’t want “hints”, and I’m not doing fedora or ubuntu, because I don’t want the LATESTGREATESTBLEEDINGEDGETIP, I want *stability*, and, since we’re supposed to be *enterprise* grade, I want stuff that’s simple enough for a poor ol’ sysadmin, who might have to explain to a manager what we’re on….

    mark

  • Leon Fauster wrote:

    In 6, I’d say look at the startup scripts, since it was serialized. With systemd, booting in parallel, I’m not sure. Anyone know if there’s a way to *force* systemd to serialize for debugging?

    mark

  • I agree if by “latest” you mean either of latest 5, latest 6, latest 7. Still, it would be good to realize what CentOS latest 7 resembles to on the side of upstream, meaning RHEL 7.x – which “x”? Hypothetically, I know that binary only distributed “something” works on RHEL 7.2. Will it be reasonable to assume it will work on my “binary compatible” CentOS? If it is CentOS 7.2, I wouldn’t have trouble concluding it will work. If it is
    7.1234567… I’m lost (my number comes from sarcasm, I learned the year/month origin already ;-) Sorry about trivial argument which I bet was repeated by many already. On the other hand why I’m arguing about CentOS 7
    which I downgraded to workstation use only and servers are being migrated to FreeBSD (sorry about mentioning – this time will be really the last one) as CentOS 5 or CentOS 6 are phased out.

    Just my $0.02

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Mine booted successfully after update to 7.2 CR. I am also using a Nvidia kmod from elrepo, no forcedeth though. kernel-3.10.0-327.el7.x86_64
    kmod-nvidia-340xx-340.96-1.el7.elrepo.x86_64

    Did you rebuild initrd after removing the kmod packages?

    Fred Wittekind

  • Leon Fauster wrote:

    Sorry, you seem to not have dealt with enough managers who only really know Windows, or other divisions (esp. ones that are 95% Windows) who require documentation, etc….

    I can live with the x.y.yymm, but not showing the relation to upstream is annoying.

    mark

  • So, both of those are just the end of the kernel call trace, unfortunately, it doesn’t show what function actually caused the panic. That’d be above that text, something you could get from an attached console.

    The first one looks like its in the middle of allocating an inode for a file, the second in looking up some dentry during init (just guessing though), although both seem to be in an allocation event at the end of an interrupt (EOI), but that’s probably just coincidence.

  • Duncan Brown wrote:

    ^^ should be are.*

    I’m just guessing here, but it looks to me as though it’s looking at inodes – so filesystem, and kernel modules, maybe video – notice the blacklist.

    Wonder if this is a grub2 issue, and it’s not finding the filesystem. This isn’t, by chance, a secure boot, not BIOS, system?

    mark

  • No nothing that exciting, BIOS, and xfs on lvm2. Pretty much the standard options anaconda gives you

    And it boots fine in 3.10.0-229.20.1.el7.x86_64

  • I had a feeling that would be the case

    It’s installed on an SSD so it goes past instantly

    I’ve taken a video and can just about make out “Kernel panic – not syncing: Watchdog detected hard LOCKUP on cpu”

  • Note that I was asking about the release numbering, not the release itself. And while you’re suggesting where I could find out more or take part in the discussion, Leon, keep in mind that I’ve been using CentOS since it was first released, I am on the -dev mailing list, and I was a part of the discussion of this new numbering scheme when it was first mooted – my recommendation was that if you did it at all, you should use names like 7.2.1511. And I recall that the decision was to use release names like 7.2.1511.

    If we can get the version numbering scheme right here:

    [lindahl@rd ~]$ more /etc/CentOS-release CentOS Linux release 7.1.1503 (Core)

    {note the .1. in the name}

    Why can’t we get it right on the website, and the mailing list? Why should I have to look at the bottom of a webpage to figure out the mapping, when we could all say 7.2.1511? What is bad about being clear?

    — greg

  • That is my main complaint about parallelized boot. My brain is only capable to deal with serial sequence of events, and which next event is deterministically predictable from previous. As with fatal things like kernel panic, it is the previous before the fatalstep is the one that you still can see…

    It there some way to tell systemd kick in components serially?

    Severs aside (you can not have everything), this (CentOS 7) is a great system for laptops, the best I saw so far. Like machintosh. Only better.

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Duncan Brown wrote:
    touching something else on your system. panic occurs) and – disable grub options like rhgb or quit … inodes – so filesystem, and kernel modules, maybe video – notice the blacklist. This isn’t, by chance, a secure boot, not BIOS, system?
    standard options anaconda gives you

    A thought: did you say you’d rebuilt the ramfs, making sure both xfs and lvm drivers were included?

    mark

  • Valeri Galtsev wrote:

    For laptops, great. For anything else, not so much. For example, it’s supposed to be an *ENTERPRISE* o/s… why does it automatically, without ever asking, install anything wifi? I’m still trying to figure out how to tell a *wired* CentOS 7 workstation to stop even thinking about wifi or wimax, and stop cluttering the logs with debugging garbage.

    mark

  • Usually when you reinstall kernel on running system it builds into ramdisk all kernel modules that are loaded at the moment (their equivalents in new kernel). Am I missing something?

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • As far as I know yes, I’m just doing a standard dracut rebuild. There’s not reason they wouldn’t

    I did also try a yum –reinstall on the kernel with no luck either

  • Am 03.12.2015 um 19:35 schrieb Greg Lindahl :

    Just to be clear; I’m also motivated like you to understand why this was voted by the CentOS Board. I am just responding in a dialectic way to get more insights.

    Following implies that the context of argumentation is: “CentOS Project”.

    So, what should be clear here – the minor version – but is it relevant?
    Relevancy means to be able to make a distinction between other minor
    versions. For example: in the virtual case of 7.1.1512 vs. 7.2.1511 it
    would be essential to use a minor number as infix and that is exactly the
    point that was discussed on “CentOS-devel” -> there are no other “branches”
    of CentOS 7 – only the current one. That makes a minor number obsolete.

    For a broader context:

    To answer the questions about the coherence to upstream:

    The point in time of the question leads directly to the answer e.g.
    1. Whats the minor version number (y)? [asked today (2015-12-03)]
    2. Current RHEL is 7.2 released 2015-11-19
    3. Current CentOS is 7 (1503) implies 2015-03
    4. Minor numbers are in the set of natural numbers
    5. 2015-03 < 2015-11-19 => 7.y < 7.2 => 7.1

    The most workload on this 5 steps was at step 2 (search for the availability date).

    My very personal conclusion: Upstream should use a timestamp :-) and continue to using minor version numbers because of the AUS, ELS and EUS branches. CentOS does not need minor version numbers.

  • Am 03.12.2015 um 17:01 schrieb m.roth@5-cent.us:

    I normally dealt with managers that coordinates the team members and the objectives. Version numbers are handled by the team members and not by the managers – but thats company specific …

  • CentOS should do whatever RHEL/Upstream does.

    Period.

    Why the change now? It really does matter, a lot, to those of us who need to do compliance testing/security checks, etc. all based on “version”
    number. I know there is no such thing in practice because of all the non-sequential updates that happen, but there’s a shit-ton of work that we have to do for each new release, and we have depended in the past on the versions matching the RHEL ones. Now, they don’t, and that’s wrong.

    It seems like a minor thing, but in real-world practice it is most definitely not.

  • I would respectfully disagree here, in that my opinion is that relying on any distribution minor version number in the first place is what is wrong. (And I think that regardless of which distribution we’re talking about….)

    I honestly wish Red Hat would have stuck to the ‘XupdateY’ format that they started with, as that is more correct. The update rollup number is not a minor version number in the strict sense of the word, at least IMO.

    Heh, I am waiting to see if the differences between RHEL 7.2 and RHEL
    7.3 will be as large as the differences were between RHL 7.2 and RHL 7.3
    back in the day……

  • So tell the Win-centric managers that this is ‘CentOS 7 Service Pack 2’
    or ‘CentOS 7 Update Rollup for 11/2015’ and they will understand what you mean (and it is an accurate comparison, and was what upstream did once upon a time by calling it ‘Version X update Y.’ Heh, the latest Windows 10 build is actually referred to as the ‘1511’ version.

    What’s really annoying is the thought that we’re going to have the same gripes on the mailing list every six to seven months or so.

  • I’ve heard there are three types of managers (management):

    1. Cooperative. When the manager is same capable at doing actual work as team members and acts as one of them. This style is claimed to be most efficient.

    2. Not interfering. Means sets general goals, doesn’t intrude into actual work and technical decisions. Doesn’t insist on unrealistic deadlines. Less efficient (pretty good for capable team and can yield the best in a long run results)

    3. Authoritative. Doesn’t need explanation. Least efficient.

    ;-)

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Am 03.12.2015 um 22:28 schrieb Alice Wonder :

    This does not apply to distributions (compared to packaged software components). There exits no RHEL 7.1.5 or similar …

  • Am 03.12.2015 um 22:24 schrieb “Phelps, Matthew” :

    I sometimes misguide myself in doing; CentOS = RHEL, but the truth is, that CentOS is not exactly the same as RHEL!

    As stated by others – this provisioning concept never was supported by CentOS.

  • Well, in the past when talking to professors I work for, I had to argue the choice of system I set up on their group servers/number crunchers. They often prefer what they know works best for their collaborators, which often is RHEL, or “Scientific Linux”. I don’t want hassle to maintain RHEL, and, let’s say, I have my opinion about “Scientific Linux”. When I
    want them to agree to go with CentOS I was usually saying “it is binary compatible with RHEL”. Which though not strictly true, but is very close to be true, and in one phrase settles the argument. But with all new different schemes of naming, I avoid saying it now. So, I perfectly understand the frustration of those who have to explain their managers version relations etc.

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • “never was supported”? But it has been used by software vendors! My startup PathScale supported CentOS for both our MPI interconnect libraries and our compilers. Our scripts opened up
    /etc/{CentOS,redhat}-release and parsed what was inside. And that worked great since 2004, and it would work today with minor hassle, because the CentOS version string in there is 7.1.1503… quite similar to 7.1.

    I feel sorry for my former colleagues, now at Intel, having to pollute their docs and code for an unnecessarily confused scheme, getting to deal with customers who become confused when you use the new scheme and they’re still referring to it by the old one, or vice versa.

    — greg

  • Maybe that’s because of a bad decision that affects a lot of users in ways that were never imagined. The reason I gripe about it every new RHEL
    release is because I want CentOS to change back.

    The people who actually have to deal with the ramifications of this decision were not involved in it. There was never a call for feedback on this list. We are not developers, and don’t have time to read the developer lists where this decision was made.

    How can we possibly lobby to change it back? We can’t use IRC (where a lot of the CentOS folks seem to think they can be “available”). because we’re in an *enterprise* environment that forbids it. We aren’t developers. We’re not on the board. And don’t ask us to “get involved”; we don’t have time!

    I have hundreds of machines, our own private copy of the mirrors, and lots of postinstall scripts. The “version number” is important to maintaining this environment, especially in a mixed version and distro environment.

    OK, I’m done griping. Until the next RHEL release, that is :)

  • Its illogical to introduce a confusing numbering system resembling random meaningless digits and then create a table to refer to the source which has a conventional numbering system possessing clarity, brevity and is easily rememberable.

    Genuine genii prefer simple solutions (K.I.S.S.) whilst the less talented mistakenly assume convoluted identification is preferable. Conservatives think if something is working well, and is devoid of problems, there is no benefit to mankind by replacing it with inferior sub-standard alternatives. Additionally confusion wastes finite time and finite resources.

    The villain is RH who wants, for commercial reasons, to differentiate its commercial product from the internationally respected free alternative.

    Shame the paid, by Red Hat, CentOS volunteers lacked the ability –
    within the Red Hat run CentOS offshoot – to maintain CentOS’ once treasured freedom. Now even the CentOS logo and name are owned by, and controlled by, Red Hat.

    Luckily I have some years left on C6. Like Valeri I’ll migrate to BSD
    instead of using the increasingly problematic C7. That way I’ll avoid systemd and similar nightmares.

    I continue to gratefully appreciate the CentOS team’s personal efforts in producing a really splendid operating system for public consumption.

  • This scared me to death. For a split second I thought “What, are my CentOS
    7 workstations Windows 10 already?” What a nightmarish thought! Luckily just my wild imagination ;-)

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • This has nothing to do with systemd or a parallelized boot. The kernel panic is happening during the initial load of the kernel and initialization of hardware.

    I know you love to blame every problem on systemd, but c’mon, this problem is going to happen with *EVERY* init system.


    Jonathan Billings

  • No, I don’t. I’m just that ignorant I guess, and not too attentive to the original description of the problem. My impression was: after the kernel was loaded, when services were getting started, that is when kernel panic had happened. I’m many [bad] things but not a wishful blamer of some piece of architecture I do not like much (compared to different few doing the same I saw in my life some of them I’m still using).

    But thanks for your note, it’s helpful for me (no sarcasm, really).

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • is it possible to get a bug report at bugs.CentOS.org with as much detail as possible, so we can try to reproduce ( and atleast document and manage it that way ).

    thanks

  • i dont see it being dropped, on my completely updated machine i still see the fully qualified numbering in /etc/CentOS-release ( as an example ) ?

  • one stmt before the panic, they could look at what the system was doing
    *sequentially*, and so have an idea what it was failing on. With systemd’s parallelism, we have no clue, other than what it’s done, and no idea what’s happening that’s failing.

    mark

  • Sadly, in this case, no init system would help with the fact that the kernel panic is more lines than are shown on a VGA screen. So, no help there.

    Also, while the systemd init system is loaded very early, judging from the call trace, it happened before or during the pivot from the initrd to the root filesystem. There’s little that can be done to address this kind of panic, no matter what init system you’re running, you basically just need to have a logging console somewhere.

    However, if you have a journald writing into memory, its very possible that if it doesn’t panic enough to kill the journal you might have a copy of what *was* running. It’s not the case here, but this is something that you get from having a journal — you have real logs of what was happening, what processes were running and what emitted the error. With Upstart and SysVinit, you were stuck watching the output on the console and hoping whatever generated the error was good enough to say what program it was. Sure, you could make guesses based on the serial startup (although Upstart also supported parallel startup, although the CentOS init system rarely used it).

    For what its worth, you could always crank up systemd.log_level

  • . . .

    The short answer: Because RHEL is based on Fedora development.

    The long answer: Because RH believes/believed that the laptop environment is/was a key part of its growth strategy. The recent phenomenon of the widespread adoption of smart phones and tablets in place of laptops may bring that into question now, but the move to laptops was a deliberate business choice in my opinion.

    It remains to be seen whether or not RH can have its cake and eat it too. Sysadmins tend to be rather prickly people when it comes to people and things that appear to waste their time. It seems to me a strategy of dubious worth aggravating ones installed based chasing a chimera.

    However that may be, the world moves on and we perforce move with it or are left behind.

  • It has never been true that a kernel panic was necessarily caused by the immediately preceding step in a sequential init. I ran into one instance (back in 4.x days, incidentally, where ‘x’ was 1 or 2) where a panic was caused by the tg3 driver, but it wasn’t tickled until a variable number of packets passed the interface, and it didn’t happen very often. Typically, when it happened it happened during SSH startup
    (almost every time it occurred, in fact). But the root cause was the tg3 driver module, not sshd. So having the last line before the panic being the SSH startup was actually a hindrance rather than a help in that case; I would have been looking for an sshd problem that didn’t actually exist.

    I don’t think that’s an isolated instance, either. You need the module information from the panic more than information on what was started immediately prior to the panic. This was fixed without me having to file a bug report, incidentally, and so there is no BZ # to point you to that I recall, and a quick search of bugzilla doesn’t show one for that particular issue that I had. I ended up seeing that it was a tg3
    problem after setting up a serial console and grabbing the panic output from that. By the time I got to that point, the next update rollup for CentOS 4 was coming down, and that was the end of that problem.

    I keep thinking I’ll track down the panic I saw a few months ago with CentOS 7 and gkrellm on my hardware, but by the time I get enough ’round toits’ to do the troubleshooting the kernel has been updated, and I have to wait on the debuginfo…..lather, rinse, repeat. Eventually I’ll get my timing right and see what is (or maybe is not) happening.

  • I know this is a big, confused thread, but the main complaint is that the website has dropped the fully qualified numbering:

    https://www.google.com/#q=%227.1.1503%27+site:CentOS.org

    vs

    https://www.google.com/#q=%227+1503%27+site:CentOS.org

    The first is “7.1.1503” and has 895 results, and they aren’t html pages, they’re things like directories containing that string.

    The second is strings like “7 (1503)”, and has 1,700+ results, and they are the html pages.

  • Em 03-12-2015 14:46, Duncan Brown escreveu:

    Of some. It’s failing on ftrace initialization while allocating memory, but can’t know the real reason. It can be just lack of memory or any other bug.

    Probably by that stage of the boot, framebuffer is already loaded. Use vga=0x317 boot option to get a bigger resolution and more lines on it, then send another pic.

    I hope it fits this time. If not, pick 0x31A, if your monitor allows.

    (kernel src, Documentation/fb/vesafb.txt)
    | 640×480 800×600 1024×768 1280×1024
    —-+———————————–

  • Hi Marcelo

    Thanks for the suggestion

    I’ve worked through a decent selection and cant get any to show a higher resolution than it’s currently at, in fact most just leave me with a blank screen

    Note I can change the resolution all I want if I boot via the previous kernel

    cheers

    Duncan

  • I’m seeing if I can get a decent copy of the kernel panic

    Would the bug report be much use without it?

    thanks

  • Puppet, Chef, Ansible and the other CFG Management systems work with the current numbering system like it is now, we have no intention of changing it ever again.

    Not only that, but all the cloud images that we produce for Google Compute, Amazon AWS, OpenNebula, Vagrant, etc, work.

    Our docker containers work and this works with the CentOS Atomic Host within Project Atomic. It works in OpenHPC.

    People can still get and use CentOS-7, it is completely free and it also works completely with every major public cloud and private onsite cloud setup.

    Our community build system for SIGs is functioning and producing software in several SIGS (https://cbs.CentOS.org/).

    In the Virt SIG we have Xen for CentOS-6 in production with XenProject.org taking the lead on that (http://bit.ly/1jFYLUe) (there are also testing RPMs for CentOS-7). There are oVirt RPMs for the last
    2 releases also in the Virt SIG (http://bit.ly/1XJJ2X4).

    CentOS Atomic Host is available here: (http://bit.ly/1XJIvEl)

    The Software Collections SIG is producing RPMs as well
    (http://bit.ly/1m3Jmid). Both SCLs and the Dev Tool Sets.

    In our AltArch SIG ee have a beta build of ppc64le and ppc64 in (with IBM helping with that build) as well released i686 and AArch64 arches
    (http://bit.ly/1OMOWyX). We will have an Arm32 release very soon as well.

    There are RDO Packages being produced for both openstack-kilo and openstack-liberty in the Cloud SIG (http://bit.ly/21FmFl6).

    The Storage SIG has 2 releases of GlusterFS Packages
    (http://bit.ly/1Q7cXUA).

    The bottom line is, our numbering system works for us to be able to do all those things.

    This is much more complicated than just a name on an ISO and I don’t think that we really need to discuss it at every singe CentOS-7 point release for the next 8 years.

    The CentOS-7 Base OS is still there, it is free for anyone to use, it is being maintained as always. If you don’t want to use it, great. That is your choice.

    However, CentOS is more open and more community driven now that it ever has been .. while still maintaining our Base OS trees.

    Thanks, Johnny Hughes

  • I always admire Johnny’s prose, passion for CentOS and his calm approach to everything.

    He is right. We are stuck with the numbering system.

    Once thing is for certain, CentOS has expanded significantly and now offers an increasing variety of exciting/interesting computer solutions.

    CentOS on Android tablets would be really nice.

    Happy Christmas everyone and a successful and prosperous New Year

  • I’m suggesting that you use 7.1.1503 instead of 7 (1503) *on the website*. I don’t think that Puppet, Chef, Ansible, or any other configuration management system parses webpages. They look at things like /etc/CentOS-release, which is 7.1.1503, and I think that’s fine.

    I am not suggesting that you revert to 7.1.

    CentOS: love it or leave it! My new employer uses Debian for their large cluster.

    — greg

  • Yeah I’ve got a nice little usb one I can hook directly into a raspberry pi on the way

    I’m totally expecting this to get fixed in a kernel update before the thing even arrives, but it’ll be a good learning experience and will probably be useful in the future!

  • Always Learning wrote:

    Agreed. But two possibly OT and probably ignorant queries:

    1. I am running a standard CentOS 32-bit system on my home servers. I keep them up-to-date, but have not re-booted for several months. I see from /etc/CentOS-release that I am running 7.1. If I re-booted would this become 7.2?

    2. If so, is this kernel panic a widespread phenomenon?

  • You’re running the 32-bit AltArch build of CentOS?

    The /etc/CentOS-release is owned by the CentOS-release package, and the contents will be updated when you update that pacakge. A reboot won’t change that. In the default x86_64 release, I think that you’d need to pull updates from the CR repo to get the 7.2.1511 packages, still.

  • And just look at the confusion — because the website almost never mentions 7.1.1053 or 7.2.1511, it can be really hard to understand this discussion — one person using “7.1” and “7.2” and the other using “7.2.1511”. Good thing the 2nd person didn’t use “7 (1511)”, like the website does.

    Oh, wait: CentOS, love it or leave it.

  • approach

    Really?

    This is what we’re dealing with now?

    OK. I will recommend we move away from CentOS.

    Good job.

  • 1. The CentOS Release package does not get updated until the full release. It will not be updated in CR repo, but will be part of the final rollout which includes installable ISOs, etc. Neither will Anaconda, which will also be updated in the full upcoming release.

    The CR repo is the rest of the updates (earlier that the final release)
    as we still to the final QA. This is all spelled out here:

    https://wiki.CentOS.org/AdditionalResources/Repositories/CR

    2. The Alternative Arches (i686, armhpc, aarch64) are not necessarily updated as quickly as the main arches. That is one of the reasons they are AltArch and not a main arch. However, we are working hard on all of those as well.

    I actually expect that I will have the CR stuff ready in the next 48
    hours for i686.

    I know that armhfp (that is Arm32) minimal tree is also already done and available for 7.2.1511 (these are testing releases for Arm32):

    https://lists.CentOS.org/pipermail/arm-dev/2015-December/001343.html

    The aarch64 packages are also mostly built, but they require some more work.

    3. This kernel issue, to the best of my knowledge on this thread, is one person who had an issue on one x86_64 install .. but it us hard to tell as there is much discussion on the thread that has nothing to do with that kernel ooops.

  • Ah, I didn’t know that. Makes sense.

    My point was just that /etc/CentOS-release reflects the packages you have installed, and not what kernel you’ve booted into.

    However, /etc/issue is intepreted by the login service to use your kernel release so it will reflect your kernel.

  • Note that there is a /etc/CentOS-release-upstream as well that identifies what ver of the upstream we are currently tracking.

    I hope its not that drastic!

    There are multiple issues and fallouts etc here, start from the fact that the point number isnt really much other than a datestamp, to who and how it gets used and for what purpose etc. But the thing that bothers me most is that the reason as to why we are doing this and how its implemented isnt clear to people on this thread.

    eg. when we were doing x.y, RHEL wasent. They were on a X release, and all the other point in time data was communicated outside of that scope
    ( eg in EL3 / 4 etc ). I believe being pragmatic around this, and delivering value into areas that needed it most is good thing for us and the userbase at large – however, if we are breaking systems for existing setup’s then we should address that. I took onboard all the feedback from 7 release time and I believe the system we have in place now should work for most people ( no one has been able to demonstrate a problem space in the distro as such ).

    If the issue is around communication and how we export the metadata /
    mindset – I totally take on board that we’ve had serious issues in that space. Even the fact that there is a CR/ repo isnt something most people understand or even know about, its something we should fix.

    Greg’s pointed out the website version reporting, and its a great point
    – however, note that we are already working on fixing that side of things by bringing all Download specific info into 1 place, and doing this on the wiki ( wiki.CentOS.org/Download ) – we are moving all version specific content away from the website; the net result being that the website becomes about the project, and the wiki becomes the defacto source for all things content ( linux distro, sig’s content, user help etc ). This is also primarily driven by the fact that we’ve struggled to keep the site updated and relevant, whereas the wiki with its much larger user base and contributor base has far better churn.

    So lets workout what the tangible issues are, and then work on resolving those.

    I will end by saying that we have more than a few million monthly instances out there now, in container space, in cloud space, in developer instances – and all of those people have hugely benefited from the new visioning.

    I certainly dont want you to leave!

    regards

  • (bit snip)

    No, I would prefer ALL of you leave. All of you who are not addressing the OP’s issue should leave the thread. Just start a new thread, not here. Enough is enough (/me saying in the same tone as President Obama referring to the gun violence in the US).

    Now back to the topic…

    This bug report:

    https://bugs.CentOS.org/view.php?id

  • Phelps, Matthew wrote:

    This seems to be raising what to me is a trivial issue to an absurd level of hostility. Johnny Hughes’ comment was uncharacteristically harsh;
    and yours is even harsher.

    To me, CentOS is a highly stable OS for my home servers, and I am eternally grateful to Johnny Hughes and his colleagues for carrying out what looks to me like an impossibly complex task.

    The numbering of packages is a very small part of this. On the other hand, a kernel panic would be very worrying to me if it were in fact likely to happen. I am glad to hear that I have no need to worry.

  • That is indeed good news.

    Now, this is only a workaround. As seen in the RH bugzilla, the patch is in the z-series kernel and the target is set to “7.3”. That will be CentOS 7 (1605) [or later]. At any rate it’s several months from now. If we are able to identify the patch, it will be possible to include it in the CentOSplus kernel.

    Akemi

  • As a member of the user community, it’s hard to see it any other way.

    I want to be fair to everyone, so I’ll acknowledge that Greg was making an ass of himself. He said so, himself. I also think that the question of what release the problem occurred on is irrelevant. The relevant question is the version of the kernel package, not the CentOS-release package.

    But that aside, I think that Greg was right in that the version notation used in the wiki (https://wiki.CentOS.org/Download) is unnecessarily inconsistent with older releases. The rpm version for CentOS-release is
    7-1.1503. The version reflected in /etc/CentOS-release is 7.1.1503.
    But text on the wiki omits a portion of that version number. Greg was consistent and (in my opinion) clear that he was suggesting that the wiki be consistent with the numbering used elsewhere.

    Johnny’s response ignored that suggestion completely, and defended the version numbering scheme, which was not in question. And in his very first response, he said, “CentOS-7 Base OS is still there, it is free for anyone to use … If you don’t want to use it.. That is your choice.”

    Can you see why that would be interpreted as “love it or leave it?”

    It has been my impression for a long time that the CentOS developers are reluctant to engage the community in contributing to the project, and this is a fairly good example of why that impression endures.

    Arguably, that’s all any version number is. Isn’t it?

    But your response, like Johnny’s ignores what Greg actually suggested:
    that the wiki use a version number consistent with the rpm version number and the content of /etc/CentOS-release.

    Yes, the wiki is where the problem is. That’s the same URL that Greg mentioned originally.

    I would interpret that as an invitation to participate in the project, but I created an account on the wiki and don’t seem to be able to edit anything.

  • Who is “contributing” here? Where’s the patch? All I see is a bunch of bikeshedding.

    The new version numbering scheme was created to solve a real problem, which CentOS has been fighting for years.[*]

    If you change anything about the version numbering scheme within the 7 line, you break automated workflows that were debugged and deployed a year ago. The time to make such a change is 8 at earliest, and I’d argue that switching *again* after the 7 effort would cause more problems than it solves.

    Remember, this distro is about stability. Changing naming/numbering schemes in a way that breaks scripts is about as far from stability as you can get.

    [*] With every release from CentOS 3.1 through 6.7, there was always a series of mailing list questions of the same basic form: “FooApp is only certified for CentOS 6.4, but CentOS 6.7 is out, and the vendor won’t update the certification, so how do I keep my servers on CentOS 6.4?” Just as there is no CentOS 7.2, only 7, there was no CentOS 6.4, only 6. The new scheme tries to make that clear.

    It would actually *be* clear if the tail (CentOS) could wag the dog (Red Hat) here and get them to adopt the YYMM respin scheme.

  • I don’t know how to be any more clear than I was. Neither Greg nor I
    (as far as I can tell) were suggesting that the version numbering be changed, only that it be consistent with the rpm and CentOS-release files where it is used on the wiki.