Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.

Home » CentOS-Virt » Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.

March 20, 2017 PJ Welsh CentOS-Virt 38 Comments

Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel boot message of a few like “APIC ID MISMATCH” and the system reboots immediately without any other bits of info. This is on a Dell R710 with
64GB RAM and 2x 6-core Intel CPU‘s. As an additional test, I installed and attempted to run the current
“testing” kernel of 4.9.16 with the exact same results.

Anyone have an idea? The 3.18.x series runs without issue of course.

Thanks

38 thoughts on - Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.

Johnny Hughes says:

March 20, 2017 at 11:05 am

–pikV7ba55IQrshWhdVLe5btIVjo0LEi5F
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Try this kernel (the noarch kernel-doc is not done yet), but that is not a required package:

https://people.CentOS.org/hughesjr/4.9.16/x86_64/

Let me know if that works or not .. we can try adjusting some other config settings.

Don’t worry about the CentOS.plus dist tag .. that will change when we subnit it via the regular process.

Thanks, Johnny Hughes

–pikV7ba55IQrshWhdVLe5btIVjo0LEi5F
Johnny Hughes says:

March 20, 2017 at 11:21 am

–PunEwd0AOaEas6Kep5IDWt6pLK7K865a2
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I think the APIC ID MISMATCH is an expected and ignorable error .. see:

https://patchwork.kernel.org/patch/9539933/

I applied that patch and I am building a 4.9.16-23 right now, I ‘ll publish it when it finishes. Maybe with the error gone we can get a better error in the console.

–PunEwd0AOaEas6Kep5IDWt6pLK7K865a2
Johnny Hughes says:

March 20, 2017 at 11:47 am

–8J0PgToVX1JFr0hGNqnm5CNE8whOe1it0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

OK, try the 4.9.16-23 packages here:

https://people.CentOS.org/hughesjr/4.9.16/x86_64/

–8J0PgToVX1JFr0hGNqnm5CNE8whOe1it0
PJ Welsh says:

March 20, 2017 at 1:20 pm

No warning, but still just reboots with no notice. Is there any other system info you need?
Thanks PJ
Johnny Hughes says:

March 20, 2017 at 2:23 pm

–FU3vWFFe2NO0VnlvuuphlPWhkdH6QiuLe Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Try the new 4.9.16-24 packages there now. (reworked the config based on a fedora kernel)

–FU3vWFFe2NO0VnlvuuphlPWhkdH6QiuLe
PJ Welsh says:

March 20, 2017 at 3:51 pm

Still just starts the kernel and wihtin 4 seconds reboots with 4.9.16-24. Thanks PJ
Ricardo J. says:

March 20, 2017 at 5:23 pm

El Lunes 20/03/2017, PJ Welsh escribió:

Edit grub’s entry and add “noreboot” to your xen parameters, maybe when the kernel panicks xen detects it and automatically reboots it.

—
Ricardo J. Barberis Usuario Linux Nº 250625: http://counter.li.org/
Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
Senior SysAdmin / IT Architect – http://www.DonWeb.com
PJ Welsh says:

March 20, 2017 at 6:14 pm

Sure thing. I will need to wait until AM Tuesday USA time to test now. Thanks PJ
PJ Welsh says:

March 21, 2017 at 7:49 am

“noreboot” grub.conf option still produced nothing other than a flashing cursor on the top left. Also, neither num-lock nor caps-lock respond at this time… I seem no closer with helpful information other than, “it’s broken” :(
Here is the grub.conf stanza for the kernel:
title CentOS (4.9.16-24.el6.CentOS.plus.x86_64)
root (hd0,1)
kernel /boot/xen.gz dom0_mem=3G,max:3G cpuinfo com1=115200,8n1
console=com1,tty loglvl=all gue st_loglvl=all noreboot module /boot/vmlinuz-4.9.16-24.el6.CentOS.plus.x86_64 ro root=UUID=bc0727e1-882c-4fbc-a4d9-e4c f754d72b7 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 crashkernel=auto K
EYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet reboot=pci max_loop=64
module /boot/initramfs-4.9.16-24.el6.CentOS.plus.x86_64.img

Thanks PJ
…
Kevin Stange says:

March 21, 2017 at 1:40 pm

Try removing “rhgb” and “quiet” from your boot options as well.
PJ Welsh says:

March 22, 2017 at 6:28 am

The last few lines are NMI watchdog: disabled CPU0 hardware events not enabled NMI watchdog: shutting down hard lockup detector on all CPUS
installing Xen timer for CPU1
installing Xen timer for CPU2
installing Xen timer for CPU3
installing Xen timer for CPU4
installing Xen timer for CPU5
installing Xen timer for CPU6

Here is the screen shot:
https://goo.gl/photos/yNQqaQY9bJBWQ84X8
It stops at CPU6. This is a dual socket server with 2x 6core L5639 CPUs (HT
disabled). I’m surprised to see it stop at 6.

Thanks PJ
PJ Welsh says:

March 24, 2017 at 1:36 pm

As a follow up I was able to test fresh install on Dell R710 and a Dell R620 with success on CentOS 7.3 without issue on the new kernel. My new plan will be to just move this C6 to one of the C7 I just created.
Sarah Newman says:

March 24, 2017 at 1:45 pm

That sounds like a compiler problem, since I think the C6 and C7 kernels are built from the same source.

–Sarah
PJ Welsh says:

March 28, 2017 at 4:55 pm

The mystery gets more interesting… I now have a CentOS 7.3 Dell R710
server doing the exact same thing of rebooting immediately after the Xen kernel load. Just to note this is a second system and not just the first system with an update. I hope I’m not introducing something odd. They only
“interesting” thing I have done for historical reasons is to change the following /etc/sysconfig/grub line:
GRUB_CMDLINE_XEN_DEFAULT=”dom0_mem=6G,max:8G cpuinfo com15200,8n1
console=com1,tty loglvl=all guest_loglvl=all”
But I’ve done that on other servers without issue. In fact I have a Dell R710 that DOES work with CentOS 7 and the new kernel… so confused.
Alvin Starr says:

March 28, 2017 at 7:45 pm

I ran into this also.

back up to an older kernel. At least that was my solution till a kernel came out that would boot.

It seems that some kernel builds are not friendly to xen.
Johnny Hughes says:

March 29, 2017 at 5:30 am

–BBcMqOSfko8nSpnlnHfgSGOPmJTR7AD3e Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Maybe the BIOS versions are different on the two machines if they are the same models. Different disc controllers or modes set up? Different NICs or other add on cards?

–BBcMqOSfko8nSpnlnHfgSGOPmJTR7AD3e
George Dunlap says:

March 29, 2017 at 6:23 am

PJ,

Thanks for your testing and report. Would you mind reporting this on xen-devel? If there’s actually a bug in the Linux 4.9.x on Xen boot path on your box, I don’t think Johnny or I are going to be able to help you debug it. :-)

-George
Johnny Hughes says:

April 4, 2017 at 11:14 am

–reXeDOJSDnUXaTIt0WCo5gnri9PV3F9tT
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

OK, I have a new CentOS-6 4.9.20-26 kernel here for testing:

https://people.CentOS.org/hughesjr/4.9.16/6/x86_64/

I am building the el7 one right now as well, it will be at:

https://people.CentOS.org/hughesjr/4.9.16/7/x86_64/

George and I found some issues with the 4.9.x config files for the xen kernel. Hopefully this one is much more stable as it has many changes from the fedora/rhel type configs now (what is built into the kernel, what is loaded as a kernel module, etc.)

Please test these kernels so we can get them released.

Thanks, Johnny Hughes

–reXeDOJSDnUXaTIt0WCo5gnri9PV3F9tT
PJ Welsh says:

April 5, 2017 at 10:42 am

CentOS-6 4.9.20-26 kernel exhibits the same constant kernel-start-then-reboot issue when booting under the “CentOS Linux, with Xen hypervisor” grub2 menu option. However, it *does* properly boot under the “CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)” grub2 menu option!

A semi-close look at the /etc/grub2.cfg yields no discernible difference between a properly functional Dell R620 and the non-properly functioning Dell R710.

Sorry, I had been distracted with other issues and have not yet submitted information to the xen-devel group yet.

Thanks PJ
-=X.L.O.R.D=- says:

April 6, 2017 at 11:06 pm

So interesting and challenging too, IT seems to Xen compatibility to Dell board BIOS related.

I have Dell R710 and R7200 with same Xen version, but the outcome is completely different, that R720 is slow in performance and reboot too.

xlord
Sarah Newman says:

April 6, 2017 at 11:21 pm

I am having no similar issues with several Dell Proliant DL160p’s and CentOS 6. They are either G5 or G6, I don’t recall which.

–Sarah
PJ Welsh says:

April 7, 2017 at 6:59 am

I’ve not gotten any bites from my posting on the xen-devel mailing list. Here is the only one to-date:
https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html

From that email, there needs to be some hypervisor messages.

Does anyone know how to produce the hypervisor messages? I’ve already removed the rhgb and quiet options from the boot.

Thanks PJ
PJ Welsh says:

April 7, 2017 at 7:02 am

I spoke too soon. To get more information:
Please see https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project and https://wiki.xenproject.org/wiki/Xen_Serial_Console or alternatively at least add “vga=keep”.

pjwelsh
Anderson, Dave says:

April 14, 2017 at 5:16 am

List moderator: feel free to delete my previous large message with attachments that’s in the moderation queue…it’s now obsolete anyway.

I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13:

Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace:

[Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
installing Xen timer for CPU 8
[Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
————[ cut here ]———-
Johnny Hughes says:

April 14, 2017 at 7:39 am

–8Vmx98lMhUfLdnguAtxVK9Ecgc37x6ISl Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Dave,

Take a look at this kernel as it is the one I think we are going to release (or a slightly newer 4.9.2x from kernel.org LTS). This version has some newer settings that are more redhat/fedora/CentOS base kernel like WRT what is a module and what is built into the kernel, etc.

https://people.CentOS.org/hughesjr/4.9.x/

Thanks, Johnny Hughes

–8Vmx98lMhUfLdnguAtxVK9Ecgc37x6ISl
PJ Welsh says:

April 14, 2017 at 9:33 am

I am on holiday until Sunday, but will download the kernel now and test it when I get back into work. Thanks
PJ Welsh says:

April 14, 2017 at 9:34 am

Very nice on the sleuthing!
Thanks
Anderson, Dave says:

April 14, 2017 at 3:26 pm

Sad to say that I already tested 4.9.20-26 from your repo yesterday…it does look a little cleaner before it dies, but still dies. I have not tested it with the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits below:

Loading Xen 4.6.3-12.el7 … Loading Linux 4.9.20-26.el7.x86_64 … Loading initial ramdisk …
[ 0.000000] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017

[ 6.195089] smpboot: Max logical packages: 1
[ 6.199549] VPMU disabled by hypervisor.
[ 6.203663] Performance Events: SandyBridge events, PMU not available due to virtualization, using software events only.
[ 6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
[ 6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
[ 6.229165] installing Xen timer for CPU 1
[ 6.233849] installing Xen timer for CPU 2
[ 6.238504] installing Xen timer for CPU 3
[ 6.243139] installing Xen timer for CPU 4
[ 6.247836] installing Xen timer for CPU 5
[ 6.252478] installing Xen timer for CPU 6
[ 6.257155] installing Xen timer for CPU 7
[ 6.261795] installing Xen timer for CPU 8
[ 6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[ 6.272736] ————[ cut here ]———-
Anderson, Dave says:

April 14, 2017 at 3:40 pm

So, strangely,

I have two _identical_ dualproc xeon mobos (same bios/ipmi versions, they even share an enclosure, one is right side, other is left), each with different cpu/memory:

Using 4.9.13 with vcpu limited to 4, early in the boot process, the one that _was_ booting before setting the xen vcpu args says:
“[ 7.060720] smpboot: Max logical packages: 2”,

and the other one says
“[ 6.195089] smpboot: Max logical packages: 1”

They both have dual procs, known working/good.

The first (the one that worked unmodified) has dual 8 core (16 HT/ea) and correctly detects “[ 0.000000] smpboot: Allowing 32 CPUs, 0 hotplug CPUs”. It’s a Xeon E5-2665v1.

The second machine (didn’t work without the xen vcpu args) has dual 4 core (8ht/ea) and also correctly detects “[ 0.000000] smpboot: Allowing 16 CPUs, 0 hotplug CPUs”. It’s a Xeon E5-2643v1…so it seems like this one does ok until it decides there’s only one cpu package?

Thanks,
-Dave
Anderson, Dave says:

April 14, 2017 at 4:58 pm

I also just realized the C6 portion of the title/subject line here refers to CentOS 6, so I’d like to clarify that all my testing/issues/etc was under CentOS 7.3 with all patches applied.

Thanks,
-Dave
Johnny Hughes says:

April 18, 2017 at 8:30 am

–uF2kaHes1dqIA1Ox1lkp4c7KBvUGig1EU
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Dave,

Just for testing purposes, can you try booting the kernel in the normal way on the machine does does not work (a normal grub entry on the kernel with no xen.gz line)

That way, we can hopefully narrow the issue down to a hypervisor issue or a kernel config issue.

Thanks, Johnny Hughes

–uF2kaHes1dqIA1Ox1lkp4c7KBvUGig1EU
PJ Welsh says:

April 18, 2017 at 8:36 am

There was a note that the non-Xen kernel at the same kernel version did indeed boot:
“CentOS-6 4.9.20-26 kernel exhibits the same constant kernel-start-then-reboot issue when booting under the “CentOS Linux, with Xen hypervisor” grub2 menu option. However, it *does* properly boot under the “CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)” grub2 menu option!”

Trying to get back into being able to test this more.

Thanks PJ
PJ Welsh says:

April 18, 2017 at 8:40 am

Just to note, the same pattern happens on C7:
“CentOS Linux, with Xen hypervisor” = reboot
“CentOS Linux (4.9.20-26.el7.x86_64) 7 (Core)” = boot

[root@XXX ~]# uname -a Linux XXX 4.9.20-25.el7.x86_64 #1 SMP Fri Mar 31 08:53:28 CDT 2017 x86_64
x86_64 x86_64
PJ Welsh says:

April 18, 2017 at 8:45 am

Apologies: I installed the newer -26 kernel and had not rebooted into it. The grub2 menu item should have been “CentOS Linux (4.9.20-25.el7.x86_64) 7
(Core)”. I am currently restarting that remote affected system (unmodified grub2 entry first). Thanks PJ
PJ Welsh says:

April 18, 2017 at 12:40 pm

Here is something interesting… I went through the BIOS options and found that one R710 that *is* functioning only differed in that “Logical Processor”/Hyperthreading was *enabled* while the one that is *not*
functioning had HT *disabled*. Enabled Logical Processor and the system starts without issue! I’ve rebooted 3 times now without issue. Dell R710 BIOS version 6.4.0
2x Intel(R) Xeon(R) CPU L5639 @ 2.13GHz
4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
x86_64 GNU/Linux
Johnny Hughes says:

April 19, 2017 at 5:41 am

–BKMA8Ooe61ChfIiveLH2TJfkOSjnCWH2E
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the system as normal updates. It should be available later today.

–BKMA8Ooe61ChfIiveLH2TJfkOSjnCWH2E
PJ Welsh says:

April 19, 2017 at 12:25 pm

I’ve verified with a second Dell R710 that disabling Hyperthreading/Logical Processor causes the primary xen booting kernel to fail and reboot. Consequently, enabling allows for the system to start as expected and without any issue:
Current tested kernel was: 4.9.13-22.el7.x86_64 #1 SMP Sun Feb 26 22:15:59
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I just attempted an update and the 4.9.23-26 is not yet up. Does this update address the Hyperthreading issue in any way?

Thanks PJ
Johnny Hughes says:

April 19, 2017 at 12:33 pm

–G4npUXN5V4uD23h2HiQt39l1eAV5UqspE
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I don’t think so .. at least I did not specifically add anything to do so.

You can get it here for testing:

https://buildlogs.CentOS.org/CentOS/7/virt/x86_64/xen/

(or from /6/ as well for CentOS-6)

Not sure why it did not go out on the signing run .. will check that server.

–G4npUXN5V4uD23h2HiQt39l1eAV5UqspE

Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.

38 thoughts on - Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta