Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.
Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel boot message of a few like “APIC ID MISMATCH” and the system reboots immediately without any other bits of info. This is on a Dell R710 with
64GB RAM and 2x 6-core Intel CPU‘s. As an additional test, I installed and attempted to run the current
“testing” kernel of 4.9.16 with the exact same results.
Anyone have an idea? The 3.18.x series runs without issue of course.
Thanks
PJ
Recommended
Recent Posts
Recent Comments
- igor on LibGLU.so.1
- Hussein on NBDE, Clevis And Tang For Non-root Disk
- João M. S. Silva on CentOS 7.6 1810 Vs. VirtualBox : Bug With Keyboard Layout Selection
- Jim Plumb on Spamassassin Vs. SELinux Trouble
- woodiskingser on Off Topic – Need Help Registering To The Smplayer Forum
Archives
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- February 2014
- January 2014
- April 2013
- December 2012
38 thoughts on - Xen C6 Kernel 4.9.13 And Testing 4.9.15 Only Reboots.
–pikV7ba55IQrshWhdVLe5btIVjo0LEi5F
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Try this kernel (the noarch kernel-doc is not done yet), but that is not a required package:
https://people.CentOS.org/hughesjr/4.9.16/x86_64/
Let me know if that works or not .. we can try adjusting some other config settings.
Don’t worry about the CentOS.plus dist tag .. that will change when we subnit it via the regular process.
Thanks, Johnny Hughes
–pikV7ba55IQrshWhdVLe5btIVjo0LEi5F
–PunEwd0AOaEas6Kep5IDWt6pLK7K865a2
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
I think the APIC ID MISMATCH is an expected and ignorable error .. see:
https://patchwork.kernel.org/patch/9539933/
I applied that patch and I am building a 4.9.16-23 right now, I ‘ll publish it when it finishes. Maybe with the error gone we can get a better error in the console.
–PunEwd0AOaEas6Kep5IDWt6pLK7K865a2
–8J0PgToVX1JFr0hGNqnm5CNE8whOe1it0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
OK, try the 4.9.16-23 packages here:
https://people.CentOS.org/hughesjr/4.9.16/x86_64/
–8J0PgToVX1JFr0hGNqnm5CNE8whOe1it0
No warning, but still just reboots with no notice. Is there any other system info you need?
Thanks PJ
–FU3vWFFe2NO0VnlvuuphlPWhkdH6QiuLe Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Try the new 4.9.16-24 packages there now. (reworked the config based on a fedora kernel)
–FU3vWFFe2NO0VnlvuuphlPWhkdH6QiuLe
Still just starts the kernel and wihtin 4 seconds reboots with 4.9.16-24. Thanks PJ
El Lunes 20/03/2017, PJ Welsh escribió:
Edit grub’s entry and add “noreboot” to your xen parameters, maybe when the kernel panicks xen detects it and automatically reboots it.
—
Ricardo J. Barberis Usuario Linux Nº 250625: http://counter.li.org/
Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
Senior SysAdmin / IT Architect – http://www.DonWeb.com
Sure thing. I will need to wait until AM Tuesday USA time to test now. Thanks PJ
“noreboot” grub.conf option still produced nothing other than a flashing cursor on the top left. Also, neither num-lock nor caps-lock respond at this time… I seem no closer with helpful information other than, “it’s broken” :(
Here is the grub.conf stanza for the kernel:
title CentOS (4.9.16-24.el6.CentOS.plus.x86_64)
root (hd0,1)
kernel /boot/xen.gz dom0_mem=3G,max:3G cpuinfo com1=115200,8n1
console=com1,tty loglvl=all gue st_loglvl=all noreboot module /boot/vmlinuz-4.9.16-24.el6.CentOS.plus.x86_64 ro root=UUID=bc0727e1-882c-4fbc-a4d9-e4c f754d72b7 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 crashkernel=auto K
EYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet reboot=pci max_loop=64
module /boot/initramfs-4.9.16-24.el6.CentOS.plus.x86_64.img
Thanks PJ
…
Try removing “rhgb” and “quiet” from your boot options as well.
The last few lines are NMI watchdog: disabled CPU0 hardware events not enabled NMI watchdog: shutting down hard lockup detector on all CPUS
installing Xen timer for CPU1
installing Xen timer for CPU2
installing Xen timer for CPU3
installing Xen timer for CPU4
installing Xen timer for CPU5
installing Xen timer for CPU6
Here is the screen shot:
https://goo.gl/photos/yNQqaQY9bJBWQ84X8
It stops at CPU6. This is a dual socket server with 2x 6core L5639 CPUs (HT
disabled). I’m surprised to see it stop at 6.
Thanks PJ
As a follow up I was able to test fresh install on Dell R710 and a Dell R620 with success on CentOS 7.3 without issue on the new kernel. My new plan will be to just move this C6 to one of the C7 I just created.
That sounds like a compiler problem, since I think the C6 and C7 kernels are built from the same source.
–Sarah
The mystery gets more interesting… I now have a CentOS 7.3 Dell R710
server doing the exact same thing of rebooting immediately after the Xen kernel load. Just to note this is a second system and not just the first system with an update. I hope I’m not introducing something odd. They only
“interesting” thing I have done for historical reasons is to change the following /etc/sysconfig/grub line:
GRUB_CMDLINE_XEN_DEFAULT=”dom0_mem=6G,max:8G cpuinfo com15200,8n1
console=com1,tty loglvl=all guest_loglvl=all”
But I’ve done that on other servers without issue. In fact I have a Dell R710 that DOES work with CentOS 7 and the new kernel… so confused.
I ran into this also.
back up to an older kernel. At least that was my solution till a kernel came out that would boot.
It seems that some kernel builds are not friendly to xen.
–BBcMqOSfko8nSpnlnHfgSGOPmJTR7AD3e Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Maybe the BIOS versions are different on the two machines if they are the same models. Different disc controllers or modes set up? Different NICs or other add on cards?
–BBcMqOSfko8nSpnlnHfgSGOPmJTR7AD3e
PJ,
Thanks for your testing and report. Would you mind reporting this on xen-devel? If there’s actually a bug in the Linux 4.9.x on Xen boot path on your box, I don’t think Johnny or I are going to be able to help you debug it. :-)
-George
–reXeDOJSDnUXaTIt0WCo5gnri9PV3F9tT
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
OK, I have a new CentOS-6 4.9.20-26 kernel here for testing:
https://people.CentOS.org/hughesjr/4.9.16/6/x86_64/
I am building the el7 one right now as well, it will be at:
https://people.CentOS.org/hughesjr/4.9.16/7/x86_64/
George and I found some issues with the 4.9.x config files for the xen kernel. Hopefully this one is much more stable as it has many changes from the fedora/rhel type configs now (what is built into the kernel, what is loaded as a kernel module, etc.)
Please test these kernels so we can get them released.
Thanks, Johnny Hughes
–reXeDOJSDnUXaTIt0WCo5gnri9PV3F9tT
CentOS-6 4.9.20-26 kernel exhibits the same constant kernel-start-then-reboot issue when booting under the “CentOS Linux, with Xen hypervisor” grub2 menu option. However, it *does* properly boot under the “CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)” grub2 menu option!
A semi-close look at the /etc/grub2.cfg yields no discernible difference between a properly functional Dell R620 and the non-properly functioning Dell R710.
Sorry, I had been distracted with other issues and have not yet submitted information to the xen-devel group yet.
Thanks PJ
So interesting and challenging too, IT seems to Xen compatibility to Dell board BIOS related.
I have Dell R710 and R7200 with same Xen version, but the outcome is completely different, that R720 is slow in performance and reboot too.
xlord
I am having no similar issues with several Dell Proliant DL160p’s and CentOS 6. They are either G5 or G6, I don’t recall which.
–Sarah
I’ve not gotten any bites from my posting on the xen-devel mailing list. Here is the only one to-date:
https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
From that email, there needs to be some hypervisor messages.
Does anyone know how to produce the hypervisor messages? I’ve already removed the rhgb and quiet options from the boot.
Thanks PJ
I spoke too soon. To get more information:
Please see https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project and https://wiki.xenproject.org/wiki/Xen_Serial_Console or alternatively at least add “vga=keep”.
pjwelsh
List moderator: feel free to delete my previous large message with attachments that’s in the moderation queue…it’s now obsolete anyway.
I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13:
Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace:
[Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
installing Xen timer for CPU 8
[Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
————[ cut here ]———-
–8Vmx98lMhUfLdnguAtxVK9Ecgc37x6ISl Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Dave,
Take a look at this kernel as it is the one I think we are going to release (or a slightly newer 4.9.2x from kernel.org LTS). This version has some newer settings that are more redhat/fedora/CentOS base kernel like WRT what is a module and what is built into the kernel, etc.
https://people.CentOS.org/hughesjr/4.9.x/
Thanks, Johnny Hughes
–8Vmx98lMhUfLdnguAtxVK9Ecgc37x6ISl
I am on holiday until Sunday, but will download the kernel now and test it when I get back into work. Thanks
Very nice on the sleuthing!
Thanks
Sad to say that I already tested 4.9.20-26 from your repo yesterday…it does look a little cleaner before it dies, but still dies. I have not tested it with the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits below:
Loading Xen 4.6.3-12.el7 … Loading Linux 4.9.20-26.el7.x86_64 … Loading initial ramdisk …
[ 0.000000] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017
[ 6.195089] smpboot: Max logical packages: 1
[ 6.199549] VPMU disabled by hypervisor.
[ 6.203663] Performance Events: SandyBridge events, PMU not available due to virtualization, using software events only.
[ 6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
[ 6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
[ 6.229165] installing Xen timer for CPU 1
[ 6.233849] installing Xen timer for CPU 2
[ 6.238504] installing Xen timer for CPU 3
[ 6.243139] installing Xen timer for CPU 4
[ 6.247836] installing Xen timer for CPU 5
[ 6.252478] installing Xen timer for CPU 6
[ 6.257155] installing Xen timer for CPU 7
[ 6.261795] installing Xen timer for CPU 8
[ 6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[ 6.272736] ————[ cut here ]———-
So, strangely,
I have two _identical_ dualproc xeon mobos (same bios/ipmi versions, they even share an enclosure, one is right side, other is left), each with different cpu/memory:
Using 4.9.13 with vcpu limited to 4, early in the boot process, the one that _was_ booting before setting the xen vcpu args says:
“[ 7.060720] smpboot: Max logical packages: 2”,
and the other one says
“[ 6.195089] smpboot: Max logical packages: 1”
They both have dual procs, known working/good.
The first (the one that worked unmodified) has dual 8 core (16 HT/ea) and correctly detects “[ 0.000000] smpboot: Allowing 32 CPUs, 0 hotplug CPUs”. It’s a Xeon E5-2665v1.
The second machine (didn’t work without the xen vcpu args) has dual 4 core (8ht/ea) and also correctly detects “[ 0.000000] smpboot: Allowing 16 CPUs, 0 hotplug CPUs”. It’s a Xeon E5-2643v1…so it seems like this one does ok until it decides there’s only one cpu package?
Thanks,
-Dave
I also just realized the C6 portion of the title/subject line here refers to CentOS 6, so I’d like to clarify that all my testing/issues/etc was under CentOS 7.3 with all patches applied.
Thanks,
-Dave
–uF2kaHes1dqIA1Ox1lkp4c7KBvUGig1EU
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Dave,
Just for testing purposes, can you try booting the kernel in the normal way on the machine does does not work (a normal grub entry on the kernel with no xen.gz line)
That way, we can hopefully narrow the issue down to a hypervisor issue or a kernel config issue.
Thanks, Johnny Hughes
–uF2kaHes1dqIA1Ox1lkp4c7KBvUGig1EU
There was a note that the non-Xen kernel at the same kernel version did indeed boot:
“CentOS-6 4.9.20-26 kernel exhibits the same constant kernel-start-then-reboot issue when booting under the “CentOS Linux, with Xen hypervisor” grub2 menu option. However, it *does* properly boot under the “CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)” grub2 menu option!”
Trying to get back into being able to test this more.
Thanks PJ
Just to note, the same pattern happens on C7:
“CentOS Linux, with Xen hypervisor” = reboot
“CentOS Linux (4.9.20-26.el7.x86_64) 7 (Core)” = boot
[root@XXX ~]# uname -a Linux XXX 4.9.20-25.el7.x86_64 #1 SMP Fri Mar 31 08:53:28 CDT 2017 x86_64
x86_64 x86_64
Apologies: I installed the newer -26 kernel and had not rebooted into it. The grub2 menu item should have been “CentOS Linux (4.9.20-25.el7.x86_64) 7
(Core)”. I am currently restarting that remote affected system (unmodified grub2 entry first). Thanks PJ
Here is something interesting… I went through the BIOS options and found that one R710 that *is* functioning only differed in that “Logical Processor”/Hyperthreading was *enabled* while the one that is *not*
functioning had HT *disabled*. Enabled Logical Processor and the system starts without issue! I’ve rebooted 3 times now without issue. Dell R710 BIOS version 6.4.0
2x Intel(R) Xeon(R) CPU L5639 @ 2.13GHz
4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
x86_64 GNU/Linux
–BKMA8Ooe61ChfIiveLH2TJfkOSjnCWH2E
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the system as normal updates. It should be available later today.
–BKMA8Ooe61ChfIiveLH2TJfkOSjnCWH2E
I’ve verified with a second Dell R710 that disabling Hyperthreading/Logical Processor causes the primary xen booting kernel to fail and reboot. Consequently, enabling allows for the system to start as expected and without any issue:
Current tested kernel was: 4.9.13-22.el7.x86_64 #1 SMP Sun Feb 26 22:15:59
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
I just attempted an update and the 4.9.23-26 is not yet up. Does this update address the Hyperthreading issue in any way?
Thanks PJ
–G4npUXN5V4uD23h2HiQt39l1eAV5UqspE
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
I don’t think so .. at least I did not specifically add anything to do so.
You can get it here for testing:
https://buildlogs.CentOS.org/CentOS/7/virt/x86_64/xen/
(or from /6/ as well for CentOS-6)
Not sure why it did not go out on the signing run .. will check that server.
–G4npUXN5V4uD23h2HiQt39l1eAV5UqspE