Old Kernel Bug Back In CentOS 6.10?
I updated a few hypervisors and their VMs to CentOS 6.10 on Monday;
today I awoke to an alert saying all VMs are down. It looks like a very old bug crept back in.
The machine is a ProLiant DL380 G7 with Xeon X5675 and 96 GB, running half a dozen smallish VMs. Hypervisor and all VMs have kernel
2.6.32-754.2.1.el6.x86_64. Around the time the VMs must have gone down, there are quite a few error messages like the following in the system log:
Aug 16 03:10:13 hyper-7 kernel: [265397.382552] vmwrite error: reg 6000 value fffffffffffffff7 (err -9)
Aug 16 03:10:13 hyper-7 kernel: [265397.421372] Pid: 9375, comm: qemu–kvm Not tainted 2.6.32-754.2.1.el6.x86_64 #1
Aug 16 03:10:13 hyper-7 kernel: [265397.464985] Call Trace:
Aug 16 03:10:13 hyper-7 kernel: [265397.481530] [
Aug 16 03:10:13 hyper-7 kernel: [265397.520737] [
Aug 16 03:10:13 hyper-7 kernel: [265397.560028] [
Aug 16 03:10:14 hyper-7 kernel: [265397.600072] [
Aug 16 03:10:14 hyper-7 kernel: [265397.638183] [
Aug 16 03:10:14 hyper-7 kernel: [265397.674367] [
Aug 16 03:10:14 hyper-7 kernel: [265397.708101] [
Aug 16 03:10:14 hyper-7 kernel: [265397.739124] [
Aug 16 03:10:14 hyper-7 kernel: [265397.773193] [
Aug 16 03:10:14 hyper-7 kernel: [265397.805774] [
Aug 16 03:10:14 hyper-7 kernel: [265397.839147] [
Aug 16 03:10:14 hyper-7 kernel: [265397.870109] [
Aug 16 03:10:14 hyper-7 kernel: [265397.909358] [
Curiously, the messages don’t seem to indicate anything fatal in and of themselves; there are a two like this a minute after bootup and like a dozen more after about a day, none of which seems to have crashed anything. However, it’s the only obvious anomaly I could find around the time and as it’s VT-x related, I reckon there’s a connection.
The stack trace closely resembles this bug that turned up in 2015 and was fixed long ago: https://lkml.org/lkml/2015/7/3/288
Has anyone seen this recently and could confirm or refute any of my guesswork?
Cheers, Matthias
One thought on - Old Kernel Bug Back In CentOS 6.10?
Yes, I am also seeing https://patchwork.kernel.org/patch/1720851/ hitting agaon on a Westmere Xeon X5672