CentOS 6.6, Apparent Xfs Corruption
Hi all –
After several months of worry-free operation, we received the following kernel messages about an xfs filesystem running under CentOS 6.6. The proximate causes appear to be “Internal error xfs_trans_cancel” and
“Corruption of in-memory data detected. Shutting down filesystem”. The filesystem is back up, mounted, appears to be working OK underlying a Splunk datastore. Does anyone have a suggestion on diagnosis or known problems? Many thanks…..Nick Geo
Sep 18 20:35:15 gries kernel: XFS (dm-2): Internal error xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c. Caller 0xffffffffa01f1388
Sep 18 20:35:15 gries kernel:
Sep 18 20:35:15 gries kernel: Pid: 24005, comm: splunkd Not tainted
2.6.32-504.8.1.el6.x86_64 #1
Sep 18 20:35:15 gries kernel: Call Trace:
Sep 18 20:35:15 gries kernel: [
xfs_error_report+0x3f/0x50 [xfs]
Sep 18 20:35:15 gries kernel: [
[xfs]
Sep 18 20:35:15 gries kernel: [
xfs_trans_cancel+0xf5/0x120 [xfs]
Sep 18 20:35:15 gries kernel: [
[xfs]
Sep 18 20:35:15 gries kernel: [
Sep 18 20:35:15 gries kernel: [
xfs_vn_rename+0x66/0x70 [xfs]
Sep 18 20:35:15 gries kernel: [
Sep 18 20:35:15 gries kernel: [
sys_renameat+0x309/0x3a0
Sep 18 20:35:15 gries kernel: [
_atomic_dec_and_lock+0x55/0x80
Sep 18 20:35:15 gries kernel: [
mntput_no_expire+0x30/0x110
Sep 18 20:35:15 gries kernel: [
audit_syscall_entry+0x1d7/0x200
Sep 18 20:35:15 gries kernel: [
Sep 18 20:35:15 gries kernel: [
system_call_fastpath+0x16/0x1b Sep 18 20:35:15 gries kernel: XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 1949 of file fs/xfs/xfs_trans.c. Return address 0xffffffffa01f2e6e Sep 18 20:35:15 gries kernel: XFS (dm-2): Corruption of in-memory data detected. Shutting down filesystem Sep 18 20:35:15 gries kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Sep 18 20:35:27 gries kernel: XFS (dm-2): xfs_log_force: error 5 returned.
3 thoughts on - CentOS 6.6, Apparent Xfs Corruption
I think you need to read this from the bottom up:
“Corruption of in-memory data detected. Shutting down filesystem”
so XFS calls xfs_do_force_shutdown to shut down the filesystem. The call comes from fs/xfs/xfs_trans.c which fails, and so reports
“Internal error xfs_trans_cancel”.
In other words, I would look at the memory corruption first. This
_could_ be a kernel problem, but I would suggest starting with an extended memory check, it smells to me of a failing chip.
Just my 2d worth!
Martin
—–BEGIN PGP SIGNATURE—
—– Original Message —
James Peltier wrote:
nobarrier, etc?
None.
e?
There are 2 xfs filesystems:
/dev/mapper/vg_gries01-LogVol00 3144200 1000428 2143773 32% /opt/splunk
/dev/mapper/vg_gries00-LogVol00 307068 267001 40067 87%
/opt/splunk/hot
You’ll notice that the larger just crossed the 1TB boundary.
Thanks…..Nick Geovanis