Corruption Of In-memory Data Detected (xfs)

Home » CentOS » Corruption Of In-memory Data Detected (xfs)

July 1, 2014 Alexandru Cardaniuc CentOS 18 Comments

Hi All,

I am having an issue with an XFS filesystem shutting down under high load with very many small files. Basically, I have around 3.5 – 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average).

at some point I get the following in dmesg:

[2870477.695512] Filesystem “sda5”: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.

18 thoughts on - Corruption Of In-memory Data Detected (xfs)

James A. says:

July 1, 2014 at 1:28 pm

—– Original Message —–
|
| Hi All,
|
| I am having an issue with an XFS filesystem shutting down under high
| load with very many small files.
| Basically, I have around 3.5 – 4 million files on this filesystem.
| New files are being written to the FS all the
| time, until I get to 9-11 mln small files (35k on average).
|
| at some point I get the following in dmesg:
|
| [2870477.695512] Filesystem “sda5”: XFS internal error
| xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
| Caller 0xffffffff8826bb7d
| [2870477.695558]
| [2870477.695559] Call Trace:
| [2870477.695611] []
| :xfs:xfs_trans_cancel+0x5b/0xfe
| [2870477.695643] [] :xfs:xfs_mkdir+0x57c/0x5d7
| [2870477.695673] [] :xfs:xfs_attr_get+0xbf/0xd2
| [2870477.695707] [] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| [2870477.695726] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695736] [] __up_read+0x19/0x7f
| [2870477.695764] [] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695776] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695784] [] __up_read+0x19/0x7f
| [2870477.695791] [] __d_lookup+0xb0/0xff
| [2870477.695803] [] _atomic_dec_and_lock+0x39/0x57
| [2870477.695814] [] mntput_no_expire+0x19/0x89
| [2870477.695829] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695837] [] __up_read+0x19/0x7f
| [2870477.695861] [] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695887] [] :xfs:xfs_access+0x3d/0x46
| [2870477.695899] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695923] [] vfs_mkdir+0xe3/0x152
| [2870477.695933] [] sys_mkdirat+0xa3/0xe4
| [2870477.695953] [] tracesys+0x47/0xb6
| [2870477.695963] [] tracesys+0xab/0xb6
| [2870477.695977]
| [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
| of file fs/xfs/xfs_trans.c. Return address =
| 0xffffffff88262c46
| [2870477.696452] Filesystem “sda5”: Corruption of in-memory data
| detected. Shutting down filesystem: sda5
| [2870477.696464] Please umount the filesystem, and rectify the
| problem(s)
|
| # ls -l /store
| ls: /store: Input/output error
| ?——— 0 root root 0 Jan 1 1970 /store
|
| Filesystems is ~1T in size
| # df -hT /store
| Filesystem    Type    Size Used Avail Use% Mounted on
| /dev/sda5      xfs    910G 142G 769G 16% /store
|
|
| Using CentOS 5.9 with kernel 2.6.18-348.el5xen
|
|
| The filesystem is in a virtual machine (Xen) and on top of LVM.
|
| Filesystem was created using mkfs.xfs defaults with
| xfsprogs-2.9.4-1.el5.CentOS (that’s the one that comes with
| CentOS 5.x by default.)
|
| These are the defaults with which the filesystem was created:
| # xfs_info /store
| meta-data=/dev/sda5              isize=256    agcount=32,
| agsize=7454720 blks
|          =                       sectsz=512   attr=0
| data     =                       bsize=4096   blocks=238551040,
| imaxpct=25
|          =                       sunit=0      swidth=0 blks,
|          unwritten=1
| naming   =version 2              bsize=4096
| log      =internal               bsize=4096   blocks=32768, version=1
|          =                       sectsz=512   sunit=0 blks,
|          lazy-count=0
| realtime =none                   extsz=4096   blocks=0, rtextents=0
|
| The problem is reproducible and I don’t think it’s hardware related.
| The problem was reproduced on multiple
| servers of the same type. So, I doubt it’s a memory issue or
| something like that.
|
| Is that a known issue? If it is then what’s the fix? I went through
| the kernel updates for CentOS 5.10 (newer
| kernel), but didn’t see any xfs related fixes since CentOS 5.9
|
| Any help will be greatly appreciated…
|
|
| —
| “If we really understand the problem, the answer will come out of it,
| because the answer is not separate from the problem.”
| – Krishnamurti

Is this filesystem mounted with the inode64 option?

—
James A. Peltier Manager, IT Services – Research Computing Group Simon Fraser University – Burnaby Campus Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.
James A. says:

July 1, 2014 at 1:32 pm

—– Original Message —–
|
| Hi All,
|
| I am having an issue with an XFS filesystem shutting down under high
| load with very many small files.
| Basically, I have around 3.5 – 4 million files on this filesystem.
| New files are being written to the FS all the
| time, until I get to 9-11 mln small files (35k on average).
|
| at some point I get the following in dmesg:
|
| [2870477.695512] Filesystem “sda5”: XFS internal error
| xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c.
| Caller 0xffffffff8826bb7d
| [2870477.695558]
| [2870477.695559] Call Trace:
| [2870477.695611] []
| :xfs:xfs_trans_cancel+0x5b/0xfe
| [2870477.695643] [] :xfs:xfs_mkdir+0x57c/0x5d7
| [2870477.695673] [] :xfs:xfs_attr_get+0xbf/0xd2
| [2870477.695707] [] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| [2870477.695726] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695736] [] __up_read+0x19/0x7f
| [2870477.695764] [] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695776] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695784] [] __up_read+0x19/0x7f
| [2870477.695791] [] __d_lookup+0xb0/0xff
| [2870477.695803] [] _atomic_dec_and_lock+0x39/0x57
| [2870477.695814] [] mntput_no_expire+0x19/0x89
| [2870477.695829] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695837] [] __up_read+0x19/0x7f
| [2870477.695861] [] :xfs:xfs_iunlock+0x57/0x79
| [2870477.695887] [] :xfs:xfs_access+0x3d/0x46
| [2870477.695899] [] _spin_lock_irqsave+0x9/0x14
| [2870477.695923] [] vfs_mkdir+0xe3/0x152
| [2870477.695933] [] sys_mkdirat+0xa3/0xe4
| [2870477.695953] [] tracesys+0x47/0xb6
| [2870477.695963] [] tracesys+0xab/0xb6
| [2870477.695977]
| [2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139
| of file fs/xfs/xfs_trans.c. Return address =
| 0xffffffff88262c46
| [2870477.696452] Filesystem “sda5”: Corruption of in-memory data
| detected. Shutting down filesystem: sda5
| [2870477.696464] Please umount the filesystem, and rectify the
| problem(s)
|
| # ls -l /store
| ls: /store: Input/output error
| ?——— 0 root root 0 Jan 1 1970 /store
|
| Filesystems is ~1T in size
| # df -hT /store
| Filesystem    Type    Size Used Avail Use% Mounted on
| /dev/sda5      xfs    910G 142G 769G 16% /store
|
|
| Using CentOS 5.9 with kernel 2.6.18-348.el5xen
|
|
| The filesystem is in a virtual machine (Xen) and on top of LVM.
|
| Filesystem was created using mkfs.xfs defaults with
| xfsprogs-2.9.4-1.el5.CentOS (that’s the one that comes with
| CentOS 5.x by default.)
|
| These are the defaults with which the filesystem was created:
| # xfs_info /store
| meta-data=/dev/sda5              isize=256    agcount=32,
| agsize=7454720 blks
|          =                       sectsz=512   attr=0
| data     =                       bsize=4096   blocks=238551040,
| imaxpct=25
|          =                       sunit=0      swidth=0 blks,
|          unwritten=1
| naming   =version 2              bsize=4096
| log      =internal               bsize=4096   blocks=32768, version=1
|          =                       sectsz=512   sunit=0 blks,
|          lazy-count=0
| realtime =none                   extsz=4096   blocks=0, rtextents=0
|
| The problem is reproducible and I don’t think it’s hardware related.
| The problem was reproduced on multiple
| servers of the same type. So, I doubt it’s a memory issue or
| something like that.
|
| Is that a known issue? If it is then what’s the fix? I went through
| the kernel updates for CentOS 5.10 (newer
| kernel), but didn’t see any xfs related fixes since CentOS 5.9
|
| Any help will be greatly appreciated…
|
|
| —
| “If we really understand the problem, the answer will come out of it,
| because the answer is not separate from the problem.”
| – Krishnamurti

Sorry, further to this, most bugs related to XFS are related to kernel bugs. I can see that you’re running an older kernel and just because you don’t see the bugs listed in the errata doesn’t mean the bugs haven’t been found as part of the backport process

—
James A. Peltier Manager, IT Services – Research Computing Group Simon Fraser University – Burnaby Campus Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.
Alexandru Cardaniuc says:

July 1, 2014 at 3:07 pm

“James A. Peltier” writes:

No, since FS is slightly smaller than 1T in size. From my understanding inode64 would be required for XFS filesystems larger than 1T?

—
“The man who has gotten everything he wants is all in favor of peace and order.”
– Jawaharlal Nehru
Alexandru Cardaniuc says:

July 1, 2014 at 3:09 pm

“James A. Peltier” writes:

So, you suggest I try my luck with the newer kernel from CentOS 5.10?

What’s the proper way to open a bug for this against CentOS 5 / RHEL 5?

—
“Individual rights are not subject to a public vote; a majority has no right to vote away the rights of a minority; the political function of rights is precisely to protect minorities from oppression by majorities (and the smallest minority on earth is the individual).”
– Ayn Rand
Frank Cox says:

July 1, 2014 at 4:02 pm

If you try it with the latest kernel and it works, then I don’t think there is any bug to file.
Jitse Klomp says:

July 1, 2014 at 4:08 pm

there is any bug to file.

Have you seen this: http://marc.info/?l=linux-kernel&m6476406605998&w=2

It might not even be a bug but a hardware issue…

– Jitse
Eliezer Croitoru says:

July 1, 2014 at 9:25 pm

I had similar issue:
A nfs server with XFS as the FS for backup of a very large system. I have a 2TB raid-1 volume and I started rsync the backup and then somewhere I got this issue. There were lots of files there and the system has 8GB of ram and CentOS
6.5 64bit. I didn’t bother to look at the issue due to the fact that ReiserFS was just OK with it without any issues.

I never new about the inode64 option, is it only on the mount options or also on the mkfs.xfs command?

Also in a case I want to test it again what would be a recommendation to not crash the system when there is lot’s of memory in use?

Thanks, Eliezer
James A. says:

July 1, 2014 at 11:36 pm

—– Original Message —–
| “James A. Peltier” writes:
|
| > | I am having an issue with an XFS filesystem shutting down under
| > | high
| > | load with very many small files. Basically, I have around 3.5 – 4
| > | million files on this filesystem. New files are being written to
| > | the
| > | FS all the time, until I get to 9-11 mln small files (35k on
| > | average).
| > |
| > | at some point I get the following in dmesg:
| > |
| > | [2870477.695512] Filesystem “sda5”: XFS internal error
| > | xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller
| > | 0xffffffff8826bb7d [2870477.695558] [2870477.695559] Call Trace:
| > | [2870477.695611] []
| > | :xfs:xfs_trans_cancel+0x5b/0xfe [2870477.695643]
| > | [] :xfs:xfs_mkdir+0x57c/0x5d7 [2870477.695673]
| > | [] :xfs:xfs_attr_get+0xbf/0xd2 [2870477.695707]
| > | [] :xfs:xfs_vn_mknod+0x1e1/0x3bb
| > | [2870477.695726]
| > | [] _spin_lock_irqsave+0x9/0x14 [2870477.695736]
| > | [] __up_read+0x19/0x7f [2870477.695764]
| > | [] :xfs:xfs_iunlock+0x57/0x79 [2870477.695776]
| > | [] _spin_lock_irqsave+0x9/0x14 [2870477.695784]
| > | [] __up_read+0x19/0x7f [2870477.695791]
| > | [] __d_lookup+0xb0/0xff [2870477.695803]
| > | [] _atomic_dec_and_lock+0x39/0x57
| > | [2870477.695814] [] mntput_no_expire+0x19/0x89
| > | [2870477.695829] []
| > | _spin_lock_irqsave+0x9/0x14
| > | [2870477.695837] [] __up_read+0x19/0x7f
| > | [2870477.695861] [] :xfs:xfs_iunlock+0x57/0x79
| > | [2870477.695887] [] :xfs:xfs_access+0x3d/0x46
| > | [2870477.695899] []
| > | _spin_lock_irqsave+0x9/0x14
| > | [2870477.695923] [] vfs_mkdir+0xe3/0x152
| > | [2870477.695933] [] sys_mkdirat+0xa3/0xe4
| > | [2870477.695953] [] tracesys+0x47/0xb6
| > | [2870477.695963] [] tracesys+0xab/0xb6
| > | [2870477.695977] [2870477.695985] xfs_force_shutdown(sda5,0x8)
| > | called from line 1139 of file fs/xfs/xfs_trans.c. Return address
| > | =
| > | 0xffffffff88262c46 [2870477.696452] Filesystem “sda5”: Corruption
| > | of
| > | in-memory data detected. Shutting down filesystem: sda5
| > | [2870477.696464] Please umount the filesystem, and rectify the
| > | problem(s)
| > |
| > | # ls -l /store ls: /store: Input/output error ?——— 0 root
| > | root
| > | 0 Jan 1 1970 /store
| > |
| > | Filesystems is ~1T in size # df -hT /store Filesystem    Type
| > | Size Used Avail Use% Mounted on /dev/sda5      xfs    910G 142G
| > | 769G 16% /store
| > |
| > |
| > | Using CentOS 5.9 with kernel 2.6.18-348.el5xen
| > |
| > |
| > | The filesystem is in a virtual machine (Xen) and on top of LVM.
| > |
| > | Filesystem was created using mkfs.xfs defaults with
| > | xfsprogs-2.9.4-1.el5.CentOS (that’s the one that comes with
| > | CentOS
| > | 5.x by default.)
| > |
| > | These are the defaults with which the filesystem was created: #
| > | xfs_info /store meta-data=/dev/sda5              isize=256
| > | agcount=32, agsize=7454720 blks          =
| > | sectsz=512   attr=0 data     =                       bsize=4096
| > | blocks=238551040, imaxpct=25          =
| > | sunit=0      swidth=0 blks,          unwritten=1 naming
| > |    =version
| > | 2              bsize=4096 log      =internal
| > | bsize=4096   blocks=32768, version=1
| > | =                       sectsz=512   sunit=0 blks,
| > | lazy-count=0 realtime =none                   extsz=4096
| > |    blocks=0,
| > | rtextents=0
| > |
| > | The problem is reproducible and I don’t think it’s hardware
| > | related.
| > | The problem was reproduced on multiple servers of the same type.
| > | So,
| > | I doubt it’s a memory issue or something like that.
| > |
| > | Is that a known issue? If it is then what’s the fix? I went
| > | through
| > | the kernel updates for CentOS 5.10 (newer kernel), but didn’t see
| > | any xfs related fixes since CentOS 5.9
| > |
| > | Any help will be greatly appreciated…
| > |
| > |
| > | — “If we really understand the problem, the answer will come out
| > | of
| > | it, because the answer is not separate from the problem.” –
| > | Krishnamurti
| >
| > Sorry, further to this, most bugs related to XFS are related to
| > kernel
| > bugs. I can see that you’re running an older kernel and just
| > because
| > you don’t see the bugs listed in the errata doesn’t mean the bugs
| > haven’t been found as part of the backport process
|
| So, you suggest I try my luck with the newer kernel from CentOS 5.10?
|
| What’s the proper way to open a bug for this against CentOS 5 / RHEL
| 5?

The recommendation is to always run the latest kernel before filing a bug. Looking at the stack trace it appears that this system is doing a lot of locking, IRQ and XFS/VFS. You’re probably looking too closely for something that is XFS specific rather than something that may be SCSI/FC related or VFS related. There have been seven CentOS 5 kernel updates since your currently running kernel, covering many facets of file systems, drivers and subsystems.

That said a way to possibly mitigate this may be to attempt to use the noatime mount option which may delay the problem.

—
James A. Peltier Manager, IT Services – Research Computing Group Simon Fraser University – Burnaby Campus Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices

To be original seek your inspiration from unexpected sources.
James A. says:

July 1, 2014 at 11:40 pm

—– Original Message —
John R says:

July 2, 2014 at 2:51 am

if you don’t use inode64, once the first 1TB is completely filled, it will have no more room for inodes.

I just noticed, the OP is running a large XFS system on EL 5 ? I didn’t think XFS was officially supported on 5, and was considered experimental. I would strongly urge installing CentOS 6.latest ASAP
and using that instead
James A. says:

July 2, 2014 at 4:18 am

—– Original Message —
Alexandru Cardaniuc says:

July 6, 2014 at 10:55 pm

Eliezer Croitoru writes:

My systems have 17G of RAM and 1T xfs partitions. I was under the impression that inode64 option only applies to FS larger than 1T in size?
James A. says:

July 6, 2014 at 11:00 pm

—– Original Message —
Alexandru Cardaniuc says:

July 6, 2014 at 11:09 pm

John R Pierce writes:

Yes, I run XFS on ~1T (900G) partition, so I don’t think I need to consider inode64 for that. What is the official situation with XFS and CentOS 5? It was in technology preview in CentOS 5.4 I think? How about now?
John R says:

July 6, 2014 at 11:51 pm

5 is very close to EOL now. I never considered XFS as anything other than a preview in 5, I don’t believe that was changed in the later updates, the only mention is in the 5.4 release notes, not 5.5-5.10.

I only use XFS on CentOS 6, where its very stable.
Johnny Hughes says:

July 7, 2014 at 12:14 pm

XFS official support was added to RHEL in 5.7, so therefore it is in our source code.

http://red.ht/TO1Qoo

Although, all that means is you get to ask on this list for help in CentOS. Any support on CentOS is what the community can provide you or that you can provide yourself.
James A. says:

July 7, 2014 at 12:41 pm

—– Original Message —
Markus Falb says:

July 7, 2014 at 1:25 pm

End of Production 3 (End of Production Phase) is on March 31 2017 [1]
That’s not that very close in my opinion.

And regarding xfs from the Release Notes of 5.7 [2]
“Usage of XFS in conjunction with Red Hat Enterprise Linux 5.7 High Availability Add-On/Clustering as a file system resource is now fully supported.”
Whatever that means.

[1] https://access.redhat.com/support/policy/updates/errata
[2] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/5.7_Release_Notes/filesystemstorage-management.html

Corruption Of In-memory Data Detected (xfs)

18 thoughts on - Corruption Of In-memory Data Detected (xfs)

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta