Xfsaild Causing Load, And Stuck In D State – CentOS 6.4 X64

Home » CentOS » Xfsaild Causing Load, And Stuck In D State – CentOS 6.4 X64
CentOS 5 Comments

PowerEdge 2850 with PERC SCSI RAID controller.
16GB RAM
Dual Intel(R) Xeon(TM) CPU 3.00GHz Using megaraid driver and xfs on 3x 300GB SCSI disks as /dev/sdb (total 572GB) (RAID-5)

Stock CentOS 6.4 x64, everything updated. Kernel : 2.6.32-358.2.1.el6.x86_64 #1 SMP

My Load always shows :

top – 10:30:21 up 23:09, 1 users, load average: 0.99, 0.96, 0.79

yet there is no services what-so-ever doing anything.

Using PowerTOP I can see that xfsaild is causing second most wake-ups (after kernel core : hrtimer_start (tick_sched_timer) )

< Detailed C-state information is not P-states (frequencies) 3.00 Ghz 100.0% 1500 Mhz 0.0% 1125 Mhz 0.0% 750 Mhz 0.0% 375 Mhz 0.0% Wakeups-from-idle per second : 125.1 interval: 10.0s no ACPI power usage estimate available Top causes for wake-ups: 47.7% (119.6) : hrtimer_start (tick_sched_timer)
39.9% (100.0) xfsaild/sdb1 : xfsaild (process_timeout)

Again, using TOP I can see that xfsaild is stuck in D state. It never changes.

10050 root 20 0 0 0 0 D 0.0 0.0 0:01.17 xfsaild/sdb1

My only way to fix this, is either rebooting the machine, unmounting the volume and mounting it back online. I’ve been rsyncing data twice between this server and another one. The rsync process takes about 30m-1hour. After the rsync operation I see that the xfsaild is stuck in D State and my Load is near 1.00.

I’v had no problem what-so-ever on CentOS 6.3 or early versions on other servers.

Any thoughts ?
Information, help would be much appreciated.

Thanks in advance.

Best regards,

Svavar

5 thoughts on - Xfsaild Causing Load, And Stuck In D State – CentOS 6.4 X64

  • It’s a kernel bug and can safely be ignored.

    —– Original Message —–
    | PowerEdge 2850 with PERC SCSI RAID controller.
    | 16GB RAM
    | Dual Intel(R) Xeon(TM) CPU 3.00GHz
    | Using megaraid driver and xfs on 3x 300GB SCSI disks as /dev/sdb
    | (total 572GB) (RAID-5)
    |
    | Stock CentOS 6.4 x64, everything updated.
    | Kernel : 2.6.32-358.2.1.el6.x86_64 #1 SMP
    |
    | My Load always shows :
    |
    | top – 10:30:21 up 23:09, 1 users, load average: 0.99, 0.96, 0.79
    |
    | yet there is no services what-so-ever doing anything.
    |
    | Using PowerTOP I can see that xfsaild is causing second most wake-ups
    | (after kernel core : hrtimer_start (tick_sched_timer) )
    |
    |
    | < Detailed C-state information is not P-states (frequencies) | 3.00 Ghz 100.0% | 1500 Mhz 0.0% | 1125 Mhz 0.0% | 750 Mhz 0.0% | 375 Mhz 0.0% | | Wakeups-from-idle per second : 125.1 interval: 10.0s | no ACPI power usage estimate available | | Top causes for wake-ups: | 47.7% (119.6) : hrtimer_start (tick_sched_timer)
    | 39.9% (100.0) xfsaild/sdb1 : xfsaild (process_timeout)
    |
    |
    | Again, using TOP I can see that xfsaild is stuck in D state. It never
    | changes.
    |
    |
    | 10050 root 20 0 0 0 0 D 0.0 0.0 0:01.17
    | xfsaild/sdb1
    |
    |
    | My only way to fix this, is either rebooting the machine, unmounting
    | the volume
    | and mounting it back online. I’ve been rsyncing data twice between
    | this server and another
    | one. The rsync process takes about 30m-1hour. After the rsync
    | operation I see that
    | the xfsaild is stuck in D State and my Load is near 1.00.
    |
    |
    | I’v had no problem what-so-ever on CentOS 6.3 or early versions on
    | other servers.
    |
    | Any thoughts ?
    | Information, help would be much appreciated.
    |
    | Thanks in advance.
    |
    | Best regards,
    |
    | Svavar
    |
    |
    |
    |

  • James A. Peltier [jpeltier@sfu.ca] wrote:

    Unfortunately it causes problems on systems that use load average as a metric – so can’t be ignored by everyone :-)

    James Pearson

  • Indeed. We have the same issue. We stuck to a previous kernel release that we know doesn’t have the bug to work around it. Each maintenance window we revisit this until we’re sure the bug is solved at which point we will remove the workaround. I really feel for ya and was just pointing out that other than the load average because it’s spinning it’s wheels that there is no fallout from it. ;)

    —– Original Message —–
    | James A. Peltier [jpeltier@sfu.ca] wrote:
    | >
    | > It’s a kernel bug and can safely be ignored.
    |
    | Unfortunately it causes problems on systems that use load average as
    | a metric – so can’t be ignored by everyone :-)
    |
    | James Pearson
    |
    |
    |

LEAVE A COMMENT