CentOS-6.8 Fsck Report Maximal Count

Home » CentOS » CentOS-6.8 Fsck Report Maximal Count
CentOS 11 Comments

We have a remote warm standby system running CentOS-6.8 as a KVM
system with multiple guests. One of the guests began reporting an error when running aide.

Caught SIGBUS/SEGV while mmapping. File was truncated while aide was running?
Caught SIGBUS/SEGV. Exiting

The /var/log/messages file contained this:
Mar 9 09:14:13 inet12 kernel: end_request: I/O error, dev vda, sector
14539264
Mar 9 09:14:31 inet12 kernel: end_request: I/O error, dev vda, sector
14539296
Mar 9 09:14:48 inet12 kernel: end_request: I/O error, dev vda, sector
14539296

df Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg_inet02-lv_root
7932336 2262672 5260064 31% /
tmpfs 961044 0 961044 0% /dev/shm
/dev/vda1 487652 139473 322579 31% /boot
. . .

This indicated that a bad sector on the underlying disk system might be the source of the problem. The guests were all shutdown, a
/forcefsck file was created on the host system, and the host system remotely restarted.

However, this action did not remove the error. The host system log files had this to say about fsck:

/var/log/messages:Mar 9 08:34:48 vhost03 kernel: EXT4-fs (dm-6):
warning: maximal mount count reached, running e2fsck is recommended

in /dev I see this:
brw-rw—-. 1 root disk 253, 6 Mar 9 08:34 dm-6

But, this device has nothing whatsoever to do with the kvm guests:

ll /dev/vg_vhost03/ | grep dm-6
lrwxrwxrwx. 1 root root 7 Mar 9 08:34 lv_CentOS_repos -> ../dm-6

Rather this is an lv devoted to holding CentOS ISOs:

/dev/mapper/vg_vhost03-lv_CentOS_repos
101016992 77160124 18718848 81% /var/data/CentOS

So, my questions are:

1. How do I fix the problem with the guest system that Aide is stumbling over?

2. How do I get the fsck issue with dm-6 resolved?

11 thoughts on - CentOS-6.8 Fsck Report Maximal Count

  • fsck’s not good at finding disk errors, it finds filesystem errors.

    If it was a real disk issue, you’d expect matching errors in the host logs. Did you?

    Unmount it and run fsck on it, and that message would go away. But I’d not worry about that one.

    jh

  • If not fsck then what?

    Yes, there are:

    Mar 9 09:14:13 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063
    Mar 9 09:14:30 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063
    Mar 9 09:14:48 vhost03 kernel: end_request: I/O error, dev sda, sector 1236929063

    I am running an extended SMART test on the drive at the moment. I
    suspect that the drive is probably at its EOL for practical purposes. So likely we will be looking at an equipment upgrade given the age of the rest of the equipment.

    In the meantime what steps, if any, should I take to remediate this problem?

  • James B. Byrne wrote:
    fsck run with -c, which forces badblocks to run. Or you can run that directly.

    Looks like only one sector’s bad. Running badblocks should, I think, mark that sector as bad, so the system doesn’t try to read or write there. I’ve got a user whose workstation has had a bad sector running for over a year. However, if it becomes two, or four, or 64 sectors, it’s replacement time, asap.

    mark

  • And I definitely will unmount relevant filesystem(s) before using badblocks…

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • You don’t necessarily have to. The default mode of badblocks is a non-invasive read-only test which is safe to run on a mounted filesystem.

    That said, a read-only badblocks pass can give a false “no errors” report in cases where a non-destructive read-then-write pass (-n) will show errors.

    Alternatively, a read-only pass may show an error that a read-then-write pass will silently bury by forcing the drive to relocate the bad sector.

    In extreme cases, you could potentially fix a problem with a read-random-random-write pass (-n -t random -t random) because that will statistically flip all the bits at least twice, which may rub the drive’s nose in a bad sector, forcing a reallocation where a normal read-then-write pass (-n alone) may not.

    Hard drives are weird. It is only through the grace of ECC and such that they approximate deterministic behavior as well as they do.

  • I get up around 0630, u can come anytime after that. I want to hit the range that morning but if I
    KNEW when you are arriving, I could plan around that…

  • Bear with me on this. The last time I did anything like this I ended up having to boot into recovery mode from an install cd and do this by hand. This is not an option in the present circumstance as the unit is a headless server in a remote location.

    If I do this:

    echo ‘-c’ > /fsckoptions touch /forcefsck shutdown -r now

    Will this repair the bad block and bring the system back up? If not then what other options should I use?

    The bad block is located in an LV assigned to a libvirt pool associated with a single vm. Can this be checked and corrected without having to deal with the base system? If so then how?

    Regards,

  • You’ll need to search the smartmontools site for their doc on bad sectors. There’s a how to, to find what file is affected by the bad sector so you can replace it. That’s the only way to fix the problem.

    This gets tricky going through LVM.

    Chris Murphy

  • After booting from USB into single-user mode and dd’ing all readable blocks, multiple passes as I then had to “skip=” to start with next good blocks, I ran the manufacurers diag/repair software and had good results.

    YMMV

    HTH, Bill