Got an older server here, running CentOS 6.6 (64-bit). Suddenly, at
0-dark-30 yesterday morning, we had failures to connect.
After several tries to reboot and get working, I tried yum update, and that failed, complaining of an python krb5 error. With more investigation, I discovered that logins were failing as there was a problem with pam;
this turned out to be it couldn’t open /lib64/security/pam_permit.so. The reason for that was that it was a broken symlink, pointing to a file in the same directory, that actually existed in the /lib64. Checking other systems, I found it should, in fact, be a file, not a symlink.
At this point, the system was considered suspect. I brought the system down, replaced the root drive, and rebuilt. I was not able to build it as CentOS 7, as something in the older hardware broke the install. CentOS 6
built successfully, and the server was returned to service.
I then loaded the drive in another server, and examined it. fsck reported both / and /boot were clean, but when I redid this with fask -c, to check for bad blocks, it found many multiply-claimed blocks.
First question: anyone have an idea why it showed as clean, until I
checked for bad blocks? Would that just be because I’d gracefully shut down the original server, and it mounted ok on the other server?
Mounting it on /mnt, I found no driver errors being reported in the logs, nor anything happening, including logons, before an automated contact from another server, which failed. AND I checked our loghost, and nothing odd shows there, neither in message nor in secure.
At this point, I *think* it’s filesystem corruption, rather than a compromised system, but I’d really like to hear anyone’s thoughts on this.