Lost Connection During Yum Update

Home » CentOS » Lost Connection During Yum Update
CentOS 24 Comments

During today’s big CentOS 6 update I lost my connection to a machine during the
“yum update” and when I logged back in and ran yum update again it told me to run yum-complete-transaction. When I ran yum-complete-transaction I got screen after screen of “x is a duplicate with x” where x consists of a huge list of packages.

“package-cleanup –dupes” gives me a huge list of packages.

I think that my next step here should be “package-cleanup –cleandupes” but when I do that it tells me that it will remove 800-plus mb of files. I suppose it’s the the same list that I get with “package-cleanup –dupes”.

Do I want to do this or will that nuke the operating system? If not, what should I be doing?

24 thoughts on - Lost Connection During Yum Update

  • Well, this is interesting. I have three systems, all of which now have the same problem.

    I was running “yum update” on these machines via a VNC connection (running a vnc desktop on one of them, and logging into the others with a a gnome-terminal on my VNC desktop), when my VNC desktop suddenly “went away” for some reason. And that killed the “yum update” jobs on the computers.

    Subsequent to that, I logged back into the machines and ran yum update again. It told me to run yum-complete-transaction. When I ran yum-complete-transaction I
    got screen after screen of “x is a duplicate with x” where x consists of a huge list of packages.

    I then ran “package-cleanup –cleandupes” and then ran “yum update” again and all appeared to be well. “yum update” completed without error and I thought I
    was home free.

    I then rebooted the machines and found out that I’m still out of luck. After the initial grub screen I get this:

    Kernel panic – not syncing: VFS: unable to mount root fs on unknown-block(0,0)
    PID: 1,comm: swapper not tainted 2.6.32-358.0.1.el6.i686 #1
    Call trace

    Followed by a series of numbers that I can post if they’re needed.

    I booted one of these machines off of a CentOS 6.4 “minimal” CD and ran the rescue mode. It mounted the drive under /mnt/sysimage with no problem. I
    can see everything on it that I expect to see.

    I then booted the CD again and tried running the “upgrade an existing system”
    option, and told it to reinstall the bootloader. That’s about all that it appeared to do: “Installing bootloader”, then it told me to reboot. Which I did.

    And I got the same kernel panic again that I just posted above.

    What has gone wrong here and how can I fix it? All of the data seems to be on the drive just like it should be, but it won’t boot up.

    Again, I have three systems that appear to have exactly the same problem.

  • Try chroot from minimal CD onto the systems and use “yum history” to see what happened and “yum history undo

  • For whatever it’s worth – I yum update’d two VMs without any trouble whatsoever (from 6.3 to 6.4) and am in the process of updating a laptop… Not that it should matter but they are both guests running on a CentOS 5.9
    Xen host.

    I’m in the process of updating a laptop – I’m hoping it works too…

    Scot P. Floess RHCT (Certificate Number 605010084735240)
    Chief Architect FlossWare http://sourceforge.net/projects/flossware
    http://flossware.sourceforge.net
    https://github.com/organizations/FlossWare

  • First off … from now on run yum updates inside a “screen” session. Install with:

    yum install screen

    Here is info on sreen:

    http://library.linode.com/linux-tools/utilities/screen

    At this point you will need to clean up before you install screen. You might first try the utility yum-complete-transaction:

    http://www.vmadmin.co.uk/linux/44-redhat/209-linuxyumcomptrans

    As soon as you get the transaction completed, install screen and ALWAYS
    do yum updates in a screen or VNC session so that the transaction will complete if you drop the connection.

  • I was doing it through VNC, thinking that would be more-or-less equivalent to screen, which it apparently isn’t. Somehow my VNC session (desktop) just disappeared in the middle of the job, while I was running “yum update” on the remote host machine and two other computers. Perhaps the “yum update” that was running on the remote host machine killed VNC — in hindsight perhaps I
    shouldn’t have done that.

    My google searching leads me to suspect that initramfs may be missing on those computers. If that is the case (which I will verify later this afternoon) then I’m thinking that perhaps chrooting to the hard drive followed by a simple yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1 will fix it.

    It’s funny that all three of them died in the same way, though I guess they were all at about the same stage in the update process when my VNC session disappeared.

    Running “yum-complete-transaction”, followed by “package-cleanup –cleandupes”, followed by “yum update” seems to have put everything back the way that it should be, with the exception of whatever it is that prevents the machine from booting.

  • What most likely happened:

    The “yum update” that was running in your lost VNC session was in all likelihood still running.

    If you had done a ‘ps -ef | grep yum’ you would probably have seen that yum update was still running.

    And then it looks like you logged back in to a new session and began running other yum commands before the original “yum update” had completed.

    So now you have a mess that may not be easy to untangle.

    It may be easier to restore from backup and then attempt to do the update again.

  • If yum was indeed still running, it wasn’t using any significant CPU. I did run top in my login terminal to see if anything significant was going on and yum didn’t show up on the list.

    When I attempted to re-connect to VNCserver after that, I was told “connection refused”, and “service VNCserver start” cranked up another session for me without any errors.

    I think VNCserver just altogether crashed for some reason, probably related to the yum update that I was running on that machine at the time. I suppose the lesson learned here is to always update the host machine from a screen session running in a plain terminal, not through a VNC session.

    Perhaps, but since everything seems to still be in place on those hard drives, and since my last “yum update” completed without any errors being reported, I
    suspect (hope?) that everything is still ok with the exception of whatever is causing the machines to fail to boot.

  • The reason I said yum update was still running was because I’ve had this exact scenario occur before.

    VNC died during yum update and when I got back in I could see that yum update was still running.

    I just waited until it finished.

    I hope it is only your initramfs. If that isn’t it, for me I would just restore and rerun the update. Much less time involved.

  • It’s looking more and more like a full nuke-and-pave is going to be the answer here.

    As I suspected, initramfs-2.6.32-358.0.1 was missing in /boot. Unfortunately, none of the other installed kernels boot either — everything gives me a kernel panic.

    I did a yum remove kernel-2.6.32-358.0.1 and yum install kernel-2.6.32-358.0.1
    and the whole transaction appeared to be successful.

    That got me initramfs-2.6.32-358.0.1 back in /boot, but I still get a kernel panic when I reboot the machine. The initial rhgb screen comes up and the little circle thing cranks for a minute or so, but then I get “kernel panic:
    attempted to kill init!”. Booting without rhgb gives me a cursor in the top left corner for a minute, followed by “kernel panic: attemtped to kill init!”. The last time /var/log/boot.log was written to was the last time the machine was rebooted prior to this whole episode (i.e. a few weeks ago) so there is absolutely no error message or log information available other than the kernel panic message on the screen.

    Damn, I hate the idea of having to set all of these machines up again from scratch. Two of them aren’t much to re-do, but the third one is the office workhorse machine that does everything from dhcp server to nfs server to print server to you-name-it.

  • I booted the “CentOS 6.4 minimal iso”, told it to “upgrade an existing installation”, and to install the bootloader. About all that it appeared to do was install the bootloader. Unfortunately, the machine still didn’t boot.

    The bootloader seems to be fine — grub itself boots up. I get a kernel panic after that, when you normally see the messages about unpacking vmlinuz and so on. I just get a blank black screen with a flashing cursor (or the rhgb screen with the spinning doodad, depending on the grub setting) and then a kernel panic.

  • I just had a thought: Is it possible to just reformat and reinstall the /boot partition? I wonder if that would solve the problem….

  • It seems like maybe it cannot find the root filesystem.

    Kernel panics just like this when it cannot find it.

    .

  • Interesting. How can I check that? I have another almost-identical system that’s still working and I compared grub.conf between the two of them and didn’t notice any significant differences. Nothing that immediately jumped up and down and screamed “problem here!” at least.

    What should I be looking for?

  • Boot to rescue mode and see if you can mount the device containing the root filesystem readonly and see all the files on it.

    Then check that the kernel root option is looking at the same device.

  • I can indeed see all of the files on that computer, including the boot directory and everything under /

    I don’t know what to do from that point, though.

    Here is the grub.conf from the working system, which is pretty much identical to one of the non-working systems. I assume that you mean I need to do something to change and/or fix the root= portion of the kernel commandline, but how do I find out what to change it to?

    default=0
    timeout=5
    splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.32-358.0.1.el6.i686)
    root (hd0,0)
    kernel /vmlinuz-2.6.32-358.0.1.el6.i686 ro root=/dev/mapper/vg_ws195-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_ws195/lv_swap rhgb crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg_ws195/lv_root rd_NO_DM
    initrd /initramfs-2.6.32-358.0.1.el6.i686.img

  • Do you know if this grub file was rewritten?

    Can you check it against a backup copy?

    Other than that I’ve given you my best suggestions.

    .

  • I don’t have a backup copy of the grub.conf file since it’s always been automatically managed and updated by grub and friends and I’ve never really had to pay much attention to it.

    I did compare it between the non-working and the working machines and didn’t see anything that struck me as a significant difference.

    The most maddening part of this is that all of the files and the filesystems appear to be present — I can boot off of a rescue CD and mount the whole works under /mnt/sysimage and browse to my hearts content. I just can’t boot the damn thing.

    How is a name like /dev/mapper/vg_ws195-lv_root rd_NO_LUKS determined? If I
    knew how to read or find out what the actual name of the root directory was on the problem machines, I could compare it to what’s in the grub.conf file.

  • I don’t have any idea how to debug LVM stuff. But if you can boot in rescue mode just on general principles I would chroot into
    /mnt/sysimage, rebuild the initrd and reinstall grub.

  • Les Mikesell wrote:
    mount the just can’t boot directory

    rd_NO_LUKS says that there are no encrypted filesystems. We *strongly*
    prefer to label our filesystems.

    Finally, if you can see it running via linux rescue, I’d go with Les’
    thought: boot that way, chroot to /mnt/sysimage, and first do a grub-install. If that doesn’t solve it, then try the rebuild of initrd.

    Oh, and check /mnt/sysimage/etc/fstab

    mark

  • Is there a simple way to tell yum to re-install the current kernel?
    If you can do that from the rescue chroot the rpm scripts should rebuild the initrd for you – and maybe that step was interrupted in the earlier update attempt.

LEAVE A COMMENT