LVM Hatred, Was /boot On A Separate Partition?

Home » CentOS » LVM Hatred, Was /boot On A Separate Partition?
CentOS 30 Comments

At the risk of some ridicule I suggest that you look at installing Webmin. It is a web based system administration tool that I find invaluable. The two most common complaints I encounter when I discuss its merits are ‘security’ and ‘transparency’.

The security issue is trivially dealt with. Install Webmin and configure it to listen on 127.0.0.1 using its standard port TCP10000. Install Firefox on the same host and then run firefox from an ‘ssh -Y’
session using the –noremote option. If you are totally paranoid then firewall TCP10000 as well, configure Webmin to use https only, and then only start the webmin service when you are performing maintenance.

There are less draconian measures that are in my opinion equally secure from a practical standpoint but I am sure that you can figure those out on your own.

The transparency issue is really unanswerable. There exists a school of thought that if you are going to administer a Linux system (or OS
of the proponent’s choice) then you should learn the command syntax of every command that you are called upon to use. This is the one-and-only path to enlightenment. Like upholding motherhood and promoting the wholesomeness of apple-pie this sort of moralizing really brooks no answer. You can guess my opinion on that line of puritanism.

As you have painfully discovered, infrequently used utilities and commands are difficult to deal with. The process of learning, or relearning, the correct arcana is particularly noisome given the notorious inconsistency of syntaxes across different utilities and the spotty coverage of up-to-date documentation. Google can be a dangerous guide given the wide variation of practice across differing flavours of *nix and the widespread aversion to providing dates on writings. In consequence I consign transparency arguments and their proponents to the religious fanatic file. Nothing personal but there is no point in arguing belief systems.

If you want to get infrequently performed sysadmin tasks done reliably and with a minimum of fuss use something like Webmin and get on with the rest of your life.

30 thoughts on - LVM Hatred, Was /boot On A Separate Partition?

  • James B. Byrne wrote:

    Back in ’06 or ’07, I installed webmin on the RHEL systems I was working on. It was a tremendous help installing and configuring openLDAP, whose tools, at least through ’08, were very definitely *NOT* ready for prime time. Webmin let me beat it into submission.

    mark

  • At Thu, 25 Jun 2015 11:03:18 -0400 CentOS mailing list wrote:

    HA! You only really need to learn *one* command: the man command. The man provides ‘enlightenment’ for all other commands:

    man vgdisplay man lvdisplay man lvcreate man lvextend man lvresize man lvreduce man lvremove man e2fsck man resize2fs

    These are the only LVM commands I use regularly (yes there a a pile more, but most are rarely used and a handful only used in startup/shutdown scripts or when rescuing) and I often end up use the man command to refresh my memory of the command options.

    Right, expecting a *web search* to give *correct* command documentation is problematical. Using the local system man pages often works better, since the man pages installed with the installed utilities will cover the *installed*
    version and not the version that might be installed on a *different* distro, etc.

  • There may be numerous commands… but isn’t it pretty obvious what each one of them do? Often lv is plenty of hinting to get to the right thing. And each of the commands uses the same syntax for options.

    Yes, exactly. DO NOT USE GOOGLE – USE THE &^@&$^* DOCUMENTATION!

    +1

    And take notes! You are sitting at a computer after all.

  • Having to read the documentation? That has always been what I assumed
    – people want to do something without being bothered with understanding what they are doing.

    Yep. Use it on every server, no exceptions, never had issues I did not cause myself – and moving storage around, adding storage, all on running servers… never a problem.

  • AFAIK, your page exists forever. This is how I first learned LVM: from your page. (Not that I use LVM much, but whenever I need to do something LVM, I’m confident I can – using your webpage).

    Thanks a lot!!

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Once upon a time, Adam Tauno Williams said:

    The key thing is to know the LVM architecture. Once you have a basic grasp of that, the rest is usually pretty easy to figure out.

    At the bottom, you have some block device. This is most often a standard disk partition (e.g. /dev/sda2); in some cases, it may be a whole disk (e.g. /dev/sdb).

    The first layer of LVM is the physical volume (PV). This is basically
    “importing” a block device into the LVM stack; the PV uses the name of the underlying block device (so still /dev/sda2 or /dev/sdb).

    You put one or more PVs into a volume group (VG), and give it a name
    (e.g. “vg_myhost”, but there’s nothing special about putting “vg_” at the front, that’s just something some people do). This is where the functionality and flexibility starts to come into play. A VG can have multiple PVs and spread data across them, do RAID, move blocks from one PV to another, etc.

    You then divide up a VG into logical volumes (LVs), also giving them names (e.g. “lv_root”; again, “lv_” is just a common naming scheme, not a requirement). This is where you can do snapshots, thin provisioning, etc.

    At that point, you’ll have a new block device, like
    /dev/vg_myhost/lv_root, and you can make filesystems, assign to VMs, set up swap, etc.

    The commands at each layer of LVM follow a similar scheme, so there’s pvcreate, vgcreate, and lvcreate for example. The arguments also follow a common scheme. For the regular admin stuff, you can typically figure out with a “–help” what you need (using the man page as a refresher or extended reference).

    It’s basically a way to assemble one arbitrary set of block devices and then divide them into another arbitrary set of block devices, but now separate from the underlying physical structure.

    Regular partitions have various limitations (one big one on Linux being that modifying the partition table of a disk with in-use partitions is a PITA and most often requires a reboot), and LVM abstracts away some of them. LVM is a set of commands and modules layered on top of the Linux kernel’s “device mapper” system. DM is just a way to map block A of virtual device X to block B of physical device Y; at one point, there was some discussion of kicking partition handling out of the kernel and just going with DM for everything (requires some form of init ramdisk though which complicates some setups).

  • Cool! this makes my day! It is just itching to add two more:

    man info info man

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • At Thu, 25 Jun 2015 13:18:04 -0400 CentOS mailing list wrote:

    It is ‘presumed’ that one has learned the man command itself and never ever need to do a ‘man man’ :-). From there all other knowledge flows…

  • And thank you for the kind words. It’s always good to hear that these things benefit someone.

  • I’ll give an example. I have a backup server, and for various reasons
    (hardlinks primarily) all the data needs to be in a single filesystem.
    However, this is running on an older VMware ESX server, and those have a
    2TB LUN size limit. So, even though my EMC Clariion arrays can deal with 10TB LUNs without issue, the VMware ESX and all of its guests cannot. So, I have a lot of RDMs for the guests. The backup server’s LVM looks like this:
    [root@backup-rdc ~]# pvscan
    PV /dev/sdd1 VG vg_opt lvm2 [1.95 TB / 0 free]
    PV /dev/sde1 VG vg_opt lvm2 [1.95 TB / 0 free]
    PV /dev/sdf1 VG vg_opt lvm2 [1.95 TB / 0 free]
    PV /dev/sda2 VG VolGroup00 lvm2 [39.88 GB / 0 free]
    PV /dev/sdg1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdh1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdi1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdj1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdk1 VG bak-rdc lvm2 [1.47 TB / 0 free]
    PV /dev/sdl1 VG bak-rdc lvm2 [1.47 TB / 0 free]
    PV /dev/sdm1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdn1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdo1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdp1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdq1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdr1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdb1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    PV /dev/sdc1 VG bak-rdc lvm2 [1.95 TB / 0 free]
    Total: 18 [32.27 TB] / in use: 18 [32.27 TB] / in no VG: 0 [0 ]
    [root@backup-rdc ~]# lvscan
    ACTIVE ‘/dev/vg_opt/lv_backups’ [5.86 TB] inherit
    ACTIVE ‘/dev/VolGroup00/LogVol00’ [37.91 GB] inherit
    ACTIVE ‘/dev/VolGroup00/LogVol01’ [1.97 GB] inherit
    ACTIVE ‘/dev/bak-rdc/cx3-80’ [26.37 TB] inherit
    [root@backup-rdc ~]#

    It’s just beautiful the way I can take another 1.95 TB LUN, add it to the volume group, expand the logical volume, and then expand the underlying filesystem (XFS) and just dynamically add storage. Being on an EMC Clariion foundation, I don’t have to worry about the RAID, either, as the RAID6 and hotsparing is done by the array. SAN and LVM
    were made for each other. And, if and when I either migrate the guest over to physical hardware on the same SAN or migrate to some other virtualization, I can use LVM’s tools to migrate from all those 1.95 and
    1.47 TB LUNs over to a few larger LUNs and blow away the smaller LUNs while the system is online. And the EMC Clariion FLARE OE software allows me great flexibility in moving LUNs around in the array for performance and other reasons.

  • The Fedora /boot upsize was something I handled relatively easily with the LVM tools and another drive. I actually used an eSATA drive for this, but an internal or a USB external (which would have impacted system performance) could have been used. Here’s what I did to resize my Fedora /boot when the upgrade required it several years back:

    1.) Added a second drive that was larger than the drive that /boot was on;
    2.) Created a PV on that drive;
    3.) Added that PV to the volume group corresponding to the PV on the drive with /boot;
    4.) Did a pvmove from the PV on the drive with /boot to the second drive
    (which took quite a while);
    5.) Removed the PV on the drive with /boot from the volume group;
    6.) Deleted the partition that previously contained the PV;
    7.) Resized the /boot partition and its filesystem (this is doable while online, whereas resizing / online can be loads of fun);
    8.) Created a new PV on the drive containing /boot;
    9.) Added that PV back to the volume group;
    10.) Resized the filesystems on the logical volumes on the volume group to shrink it to fit the new PV’s space and resized the LV’s accordingly
    (may require single-user mode to shrink some filesystems);
    11.) Did a pvmove from the secondary drive back to the drive with /boot;
    12.) Removed the secondary drive’s PV from the VG (and removed the drive from the system).

    I was able to do this without a reboot step or going into single user mode since I had not allocated all of the space in the VG to LV’s, so I
    was able to skip step 10. While the pvmoves were executing the system was fully up and running, but with degraded performance; no downtime was experienced until the maintenance window to do the version upgrade.
    Once step 12 completed, I was able to do the upgrade with no issues with
    /boot size and no loss of data on the volume group on the /boot drive.

  • Mike – st257 silvertip257 at gmail.com Tue Jun 23 16:40:47 UTC 2015

    I think LVM is badass, however if you don’t know the LVM tools, you’re instantly tossed deep into the weeds. Most every letter, lower and upper case, seems to be used twice by each of the lvm commands. I
    don’t have enough fingers to count the number of lvm commands. There’s so much intricate detail required for creating LVM layouts and doing snapshots and snapshot deletion compared to Btrfs that I’ve just about given up on LVM.

    I’ve also never had Btrfs snapshots explode on me like LVM thinp snapshots have when the metadata pool wasn’t made big enough in advance (and it isn’t made big enough by default, apparently). Most any typical maneuver done on LVM can be done much more easily and intuitively with Btrfs. So these days I just focus on Btrfs even though I definitely don’t hate LVM.

    On desktop Linux, making LVM the default layout I think is a bad decision. It causes mortal users more trouble than it’s worth. I’d be a bit more accommodating if LVM had integrated encryption with live bi-directional conversion.

  • Gordon Messmer gordon.messmer at gmail.com Wed Jun 24 01:42:13 UTC 2015

    I did a bunch of testing of Raw, qcow2, and LV backed VM storage circa Fedora 19/20 and found very little difference. What mattered most was the (libvirt) cache setting, accessible by virsh edit the xml config or virt-manager through the GUI. There have been a lot of optimizations in libvirt and qemu that make qcow2 files perform comparable to LVs.

    For migrating VMs, it’s easier if they’re a file. And qcow2 snapshots are more practical than LVM (thick) snapshots. The thin snapshots are quite good though they take a lot of familiarity with setting them up.

  • Chris Adams linux at cmadams.net Wed Jun 24 19:06:19 UTC 2015

    LVM is the emacs of storage. It’ll be here forever.

    Btrfs doesn’t export (virtual) block devices like LVM can, so it can’t be a backing for say iSCSI. And it’s also at the moment rather catatonic when it comes to VM images. This is mitigated if you set xattr +C at image create time (it must be zero length file for +C to take). But if you cp –reflink or snapshot the containing subvolume, then COW starts to happen for new writes to either copy; overwrites to either copies newly written blocks are nocow. So anyway you can quickly get into complicated states with VM images on Btrfs. I’m not sure of the long term plan.

    This is how to set xattr +C at qcow2 create time, only applicable when the qcow2 is on Btrfs.

    # qemu-img create -o nocow=on

    But really piles more testing is needed to better understand some things with Btrfs and VMs. It’s all quite complicated what’s going on with these layers. Even though my VM images get monstrous numbers of fragments if I don’t use +C, I haven’t yet seen a big performance penalty as a result when the host and guest are using Btrfs and the cache is set to unsafe. Now, you might say, that’s crazy! It’s called unsafe for a reason! Yes, but I’ve also viscously killed the VM while writes were happening and at most I lose a bit of data that was in flight, the guest fs is not corrupt at all, not even any complaints on the remount. I’ve got limited testing killing the host while the writes are happening, and there is more data loss due to delayed allocation probably, but again the host and guest Btrfs are fine – no mount complaints at all. And you kinda hope the host isn’t often dying…

    NTFS in qcow2 on Btrfs without +C however? From Btrfs list anecdote this combination appears to cause hundreds of thousands of fragments in short order, and serious performance penalties. But I haven’t tested this. I’m guessing something about NTFS journalling and flushing, and suboptimal cache setting for libvirt is probably causing too aggressive flushes to disk and each flush is a separate extent. Just a guess.

  • Am 26.06.2015 um 12:47 schrieb Steve Clark :

    Keep in mind – write caching can improve perf but also increases data loss on abnormal VM shutdowns

  • In terms of performance, unsafe. Overall, it’s hard to say because it’s so configuration and use case specific. In my case, I do lots of Fedora installs, and Btrfs related testing, and the data I care about is safeguarded other ways. So I care mainly about VM performance, and therefore use unsafe. I haven’t yet lost data in a way attributable to that setting (top on the list is user error, overwhelmingly, haha).

    You might find this useful:
    https://rwmj.wordpress.com/2013/09/02/new-in-libguestfs-allow-cache-mode-to-be-selected/

    And this:
    https://github.com/libguestfs/libguestfs/commit/749e947bb0103f19feda0f29b6cbbf3cbfa350da

    Of particular annoyance to me in Virt-Manager is the prolific use of the word “Default” which doesn’t tell you diddly. The problem is Virt-Manager supports different hypervisors and all of them can have different defaults which don’t necessarily propagate through to libvirt and I’m not sure that libvirt is even able to be aware of all of them. So we get this useless placeholder called default. Default is not good just because you don’t know what it is. It’s not necessarily true that default translates into what’s recommended – that may be true, but it may also not be ideal for your use case.

  • It’s definitely still true on CentOS 7.

    Create a RAID1 volume on two drives. Partition that volume.

    Where is your partition table? Is it in a spot where your BIOS/UEFI or another OS will see it? Will that non-Linux system try to open or modify the partitions inside your RAID? It depends on what metadata version you use. If you set this up in Anaconda, it’s going to be version 0.90, and your partition table will be in a spot where a non-Linux system will read it.

    There’s no ambiguity with LVM. That’s what I mean when I say that it’s less complicated.

    The format of MBR and GPT partition tables are imposed by the design of BIOS and UEFI. There is no good reason to use them for any purpose other than identifying the location of a filesytem that BIOS or UEFI
    must be able to read.

    The limitation I was referring to was that as far as I know, if Linux has mounted filesystems from a partitioned RAID set, you can’t modify partitions without rebooting. That limitation doesn’t affect LVM.

    I know, that’s what I said was the only practical way to support redundant storage (when using LVM).

    I hadn’t realized that. That’s an interesting alternative to MD RAID, particularly for users who want LVs with different RAID levels.

  • LVM RAID uses the md kernel code, but is managed by LVM tools and metadata rather than mdadm and its metadata format. It supports all the same RAID levels these days. The gotcha is that it’s obscure enough that you won’t find nearly as much documentation or help when you arrive at DR, what to do. And anyone who lurks on the linux-raid@
    list knows that a huge pile of data loss comes from users who do the wrong thing; maybe top on the list is they for some ungodly reason read somewhere to use mdadm -C to overwrite mdadm metadata on one of their drives and this obliterates important information needed for recovery and now they actually have caused a bigger problem.

    At the moment, LVM RAID is only supported with conventional/thick provisioning. So if you want to do software RAID and also use LVM thin provisioning, you still need to use mdadm (or hardware RAID).

  • You can do thin pools as RAID[1,5,N], just not in a single command:

    |root #||lvcreate -m 1 –type raid1 -l40%VG -n thin_pool vg0 |
    |root #||lvcreate -m 1 –type raid1 -L4MB -n thin_meta vg0 |
    |root #||lvconvert –thinpool vg0/thin_pool –poolmetadata vg00/thin_meta

    So yeah, it’s not directly supported by the tools but it does work. I
    would not recommend it though as I doubt it is very well tested.
    |

  • https://en.wikipedia.org/wiki/AdvFS
    AdvFS uses a relatively advanced concept of a storage pool (called a file domain) and of logical file systems (called file sets). A file domain is composed of any number of block devices, which could be partitions, LVM or LSM devices.

    I really miss this. BR, Bob