You are not logged in.

#1 2014-07-15 14:05:43

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Intermittent boot failures related to presence of LVM snapshots

Aside from /boot (sda2), all of my filesystems are on LVM LVs:

# fdisk -l /dev/sd?
Disk /dev/sda: 233.8 GiB, 251059544064 bytes, 490350672 sectors
...
Device    Boot     Start       End    Blocks  Id System
/dev/sda1           2048   2099199   1048576  83 Linux
/dev/sda2 *      2099200   2303999    102400  83 Linux
/dev/sda3        2304000 490350671 244023336  8e Linux LVM

Disk /dev/sdb: 233.8 GiB, 251059544064 bytes, 490350672 sectors
...
Device    Boot Start       End    Blocks  Id System
/dev/sdb1       2048 490350671 245174312  8e Linux LVM

# lvscan
  ACTIVE   Original '/dev/VG0/lv_root' [5.00 GiB] inherit
  ACTIVE   Original '/dev/VG0/lv_home' [10.00 GiB] inherit
  ACTIVE            '/dev/VG0/lv_data' [212.00 GiB] inherit
  ACTIVE   Original '/dev/VG0/lv_var' [5.71 GiB] contiguous
  ACTIVE            '/dev/VG0/lv_build' [10.00 GiB] contiguous
  ACTIVE            '/dev/VG0/lv_aoe0' [10.00 GiB] contiguous
  ACTIVE            '/dev/VG0/lv_aoe1' [10.00 GiB] inherit
  ACTIVE            '/dev/VG0/cp_home' [10.00 GiB] inherit
  ACTIVE            '/dev/VG0/cp_var' [6.00 GiB] inherit
  ACTIVE   Snapshot '/dev/VG0/ss_root' [5.00 GiB] inherit
  ACTIVE   Snapshot '/dev/VG0/ss_var' [5.71 GiB] inherit
  ACTIVE   Snapshot '/dev/VG0/ss_home' [10.00 GiB] inherit

sda1 is a minimal install of Arch which shares /boot with the main install. Before certain updates (kernel, X, graphics driver) I take snapshots of my root, home, and var LVs as indicated above. The snapshots are taken offline in the minimal install and the update is done in a chroot from there as well. I don't think that's relevant but I suppose it warrants mentioning.

Anyway, when the snapshots exist, something strange intermittently happens with one or more of the snapshotted volumes and the system fails to boot. It has never happened on the first boot after creating the snapshots; it usually happens on the second but occasionally I make it to a third boot before it occurs. The most recent failure was with lv_home. I logged into the recovery console and ran lvscan which indicated that all LVs were active. But there was no device node for lv_home (ie. /dev/VG0/lv_home). The journal indicated that the LVs were activated successfully, but not without issue:

Jul 15 09:05:25 caddywhompus lvm[270]: Monitoring snapshot VG0-ss_root
Jul 15 09:05:25 caddywhompus lvm[270]: Monitoring snapshot VG0-ss_home
Jul 15 09:05:25 caddywhompus lvm[211]: 4 logical volume(s) in volume group "VG0" monitored
Jul 15 09:05:26 caddywhompus lvm[277]: device-mapper: create ioctl on VG0-lv_data failed: Device or resource busy
Jul 15 09:05:26 caddywhompus lvm[277]: device-mapper: create ioctl on VG0-lv_var failed: Device or resource busy
Jul 15 09:05:26 caddywhompus lvm[277]: 12 logical volume(s) in volume group "VG0" now active
Jul 15 09:05:26 caddywhompus lvm[277]: VG0: autoactivation failed.
Jul 15 09:05:26 caddywhompus lvm[270]: Monitoring snapshot VG0-ss_var
Jul 15 09:05:26 caddywhompus lvm[279]: 12 logical volume(s) in volume group "VG0" now active

(That's all the lines containing "lvm[" from before I logged in to the recovery console)

I ran lvchange -an VG0/lv_home then lvchange -ay VG0/lv_home . After the latter the standard "lv_home fsck'd" boot message appeared, and mount indicated it was automounted. I exited the recovery console and the system eventually booted after a minute or so, though at least one service (dhcpd4) failed to start. At some point after logging in to the recovery console (presumably after the lvchange commands) a few more messages related to lvm were generated:

Jul 15 09:09:17 caddywhompus lvm[270]: No longer monitoring snapshot VG0-ss_home
Jul 15 09:09:21 caddywhompus lvm[270]: Monitoring snapshot VG0-ss_home
Jul 15 09:09:41 caddywhompus lvm[270]: There are still devices being monitored.
Jul 15 09:09:41 caddywhompus lvm[270]: Refusing to exit.

I've actually had issues with LVM and snapshots for over a year now (since the big LVM brouhaha in Feb of last year). Since I only use the snapshots as a hedge against update breakage, it's not a big sacrifice to remove them after the first post-update boot. But occasionally I forget, and of course this shouldn't be happening anyway. So any suggestions would be appreciated.

Edit to add: lv_{root, var, home, data} were all on sda3 and filled it completely. The remaining lvs (including snaps) are on sdb1.

Last edited by alphaniner (2014-07-15 19:21:26)


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#2 2014-07-15 14:28:58

greyseal96
Member
Registered: 2014-03-20
Posts: 31

Re: Intermittent boot failures related to presence of LVM snapshots

I just wanted to post that I've noticed weird and intermittent boot failures when I've got LVM snapshots as well.  Like you, I take snapshots of some of my partitions (root, home and boot) before updating so that I can roll back, if necessary.  I run with the snapshots for much longer than you, though, because it usually doesn't cause me any problems.  It is usually a couple of weeks at minimum before I start to have any problems at boot.  Usually, the problem is that the boot process won't be able to find my home partition.  It will try to find it for 90 seconds and then fail and go to a recovery console.  On a couple of rare occasions, the boot process accidentally mounted the root snapshot instead of the root partition (but I'm not sure if this is related to the issue in question or not).  Usually, shutting down and rebooting will fix the problem.  This issue is mostly just annoying.

I wish that I had a solution for you, but I haven't been able to figure this out yet.  If this happens again, though, I'll try to look in the journal to see if I have output similar to yours and I'll post my findings.  I've noticed that your previous posts haven't yielded a solution either so maybe this is a pretty rare issue.  Maybe we'll start attracting some attention.  smile

Offline

#3 2014-07-25 13:28:56

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: Intermittent boot failures related to presence of LVM snapshots

To rule out disk issues - very unlikely IMO, but I don't have any better ideas - I migrated my installation to a new set of hard drives with the same basic configuration: root, var, home, and data LVs on the first disk and all others including snaps on the second disk. It took a bit longer before I had an issue, but eventually it happened (same thing with home LV).

I hadn't really been paying any attention to which LV had problems before I made this post, but since then it's always been the home LV. So I removed the snapshot for the home LV and haven't had any problems since.

-----

greyseal96 wrote:

On a couple of rare occasions, the boot process accidentally mounted the root snapshot instead of the root partition

If you use labels or UUID in fstab, that could be the cause:

# blkid
/dev/mapper/vg0-v_root-real: LABEL="new_arch" UUID="438521e2-c417-438c-9d41-2c4bc2e11136" TYPE="ext4" 
/dev/mapper/vg0-v_root: LABEL="new_arch" UUID="438521e2-c417-438c-9d41-2c4bc2e11136" TYPE="ext4" 
/dev/mapper/vg0-s_root-cow: TYPE="DM_snapshot_cow" 
/dev/mapper/vg0-s_root: LABEL="new_arch" UUID="438521e2-c417-438c-9d41-2c4bc2e11136" TYPE="ext4"

Note that v_root (the origin LV), s_root (the snapshot LV) and v_root-real (a 'meta' LV or somesuch like s_root-cow that doesn't show up in lvscan, etc.) all have same UUID and label.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#4 2014-07-26 05:34:07

greyseal96
Member
Registered: 2014-03-20
Posts: 31

Re: Intermittent boot failures related to presence of LVM snapshots

You're absolutely correct regarding the use of UUID or labels in the fstab.  Early on, when I started having this problem, I had a weird week where I was working on something one day but the file would be gone the next.  Then I would work on something else and it would be gone, but the other files that I thought that I lost were back.  It was really weirding me out until I figured out that my snapshot partitions (root, home, etc.) had actually gotten mounted a couple of times.  After that, I changed my fstab to use the partition names instead of UUIDs and, with one exception, I haven't had the wrong partitions mounted since.  The exception that I mentioned was the time that, even though I had switched to using the root partition name, the snapshot root partition was mounted.  I'm still not exactly sure why that would happen but, like I say it's only happened once so I haven't really been motivated to pursue that particular problem.

It's good to know about how you migrated the installation to a new set of hard drives.  That would seem to rule out an issue with that particular disk.  This really does seem like an issue with LVM and snapshotted partitions.  I would say that it is an issue with LVM and snapshots of home partitions in particular, but I've had a couple of outlier cases where I had a problem with another partition.  I'm pretty sure that this has something to do with the size of the snapshot, because this always happens to me after I've been running with the snapshots for a while.  When I start having boot issues, the snapshots are, at most, 40% used (as reported by LVS) so I don't think that they're running out of space or anything.  I haven't been able to form a really good hypothesis about the problem, but those are the hunches that I've got so far.  Thanks for posting your additional information.

Offline

Board footer

Powered by FluxBB