You are not logged in.
Hello everyone,
yesterday I installed Arch on my PC. So far it is a pretty basic setup.
I created the EFI system partition on one of my 3 nvme drives the whole space is used for LVM otherwise. I created a single VG which spans over all 3 drives (I called it "system" for a lack of a better name). Technically there is another older PV/VG ("black", because it is the only WD Black disk) on a HDD, but that one doesn't seem to matter as it doesn't have any mounts configured either.
[root@ember ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/nvme0n1p2 system lvm2 a-- <464,76g 320,24g
/dev/nvme1n1p1 system lvm2 a-- <465,76g 321,24g
/dev/nvme2n1p1 system lvm2 a-- <465,76g <336,25g
/dev/sdb1 black lvm2 a-- <2,73t <501,26g
I have created volumes with varying RAID modes. One for root (5), var (1), home (5), swap (0) and some space for a steam library (0). For booting the system I use systemd-boot.
[root@ember ~]# grep -vE '^#|^$' /etc/mkinitcpio.conf
MODULES=(dm-raid dm_integrity raid0 raid1 raid10 raid456)
BINARIES=()
FILES=()
HOOKS=(base systemd modconf keyboard keymap consolefont block lvm2 filesystems fsck autodetect)
The problem: Every other boot or so the system hangs on startup waiting for lvm2 related jobs to finish until they timeout after 90 seconds:
(typed by hand)
A stop job is running for /usr/bin/lvm vgchange -aay --autoactivation event system
A start job is running for /dev/disk/by-uuid/<UUID>
...
A start job is running for Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
Then I am thrown into the emergency console. The unit "lvm-activate-system.service" is in failed state. I can't seem to finish the activation manually or mount the remaining volumes afterwards. Rebooting the system is the easiest workaround at this point. If you'd ask me it is some race condition with something happening early in boot which leaves lvm2 in a broken state and it is seemingly purely random. Once it boots successfully there are no further problems and everything is working as expected related to the storage.
I remember a similar issue in the past where a workaround existed related to lvm2-metad which consisted of an override for the service unit which was copied into the image. Unfortunately this doesn't work anymore as lvm2-metad got replaced (?). Also I found a more recent topic which sounds a lot like my problem, but neither of the proposed solutions seem to work.
Here is the journal of one of the failed boots: journalctl -b <ID> | sed 's/myname/m***/g'
I think this is the most relevant part of the journal which displays the same problem as described in the other thread. I didn't highlight the mount failures that follow to keep it concise.
May 06 23:06:31 archlinux systemd[1]: Stopped Create List of Static Device Nodes.
May 06 23:06:31 archlinux lvm[477]: Interrupted...
May 06 23:06:31 archlinux lvm[477]: 4 logical volume(s) in volume group "system" now active
...
May 06 23:08:01 ember systemd[1]: lvm-activate-system.service: State 'stop-sigterm' timed out. Killing.
May 06 23:08:01 ember systemd[1]: lvm-activate-system.service: Killing process 477 (lvm) with signal SIGKILL.
May 06 23:08:01 ember systemd[1]: lvm-activate-system.service: Main process exited, code=killed, status=9/KILL
May 06 23:08:01 ember systemd[1]: lvm-activate-system.service: Failed with result 'timeout'.
May 06 23:08:01 ember systemd[1]: Stopped /usr/bin/lvm vgchange -aay --autoactivation event system.
I don't really know how to proceed from here as I can't seem to figure it out. So I was hoping any of you could please help. :)
Thanks!
Solution
So while this is an issue I misinterpreted the solution in the linked thread and did not properly understand what it is actually doing. So the solution was to just enter the root device into the auto_activation_volume_list in /etc/lvm/lvm.conf. Nothing else should be added there (my mistake here). Afterwards I took the linked script and adjusted it a bit (thank you sickill!).
/etc/systemd/system/lvm-activate@.service
[Unit]
Requires=dm-event.socket
After=dm-event.socket dm-event.service
Before=local-fs-pre.target
DefaultDependencies=no
Description=Activate volume group %I
[Service]
Type=oneshot
ExecStart=/usr/bin/lvm vgchange -ay %I
RemainAfterExit=no
[Install]
WantedBy=local-fs-pre.target
And enabled it for all used VG on my system.
systemctl enable lvm-activate@system
systemctl enable lvm-activate@black
Marking this as solved, because the workaround / fix seems reasonable and I am not able to reproduce the original issue at this point after around 10 reboots.
Last edited by Swiggles (2023-05-07 13:13:22)
Offline
Let me first say your setup violates the KISS principle. I urge you to rethink what you are trying to achieve (redundancy of what exactly - LV flexibility of what exactly) and to drastically reduce the (LVM) complexity.
That said - this seems to be a classic race condition.
The VGs are assembled, autoactivation starts and is still running while the root device gets detected:
...
May 06 23:06:30 archlinux lvm[465]: PV /dev/sdb1 online, VG black is complete.
...
May 06 23:06:30 archlinux lvm[466]: PV /dev/nvme0n1p2 online, VG system incomplete (need 2).
May 06 23:06:30 archlinux lvm[468]: PV /dev/nvme1n1p1 online, VG system incomplete (need 1).
May 06 23:06:30 archlinux lvm[467]: PV /dev/nvme2n1p1 online, VG system is complete.
May 06 23:06:30 archlinux systemd[1]: Started /usr/bin/lvm vgchange -aay --autoactivation event black.
May 06 23:06:30 archlinux lvm[470]: 0 logical volume(s) in volume group "black" now active
May 06 23:06:30 archlinux systemd[1]: Started /usr/bin/lvm vgchange -aay --autoactivation event system.
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-1 operational as raid disk 0
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-3 operational as raid disk 1
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-5 operational as raid disk 2
May 06 23:06:30 archlinux kernel: md/raid:mdX: raid level 5 active with 3 out of 3 devices, algorithm 2
May 06 23:06:30 archlinux kernel: device-mapper: raid: raid456 discard support disabled due to discard_zeroes_data uncertainty.
May 06 23:06:30 archlinux kernel: device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y to override.
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-8 operational as raid disk 0
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-10 operational as raid disk 1
May 06 23:06:30 archlinux kernel: md/raid:mdX: device dm-12 operational as raid disk 2
May 06 23:06:30 archlinux kernel: md/raid:mdX: raid level 5 active with 3 out of 3 devices, algorithm 2
May 06 23:06:30 archlinux systemd[1]: Found device /dev/system/root.
The root fs is checked, sysroot gets mounted, autoactivation of "black" is finished:
May 06 23:06:30 archlinux systemd[1]: Reached target Initrd Root Device.
May 06 23:06:30 archlinux systemd[1]: Starting File System Check on /dev/system/root...
May 06 23:06:30 archlinux systemd-fsck[537]: /dev/mapper/system-root: clean, 182418/983040 files, 2088060/3932160 blocks
May 06 23:06:30 archlinux systemd[1]: Finished File System Check on /dev/system/root.
May 06 23:06:30 archlinux systemd[1]: Mounting /sysroot...
May 06 23:06:30 archlinux systemd[1]: lvm-activate-black.service: Deactivated successfully.
One of the RAID1 is active, sysroot is mounted, handoff from initrd to sysroot
May 06 23:06:31 archlinux kernel: device-mapper: raid: raid456 discard support disabled due to discard_zeroes_data uncertainty.
May 06 23:06:31 archlinux kernel: device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y to override.
May 06 23:06:31 archlinux kernel: md/raid1:mdX: active with 2 out of 2 mirrors
May 06 23:06:31 archlinux systemd[1]: Mounted /sysroot.
May 06 23:06:31 archlinux systemd[1]: Reached target Initrd Root File System.
May 06 23:06:31 archlinux kernel: EXT4-fs (dm-6): mounted filesystem 2f68e77d-1ab7-4119-b418-3f278bcacc1e with ordered data mode. Quota mode: none.
May 06 23:06:31 archlinux systemd[1]: Starting Mountpoints Configured in the Real Root...
...
May 06 23:06:31 archlinux systemd[1]: Finished Mountpoints Configured in the Real Root.
May 06 23:06:31 archlinux systemd[1]: Reached target Initrd File Systems.
May 06 23:06:31 archlinux systemd[1]: Reached target Initrd Default Target.
systemd now cleans up everything and stops the autoactivation of system, which is still running:
May 06 23:06:31 archlinux systemd[1]: Starting Cleaning Up and Shutting Down Daemons...
May 06 23:06:31 archlinux systemd[1]: Stopped target Initrd Default Target.
May 06 23:06:31 archlinux systemd[1]: Stopped target Basic System.
May 06 23:06:31 archlinux systemd[1]: Stopped target Initrd Root Device.
May 06 23:06:31 archlinux systemd[1]: Stopped target Path Units.
May 06 23:06:31 archlinux systemd[1]: Stopped target Slice Units.
May 06 23:06:31 archlinux systemd[1]: Stopped target Socket Units.
May 06 23:06:31 archlinux systemd[1]: Stopped target System Initialization.
May 06 23:06:31 archlinux systemd[1]: Stopped target Local File Systems.
May 06 23:06:31 archlinux systemd[1]: Stopped target Swaps.
May 06 23:06:31 archlinux systemd[1]: Stopped target Timer Units.
May 06 23:06:31 archlinux systemd[1]: Stopping /usr/bin/lvm vgchange -aay --autoactivation event system...
LVM doesn't like that:
May 06 23:06:31 archlinux lvm[477]: Interrupted...
May 06 23:06:31 archlinux lvm[477]: 4 logical volume(s) in volume group "system" now active
and the systemd job runs into the timeout.
Since I neither use LVM root volumes nor had to reign in systemd to wait for something to finish - I can only describe possible solutions:
- Reduce the LVM complexity to enable lvm to be faster
- Only assemble the "root" volume during the initrd phase and the rest afterwards
- Tell systemd to wait for lvm to finish before mounting sysroot or
- Tell systemd to wait for lvm to finish before starting the cleanup
Offline
Only assemble the "root" volume during the initrd phase and the rest afterwards
This is basically the solution I added to the top post. Thank you for your input!
Regarding complexity it is the simplest setup I came up with where the storage is unbalanced (hence lvm at the bottom) and any system critical partition can survive with one failed drive. Mixing RAID 1 and 5 for arbitrary storage / performance reasons. I don't see any benefit using linear volumes over RAID 0 for data that can be easily restored/recovered or is not important for running a degraded system. Of course backups are next on my list, but I wanted this fixed first. :)
Offline