[SOLVED] Possible race condition on LVM activation during boot

Jason P. · 2019-02-06 13:12:13

Hi,

This one seems a little bit tricky... From time to time I'm dropped to a rescue shell during boot because of a non root partition that cannot be found. These are the relevant lines in log:

feb 06 12:07:09 mount[544]: mount: /mnt/multimedia: el dispositivo especial /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080 no existe.*
feb 06 12:07:09 systemd[1]: mnt-multimedia.mount: Mount process exited, code=exited, status=32/n/a
feb 06 12:07:09 systemd[1]: mnt-multimedia.mount: Failed with result 'exit-code'.
feb 06 12:07:09 systemd[1]: Failed to mount /mnt/multimedia.
feb 06 12:07:09 systemd[1]: Dependency failed for Local File Systems.
feb 06 12:07:09 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
feb 06 12:07:09 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.

* "special device /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080 doesn't exist"

This happened randomly just two times in past days and for the moment the issue is gone the second time I try to boot. Another line just a little below those makes me think I'm dealing with some sort of race condition:

feb 06 12:07:09 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
feb 06 12:07:09 systemd[1]: systemd-ask-password-wall.path: Succeeded.
feb 06 12:07:09 systemd[1]: Stopped Forward Password Requests to Wall Directory Watch.
feb 06 12:07:09 systemd[1]: Reached target Host and Network Name Lookups.
feb 06 12:07:09 systemd[1]: systemd-ask-password-console.path: Succeeded.
feb 06 12:07:09 lvm[483]:   1 logical volume(s) in volume group "storage" now active

Volume group "storage" contains one single logical volume, the one that failed before, and looks like it was activated a bit later during boot.

➜ # ~  vgdisplay storage
  --- Volume group ---
  VG Name               storage
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <931,51 GiB
  PE Size               4,00 MiB
  Total PE              238466
  Alloc PE / Size       238466 / <931,51 GiB
  Free  PE / Size       0 / 0   
  VG UUID               y5SfTe-knJI-CNC9-XNob-3JMa-KxyR-iqqSV4

➜ # ~  lvdisplay storage
  --- Logical volume ---
  LV Path                /dev/storage/multimedia
  LV Name                multimedia
  VG Name                storage
  LV UUID                wICROl-WnQb-nSKA-EIo5-LhsI-tl4R-s3sO3c
  LV Write Access        read/write
  LV Creation host, time archiso, 2015-01-11 20:37:37 +0100
  LV Status              available
  # open                 1
  LV Size                <931,51 GiB
  Current LE             238466
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:0

➜ # ~  blkid /dev/storage/multimedia
/dev/storage/multimedia: LABEL="multimedia" UUID="cabdab31-983b-401f-be30-dda0ae462080" TYPE="ext4"

➜ # ~  lsblk /dev/sdb 
NAME                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdb                      8:16   0 931,5G  0 disk 
└─sdb1                   8:17   0 931,5G  0 part 
  └─storage-multimedia 254:0    0 931,5G  0 lvm  /mnt/multimedia

➜ # ~  cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
# /dev/mapper/ssd-root
UUID=d2c293c1-508e-4178-b652-2a81ef5cd1b5	/					ext4	rw,relatime,data=ordered,discard		0	1

# /dev/sda1
UUID=DDC4-DF3D		/boot	vfat	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro,discard	0	2

# /dev/mapper/hdd-tmp
UUID=5b624507-ac12-4e5c-8a60-d1671e4c11af	/tmp				ext4	rw,relatime,data=ordered				0	2

# /dev/mapper/hdd-var
UUID=ea78a861-6240-4fdd-8408-d1aab9099b98	/var				ext4	rw,relatime,data=ordered				0	2

# /dev/mapper/hdd-home
UUID=888f14fc-dd6f-444a-b4ee-b919538adc91	/home				ext4	rw,relatime,data=ordered				0	2

# /dev/mapper/hdd-swap
UUID=b91165e2-021c-4b60-8e26-22903c825996	none				swap	defaults								0	0

# /dev/mapper/storage-multimedia
UUID=cabdab31-983b-401f-be30-dda0ae462080	/mnt/multimedia		ext4	rw,relatime,data=ordered,x-gvfs-show	0	2

# /dev/mapper/hdd-deluge
UUID=de6b6b15-0333-4e98-87c7-efffc35212f3	/srv/deluge/watch	ext4	rw,relatime,data=ordered,x-gvfs-show	0	2

➜ # ~  pacman -Qi lvm2 
Nombre                    : lvm2
Versión                   : 2.02.183-2

Drives are configured the same way since years, what can be the source of the problem? Maybe not related at all with but I ran into this other issue several days ago after shutting down when I normally just do suspend/reboot.

I'd appreciate your help, thanks!

Last edited by Jason P. (2019-02-21 14:34:25)

jamespharvey20 · 2019-02-06 23:44:59

Is it always multimedia that sometimes fails?

If so, I'd suggest digging deeper, but a quick fix might be to add "noauto,x-systemd.automount" after multimedia's mount options ("...x-gvfs-show,noauto,x-systemd.automount"). This will wait to mount it until something tries to use it.

Can you post a journalctl output for the entire failed boot? Run "journalctl --list-boots" to see boot IDs, and run "journalctl -b -<#>" to show the output for that boot-shutdown cycle. (Make sure to put the - in front of the number as well - positive value counts from beginning, negative value counts backwards from current like "--list-boots" shows. Paste it somewhere and put a link here. Pastebin clients Or just install "netcat" (i.e. "gnu-netcat") and run "journalctl -b -<#> | netcat termbin.com 9999" and post the URL it gives you here.

1 logical volume(s) in volume group "storage" now active

Until this appears, that volume's UUID doesn't exist in "/dev/disk/by-uuid". When it fails, am I right that there's a long delay before systemd times out? (By default, systemd times out a unit after 90 seconds.)

If I'm right there's a 90 second timeout, it seems bizarre to me that in the same second the mount fails, the volume finally becomes active. I'm wondering if for some reason systemd isn't waiting the 90 seconds, and there's a race against LVM activation. I personally use "/dev/<vg>/<lv>" in my fstab to mount LVM volumes and systemd waits for the volumes to become activated, as long as they do so within 90 seconds. (Which they do easily, unless there's thin volumes in use, then I need to add "--skip-mappings" to "thin_check_options" in "/etc/lvm/lvm.conf".) I think systemd should be waiting if you're using UUIDs, but maybe for some reason I'm wrong on that, and you could try the "/dev/<vg>/<lv>" form in fstab.

Combined with your other post about the "open failed" errors during pvscan, I'm wondering if your hard drive controller (motherboard?) could be having intermittent problems. Or, maybe if your power supply could be having problems. I have to recommend backing up all your data to be prudent, granted the other post's "open failed" errors make me less concerned about your sdb failing, but backups are never a bad idea.

Jason P. · 2019-02-07 18:01:51

jamespharvey20 wrote:

Is it always multimedia that sometimes fails?

Until now yes. Two times to be precise.

jamespharvey20 wrote:

If so, I'd suggest digging deeper, but a quick fix might be to add "noauto,x-systemd.automount" after multimedia's mount options ("...x-gvfs-show,noauto,x-systemd.automount"). This will wait to mount it until something tries to use it.

Nice It will do the trick and also improve boot times a couple of ms no? hehe. In the end it's my storage for movies, music and so and I don't have reasons to make it available during boot. Here I'm assuming that lvm activation will maintain it unmounted.

jamespharvey20 wrote:

Can you post a journalctl output for the entire failed boot? Run "journalctl --list-boots" to see boot IDs, and run "journalctl -b -<#>" to show the output for that boot-shutdown cycle. (Make sure to put the - in front of the number as well - positive value counts from beginning, negative value counts backwards from current like "--list-boots" shows. Paste it somewhere and put a link here. Pastebin clients Or just install "netcat" (i.e. "gnu-netcat") and run "journalctl -b -<#> | netcat termbin.com 9999" and post the URL it gives you here.

Here you have: https://termbin.com/kbdq

jamespharvey20 wrote:

When it fails, am I right that there's a long delay before systemd times out?

Not really. As you could see in the complete log there's no timeout.

jamespharvey20 wrote:

I personally use "/dev/<vg>/<lv>" in my fstab to mount LVM volumes and systemd waits for the volumes to become activated

I'm gonna try and see if but this fstab has not been touched for several years and it was working ok. What leads me to your next comment...

jamespharvey20 wrote:

I'm wondering if your hard drive controller (motherboard?) could be having intermittent problems. Or, maybe if your power supply could be having problems.

I've had weird issues long time ago during cold boots after several hours with the machine powered off. Apparently something was not having enough power to start causing boot process to stop and start again in short loops. At the beginning I thought the issue was related with the power supply so I replaced it with a newer and better model. Also I disconnected devices that were not being used like DVD and PCI cards, but problems were still there (sometimes). So I just avoided them going to suspend instead of powering off. I'm not quite sure if this is yet a problem. A few times after all this it managed to boot fine although it was several hours unplugged. Maybe some capacitors are faulty... who knows. I don't have a tester to check them. Maybe I should do it...

Jason P. · 2019-02-09 17:11:19

I'm trying to reproduce the problem with udev logging level at "debug" so I can provide more info.

Jason P. · 2019-02-21 13:50:25

It has been more than a week without the error. I close this thread because I don't manage to reproduce the issue and systemd has been updated to version 241.

V1del · 2019-02-21 14:22:03

Please use [SOLVED] instead, as [closed] denotes a thread locked by a moderator

Arch Linux

#1 2019-02-06 13:12:13

[SOLVED] Possible race condition on LVM activation during boot

#2 2019-02-06 23:44:59

Re: [SOLVED] Possible race condition on LVM activation during boot

#3 2019-02-07 18:01:51

Re: [SOLVED] Possible race condition on LVM activation during boot

#4 2019-02-09 17:11:19

Re: [SOLVED] Possible race condition on LVM activation during boot

#5 2019-02-21 13:50:25

Re: [SOLVED] Possible race condition on LVM activation during boot

#6 2019-02-21 14:22:03

Re: [SOLVED] Possible race condition on LVM activation during boot

Board footer