You are not logged in.
Hi,
I have a non-root btrfs filesystem spannig over two encrypted partitions:
/etc/crypttab:
graid1 /dev/sdc3 /etc/crypto/graid1 timeout=15
graid2 /dev/sdb2 /etc/crypto/graid2 timeout=15
/etc/fstab:
/dev/mapper/graid1 /mnt/raid btrfs defaults,device=/dev/mapper/graid1,device=/dev/mapper/graid2 0 0
When booting I get
A start job is running for dev-mapper-graid2.device (10s / no limit)
Adding a nofail option to fstab makes the system bootable again but obviously the raid filesystem is not mounted. A mount -a in an interactive root shell mounts the filesystem just fine.
My first guess was that systemd is ignorant to (among other things) the fact that all btrfs partitions must be present when mouning a btrfs array, not just the first one and it tries to "optimise" the boot process by starting the mount as soon as the first device is up via crypttab systemd automagick (systemd-crypto or whatever that unit parsing crypttab is called). Since debugging of systemd is black magick (no errors, warnings, fails or anything useful yielded from two-hour debugging using the wiki and extensive ddg'ing) I went the trial-and-error way.
I found out that systemd somewhat "virtualises" the fstab to it's units, namely mount units, so I tried to gently hint it that it should really wait for the cryptsetup.target to finish so that all partitions are ready before it tries to mount the array:
/etc/systemd/system/mnt-raid.mount:
[Unit]
Description=/mnt/raid
Wants=cryptsetup.target
After=cryptsetup.target
AssertPathExists=/dev/mapper/graid1 // I was hoping these would force the unit to FAIL
AssertPathExists=/dev/mapper/graid2 // in order for it to be at least a little debuggable. Guess what.
[Mount]
What=/dev/mapper/graid1
Where=/mnt/raid
Type=btrfs
Options=defaults,relatime,compress=lzo,device=/dev/mapper/graid1,device=/dev/mapper/graid2
Along with this unit I also commented out the line in fstab since it took precedence* every time even though the systemd documentation specifically says otherwise: If a mount point is configured in both /etc/fstab and a unit file that is stored below /usr, the former will take precedence. If the unit file is stored below /etc, it will take precedence. This means: native unit files take precedence over traditional configuration files, but this is superseded by the rule that configuration in /etc will always take precedence over configuration in /usr.
*) I figured that much by trial-and-error approach, having the mount unit already present and listed in systemctl list-unit-files but removing the nofail option from fstab resulted in above mentioned "a start job is running" red line of death.
After bootup systemctl list-units says:
...
dev-mapper-graid1.device loaded inactive dead start dev-mapper-graid1.device
...
mnt-raid.mount loaded inactive dead start /mnt/raid
(these lines were not present when booting without the unit file).
Since the unit did not "fail" and there are no logs in journalctl -u mnt-raid.mount and running SYSTEMD_LOG_LEVEL=debug systemctl start mnt-raid.mount yields only unseful:
Calling manager for StartUnit on mnt-raid.mount, replace
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StartUnit cookie=1 reply_cookie=0 error=n/a
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=GetUnit cookie=2 reply_cookie=0 error=n/a
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/mnt_2draid_2emount interface=org.freedesktop.DBus.Properties member=Get cookie=3 reply_cookie=0 error=n/a
Adding /org/freedesktop/systemd1/job/59 to the set
Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/job/59 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=2 reply_cookie=0 error=n/a
Got D-Bus request: org.freedesktop.DBus.Properties.PropertiesChanged() on /org/freedesktop/systemd1/job/59
Got message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1/job/58 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3 reply_cookie=0 error=n/a
Got D-Bus request: org.freedesktop.DBus.Properties.PropertiesChanged() on /org/freedesktop/systemd1/job/58
... and hangs indefinitely, I'm really running out of ideas. I did not find any information on how to debug a dead unit since it's apparently not considered an errorneous state.
I'm stuck. I don't know what to do next except for making workarounds like manually mounting the filesystem via some kind of (now defunct - why?) /etc/rc.local mechanism. How do I find out what's happening when systemd starts the mount unit? How do I get the output from the commands it runs (like mount and possibly others)? I have no idea whether the custom mount unit of mine works and the problem is elsewhere or doesn't, or ... whatever.
(off-topic: exactly how does systemd "optimise" anything? Ever since Arch moved to systemd my boot time tripled, all troubles got way more frustrating and 4 out of 5 of them were due to some systemd "goodie". What happened to KISS? systemd sure is Stupid but I kinda miss the Simple part...)
Last edited by mr.MikyMaus (2015-04-01 13:22:11)
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline
Have a look at this comment: https://bugs.freedesktop.org/show_bug.cgi?id=88483#c1
and this bug here at Arch: https://bugs.archlinux.org/task/42884
Offline
Thanks! The Arch bug isn't directly related to my problem but the comment in systemd's bugreport was, well, "helpful", if you can call it that.
UUID in fstab indeed works but I have only vague idea why and from Lennart's style of commentary I have a really bad feeling. Last time I had this feeling was a couple of years ago when I needed to solve a problem with Windows and contacted Microsoft support. They also used this language style and words like "this is intended, this is considered to be like this, you need to" and similar but no explanation why is it this way and why is it deviating from what is expected behaviour from user's point of view.
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline
I know what you mean, I think he chooses such phrases to abbreviate. Though, he really needs to get credit for being very active at the bugs too. Pity there is not always time for a link to accompany the statements.
Back to topic: So using the btrfs UUID in fstab solved it for you? (apparently it did not do it for the bug OP who is running a btrfs root raid) E.g.:
UUID=123-456-789-012 /mnt/raid btrfs defaults,device=/dev/mapper/graid1,device=/dev/mapper/graid2 0 0
makes it boot fine?
Offline
Yes, UUID in fstab works as expected. I am, however, hesitant to mark this problem as "solved" as I still don't fully understand how is this a "solution". In my mind and understaning of basic principles, this is still a workaround...
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline
Yes, UUID in fstab works as expected. I am, however, hesitant to mark this problem as "solved" as I still don't fully understand how is this a "solution". In my mind and understaning of basic principles, this is still a workaround...
From my experience, using "LABEL" works the same as UUID. It still comes up a little short for me though.
Out of curiosity, do you have pending jobs on boot? For example my system thinks it is still in the "starting" stage of the boot cycle. I have 4 underlying devices using LUKS crypto to form a btrfs RAID:
$ systemctl list-jobs
JOB UNIT TYPE STATE
72 dev-mapper-crypt3.device start running
58 dev-mapper-crypt0.device start running
69 dev-mapper-crypt1.device start running
3 jobs listed.
$ systemctl status dev-mapper-crypt2.device
● dev-mapper-crypt2.device - /dev/mapper/crypt2
Follow: unit currently follows state of sys-devices-virtual-block-dm\x2d6.device
Loaded: loaded
Drop-In: /run/systemd/generator/dev-mapper-crypt2.device.d
└─90-device-timeout.conf
Active: active (plugged) since Sat 2015-07-04 14:54:08 PDT; 1h 3min ago
Device: /sys/devices/virtual/block/dm-6
Jul 04 14:54:08 fubar systemd[1]: Found device /dev/mapper/crypt2.
Offline
I am also getting this issue with an encrypted btrfs array...
$ cat /etc/crypttab
storage0 /dev/sdb1 /root/key
storage1 /dev/sdc1 /root/key
$ cat /etc/fstab
UUID=5CE1-FC8E /boot/efi vfat defaults 0 2
UUID=5d959a82-74bb-41bf-9ba0-ac6bfe7c3e2b / btrfs defaults,ssd,noatime,compress=lzo,subvol=__active 0 1
UUID=3bc453c0-16c2-4c72-ba38-8b2e4f7153fa /storage btrfs defaults,noatime,compress=lzo,device=/dev/mapper/storage0,device=/dev/mapper/storage1,nofail 0 2
//WASTELAND/Storage /home/mark/Storage cifs guest,uid=1000,x-systemd.automount 0 2
$ systemctl list-jobs
JOB UNIT TYPE STATE
20 dev-mapper-storage1.device start running
1 jobs listed.
$ systemctl status dev-mapper-storage0.device
● dev-mapper-storage0.device - /dev/mapper/storage0
Follow: unit currently follows state of sys-devices-virtual-block-dm\x2d2.device
Loaded: loaded
Drop-In: /run/systemd/generator/dev-mapper-storage0.device.d
└─90-device-timeout.conf
Active: active (plugged) since Sat 2015-11-21 16:39:08 MST; 5min ago
Device: /sys/devices/virtual/block/dm-2
Nov 21 16:39:08 wasteland systemd[1]: Found device /dev/mapper/storage0.
End result is systemd hanging forever on:
[ *** ] A start job is running for /dev/mapper/storage1 ( 5min / no limit )
and...
# systemd-analyze
Bootup is not yet finished. Please try again later.
Last edited by CrazyIrish (2015-11-22 00:01:07)
Offline