You are not logged in.

#1 2025-04-24 12:49:11

ozwigh
Member
Registered: 2014-10-07
Posts: 33

[SOLVED] Unexpected partitions unmounting (incl. system) during boot.

Hello!

I think migration is close to installation (at least, it requires some similar steps), so I leave it there. If I'm wrong, feel free to move the topic to a more suitable subforum, dear moderators.

I suddenly got into trouble when migrating the existing (working perfectly) installation to new NVMe drives. I have done it dozens of times without any problems before; it's not a complex procedure: copy, change UUIDs, regenerate initramfs, GRUB stub, and it all just works.

But this time, I encountered unexpected filesystems unmounting during the boot process on the new drives in all modes except rescue. Not only user partitions were affected but also /proc, /sys, and/run. Logs/configs will follow.

Laptop: Dream Machines NS7x, i7-1260P, 64G RAM, latest UEFI (1.07.09TDES2).
Old drives: BIWIN 1TB.
New drives: ADATA Legend 800 2TB. Those are dirt cheap and work well, especially for a general-purpose PC (UNIX philosophy: one task, one device, heh). Also, they come with a thin radiator that fits for laptops. Therefore, no overheating, throttling, or reading/writing errors.

Partitioning:

Drive 1: 
	1: 512M EFI (EF00, FAT32)
	2: the rest mdadm RAID (FD00)
Drive 2: 
	1: 512M BOOT (8300, EXT4)
	2: the rest mdadm RAID (FD00)

The layout is: btrfs with subvolumes on LUKS on mdadm RAID.

mdadm.conf:

DEVICE partitions
...
ARRAY /dev/md/raid0 uuid=...

mkinitcpio.conf:

MODULES=(i915 dm_mod)
...
HOOKS=(base systemd autodetect microcode modconf kms keyboard sd-vconsole block mdadm_udev sd-encrypt filesystems fsck)

fstab

/dev/mapper/raid                                            /                          btrfs      noatime,ssd,compress=zstd,space_cache=v2,subvol=sys           0 1
/dev/mapper/raid                                            /home                      btrfs      noatime,ssd,compress=zstd,space_cache=v2,subvol=home          0 1
/dev/mapper/raid                                            /data                      btrfs      noatime,ssd,compress=zstd,space_cache=v2,subvol=data          0 1
/dev/mapper/raid                                            /ext                       btrfs      noatime,ssd,compress=zstd,space_cache=v2,subvol=ext           0 1
...

crypttab.initramfs

raid  /dev/md/raid0  none

grub.cfg

...
	linux /vmlinuz-linux root=/dev/mapper/raid rootflags=subvol=/sys loglevel=3 rd.systemd.show_status=0 rd.udev.log_priority=3 rd.luks.options=tries=0,timeout=0 rootflags=x-systemd.device-timeout=0 audit=0 threadirqs ibt=off rw fbcon=font:TER16x32 raid0.default_layout=2
...

When I boot in rescue mode (any kernel, I tried default, lts, zen, hardened, and even rt), I see what expected:

nvme0n1     259:0    0  1.8T  0 disk  
├─nvme0n1p1 259:1    0  512M  0 part  /boot
└─nvme0n1p2 259:2    0  1.8T  0 part  
  └─md127     9:127  0  3.6T  0 raid0 
    └─raid  253:0    0  3.6T  0 crypt /home
                                      /ext
                                      /data
                                      /
nvme1n1     259:3    0  1.8T  0 disk  
├─nvme1n1p1 259:4    0  512M  0 part  /esp
└─nvme1n1p2 259:5    0  1.8T  0 part  
  └─md127     9:127  0  3.6T  0 raid0 
    └─raid  253:0    0  3.6T  0 crypt /home
                                      /ext
                                      /data
                                      /

But booting in any other mode fails. Let's check the 'system.journal'.

...
// no errors so far
systemd[1]: Mounting /boot...
systemd[1]: Mounting /data...
systemd[1]: Mounting /esp...
systemd[1]: Mounting /ext...
systemd[1]: Mounting /home...
...
systemd[1]: Mounted /boot.
systemd[1]: Mounted /data.
systemd[1]: Mounted /esp.
systemd[1]: Mounted /ext.
systemd[1]: Mounted /home.
// filesystems mounted w/o errors
// services starting successfully
...
// here things starting to be more interesting
NetworkManager[1185]: <info>  [1745379036.7533] NetworkManager (version 1.52.0-1) is starting...
...
systemd[1]: Started Network Manager Script Dispatcher Service.
nm-dispatcher[1268]: Error: NetworkManager is not running.
systemd[1]: Failed to parse /proc/self/mountinfo: No such file or directory
// ??!
kernel: EXT4-fs (nvme0n1p1): unmounting filesystem b57e3e84-593e-47b2-b45d-f5d843e59e17.
// why??! it's /boot and it's in fstab.
systemd[1]: proc-sys-fs-binfmt_misc.automount: Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?
systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed with result 'unmounted'.
systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to unmount: No such file or directory
systemd[1]: Started Network Manager.
// why does the nm-dispatcher start before the nm is fully loaded?
...
// then goes a lot of errors for each service:
systemd[1]: dnsmasq.service: Failed to get executor path from fd: Function not implemented
systemd[1]: dnsmasq.service: Failed to spawn 'start-pre' task: Function not implemented
systemd[1]: dnsmasq.service: Failed with result 'resources'.
systemd[1]: dnsmasq.service: Failed to destroy cgroup /system.slice/dnsmasq.service, ignoring: Directory not empty
...
systemd[1]: Reached target Graphical Interface.
systemd[1]: Startup finished in 6.606s (firmware) + 8.364s (loader) + 1.787s (kernel) + 9.628s (initrd) + 3.306s (userspace) = 29.694s.
dbus-broker-launch[1182]: Activation request for 'org.freedesktop.resolve1' failed: The systemd unit 'dbus-org.freedesktop.resolve1.service' could not be found.
// a lot of service errors
systemd[1]: dnsmasq.service: Failed to get cgroup ID of cgroup /sys/fs/cgroup/system.slice/dnsmasq.service, ignoring: Value too large for defined data type
systemd[1]: /dev/null is not a device.
systemd[1]: Cannot open /proc/devices to resolve pts: No such file or directory
systemd[1]: dnsmasq.service: No devices matched by device filter.
systemd[1]: Attaching device control BPF program to cgroup /system.slice/dnsmasq.service failed: Invalid argument
systemd[1]: dnsmasq.service: Failed to get executor path from fd: Function not implemented
systemd[1]: dnsmasq.service: Failed to spawn 'start-pre' task: Function not implemented
systemd[1]: dnsmasq.service: Failed with result 'resources'.
systemd[1]: dnsmasq.service: Failed to destroy cgroup /system.slice/dnsmasq.service, ignoring: Directory not empty
systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.
...
systemd[1]: cloudflared.service:: Scheduled restart job, restart counter is at 5.
systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
systemd[1]: cloudflared.service: Failed to get cgroup ID of cgroup /sys/fs/cgroup/system.slice/cloudflared.service, ignoring: Value too large for defined data type
systemd[1]: cloudflared.service: Failed to get executor path from fd: Function not implemented
systemd[1]: cloudflared.service: Failed to spawn 'start' task: Function not implemented
systemd[1]: cloudflared.service: Failed with result 'resources'.
systemd[1]: cloudflared.service: Failed to destroy cgroup /system.slice/cloudflared.service, ignoring: Directory not empty
systemd[1]: Failed to start cloudflared DNS over HTTPS proxy.
...
systemd[1]: Received SIGINT.
// from what?
systemd[1]: Activating special unit System Reboot...
// reboot failed though
...
systemd[1]: Failed to add a watch for /run/systemd/ask-password: No such file or directory
...
systemd[1]: boot.mount: Failed to get executor path from fd: Function not implemented
systemd[1]: boot.mount: Failed to spawn 'umount' task: Function not implemented
systemd[1]: Failed unmounting /boot.
// of course, it's already unmounted for a reason unknown
...
systemd[1]: Reached target Unmount All Filesystems.
...
systemd[1]: Stopped target Preparation for Local File Systems.
...
systemd[1]: Finished System Reboot.
// didn't happen
systemd[1]: Reached target System Reboot.
// ??
...
systemd[1]: Shutting down.
// didn't happen as well

Of course, I searched the internet. Nothing comes even close. This really bothers me: "Got hangup/error on autofs pipe from kernel. Likely our automount point has been unmounted by someone or something else?". Someone or something else?! Come on... And nm-dispatcher behavior looks suspicious, though I doubt it can unmount /sys or even /boot.

[EDIT] Forgot to add: during loading, systemd prints [OK] for everything, then stops on nm-dispatcher ([OK] too). Nothing after that.

When I plug in old drives (everything is identical except UUIDs), everything works flawlessly. Any ideas?

Thank you!

Last edited by ozwigh (2025-04-25 14:50:27)

Offline

#2 2025-04-24 13:47:07

stfischr
Member
Registered: 2020-04-29
Posts: 30

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

Hi there.

What do you mean with rescue mode? Initramfs-fallback? If it works there try experimenting with the position of the autodetect hook or leave it out completely.

It's rather strange that it reports all mounts are successful and seconds later all mounts seem to be gone.
Have you checked for SMART-Errors?
Is /esp also failing or is it always just /boot?

Does mdadm report any problems?
Have you run a consistency check?

Offline

#3 2025-04-24 14:42:46

ozwigh
Member
Registered: 2014-10-07
Posts: 33

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

stfischr wrote:

What do you mean with rescue mode? Initramfs-fallback?

No, "systemd.unit=rescue.target" kernel parameter, with the same initramfs. It allows login with a root password for maintenance.

stfischr wrote:

If it works there try experimenting with the position of the autodetect hook or leave it out completely.

Will try, thank you.

stfischr wrote:

It's rather strange that it reports all mounts are successful and seconds later all mounts seem to be gone.

Yeah, I've never seen this before, too. I can understand hardware mounts, but system ones? Duh...

stfischr wrote:

Have you checked for SMART-Errors?

The disks are brand new, SMART if perfect, no errors (with "loglevel=7") but I even did badblocks check and checksums verify through the night. /facepalm
Everything is the same except configs I personally modified, initramfs and grub stub.

stfischr wrote:

Is /esp also failing or is it always just /boot?

Yes, all filesystems are unmounted. Even zram, tmpfs and kernel ones.

stfischr wrote:

Does mdadm report any problems? Have you run a consistency check?

Nothing. No errors. I'll try a check, though files' checksums are identical. But who knows...

Thank you!

Offline

#4 2025-04-24 15:15:43

stfischr
Member
Registered: 2020-04-29
Posts: 30

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

ozwigh wrote:
stfischr wrote:

What do you mean with rescue mode? Initramfs-fallback?

No, "systemd.unit=rescue.target" kernel parameter, with the same initramfs. It allows login with a root password for maintenance.

Then autodetect is the wrong place to look. In the rescue target do you need to manually mount anything or does everything just work there?

Everything is the same except configs I personally modified, initramfs and grub stub.

I see a lot of timeout=0 stuff in the grub.cfg, I haven't  used grub in a long time are they standard? Try to boot with the least kernel parameters possible. Or if you can check

cat /proc/cmdline

in the rescue shell to see if there are differences.

Offline

#5 2025-04-24 17:01:03

ozwigh
Member
Registered: 2014-10-07
Posts: 33

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

stfischr wrote:

Then autodetect is the wrong place to look. In the rescue target do you need to manually mount anything or does everything just work there?

No, it mounts everything in my case. Assembles raid, decrypts it, and mounts all that in fstab. It's just a single-user environment limited to root w/o some services started.

stfischr wrote:

I see a lot of timeout=0 stuff in the grub.cfg, I haven't  used grub in a long time are they standard? Try to boot with the least kernel parameters possible. Or if you can check

cat /proc/cmdline

in the rescue shell to see if there are differences.

Thank you for /proc/cmdline suggestion, will try.
As for the rd.luks.options=timeout=0, it just disables any passphrase entering timeouts for dmcrypt (for example, I turned on PC and went to smoke, it'll not fail to rescue mode while waiting), x-systemd.device-timeout disables timeout for any fstab entry. No, they aren't standard, the first is ~60s I think by default, about the second I even don't know. Those aren't grub parameters, the first is kernel one, the second is systemd. It worked for years so I forgot. And the fact that everything works in rescue mode shows that it's ok. But I'll play with command line a little more... /sigh
Me using grub is more like a tradition. What's linux without old scary grub? Though, I only use it to chainload UKIs and EFI binaries (don't like the idea to be limited to the only one boot option). BTW, UKI fails the same way as the traditional method (and works in the rescue mode).
Must be something simple that I missing. My primary suspects now are NetworkManager and nm-dispatcher, will play with those tomorrow. Analyzing log with loglevel=7 is a hellish work...

Offline

#6 2025-04-25 07:40:05

seth
Member
Registered: 2012-09-03
Posts: 63,637

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

systemd[1]: /dev/null is not a device.
systemd[1]: Cannot open /proc/devices to resolve pts: No such file or directory

You somehow cloned procfs and devfs (probably also sysfs)?

Can you please post the complete, unaltered journal (w/ timestamps and everything, if you want to hide your embarrassing hostname, please sed it, don't delete any columns to maintain syntax highlighting, thanks)

Last edited by seth (2025-04-25 07:40:20)

Offline

#7 2025-04-25 14:48:26

ozwigh
Member
Registered: 2014-10-07
Posts: 33

Re: [SOLVED] Unexpected partitions unmounting (incl. system) during boot.

seth wrote:

You somehow cloned procfs and devfs (probably also sysfs)?

I certainly didn't, at least consciously. It would require so much typing!

But... The problem is solved. Unxepectedly, as it appeared. As I thought, the culprit was the nm-dispatcher. I logged into rescue mode, masked 'NetworkManager-dispatcher.service' and voila. No errors, everything works as before the migration. Fushigi...
There are no third-party hooks in 'dispatcher.d', but I'll investigate the issue later. I'm quite curious how it did that and what triggered such behavior. Maybe it's DBus issue.

Mounts:

/dev/mapper/sys on / type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,subvolid=256,subvol=/sys)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=4096k,nr_inodes=8188242,mode=755,inode64)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (ro,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=13108776k,nr_inodes=819200,mode=755,inode64)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=5922)
none on /dev/binderfs type binder (rw,relatime,max=1048576)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/credentials/systemd-journald.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,gid=78,mode=1770,pagesize=2M)
/dev/mapper/sys on /data type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,subvolid=258,subvol=/data)
/dev/mapper/sys on /ext type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,subvolid=259,subvol=/ext)
/dev/mapper/sys on /home type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,subvolid=257,subvol=/home)
tmpfs on /ram type tmpfs (rw,nosuid,nodev,noatime,size=50331648k,inode64)
/dev/nvme0n1p1 on /esp type vfat (ro,noatime,fmask=0111,dmask=0000,allow_utime=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)
/dev/zram2 on /cache type ext4 (rw,nosuid,nodev,noatime,discard,nobarrier)
/dev/zram1 on /tmp type ext4 (rw,nosuid,nodev,noatime,discard,nobarrier)
/dev/nvme1n1p1 on /boot type ext4 (rw,noatime)
/dev/zram3 on /mem type ext4 (rw,nosuid,nodev,noatime,discard,nobarrier)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
/dev/mmcblk0p1 on /safe type ext4 (rw,relatime)
/dev/zram2 on /var/cache type ext4 (rw,nosuid,nodev,noatime,discard,nobarrier)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=6554384k,nr_inodes=1638596,mode=700,uid=1000,gid=1000,inode64)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
seth wrote:

Can you please post the complete, unaltered journal (w/ timestamps and everything, if you want to hide your embarrassing hostname, please sed it, don't delete any columns to maintain syntax highlighting, thanks)

The issue is resolved and I'll mark it as such, but if you're interested, I can upload binary journals to proton drive. Text logs are too huge and I think will exceed forum limitations by far.

P.S. Spot on about embarrassing host name. Not that embarrassing though.

Thank you very much, Seth and Stfischr, for trying to help and allowing me to clear my mind, very appreciated!

Offline

Board footer

Powered by FluxBB