You are not logged in.

#1 2024-11-23 10:34:00

damiano
Member
Registered: 2024-04-30
Posts: 12

Strange behavior while booting

I have a very strange issue with my laptop that I thought I had fixed (since the frequency dropped after I tried various fixes), but it seems I was wrong and that it still occurs.

So "sometimes" Arch halts in the boot process (where it shows a timer of 1 min 31 sec at a certain step). After the time is up the boot continues normally, but network and sound doesn't work.

I can reproduce the error by being chrooted to /mnt while having a btrfs partition mounted to /mnt and /boot mounted to /mnt/boot and rebooting without unmounting. Sometimes it also happens if I reboot or shutdown right after waking up from suspend.

I have tried a lot of various things such as
1) change fsck values for boot to 0 0 in fstab (since it often stops when checking /boot). Didn't help.
2) blacklisting i2c-i801 module (since I had errors mentioning smbus after these strange boots). Blacklisting it removed the error messages but not the problem.
3) reinstalling. Didn't help.
4) masking various systemd services that were mentioned in the logs. Didn't help even if I masked services until the system wouldn't boot at all.
5) every fix I could find while googling related to errors in booting with systemd. Not one worked.

So nothing of the above has helped (I can give more specifics on exactly what I have tried if it's relevant).

The strangest things is that I can't reproduce this error when booting with systemd on Ubuntu, Linux Mint, NixOS or Gentoo (I can't reproduce it on Gentoo with OpenRC either, but that is probably unnecessary info in this case).

Can it be that systemd is compiled differently on Arch Linux than on these, or that these distros install "something" that is not automatically pulled in when installing Arch?

Here is a log with udev_log=debug from a failed boot:
https://0x0.st/X5MG.txt

Can anyone please help me find the culprit?

Offline

#2 2024-11-23 15:49:16

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,447

Re: Strange behavior while booting

Symptoms sound like you are not using persistent names in your fstab and/or bootloader, the device name swaps and all sorts of weirdness ensues.

What's in your fstab? Did you follow https://wiki.archlinux.org/title/Persis … ice_naming

That root line also reads weird, what's your bootloader setup and why are initramfs ucode and the subvol to boot into versioned with an install date? Which instructions did you follow to set Arch up?

Also maybe a journal without the udev log, it seems to create a bunch of not entirely relevant noise.

Last edited by V1del (2024-11-23 15:50:06)

Offline

#3 2024-11-24 07:50:32

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

Thank you @V1del for looking into this.

I use UUIDs to avoid such problems. The fstab was generated by "genfstab -U" (I changed 0 0 to 0 1 for root partition manually though) and looks like this:

# /dev/nvme0n1p2
UUID=8abbc627-6f4f-4015-82f2-951babde1de7	/         	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvol=/arch_20241121_1518	0 1
# /dev/nvme0n1p1
UUID=D120-13D8      	/boot     	vfat      	rw,relatime,fmask=0137,dmask=0027,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro	0 2

The reason initramfs, intel_ucode and the kernel have dates in them is because I have my own script that can create an Arch Linux install by
1) creating a btrfs subvolume
2) using pacstrap to install packages (don't worry, it uses the cached packages unless any upgrades exist, so it won't stress the repo servers)
3) configure the install

In the script I have a pacman hook that copies the kernel, initramfs and intel_ucode and adds the install date since I then can have multiple subvolumes with different kernels without a clash (since Arch normally doesn't show version number an old install would break if a new one upgraded the kernel).

I am using this script on another computer (a desktop computer and not a laptop, but that shouldn't matter) and it never fails to boot. I have also tested to reproduce the error on this laptop now when using a kernel that is not renamed, i.e. by using the following boot loader entry:

title   ArchLinux
linux   /vmlinuz-linux-lts
initrd  /intel-ucode.img
initrd  /initramfs-linux-lts.img
options root=UUID=8abbc627-6f4f-4015-82f2-951babde1de7 rootflags=subvol=arch_20241121_1518 rw i915.enable_psr=0 rootfstype=btrfs

It still halts in the same way if I suspend and then reboot. I don't believe neither the fstab nor the hook are the problem here.

Here's an example of the pacman hook for the kernel:

[Trigger]
Operation = Install
Operation = Upgrade
Type = Path
Target = usr/lib/initcpio/*
Target = usr/lib/firmware/*
Target = usr/src/*/dkms.conf
Target = usr/lib/modules/*/vmlinuz

[Action]
Description = Rename kernel to have arch_20241121_1518 in it.
When = PostTransaction
Exec = /bin/bash -c 'cp /boot/vmlinuz-linux-lts /boot/vmlinuz-arch_20241121_1518 && cp /boot/initramfs-linux-lts.img /boot/initramfs-arch_20241121_1518'

Another thing. If it seems that very specific artificial conditions need to be met for the bug to trigger that is actually not the case. Arch fails to boot from time to time on this machine when I have used the computer "normally", it's just that I found that I could trigger the boot failure using suspend and mounting stuff without un-mounting.

Offline

#4 2024-11-24 09:41:59

Head_on_a_Stick
Member
From: The Wirral
Registered: 2014-02-20
Posts: 8,480
Website

Re: Strange behavior while booting

damiano wrote:

I changed 0 0 to 0 1 for root partition manually though

Change it back, btrfs is not fsck-able and so should have fs_passno set to 0.

damiano wrote:

Arch normally doesn't show version number

Use file(1) to determine the version of the kernel image.


Para todos todo, para nosotros nada

Online

#5 2024-11-24 09:43:28

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

I have continued to read up on the issue and I now tried to let udev get more time to settle, i.e. I added the following kernel parameter:

udev.event-timeout=300

and with that set I cannot reproduce the error after five reboots and when I changed it to:

udev.event-timeout=5

I could trigger the bug after one reboot, so I might be fixed, but only time will tell. I will post here in a couple of weeks if the error is gone, and will post before that if it appears.

One question though. What is the difference between udev.event-timeout and rd.udev.event-timeout? Both are listed here for setting the timeout variable:
https://manpages.org/systemd-udevd/8
rd stands for ramdisk, right? So it sets the variable for the initramfs? If so, does it matter which one I set? Or should both be set? Sorry if the question is obvious...

Offline

#6 2024-11-26 12:02:26

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

Head_on_a_Stick wrote:

Change it back, btrfs is not fsck-able and so should have fs_passno set to 0.

Thank you! I have changed this!

Head_on_a_Stick wrote:

Use file(1) to determine the version of the kernel image.

Well I am not sure how that would help with the script. Since every kernel is named vmlinuz-linux-lts and they are on a shared partition (the vfat /boot  one) there will be a collision when I have multiple installs since they all look for the same file.

Offline

#7 2024-11-26 12:07:20

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

Unfortunately the kernel parameter didn't help. I was just fortunate when I tested it. Yesterday I ran into the boot issue again and today too. Could it be that it's a suspend issue? That the problem arises when I have left the computer so that it suspends and locks it?

V1del wrote:

Also maybe a journal without the udev log, it seems to create a bunch of not entirely relevant noise.

I included it since I believed it was the problem. I will add one without udev_log=debug.

EDIT: Here it is:
https://0x0.st/X570.txt

Last edited by damiano (2024-11-26 12:21:22)

Offline

#8 2024-11-26 12:52:18

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,447

Re: Strange behavior while booting

Also without using -x in the future please, it bloats the logs with mostly useless messages.

The main thing I see which can definitely lead to all sorts of weirdness is that you don't have a static hostname, you should set one: https://wiki.archlinux.org/title/Networ … e_hostname

While I don't see any obvious reference to conflicting services, the output of

find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f

can also help identify whether something is going wrong

Offline

#9 2024-11-26 12:54:55

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

Another log where it failed to boot completely:
https://0x0.st/X5hH.txt

Offline

#10 2024-11-26 13:16:43

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,447

Re: Strange behavior while booting

Ah yeah that's interesting as well

Nov 26 13:43:18 archlinux kernel: mousedev: PS/2 mouse device common for all mice
Nov 26 13:43:18 archlinux kernel: input: Microsoft Surface Keyboard as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input15
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input16
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard UNKNOWN as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input17
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard Touchpad as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input18
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard UNKNOWN as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input19
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard UNKNOWN as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input20
Nov 26 13:43:19 archlinux kernel: input: Microsoft Surface Keyboard UNKNOWN as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0922.0001/input/input21
Nov 26 13:43:19 archlinux kernel: hid-multitouch 0003:045E:0922.0001: input,hiddev96,hidraw0: USB HID v1.11 Keyboard [Microsoft Surface Keyboard] on usb-0000:00:14.0-1.3/input0
Nov 26 13:44:18 archlinux systemd-udevd[421]: INT3403:01: Worker [440] processing SEQNUM=3169 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0040:00: Worker [457] processing SEQNUM=3176 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0029:00: Worker [450] processing SEQNUM=3174 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0107:00: Worker [426] processing SEQNUM=3206 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0091:00: Worker [446] processing SEQNUM=3180 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0153:00: Worker [452] processing SEQNUM=3209 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0133:00: Worker [424] processing SEQNUM=3208 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: INT33BE:00: Worker [456] processing SEQNUM=3216 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: INT3472:00: Worker [430] processing SEQNUM=3217 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:04.0: Worker [432] processing SEQNUM=3440 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:15.1: Worker [447] processing SEQNUM=3480 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:14.2: Worker [429] processing SEQNUM=3478 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:15.0: Worker [428] processing SEQNUM=3479 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: intel_xhci_usb_sw: Worker [425] processing SEQNUM=3442 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:13.0: Worker [454] processing SEQNUM=3441 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:16.4: Worker [442] processing SEQNUM=3482 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:16.0: Worker [434] processing SEQNUM=3481 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0003:045E:0922.0001: Worker [438] processing SEQNUM=2951 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: input11: Worker [445] processing SEQNUM=2943 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 2-4.2:2.1: Worker [433] processing SEQNUM=3475 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 2-4.2:2.0: Worker [444] processing SEQNUM=3474 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:01:00.0: Worker [423] processing SEQNUM=3486 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:1e.0: Worker [453] processing SEQNUM=3519 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: 0000:00:1f.3: Worker [443] processing SEQNUM=3524 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: MSHW0040:00: Worker [431] processing SEQNUM=3539 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: pcspkr: Worker [449] processing SEQNUM=3560 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu3: Worker [427] processing SEQNUM=3624 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu2: Worker [455] processing SEQNUM=3623 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu0: Worker [435] processing SEQNUM=3621 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu1: Worker [448] processing SEQNUM=3622 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu4: Worker [451] processing SEQNUM=3625 is taking a long time
Nov 26 13:44:18 archlinux systemd-udevd[421]: cpu5: Worker [437] processing SEQNUM=3626 is taking a long time

looks HW related, and something I'm just noticing, any reason for the LTS kernel? What happens on stable? the LTS kernel is over a year old in terms of HW enablement support.

Offline

#11 2024-11-26 14:49:03

damiano
Member
Registered: 2024-04-30
Posts: 12

Re: Strange behavior while booting

V1del wrote:

Also without using -x in the future please, it bloats the logs with mostly useless messages.

The main thing I see which can definitely lead to all sorts of weirdness is that you don't have a static hostname, you should set one: https://wiki.archlinux.org/title/Networ … e_hostname

While I don't see any obvious reference to conflicting services, the output of

find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f

can also help identify whether something is going wrong

Oh, sorry!

The hostname is set in /etc/hostname

damiano@archlinux ~ $ cat /etc/hostname 
archlinux

I ran the stable kernel before and switched to the LTS to try to fix the issue. I have reinstalled the stable (6.12.1-arch1-1) but it still occurs.

Here's the output of the command, does it say anything of interest?

damiano@archlinux ~ $ find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f

avahi-daemon.service                     | multi-user.target.wants
avahi-daemon.socket                      | sockets.target.wants
cups.path                                | multi-user.target.wants
cups.service                             | multi-user.target.wants
cups.service                             | printer.target.wants
cups.socket                              | sockets.target.wants
dbus-org.freedesktop.Avahi.service       | system
dbus-org.freedesktop.nm-dispatcher.service | system
display-manager.service                  | system
getty@tty1.service                       | getty.target.wants
NetworkManager.service                   | multi-user.target.wants
NetworkManager-wait-online.service       | network-online.target.wants
p11-kit-server.socket                    | sockets.target.wants
pipewire-pulse.socket                    | sockets.target.wants
pipewire-session-manager.service         | user
pipewire.socket                          | sockets.target.wants
remote-fs.target                         | multi-user.target.wants
systemd-userdbd.socket                   | sockets.target.wants
ufw.service                              | multi-user.target.wants
wireplumber.service                      | pipewire.service.wants

And here is a log without -x from the most recent failed boot using the stable kernel:
https://0x0.st/X5CZ.txt

Offline

Board footer

Powered by FluxBB