You are not logged in.
Pages: 1
Topic closed
Hi all,
Post update today, my machine (Thinkpad W-530) failed to reboot successfully.
It hung on "::Running early hooks [udev]" and didn't go anywhere else.
I thought of booting with a backup image, however none of my Grub entries pointed to an alternate vmlinuz, so this wasn't an option.
I tried to tinker with some of my grub settings, and randomly commented out "set gfxpayload=keep", which is a setting I've had present for quite some time.
Imagine my surprise when this actually worked and let me boot up again (I should probably refresh my ISO USB, it's been some time).
I went ahead and updated "/etc/default/grub" to comment the line: "#GRUB_GFXPAYLOAD_LINUX=keep", and regenerated my grub via:
grub-mkconfig -o /boot/grub/grub.cfg
which seemed ok, until I began to scrutinize what it generated. For some reason, it added a "search" entry which is trying to point to my dmcrypt (full disk encryption sans /boot) with (iirc) LUKS2.
For comparison, I will paste my old grub entry, my new grub entry, and my fdisk/by-uuid/df outputs.
Old grub entry:
### BEGIN /etc/grub.d/10_linux ###
menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-ab10990f-f3b0-47c1-aaaf-1c055758b147' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 fbae37f8-6a60-4ef9-bd87-34c9c28cfd0f
else
search --no-floppy --fs-uuid --set=root fbae37f8-6a60-4ef9-bd87-34c9c28cfd0f
fi
echo 'Loading Linux linux ...'
linux /vmlinuz-linux root=UUID=ab10990f-f3b0-47c1-aaaf-1c055758b147 rw cryptdevice=/dev/sda2:foo intel_pstate=disable pcie_port_pm=off loglevel=3
echo 'Loading initial ramdisk ...'
initrd /initramfs-linux.img
}
New grub entry:
### BEGIN /etc/grub.d/10_linux ###
menuentry 'Arch Linux' --class arch --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-ab10990f-f3b0-47c1-aaaf-1c055758b147' {
load_video
insmod gzio
insmod ext2
search --no-floppy --fs-uuid --set=root ab10990f-f3b0-47c1-aaaf-1c055758b147
echo 'Loading Linux linux ...'
linux /vmlinuz-linux root=UUID=ab10990f-f3b0-47c1-aaaf-1c055758b147 rw cryptdevice=/dev/sda2:foo intel_pstate=disable pcie_port_pm=off loglevel=3
echo 'Loading initial ramdisk ...'
initrd /initramfs-linux.img
}
and my disk info:
[root@blub grub]# fdisk -l
Disk /dev/sda: 238.47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: Samsung SSD 840
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x6789989b
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 1001471 999424 488M 83 Linux
/dev/sda2 1001472 357369855 356368384 169.9G 83 Linux
/dev/sda3 * 357369856 500117503 142747648 68.1G 7 HPFS/NTFS/exFAT
Disk /dev/mapper/foo: 169.91 GiB, 182443835392 bytes, 356335616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
[root@blub grub]# ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 10 Oct 13 20:53 1bf8d47d-5191-4d9c-8121-3306edee4d1f -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 13 20:53 3BC36E9F5496BE0A -> ../../sda3
lrwxrwxrwx 1 root root 10 Oct 13 20:53 ab10990f-f3b0-47c1-aaaf-1c055758b147 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Oct 13 20:53 fbae37f8-6a60-4ef9-bd87-34c9c28cfd0f -> ../../sda1
[root@blub grub]# df -h
Filesystem Size Used Avail Use% Mounted on
dev 7.6G 0 7.6G 0% /dev
run 7.6G 1.2M 7.6G 1% /run
/dev/mapper/foo 167G 156G 2.7G 99% /
tmpfs 7.6G 272K 7.6G 1% /dev/shm
tmpfs 7.6G 4.0K 7.6G 1% /tmp
/dev/sda3 69G 60G 9.0G 87% /mnt/windows
/dev/sda1 471M 83M 364M 19% /boot
tmpfs 1.6G 12K 1.6G 1% /run/user/1000
So, my old file properly pointed to fbae37 (my /boot) in the search line, but this new one is (for some reason) pointing ab1099 in the search line.
I tried re-reading the grub wiki page, maybe my brain has glazed over at this point, because I couldn't find a good description of what "search" is actually doing (other than now causing an error on boot up about ab1099 not being found - yet I still get to the decryption keyphrase prompt).
If anyone had the time to read all of this, my outstanding/summarized questions:
- How do I fix the "search" error being produced on my boot (without editing /boot/grub/grub.cfg directly...)
- How can I restore the gfxpayload=keep setting (and is this setting necessary if I wanted to try adding the bootup background image? I just noticed that setting)
- Is there a safe way to tell if some type of change will impact my system in a way that'd make it unbootable in the future? (like simulating a dry-restart, so I can address the issue without almost reaching for an ISO + arch chroot)
Last edited by jehiva (2022-10-14 01:18:50)
Offline
Loaded MODULES=(intel_agp i915) in /etc/mkinitcpio.conf ?
After udev you probably have either autodetect or plymouth
HOOKS=(base udev plymouth autodetect keyboard keymap consolefont modconf block plymouth-encrypt filesystems fsck)
So I would imagine with new linux kernel you running into some sort of new issues with graphics.
Last edited by u666sa (2022-10-14 04:44:24)
Offline
Those settings are currently:
[root@blub etc]# grep -E '^(MODULES|HOOKS)' ./mkinitcpio.conf
MODULES=(vfio vfio_iommu_type1 vfio_pci vfio_virqfd nouveau)
HOOKS=(base udev autodetect keyboard keymap consolefont modconf block encrypt filesystems fsck modconf)
Would you recommend I include the things noted in your sample? (or remove anything from my current one?)
Offline
I'm having the same issue, except removing gfxpayload=keep doesn't work for me.
I also have an nvidia card, except the drivers are blacklisted because i use it for gpu passthrough to a windows vm
Offline
Out of curiosity, is your issue specific to 6.0 kernels? If you downgrade to 5.19 does it go away?
I'm on a very different setup (systemd-boot + dracut), but 6.0 kernels reliably produce an issue which sounds very similar to yours - dracut hits early udev and then the whole system locks up, cursor stops blinking. (Kind of hard to get logs out at that point since root hasn't been mounted yet.) Downgrading to 5.19.13 fixes it right away for me. And I'm also on intel + nvidia graphics, with the nvidia gpu being passed to vfio-pci, so I'm not using any nvidia drivers or kernel options.
Offline
Yup! linux-6 is the culprit (from what I hear hopping between threads - supposedly nvidia hasn't had time to update the 470xx drivers, and/or linux 6 removes some acpu/power features needed on some models of laptops for some reason).
But at least we have rust in the kernel, hooray /s
Offline
Rust isn't until 6.1, but that doesn't have anything to do with how quickly nvidia updates old drivers.
Offline
I did a bit more investigation today and found that, in my case, the issue was binding vfio-pci to the nvidia card. My system boots fine on 6.0 kernels if nouveau is blacklisted, but if I try to bind vfio-pci, it locks up. Still don't understand the root cause though.
Edit: It also seems that binding vfio-pci to the GPU later (ie after everything's booted up) is fine. It's only the early binding that causes an issue. (And, for whatever it's worth, the nvidia audio controller can be bound to vfio-pci without issues, only the VGA device is a problem.)
Last edited by tummychow (2022-10-29 07:35:53)
Offline
I did a bit more investigation today and found that, in my case, the issue was binding vfio-pci to the nvidia card. My system boots fine on 6.0 kernels if nouveau is blacklisted, but if I try to bind vfio-pci, it locks up. Still don't understand the root cause though.
Edit: It also seems that binding vfio-pci to the GPU later (ie after everything's booted up) is fine. It's only the early binding that causes an issue. (And, for whatever it's worth, the nvidia audio controller can be bound to vfio-pci without issues, only the VGA device is a problem.)
How would I go about doing that? Is that a fix, or a workaround?
I'm still having issues when updating to 6.0.7.
Last edited by Qwerty-Space (2022-11-07 21:04:14)
Offline
How would I go about doing that? Is that a fix, or a workaround?.
Idk, depends how your setup is arranged. Basically I'm saying you need to undo https://wiki.archlinux.org/title/PCI_pa … _device_ID - however you're binding the vfio-pci module to the GPU in early boot, you need to stop doing it. You can blacklist the nouveau module (https://wiki.archlinux.org/title/Kernel … acklisting) to ensure the GPU doesn't get bound to any driver. Whether you characterize this as a fix or workaround is semantics.
If you aren't binding vfio-pci to the GPU at all and you're still getting early boot issues, then I'm not sure. That's probably the same issue under the hood, but it's not the issue I have.
Offline
Some more digging led me to an actual fix: the system isn't frozen at all, it's only the graphics. I suspect that all of us in this thread are using full disk encryption, right? The hang is because it's waiting for your password - the boot actually continued normally, but the screen is frozen so you don't see the prompt. If you type it in, root will mount and the drivers will become available, and the screen will recover.
Something in kernel 6.0 causes the EFI framebuffer (responsible for very early graphics) to break; the reason for this I still don't understand (but clearly binding the vfio driver to the gpu is one way to cause the breakage; the aforementioned gfxpayload=keep kernel option might be another). The solution is to load your graphics driver's kernel module in the initrd, before root is mounted. In my case it's i915 because I want to use the igpu.
Related links:
https://bbs.archlinux.org/viewtopic.php … 3#p2063423
https://forum.manjaro.org/t/kernel-6-0- … v/124695/2
https://forum.level1techs.com/t/linux-k … ugh/190039
https://www.heiko-sieger.info/vfio-grub … e-feature/
Offline
The solution is to load your graphics driver's kernel module in the initrd, before root is mounted.
can you please explain how to do that?
I have been trying for ours to understand what exactly needs to be done, but I cannot. I have read the 4 links you added, and 40 more, but can't get it to work...
The (annoying) workaround I have found it to remove the DP cable from my dGPU (Nvidia 4080), before I switch on the PC, and leave only the HDMI on my iGPU.
If I have the cable connected, there is no way I get to login (with vfio modules loaded)
Offline
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
Pages: 1
Topic closed