You are not logged in.

#1 2024-11-27 17:35:27

physicsBTW
Member
Registered: 2023-11-07
Posts: 3

VFIO gpu passthrough kernel regression

I'm using https://gitlab.com/akshaycodes/vfio-script to help pass-through my AMD graphics card to a windows 10 virtual machine and have my graphics card reset properly when the VM shuts off.  I'm passing through my single GPU connected to my display, so the behavior is that when the VM starts, the display output switches to coming from the Linux host to coming from the windows VM directly, and switches back to Linux once the guest shuts off.

This worked great, until recently. After upgrading 6.11.9.arch1-1 -> 6.12.1.arch1-1, the graphics card and audio function still gets passed through and detected by the virtual machine, with vfio-pci being shown as the active kernel driver for the graphics card/ graphics card audio device with lspci -nnv as expected.  I get display output of the VM bios, and beginning portions of the windows boot process, but the display goes black once windows attempts to load the graphics driver. Connecting to the windows VM via RDP shows in device manager that the graphics card is detected, but the driver failed to load with error code 43. Shutting down the virtual machine causes the graphics card to correctly detach, reset, and drop me back into the SDDM login screen the same as before the kernel update. Even worse, after rebooting the virtual and physical machine many times trying to debug this, the graphics driver on windows loaded in properly, which I was not able to reproduce again with no changes to system configuration between the good attempt and all subsequent ones. The behavior is identical to what I experienced without setting a random hypervisor vendor id in libvirt to trick the AMD graphics drivers to load in windows. No errors are reported in the libvirt logs.

I also noticed that regardless of whether I attach any additional PCI or USB devices, setting the cpu feature policy <feature policy="disable" name="hypervisor"/> in libvirt will make windows guests, and only windows guests, hang on boot. This was also not the case on kernel version < 6.12, and is required by certain games to run in a VM.

Loading the linux-lts 6.6.63-1 kernel and starting the VM solves the issue in the meantime.

Any help to try and get this working on the latest kernel would be appreciated.

Graphics card is a 6700xt.

Last edited by physicsBTW (2024-11-27 17:52:27)

Offline

#2 2024-11-27 17:53:33

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 715
Website

Re: VFIO gpu passthrough kernel regression

Could you also confirm again that using 6.11 does fix the issue?

sudo pacman -U https://archive.archlinux.org/packages/l/linux/linux-6.11.9.arch1-1-x86_64.pkg.tar.zst

Offline

#3 2024-11-27 18:01:08

physicsBTW
Member
Registered: 2023-11-07
Posts: 3

Re: VFIO gpu passthrough kernel regression

Ran the command you gave me and rebooted, verified:

uname -r
6.11.9-arch1-1

Started the VM and it worked.


Ran sudo pacman -Syu to go back to the latest kernel, rebooted:
uname -r
6.12.1-arch1-1

Tried to run the VM and encountered the issue in the post.

Last edited by physicsBTW (2024-11-28 17:06:33)

Offline

#4 Yesterday 22:55:38

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 715
Website

Re: VFIO gpu passthrough kernel regression

Could you post a full dmesg? Also this could be a kernel regression, which should be bisected and reported to the upstream kernel developers

Are you confident to do the bisection on your own or do you need some help?
If you want we could also provide you with pre-built kernel images for you to test (which greatly speeds up the test time) smile 

Good info to get you started is:
- https://docs.kernel.org/admin-guide/rep … sions.html
- https://wiki.archlinux.org/title/Kernel … egressions

Since we already determined that there are multiple issues at hand here (sleep + brightness) I would like to have a look at the sleep issue.

Additionally it would be good to see if the latest release candidate for mainline is affected:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.13rc1-1-x86_64.pkg.tar.zst

(note that this installs the kernel as linux-mainline, so you need to configure your bootloader to boot it (for example via grub-mkconfig -o ... or by writing the systemd-boot loader entry))

Offline

#5 Yesterday 23:56:02

Ranguvar
Member
Registered: 2008-08-12
Posts: 2,554

Re: VFIO gpu passthrough kernel regression

physicsBTW wrote:

I also noticed that regardless of whether I attach any additional PCI or USB devices, setting the cpu feature policy <feature policy="disable" name="hypervisor"/> in libvirt will make windows guests, and only windows guests, hang on boot. This was also not the case on kernel version < 6.12, and is required by certain games to run in a VM.

Thanks for this note, I believe something like this is also affecting me (similar thread in this subforum), even though I use Polaris host graphics and an NVIDIA passthrough card.

I normally don't set this parameter, instead simply having:

  <cpu mode='host-passthrough' check='full' migratable='off'>
    <topology sockets='1' dies='1' clusters='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>

However I've tried adding this to my domain xml:

<feature policy='require' name='hypervisor'/>

If I run it like that with 6.12 it locks up once loading the Windows kernel, also causing some issues on Linux until the VM is stopped.
If I use “disable” it no longer pegs my cores to 100% usage or causes e.g. glxinfo, alacritty, and kitty to hang forever, but Windows still locks up on boot.

I haven’t the time for a git bisect right now but gromit did very kindly offer to help with builds.

Last edited by Ranguvar (Today 04:50:34)

Online

Board footer

Powered by FluxBB