You are not logged in.

#1 2024-11-27 17:35:27

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

VFIO gpu passthrough kernel regression

I'm using https://gitlab.com/akshaycodes/vfio-script to help pass-through my AMD graphics card to a windows 10 virtual machine and have my graphics card reset properly when the VM shuts off.  I'm passing through my single GPU connected to my display, so the behavior is that when the VM starts, the display output switches to coming from the Linux host to coming from the windows VM directly, and switches back to Linux once the guest shuts off.

This worked great, until recently. After upgrading 6.11.9.arch1-1 -> 6.12.1.arch1-1, the graphics card and audio function still gets passed through and detected by the virtual machine, with vfio-pci being shown as the active kernel driver for the graphics card/ graphics card audio device with lspci -nnv as expected.  I get display output of the VM bios, and beginning portions of the windows boot process, but the display goes black once windows attempts to load the graphics driver. Connecting to the windows VM via RDP shows in device manager that the graphics card is detected, but the driver failed to load with error code 43. Shutting down the virtual machine causes the graphics card to correctly detach, reset, and drop me back into the SDDM login screen the same as before the kernel update. Even worse, after rebooting the virtual and physical machine many times trying to debug this, the graphics driver on windows loaded in properly, which I was not able to reproduce again with no changes to system configuration between the good attempt and all subsequent ones. The behavior is identical to what I experienced without setting a random hypervisor vendor id in libvirt to trick the AMD graphics drivers to load in windows. No errors are reported in the libvirt logs.

I also noticed that regardless of whether I attach any additional PCI or USB devices, setting the cpu feature policy <feature policy="disable" name="hypervisor"/> in libvirt will make windows guests, and only windows guests, hang on boot. This was also not the case on kernel version < 6.12, and is required by certain games to run in a VM.

Loading the linux-lts 6.6.63-1 kernel and starting the VM solves the issue in the meantime.

Any help to try and get this working on the latest kernel would be appreciated.

Graphics card is a 6700xt.

Last edited by physicsBTW (2024-11-27 17:52:27)

Offline

#2 2024-11-27 17:53:33

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

Could you also confirm again that using 6.11 does fix the issue?

sudo pacman -U https://archive.archlinux.org/packages/l/linux/linux-6.11.9.arch1-1-x86_64.pkg.tar.zst

Offline

#3 2024-11-27 18:01:08

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

Ran the command you gave me and rebooted, verified:

uname -r
6.11.9-arch1-1

Started the VM and it worked.


Ran sudo pacman -Syu to go back to the latest kernel, rebooted:
uname -r
6.12.1-arch1-1

Tried to run the VM and encountered the issue in the post.

Last edited by physicsBTW (2024-11-28 17:06:33)

Offline

#4 2024-12-05 22:55:38

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

Could you post a full dmesg? Also this could be a kernel regression, which should be bisected and reported to the upstream kernel developers

Are you confident to do the bisection on your own or do you need some help?
If you want we could also provide you with pre-built kernel images for you to test (which greatly speeds up the test time) smile 

Good info to get you started is:
- https://docs.kernel.org/admin-guide/rep … sions.html
- https://wiki.archlinux.org/title/Kernel … egressions

Since we already determined that there are multiple issues at hand here (sleep + brightness) I would like to have a look at the sleep issue.

Additionally it would be good to see if the latest release candidate for mainline is affected:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.13rc1-1-x86_64.pkg.tar.zst

(note that this installs the kernel as linux-mainline, so you need to configure your bootloader to boot it (for example via grub-mkconfig -o ... or by writing the systemd-boot loader entry))

Offline

#5 2024-12-05 23:56:02

Ranguvar
Member
Registered: 2008-08-12
Posts: 2,563

Re: VFIO gpu passthrough kernel regression

physicsBTW wrote:

I also noticed that regardless of whether I attach any additional PCI or USB devices, setting the cpu feature policy <feature policy="disable" name="hypervisor"/> in libvirt will make windows guests, and only windows guests, hang on boot. This was also not the case on kernel version < 6.12, and is required by certain games to run in a VM.

Thanks for this note, I believe something like this is also affecting me (similar thread in this subforum), even though I use Polaris host graphics and an NVIDIA passthrough card.

I normally don't set this parameter, instead simply having:

  <cpu mode='host-passthrough' check='full' migratable='off'>
    <topology sockets='1' dies='1' clusters='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>

However I've tried adding this to my domain xml:

<feature policy='require' name='hypervisor'/>

If I run it like that with 6.12 it locks up once loading the Windows kernel, also causing some issues on Linux until the VM is stopped.
If I use “disable” it no longer pegs my cores to 100% usage or causes e.g. glxinfo, alacritty, and kitty to hang forever, but Windows still locks up on boot.

I haven’t the time for a git bisect right now but gromit did very kindly offer to help with builds.

Last edited by Ranguvar (2024-12-06 04:50:34)

Offline

#6 2024-12-07 22:02:07

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

Just want to check my understanding before I recompile the kernel several times. For the bisection:

git clone https://aur.archlinux.org/linux-git.git
makepkg --nobuild --nodeps
cd src/linux-torvalds
git bisect start
git bisect good v6.11
git bisect bad v6.12
cd ../..
makepkg -efsi
configure bootloader to use the installed kernel version and reboot
test if vm launches correctly
cd src/linux-torvalds
if(good)
git bisect good
else
git bisect bad
repeat until the bad commit is found.

Offline

#7 2024-12-07 22:14:31

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

Yes that sounds roughly right!

Offline

#8 2024-12-07 22:22:30

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

dmesg. Before you ask, disabling resizable bar/ above 4g decoding, and removing the amd_pstate=active amdgpu.ppfeaturemask=0xffffffff kernel parameters (gpu overclock script) was the first thing I tried, and it did not help. I also made sure that 6.12.3-arch1-1 had the same issue before posting.

EDIT - pastebin dmesg instead of having a really long post:
https://pastebin.com/mqHufdkd

Last edited by physicsBTW (2024-12-07 22:43:35)

Offline

#9 2024-12-07 22:31:34

loqs
Member
Registered: 2014-03-06
Posts: 18,135

Re: VFIO gpu passthrough kernel regression

You can save some time by using the bisection kernels gromit has already built.

$ git bisect start
Updating files: 100% (21455/21455), done.
Previous HEAD position was d390303b28da Linux 6.12.1
Switched to branch 'makepkg'
status: waiting for both good and bad commits
$ git bisect bad v6.12
status: waiting for good commit(s), bad commit known
$ git bisect good v6.11
Bisecting: 7334 revisions left to test after this (roughly 13 steps)
[509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
$ git describe 
v6.11-7272-g509d2cd12a
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-v6.11.r7272.g509d2cd-1-x86_64.pkg.tar.zst

Offline

#10 2024-12-07 22:42:51

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

@loqs, FYI this is the bash command I normally use to get check if there is one cached already:

URL="https://pkgbuild.com/~gromit/linux-bisection-kernels/linux-mainline-$(git describe --long --abbrev=7 | sed 's/\([^-]*-g\)/r\1/;s/-/./g')-1-x86_64.pkg.tar.zst"; curl --output /dev/null --silent --head --fail "${URL}" && echo "sudo pacman -U ${URL/\~/\~}" || echo "Not in cache"

Offline

#11 2024-12-08 00:26:57

kevinlpowell
Member
Registered: 2024-12-08
Posts: 8

Re: VFIO gpu passthrough kernel regression

FWIW - same issue on my Dell G15 5515 laptop
Using vfio passthrough to handover nvidia discreet GPU to win guest.
This was working great until the last few days.  I can confirm that downgrading to kernel 6.11.9-arch1-1 fixed the issue for me.

Offline

#12 2024-12-08 09:31:22

SimonP
Member
Registered: 2024-12-08
Posts: 5

Re: VFIO gpu passthrough kernel regression

I'm also having this problem on Debian, 6.11.10 good 6.12.3 bad. It's almost certainly an upstream kernel regression.

Offline

#13 2024-12-08 11:06:15

kevinlpowell
Member
Registered: 2024-12-08
Posts: 8

Re: VFIO gpu passthrough kernel regression

I was briefly possessed by the spirit of Dudley Do-right and wound up going through the git-bisect.

the final result : 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b is the first bad commit

here's the bisect log:

git bisect start
# status: waiting for both good and bad commits
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# status: waiting for bad commit, 1 good commit known
# bad: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12
git bisect bad adc218676eef25575469234709c2d87185ca223a
# good: [509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
git bisect good 509d2cd12a10d057fdf72f565b930f9a81140d59
# good: [356a0319456810f3a5618353f6ca3b0ef9965479] Merge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 356a0319456810f3a5618353f6ca3b0ef9965479
# bad: [974099e40e924a911000541fea0b59d075a3d1d0] Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 974099e40e924a911000541fea0b59d075a3d1d0
# good: [e08d227840bb9366c6321ae1e480b37ba5eec29b] Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect good e08d227840bb9366c6321ae1e480b37ba5eec29b
# bad: [78b7b991838a4a6baeaad934addc4db2c5917eb8] vxlan: Handle error of rtnl_register_module().
git bisect bad 78b7b991838a4a6baeaad934addc4db2c5917eb8
# bad: [3ed7df085225ea8736b80d1e1a247a40d91281c8] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad 3ed7df085225ea8736b80d1e1a247a40d91281c8
# bad: [55e6f8f29d6ac76126ad1c8000b4c3626cf4b176] Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 55e6f8f29d6ac76126ad1c8000b4c3626cf4b176
# bad: [41786cc5ea89b71437dd6fece444f3766edb4db7] Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 41786cc5ea89b71437dd6fece444f3766edb4db7
# bad: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
# good: [590b09b1d88e18ae57f89930a6f7b89795d2e9f3] KVM: x86: Register "emergency disable" callbacks when virt is enabled
git bisect good 590b09b1d88e18ae57f89930a6f7b89795d2e9f3
# good: [f9b56b2c31e5733c04464da1b73bafb9eff6569f] s390: Enable KVM_S390_UCONTROL config in debug_defconfig
git bisect good f9b56b2c31e5733c04464da1b73bafb9eff6569f
# good: [ec495f2ab12290b008a691e826b39b895f458945] KVM: Write the per-page "segment" when clearing (part of) a guest page
git bisect good ec495f2ab12290b008a691e826b39b895f458945
# good: [55f50b2f86929ae042cd2eee8b2e8ffe00b5a885] Merge branch 'kvm-memslot-zap-quirk' into HEAD
git bisect good 55f50b2f86929ae042cd2eee8b2e8ffe00b5a885
# good: [025dde582bbf31e7618f9283594ef5e2408e384b] KVM: Harden guest memory APIs against out-of-bounds accesses
git bisect good 025dde582bbf31e7618f9283594ef5e2408e384b
# good: [c09dd2bb5748075d995ae46c2d18423032230f9b] Merge branch 'kvm-redo-enable-virt' into HEAD
git bisect good c09dd2bb5748075d995ae46c2d18423032230f9b
# first bad commit: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD

Offline

#14 2024-12-08 11:43:58

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

You have landed on a merge commit "7056c4e2a13a ("Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD")" so most likely you have taken a wrong turn somewhere ... @loqs, any ideas for commits that should probably be re-tested?

Offline

#15 2024-12-08 13:39:17

loqs
Member
Registered: 2014-03-06
Posts: 18,135

Re: VFIO gpu passthrough kernel regression

My complete guess would be the first bad commit 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b and its parent c09dd2bb5748075d995ae46c2d18423032230f9b. Possibly full building both commits to avoid make have missed needing to rebuild anything.
Edit:
Could also try gromit's build of 55e6f8f Merge tag 'kvm-x86-svm-6.12':

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-v6.11.rc7.r197.g55e6f8f-1-x86_64.pkg.tar.zst

Last edited by loqs (2024-12-08 13:59:54)

Offline

#16 2024-12-08 18:01:23

kevinlpowell
Member
Registered: 2024-12-08
Posts: 8

Re: VFIO gpu passthrough kernel regression

It is quite possible that I 'took a wrong turn somewhere.'
I can check into it, but I won't have the time until late tonight (US/Pacific TZ).

I was able to quickly check linux-mainline-v6.11.rc7.r197.g55e6f8f-1-x86_64.pkg.tar.zst
That version does exhibit the problem for me: win guest hangs during boot and pegs a core at 100%

Offline

#17 2024-12-08 19:06:19

loqs
Member
Registered: 2014-03-06
Posts: 18,135

Re: VFIO gpu passthrough kernel regression

@physicsBTW could you please try linux-mainline-v6.11.rc7.r197.g55e6f8f-1-x86_64.pkg.tar.zst as well (linked in post #15) to see if it also exhibits the issue on your system?

@kevinlpowell would it help if gromit built 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b and its parent c09dd2bb5748075d995ae46c2d18423032230f9?

Offline

#18 2024-12-08 19:54:53

kevinlpowell
Member
Registered: 2024-12-08
Posts: 8

Re: VFIO gpu passthrough kernel regression

loqs wrote:

@kevinlpowell would it help if gromit built 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b and its parent c09dd2bb5748075d995ae46c2d18423032230f9?

@loqs : If gromit is willing to make those builds, I'm happy to give them a quick try (as time permits).  My laptop takes a while to do a kernel build, so it does accelerate things if I can snag an already built package.

Offline

#19 2024-12-08 23:00:56

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

@loqs

uname -r
6.11.0-rc7-1-mainline-00197-g55e6f8f29d6a

Works correctly, tested a few times to be sure. The kernel bisect was taking me a very long time because there were no cache hits against gromit's bisection kernels using linux-git, so I needed to recompile each time. What can I do here to help you guys?

Offline

#20 2024-12-08 23:05:28

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 782
Website

Re: VFIO gpu passthrough kernel regression

# 7056c4e2a13a ("Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD")
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-v6.11.rc7.r107.g7056c4e-1-x86_64.pkg.tar.zst

# c09dd2bb5748 ("Merge branch 'kvm-redo-enable-virt' into HEAD")
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-v6.11.rc7.r101.gc09dd2b-1-x86_64.pkg.tar.zst

Offline

#21 2024-12-08 23:22:39

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

@gromit
Neither  6.11.0-rc7-1-mainline-00107-g7056c4e2a13a or 6.11.0-rc7-1-mainline-00101-gc09dd2bb5748 have the issue.

This is what I got so far:

git bisect start
# status: waiting for both good and bad commits
# bad: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12
git bisect bad adc218676eef25575469234709c2d87185ca223a
# status: waiting for good commit(s), bad commit known
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# good: [509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
git bisect good 509d2cd12a10d057fdf72f565b930f9a81140d59
# bad: [356a0319456810f3a5618353f6ca3b0ef9965479] Merge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 356a0319456810f3a5618353f6ca3b0ef9965479
# bad: [3a37872316c2e3288e09a1322221c83e5929768d] Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
git bisect bad 3a37872316c2e3288e09a1322221c83e5929768d
# bad: [440b65232829fad69947b8de983c13a525cc8871] Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect bad 440b65232829fad69947b8de983c13a525cc8871
# bad: [617a814f14b8914271f7a70366d72c6196d17663] Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 617a814f14b8914271f7a70366d72c6196d17663
# good: [775d28fd45a2f5e58d8927ce77693398b1f074a6] mm: remove isolate_lru_page()
git bisect good 775d28fd45a2f5e58d8927ce77693398b1f074a6
# good: [056f8c437dc33e9e8e64b9344e816d7d46c06c16] Merge tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect good 056f8c437dc33e9e8e64b9344e816d7d46c06c16

Last edited by physicsBTW (2024-12-09 00:47:43)

Offline

#22 2024-12-09 03:51:56

kevinlpowell
Member
Registered: 2024-12-08
Posts: 8

Re: VFIO gpu passthrough kernel regression

given that pysicsBTW had different results than I reported, I went back and re-checked:

uname -a
Linux orochi 6.11.0-rc7-1-mainline-00197-g55e6f8f29d6a #1 SMP PREEMPT_DYNAMIC Sun, 20 Oct 2024 22:39:07 +0000 x86_64 GNU/Linux

definitely fails for me.  (checked twice this last time, and once earlier today)

I wonder if there's not some confounding factor at play here.

As for builds 7056c4e2a13a and c09dd2bb5748 from gromit:

uname -a
Linux orochi 6.11.0-rc7-1-mainline-00107-g7056c4e2a13a #1 SMP PREEMPT_DYNAMIC Sun, 08 Dec 2024 22:44:25 +0000 x86_64 GNU/Linux

Failed for me.  guest win did not boot.
during my re-try on bisecting, I built 7056c4 locally, and I marked 7056c4 as bad during the bisect.  So, that at least is consistent on my machine.  Looking back, I also marked 7056c4 'bad' on my first bisect attempt.

uname -a
Linux orochi 6.11.0-rc7-1-mainline-00101-gc09dd2bb5748 #1 SMP PREEMPT_DYNAMIC Sun, 08 Dec 2024 22:57:24 +0000 x86_64 GNU/Linux

Did not exhibit the problem for me.  win guest booted successfully.
c09dd2 was one of the ones I built during my latest attempt at bisecting, and I also marked c09dd2 as 'good' during the bisect.


I don't really know what to say regarding the differing results that pysicsBTW and I are reporting.  I have gone through the bisect process a second time, but given the contradictory results I'd prefer to go through it a third time and ensure my process is consistent before I post up something that could potentially muddy the water even further.  Since our results differ starting at 356a03, I'll pay particular attention there.

-EDIT-
after re-starting the git-bisect for a 3rd time, I can confirm that I get this following result consistently:

git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12
git bisect bad adc218676eef25575469234709c2d87185ca223a
# status: waiting for good commit(s), bad commit known
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# good: [509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
git bisect good 509d2cd12a10d057fdf72f565b930f9a81140d59
# good: [356a0319456810f3a5618353f6ca3b0ef9965479] Merge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good 356a0319456810f3a5618353f6ca3b0ef9965479
# bad: [974099e40e924a911000541fea0b59d075a3d1d0] Merge tag 'devicetree-fixes-for-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 974099e40e924a911000541fea0b59d075a3d1d0
# good: [e08d227840bb9366c6321ae1e480b37ba5eec29b] Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect good e08d227840bb9366c6321ae1e480b37ba5eec29b
# bad: [78b7b991838a4a6baeaad934addc4db2c5917eb8] vxlan: Handle error of rtnl_register_module().
git bisect bad 78b7b991838a4a6baeaad934addc4db2c5917eb8
# bad: [3ed7df085225ea8736b80d1e1a247a40d91281c8] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad 3ed7df085225ea8736b80d1e1a247a40d91281c8
# bad: [55e6f8f29d6ac76126ad1c8000b4c3626cf4b176] Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 55e6f8f29d6ac76126ad1c8000b4c3626cf4b176
# bad: [41786cc5ea89b71437dd6fece444f3766edb4db7] Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 41786cc5ea89b71437dd6fece444f3766edb4db7
# bad: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
# good: [590b09b1d88e18ae57f89930a6f7b89795d2e9f3] KVM: x86: Register "emergency disable" callbacks when virt is enabled
git bisect good 590b09b1d88e18ae57f89930a6f7b89795d2e9f3
# good: [f9b56b2c31e5733c04464da1b73bafb9eff6569f] s390: Enable KVM_S390_UCONTROL config in debug_defconfig
git bisect good f9b56b2c31e5733c04464da1b73bafb9eff6569f
# good: [ec495f2ab12290b008a691e826b39b895f458945] KVM: Write the per-page "segment" when clearing (part of) a guest page
git bisect good ec495f2ab12290b008a691e826b39b895f458945
# good: [55f50b2f86929ae042cd2eee8b2e8ffe00b5a885] Merge branch 'kvm-memslot-zap-quirk' into HEAD
git bisect good 55f50b2f86929ae042cd2eee8b2e8ffe00b5a885
# good: [025dde582bbf31e7618f9283594ef5e2408e384b] KVM: Harden guest memory APIs against out-of-bounds accesses
git bisect good 025dde582bbf31e7618f9283594ef5e2408e384b
# good: [c09dd2bb5748075d995ae46c2d18423032230f9b] Merge branch 'kvm-redo-enable-virt' into HEAD
git bisect good c09dd2bb5748075d995ae46c2d18423032230f9b
# first bad commit: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD

I took extra care on the third attempt at git-bisect : did 'shutdown -h' in between each kernel switch, double checked that 'uname -a' matched the current git commit hash before marking good/bad, and generally kept my work flow tidy.  However, I did not rebuild all the previous local linux-git packages -- so my results could be no good if for some reason my builds are out of whack.

Last edited by kevinlpowell (2024-12-09 07:12:24)

Offline

#23 2024-12-09 13:26:02

loqs
Member
Registered: 2014-03-06
Posts: 18,135

Re: VFIO gpu passthrough kernel regression

[7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD is bad while its two parents [c09dd2bb5748075d995ae46c2d18423032230f9b] Merge branch 'kvm-redo-enable-virt' into HEAD and [025dde582bbf31e7618f9283594ef5e2408e384b] KVM: Harden guest memory APIs against out-of-bounds accesses are good.
Does adding the kernel parameter kvm.enable_virt_at_load=0 have any effect on kernels with the issue?

Offline

#24 2024-12-09 15:25:13

physicsBTW
Member
Registered: 2023-11-07
Posts: 11

Re: VFIO gpu passthrough kernel regression

@kevinlpowell
I'll try to finish the kernel bisection later today when I get home, and retest some of the kernel versions that were not working for you that worked for me. I will also try enabling kvm.enable_virt_at_load=0 on a problem kernel. I was also being very careful to check that the git describe hash for each iteration matched uname -r before booting the vm. Are your vms hard crashing, or are they booting and graphics drivers not loading? Try booting into a good kernel and enabling RDP on the guest, then restarting into a bad kernel, starting the VM and trying to RDP in from a different machine. You might need to switch to a bridged or macvtap network option for RDP to be visible from remote machines. Loading in QXL/ spice to remote in that way can cause the graphics drivers to fail to load on good kernels, so disable the virtual video adapter before booting. For me, the bad kernels will boot the vm fine, but the graphics drivers refuse to load, which you can verify in device manager after connecting via RDP.

Offline

#25 2024-12-09 18:43:47

SimonP
Member
Registered: 2024-12-08
Posts: 5

Re: VFIO gpu passthrough kernel regression

Did my own bisect, log below. Patching out the bad commit fixes 6.12.y for me. Can someone confirm?

git bisect start
# status: waiting for both good and bad commits
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# status: waiting for bad commit, 1 good commit known
# bad: [59b723cd2adbac2a34fc8e12c74ae26ae45bf230] Linux 6.12-rc6
git bisect bad 59b723cd2adbac2a34fc8e12c74ae26ae45bf230
# good: [de848da12f752170c2ebe114804a985314fd5a6a] Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel
git bisect good de848da12f752170c2ebe114804a985314fd5a6a
# good: [570172569238c66a482ec3eb5d766cc9cf255f69] Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux
git bisect good 570172569238c66a482ec3eb5d766cc9cf255f69
# bad: [7b43ba65019e83b55cfacfcfc0c3a08330af54c1] Merge branch 'maintainers-networking-file-coverage-updates'
git bisect bad 7b43ba65019e83b55cfacfcfc0c3a08330af54c1
# good: [5e5466433d266046790c0af40a15af0a6be139a1] Merge tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good 5e5466433d266046790c0af40a15af0a6be139a1
# good: [e08d227840bb9366c6321ae1e480b37ba5eec29b] Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect good e08d227840bb9366c6321ae1e480b37ba5eec29b
# bad: [3ed7df085225ea8736b80d1e1a247a40d91281c8] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad 3ed7df085225ea8736b80d1e1a247a40d91281c8
# bad: [55e6f8f29d6ac76126ad1c8000b4c3626cf4b176] Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 55e6f8f29d6ac76126ad1c8000b4c3626cf4b176
# bad: [41786cc5ea89b71437dd6fece444f3766edb4db7] Merge tag 'kvm-x86-misc-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect bad 41786cc5ea89b71437dd6fece444f3766edb4db7
# good: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect good 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
# bad: [d33234342f8b468e719e05649fd26549fb37ef8a] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()
git bisect bad d33234342f8b468e719e05649fd26549fb37ef8a
# bad: [74c6c98a598a1fa650f9f8dfb095d66e987ed9cf] KVM: x86: Refactor kvm_x86_ops.get_msr_feature() to avoid kvm_msr_entry
git bisect bad 74c6c98a598a1fa650f9f8dfb095d66e987ed9cf
# good: [e0183a42e3bcd4c30eb95bb046c016023fdc01ce] KVM: x86: Use this_cpu_ptr() in kvm_user_return_msr_cpu_online
git bisect good e0183a42e3bcd4c30eb95bb046c016023fdc01ce
# bad: [b58b808cbe93e8abe936b285ae534c9927789242] KVM: x86: Move MSR_TYPE_{R,W,RW} values from VMX to x86, as enums
git bisect bad b58b808cbe93e8abe936b285ae534c9927789242
# bad: [74a0e79df68a8042fb84fd7207e57b70722cf825] KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value
git bisect bad 74a0e79df68a8042fb84fd7207e57b70722cf825
# first bad commit: [74a0e79df68a8042fb84fd7207e57b70722cf825] KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value

Offline

Board footer

Powered by FluxBB