You are not logged in.
@gromit could you build a 6.12 kernel with the patch below for testing?
From 769baa867246228d89936fc99193b04b7227a7a8 Mon Sep 17 00:00:00 2001
From: SimonP <xxx@xxx.com>
Date: Mon, 9 Dec 2024 18:23:12 +0100
Subject: [PATCH] Revert "KVM: SVM: Disallow guest from changing userspace's
MSR_AMD64_DE_CFG value"
This reverts commit 74a0e79df68a8042fb84fd7207e57b70722cf825.
---
arch/x86/kvm/svm/svm.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd15cc635655..c088d4241fff 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3201,13 +3201,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (data & ~supported_de_cfg)
return 1;
- /*
- * Don't let the guest change the host-programmed value. The
- * MSR is very model specific, i.e. contains multiple bits that
- * are completely unknown to KVM, and the one bit known to KVM
- * is simply a reflection of hardware capabilities.
- */
- if (!msr->host_initiated && data != svm->msr_decfg)
+ /* Don't allow the guest to change a bit, #GP */
+ if (!msr->host_initiated && (data ^ supported_de_cfg))
return 1;
svm->msr_decfg = data;
--
2.45.2
Last edited by SimonP (2024-12-09 19:05:12)
Offline
a few possibly interesting updates:
kernel 7056c4e turns out to be 'mostly bad' rather than always bad. There is, for me, about 1 chance in 20 that I will be able to successfully start my guest Windows on 7056c4e. This seems to be decided when the linux host boots.
I tried the kvm.enable_virt_at_load=0 kernel parameter with mixed results. On 7056c4e, that parameter significantly increases the chances that I'll be able to start the Windows guest. (3 out of 3 successful trials). However on later kernels (e.g. the arch 6.12 package), I'm still unable to start my Windows guest and kvm.enable_virt_at_load=0 does not seem to have any effect.
@pysicsBTW when I encounter this problem, the vms appear to be hard crashing. I'm checking for alive-ness with ssh rather than RDP (as I already have sshd set up on the windows guest). As far as I can tell, the guest OS never actually boots.
Offline
@gromit could you build a 6.12 kernel with the patch below for testing?
</snip>
I started building this over 6.12.4-zen1-1. In a few hours I can test.
Please let me know if you'd like me to test any other built kernels while I have everything down.
Kevin, I have the same result on 6.12 or later, kvm.enable_virt_at_load=0 didn't help me.
Good idea checking ssh to Windows.
Thank you for all of your contributions.
Last edited by Ranguvar (2024-12-09 21:23:20)
Offline
@loqs @kevinpowell
1) I tried kvm.enable_virt_at_load = 0 on a problem kernel, verified with /proc/cmdline that it was applied and it had no effect. I removed this parameter after testing it.
2) I reinstalled gromit's copy of 7056c4e2a13a, verified
uname -r
6.11.0-rc7-1-mainline-00107-g7056c4e2a13a
Made three vm load attempts, shutting down my computer, unplugging it, and holding down the power button for 10 seconds between each one. All 3/3 attempts the vm loaded correctly.
Testing simon's results now.
Last edited by physicsBTW (2024-12-10 01:30:38)
Offline
@SimonP -- good news from my machine. Applied your patch at v6.12 resulting in
uname -a
Linux orochi 6.12.0-1-git-dirty #15 SMP PREEMPT_DYNAMIC Tue, 10 Dec 2024 00:09:14 +0000 x86_64 GNU/Linux
and my guest windows with gpu passthrough started right up. thanks for that.
Offline
@SimonP
commit 74a0e79df68a8042fb84fd7207e57b70722cf825 loads in for me without patching.
I'm thinking that we are having separate issues.
Offline
How did you revert/build the commit? When I revert it on top of 6.12.y I get the following compile error:
arch/x86/kvm/svm/svm.c: In function ‘svm_set_msr’:
arch/x86/kvm/svm/svm.c:3203:53: error: ‘msr_entry’ undeclared (first use in this function); did you mean ‘psc_entry’?
3203 | if (!msr->host_initiated && (data ^ msr_entry.data))
| ^~~~~~~~~
| psc_entry
This is with the following revert patch:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..759cd4326d2a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3199,13 +3199,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (data & ~supported_de_cfg)
return 1;
- /*
- * Don't let the guest change the host-programmed value. The
- * MSR is very model specific, i.e. contains multiple bits that
- * are completely unknown to KVM, and the one bit known to KVM
- * is simply a reflection of hardware capabilities.
- */
- if (!msr->host_initiated && data != svm->msr_decfg)
+ /* Don't allow the guest to change a bit, #GP */
+ if (!msr->host_initiated && (data ^ msr_entry.data))
return 1;
svm->msr_decfg = data;
--
2.47.1
Offline
git bisect start
# status: waiting for both good and bad commits
# bad: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12
git bisect bad adc218676eef25575469234709c2d87185ca223a
# status: waiting for good commit(s), bad commit known
# good: [98f7e32f20d28ec452afb208f9cffc08448a2652] Linux 6.11
git bisect good 98f7e32f20d28ec452afb208f9cffc08448a2652
# good: [509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
git bisect good 509d2cd12a10d057fdf72f565b930f9a81140d59
# bad: [356a0319456810f3a5618353f6ca3b0ef9965479] Merge tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad 356a0319456810f3a5618353f6ca3b0ef9965479
# bad: [3a37872316c2e3288e09a1322221c83e5929768d] Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
git bisect bad 3a37872316c2e3288e09a1322221c83e5929768d
# bad: [440b65232829fad69947b8de983c13a525cc8871] Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect bad 440b65232829fad69947b8de983c13a525cc8871
# bad: [617a814f14b8914271f7a70366d72c6196d17663] Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 617a814f14b8914271f7a70366d72c6196d17663
# good: [775d28fd45a2f5e58d8927ce77693398b1f074a6] mm: remove isolate_lru_page()
git bisect good 775d28fd45a2f5e58d8927ce77693398b1f074a6
# good: [7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b] Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect good 7056c4e2a13a61f4e8a9e8ce27cd499f27e0e63b
# good: [c09dd2bb5748075d995ae46c2d18423032230f9b] Merge branch 'kvm-redo-enable-virt' into HEAD
git bisect good c09dd2bb5748075d995ae46c2d18423032230f9b
# good: [55e6f8f29d6ac76126ad1c8000b4c3626cf4b176] Merge tag 'kvm-x86-svm-6.12' of https://github.com/kvm-x86/linux into HEAD
git bisect good 55e6f8f29d6ac76126ad1c8000b4c3626cf4b176
# good: [056f8c437dc33e9e8e64b9344e816d7d46c06c16] Merge tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect good 056f8c437dc33e9e8e64b9344e816d7d46c06c16
# good: [df7e1286b1dc3d6cff952cf3ef4f0c36831e2fbb] mm: care about shadow stack guard gap when getting an unmapped area
git bisect good df7e1286b1dc3d6cff952cf3ef4f0c36831e2fbb
# good: [5731aacd54a883dd2c1a5e8c85e1fe78fc728dc7] KVM: use follow_pfnmap API
git bisect good 5731aacd54a883dd2c1a5e8c85e1fe78fc728dc7
# good: [658be46520ce480a44fe405730a1725166298f27] mm: support poison recovery from copy_present_page()
git bisect good 658be46520ce480a44fe405730a1725166298f27
# bad: [325efb16da2c840e165d9b620fec8049d4d664cc] mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
git bisect bad 325efb16da2c840e165d9b620fec8049d4d664cc
# bad: [659c55ef981bb63355a65ffc3b3b5cad562b806a] mm/vma: return the exact errno in vms_gather_munmap_vmas()
git bisect bad 659c55ef981bb63355a65ffc3b3b5cad562b806a
# bad: [f2c5101be43677c227974912a043da29a62743ef] memcg: cleanup with !CONFIG_MEMCG_V1
git bisect bad f2c5101be43677c227974912a043da29a62743ef
# bad: [fd00be9afa1d64c90ae20a1307da1bdb809b3d55] mm/show_mem.c: report alloc tags in human readable units
git bisect bad fd00be9afa1d64c90ae20a1307da1bdb809b3d55
# first bad commit: [fd00be9afa1d64c90ae20a1307da1bdb809b3d55] mm/show_mem.c: report alloc tags in human readable units
I'm checking reverting fd00be9afa1d64c90ae20a1307da1bdb809b3d55 to see if it does anything, but this result seems a little strange.
Edit - Made a wrong turn somewhere I think, or a non-clean build issue.
Last edited by physicsBTW (2024-12-10 04:47:10)
Offline
I don't know if this will help at all: when I made my patched build of 6.12 -- the first thing I tried was simply:
git revert 74a0e79
and I tried that based on an incomplete understanding of SimonP's post. simply reverting 74a0e79 results in the build error that gromit posted. It was at that point I noticed that SimonP's patch is not what one gets from reverting 74a0e79, but is instead a fixup which actually compiles. So I went back to an unmodified 6.12 (at commit adc2186) and applied, verbatim, the patch from SimonP's post.
Last edited by kevinlpowell (2024-12-10 06:25:52)
Offline
It doesn't revert cleanly. Use my patch above.
Offline
@SimonP
commit 74a0e79df68a8042fb84fd7207e57b70722cf825 loads in for me without patching.
I'm thinking that we are having separate issues.
My results seem to agree, there could be separate issues at play.
6.12.4-zen1-1 with Simon's patch did not work for me. I only had time to test twice. All VM cores pegged to 100%, even if left for several minutes.
gromit's linux-mainline 7056c4e2a13a worked well immediately on first try.
Just let me know if there are any other specific built kernels I should test tomorrow.
Last edited by Ranguvar (2024-12-10 08:43:31)
Offline
Well, this is unfortunate. I have a very similar setup to @Ranguvar (including the same CPU) so I expected success there at least. I'll keep looking at this thread but as reverting 74a0e79df68a8042fb84fd7207e57b70722cf825 fixed the issue for me I can't be of any more help.
Offline
SimonP's upstream report https://lore.kernel.org/kvm/52914da7-a9 … ilbox.org/
Edit:
6.12.4 with the diagnostic patch from https://lore.kernel.org/kvm/Z1hiiz40nUq … oogle.com/ applied which adds two new warning messages which should trigger if affected:
linux-headers-6.12.4.arch1-1.1-x86_64.pkg.tar.zst/linux-6.12.4.arch1-1.1-x86_64.pkg.tar.zst.
Last edited by loqs (2024-12-10 20:56:40)
Offline
My dmesg output with that kernel is similar to Simon's:
[ 74.302915] virbr0: port 1(vnet0) entered blocking state
[ 74.302924] virbr0: port 1(vnet0) entered disabled state
[ 74.302931] vnet0: entered allmulticast mode
[ 74.302969] vnet0: entered promiscuous mode
[ 74.304700] virbr0: port 1(vnet0) entered blocking state
[ 74.304703] virbr0: port 1(vnet0) entered listening state
[ 76.316421] vfio-pci 0000:0e:00.0: enabling device (0002 -> 0003)
[ 76.415850] virbr0: port 1(vnet0) entered learning state
[ 76.445465] vfio-pci 0000:0e:00.1: enabling device (0000 -> 0002)
[ 76.471323] pcieport 0000:03:06.0: unlocked secondary bus reset via: __pci_reset_function_locked+0x41/0x70
[ 78.536043] virbr0: port 1(vnet0) entered forwarding state
[ 78.536047] virbr0: topology change detected, propagating
[ 85.973721] kvm_amd: DE_CFG current = 0, WRMSR = 2
[ 329.196806] sched: DL replenish lagged too much
Offline
Tested 6.12.4 while dropping the if statement entirely as Sean suggests at https://lore.kernel.org/kvm/Z1jEDFpanEI … oogle.com/
Issue was unchanged for me.
Offline
Issue was unchanged for me.
Were you able to bisect the cause of your issue?
Offline
Ranguvar wrote:Issue was unchanged for me.
Were you able to bisect the cause of your issue?
I haven't, but 7056c4e from gromit's package worked well for me.
I can test other builds if it'd help.
Offline
@Ranguvar I would suggest starting a fresh and using gromits builds and see where that leads:
$ git bisect start
status: waiting for both good and bad commits
$ git bisect bad v6.12
status: waiting for good commit(s), bad commit known
$ git bisect good v6.11
Bisecting: 7334 revisions left to test after this (roughly 13 steps)
[509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
$ URL="https://pkgbuild.com/~gromit/linux-bisection-kernels/linux-mainline-$(git describe --long --abbrev=7 | sed 's/\([^-]*-g\)/r\1/;s/-/./g')-1-x86_64.pkg.tar.zst"; curl --output /dev/null --silent --head --fail "${URL}" && echo "sudo pacman -U ${URL/\~/\~}" || echo "Not in cache"
sudo pacman -U https://pkgbuild.com/~gromit/linux-bisection-kernels/linux-mainline-v6.11.r7272.g509d2cd-1-x86_64.pkg.tar.zst
Offline
Found the commit, but I did not expect it to be in scheduler code.
[ranguvar@khufu linux-torvalds]$ git bisect start
status: waiting for both good and bad commits
[ranguvar@khufu linux-torvalds]$ git bisect bad v6.12
status: waiting for good commit(s), bad commit known
[ranguvar@khufu linux-torvalds]$ git bisect good v6.11
Bisecting: 7334 revisions left to test after this (roughly 13 steps)
[509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
[ranguvar@khufu linux-torvalds]$ git bisect bad 2004cef11ea0 # This commit is also halfway between, confirmed broken many times
Bisecting: 3701 revisions left to test after this (roughly 12 steps)
[7b17f5ebd5fc5e9275eaa5af3d0771f2a7b01bbf] Merge tag 'soc-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-3635-g7b17f5ebd5fc
[ranguvar@khufu linux-torvalds]$ git bisect good 7b17f5ebd5fc
Bisecting: 1825 revisions left to test after this (roughly 11 steps)
[3a7101e9b27fe97240c2fd430c71e61262447dd1] Merge tag 'powerpc-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-5511-g3a7101e9b27f
[ranguvar@khufu linux-torvalds]$ git bisect good 3a7101e9b27f
Bisecting: 912 revisions left to test after this (roughly 10 steps)
[6dcc304f85898b099b35c63748c5e11ba56d0c8a] drm/amd/display: Resolve Coverity Issues
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc5-913-g6dcc304f8589
[ranguvar@khufu linux-torvalds]$ git bisect good 6dcc304f8589
Bisecting: 469 revisions left to test after this (roughly 9 steps)
[ae2c6d8b3b88c176dff92028941a4023f1b4cb91] Merge tag 'drm-xe-next-fixes-2024-09-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc7-1356-gae2c6d8b3b88
[ranguvar@khufu linux-torvalds]$ git bisect good ae2c6d8b3b88
Bisecting: 242 revisions left to test after this (roughly 8 steps)
[a65b3c3ed49a3b8068c002e98c90f8594927ff25] Merge tag 'hid-for-linus-2024091602' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-5738-ga65b3c3ed49a
[ranguvar@khufu linux-torvalds]$ git bisect good a65b3c3ed49a
Bisecting: 148 revisions left to test after this (roughly 7 steps)
[cff06a799dbe81f3a697ae7c805eaf88d30c2308] Merge patch series "smartpqi updates"
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-94-gcff06a799dbe
[ranguvar@khufu linux-torvalds]$ git bisect good cff06a799dbe
Bisecting: 74 revisions left to test after this (roughly 6 steps)
[839c4f596f898edc424070dc8b517381572f8502] Merge tag 'mm-hotfixes-stable-2024-09-19-00-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-7262-g839c4f596f89
[ranguvar@khufu linux-torvalds]$ git bisect good 839c4f596f89
Bisecting: 37 revisions left to test after this (roughly 5 steps)
[152e11f6df293e816a6a37c69757033cdc72667d] sched/fair: Implement delayed dequeue
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-42-g152e11f6df29
[ranguvar@khufu linux-torvalds]$ git bisect skip # Fails to compile
Bisecting: 37 revisions left to test after this (roughly 5 steps)
[54a58a78779169f9c92a51facf6de7ce94962328] sched/fair: Implement DELAY_ZERO
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-43-g54a58a787791
[ranguvar@khufu linux-torvalds]$ git bisect good 54a58a787791
Bisecting: 17 revisions left to test after this (roughly 4 steps)
[6b9ccbc033cf179956a37fef3ee415bdc3029d2f] kthread: Fix task state in kthread worker if being frozen
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-62-g6b9ccbc033cf
[ranguvar@khufu linux-torvalds]$ git bisect bad 6b9ccbc033cf
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[4686cc598f669dea1b50dde1568e6c65c355bc67] sched: Clean up DL server vs core sched
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-53-g4686cc598f66
[ranguvar@khufu linux-torvalds]$ git bisect good 4686cc598f66
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[b2d70222dbf2a2ff7a972a685d249a5d75afa87f] sched: Add put_prev_task(.next)
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-58-gb2d70222dbf2
[ranguvar@khufu linux-torvalds]$ git bisect bad b2d70222dbf2
Bisecting: 1 revision left to test after this (roughly 1 step)
[436f3eed5c69c1048a5754df6e3dbb291e5cccbd] sched: Combine the last put_prev_task() and the first set_next_task()
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-56-g436f3eed5c69
[ranguvar@khufu linux-torvalds]$ git bisect good 436f3eed5c69
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[bd9bbc96e8356886971317f57994247ca491dbf1] sched: Rework dl_server
[ranguvar@khufu linux-torvalds]$ git describe
v6.11-rc1-57-gbd9bbc96e835
[ranguvar@khufu linux-torvalds]$ git bisect bad bd9bbc96e835
bd9bbc96e8356886971317f57994247ca491dbf1 is the first bad commit
commit bd9bbc96e8356886971317f57994247ca491dbf1 (HEAD)
Author: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 14 00:25:55 2024 +0200
sched: Rework dl_server
When a task is selected through a dl_server, it will have p->dl_server
set, such that it can account runtime to the dl_server, see
update_curr_task().
Currently p->dl_server is set in pick*task() whenever it goes through
the dl_server, clearing it is a bit of a mess though. The trivial
solution is clearing it on the final put (now that we have this
location).
However, this gives a problem when:
p = pick_task(rq);
if (p)
put_prev_set_next_task(rq, prev, next);
picks the same task but through a different path, notably when it goes
from picking through the dl_server to a direct pick or vice-versa. In
that case we cannot readily determine wether we should clear or
preserve p->dl_server.
An additional complication is pick_*task() setting p->dl_server for a
remote pick, it might still need to update runtime before it schedules
the core_pick.
Close all these holes and remove all the random clearing of
p->dl_server by:
- having pick_*task() manage rq->dl_server
- having the final put_prev_task() clear p->dl_server
- having the first set_next_task() set p->dl_server = rq->dl_server
- complicate the core_sched code to save/restore rq->dl_server where
appropriate.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240813224016.259853414@infradead.org
kernel/sched/core.c | 40 +++++++++++++++-------------------------
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c | 10 ++--------
kernel/sched/sched.h | 14 ++++++++++++++
4 files changed, 32 insertions(+), 34 deletions(-)
I don't use any special config with regard to scheduling. Sometimes I use Arch's packaged linux and sometimes linux-zen. This setup has worked for me since 5.10, before the advent of EEVDF, and likely earlier.
I use
iommu=pt isolcpus=1-7,17-23 rcu_nocbs=1-7,17-23 nohz_full=1-7,17-23
in kernel cmdline. Those cores are allocated to the VM, and some or all will show 100% usage while the guest is locked and some tasks stall on the host. Removing iommu=pt had no effect. Removing the core isolation reliably locks up the host as soon as the guest starts and pins its threads.
*Edited above for clarity
This appears to block kernel tasks for many minutes (indefinitely).
dmesg beginning when VM runs, no new unusual messages appear when VM is forcibly stopped.
[ 565.381219] vfio-pci 0000:0e:00.0: enabling device (0002 -> 0003)
[ 565.512119] vfio-pci 0000:0e:00.1: enabling device (0000 -> 0002)
[ 565.538102] pcieport 0000:03:06.0: unlocked secondary bus reset via: __pci_reset_function_locked+0x41/0x70
[ 642.153635] hrtimer: interrupt took 3370 ns
[ 681.155366] sched: DL replenish lagged too much
[ 1720.226930] INFO: task khugepaged:263 blocked for more than 122 seconds.
[ 1720.226934] Tainted: P OE 6.11.0-rc1-1-git-00057-gbd9bbc96e835 #12
[ 1720.226935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1720.226937] task:khugepaged state:D stack:0 pid:263 tgid:263 ppid:2 flags:0x00004000
[ 1720.226941] Call Trace:
[ 1720.226942] <TASK>
[ 1720.226945] __schedule+0x382/0x1430
[ 1720.226951] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1720.226955] ? srso_alias_return_thunk+0x5/0xfbef5
[ 1720.226959] schedule+0x27/0xf0
[ 1720.226962] schedule_timeout+0x12f/0x160
[ 1720.226967] wait_for_completion+0x86/0x170
[ 1720.226971] __flush_work+0x1bf/0x2c0
[ 1720.226975] ? __pfx_wq_barrier_func+0x10/0x10
[ 1720.226978] __lru_add_drain_all+0x145/0x1f0
[ 1720.226982] khugepaged+0x65/0x940
[ 1720.226987] ? __pfx_autoremove_wake_function+0x10/0x10
[ 1720.226991] ? __pfx_khugepaged+0x10/0x10
[ 1720.226994] kthread+0xd2/0x100
[ 1720.226998] ? __pfx_kthread+0x10/0x10
[ 1720.227002] ret_from_fork+0x34/0x50
[ 1720.227004] ? __pfx_kthread+0x10/0x10
[ 1720.227007] ret_from_fork_asm+0x1a/0x30
[ 1720.227014] </TASK>
Last edited by Ranguvar (2024-12-12 20:58:39)
Offline
I would suggest contacting Peter Zijlstra as the author of the causal commit if you have not already done so. There is no dedicated list for the scheduler so I suggest using either the main linux-kernel or regressions mailing lists.
Offline
$ ./scripts/get_maintainer.pl --no-rolestats -f kernel/sched/core.c
Ingo Molnar <mingo@redhat.com>
Peter Zijlstra <peterz@infradead.org>
Juri Lelli <juri.lelli@redhat.com>
Vincent Guittot <vincent.guittot@linaro.org>
Dietmar Eggemann <dietmar.eggemann@arm.com>
Steven Rostedt <rostedt@goodmis.org>
Ben Segall <bsegall@google.com>
Mel Gorman <mgorman@suse.de>
Valentin Schneider <vschneid@redhat.com>
linux-kernel@vger.kernel.org
Offline
New to this process, but reported: https://lore.kernel.org/regressions/jGQ … ar.io/T/#u
Thank you both for the assistance.
Offline
I'm not sure if it's related to Ranguvar's problem, or if it's the third 6.12 regression in this thread ^^
In my case, the symptoms match the original post. Linux and Windows guests both fail to initialize the GPU. I also had it work once (first boot with 6.12) but, other than that, it fails reliably.
Do note that I'm using Manjaro, but since we're talking about a mainline kernel-related regression and this is the only discussion I could find, I thought I'd chime in.
Specs: 7950X3D, ASrock X670E Steel Legend, 6700XT for the guest.
Dmesg of a linux guest failing to initialize the GPU (one of two error variations I've seen):
[ 10.245100] [drm] amdgpu kernel modesetting enabled.
[ 10.245173] amdgpu: Virtual CRAT table created for CPU
[ 10.245182] amdgpu: Topology: Add CPU node
[ 10.245480] [drm] initializing kernel modesetting (NAVY_FLOUNDER 0x1002:0x73DF 0x1002:0x0E36 0xC1).
[ 10.245492] [drm] register mmio base: 0x81A00000
[ 10.245493] [drm] register mmio size: 1048576
[ 10.248861] [drm] add ip block number 0 <nv_common>
[ 10.248862] [drm] add ip block number 1 <gmc_v10_0>
[ 10.248863] [drm] add ip block number 2 <navi10_ih>
[ 10.248864] [drm] add ip block number 3 <psp>
[ 10.248864] [drm] add ip block number 4 <smu>
[ 10.248865] [drm] add ip block number 5 <dm>
[ 10.248866] [drm] add ip block number 6 <gfx_v10_0>
[ 10.248867] [drm] add ip block number 7 <sdma_v5_2>
[ 10.248867] [drm] add ip block number 8 <vcn_v3_0>
[ 10.248868] [drm] add ip block number 9 <jpeg_v3_0>
[ 10.248877] amdgpu 0000:05:00.0: amdgpu: Fetched VBIOS from VFCT
[ 10.248878] amdgpu: ATOM BIOS: 113-D5121100-101
[ 10.270097] [drm] VCN(0) decode is enabled in VM mode
[ 10.270099] [drm] VCN(0) encode is enabled in VM mode
[ 10.284318] [drm] JPEG decode is enabled in VM mode
[ 10.284320] amdgpu 0000:05:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 10.284359] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 10.284365] amdgpu 0000:05:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
[ 10.284367] amdgpu 0000:05:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 10.284375] [drm] Detected VRAM RAM=12272M, BAR=16384M
[ 10.284376] [drm] RAM width 192bits GDDR6
[ 10.284495] [drm] amdgpu: 12272M of VRAM memory ready
[ 10.284496] [drm] amdgpu: 16042M of GTT memory ready.
[ 10.284505] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 10.284626] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[ 12.218276] amdgpu 0000:05:00.0: amdgpu: STB initialized to 2048 entries
[ 12.218333] [drm] Loading DMUB firmware via PSP: version=0x02020020
[ 12.218647] [drm] use_doorbell being set to: [true]
[ 12.218658] [drm] use_doorbell being set to: [true]
[ 12.218667] [drm] Found VCN firmware Version ENC: 1.30 DEC: 3 VEP: 0 Revision: 4
[ 12.218672] amdgpu 0000:05:00.0: amdgpu: Will use PSP to load VCN firmware
[ 14.390991] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0)
[ 14.390994] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc
[ 14.391223] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed!
[ 14.411423] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[ 14.411604] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[ 14.411784] amdgpu 0000:05:00.0: amdgpu: amdgpu_device_ip_init failed
[ 14.411785] amdgpu 0000:05:00.0: amdgpu: Fatal error during GPU init
[ 14.411786] amdgpu 0000:05:00.0: amdgpu: amdgpu: finishing device.
[ 14.411928] ------------[ cut here ]------------
[ 14.411929] WARNING: CPU: 6 PID: 507 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:622 amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 14.412114] Modules linked in: amdgpu(+) video wmi amdxcp i2c_algo_bit drm_ttm_helper crct10dif_pclmul ttm crc32_pclmul crc32c_intel polyval_clmulni drm_exec polyval_generic ghash_clmulni_intel gpu_sched nvme sha512_ssse3 drm_suballoc_helper drm_buddy sha256_ssse3 drm_display_helper nvme_core sha1_ssse3 virtio_net cec nvme_auth virtio_console net_failover virtio_blk failover qemu_fw_cfg serio_raw ip6_tables ip_tables fuse
[ 14.412133] CPU: 6 PID: 507 Comm: (udev-worker) Not tainted 6.8.5-201.fc39.x86_64 #1
[ 14.412134] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[ 14.412135] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 14.412305] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 6a 30 bc e3 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 59 30 bc e3 b8 ea ff ff ff e9 4f 30 bc e3
[ 14.412306] RSP: 0018:ffffaae50112ba60 EFLAGS: 00010246
[ 14.412308] RAX: ffff8bbcca3ed100 RBX: ffff8bbcd19987a8 RCX: 0000000000000000
[ 14.412309] RDX: 0000000000000000 RSI: ffff8bbcd19a4db8 RDI: ffff8bbcd1980000
[ 14.412310] RBP: ffff8bbcd19901e8 R08: 0000000000000000 R09: ffffaae50112b878
[ 14.412311] R10: ffffaae50112b870 R11: 0000000000000003 R12: ffff8bbcd19905c8
[ 14.412311] R13: ffff8bbcd1980010 R14: ffff8bbcd1980000 R15: ffff8bbcd19a4db8
[ 14.412313] FS: 00007f5fde03e980(0000) GS:ffff8bc41fb80000(0000) knlGS:0000000000000000
[ 14.412315] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.412316] CR2: 00005623742f1000 CR3: 000000010c1fa000 CR4: 0000000000750ef0
[ 14.412318] PKRU: 55555554
[ 14.412319] Call Trace:
[ 14.412320] <TASK>
[ 14.412321] ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 14.412493] ? __warn+0x81/0x130
[ 14.412497] ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 14.412677] ? report_bug+0x171/0x1a0
[ 14.412681] ? handle_bug+0x3c/0x80
[ 14.412683] ? exc_invalid_op+0x17/0x70
[ 14.412685] ? asm_exc_invalid_op+0x1a/0x20
[ 14.412688] ? amdgpu_irq_put+0x46/0x70 [amdgpu]
[ 14.412857] amdgpu_fence_driver_hw_fini+0xfe/0x130 [amdgpu]
[ 14.413049] amdgpu_device_fini_hw+0xa6/0x400 [amdgpu]
[ 14.413233] ? blocking_notifier_chain_unregister+0x36/0x50
[ 14.413236] amdgpu_driver_load_kms+0xec/0x190 [amdgpu]
[ 14.413411] amdgpu_pci_probe+0x18b/0x510 [amdgpu]
[ 14.413586] local_pci_probe+0x42/0xa0
[ 14.413589] pci_device_probe+0xc7/0x240
[ 14.413592] really_probe+0x19b/0x3e0
[ 14.413595] ? __pfx___driver_attach+0x10/0x10
[ 14.413597] __driver_probe_device+0x78/0x160
[ 14.413599] driver_probe_device+0x1f/0x90
[ 14.413601] __driver_attach+0xd2/0x1c0
[ 14.413603] bus_for_each_dev+0x85/0xd0
[ 14.413605] bus_add_driver+0x116/0x220
[ 14.413607] driver_register+0x59/0x100
[ 14.413609] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 14.413768] do_one_initcall+0x58/0x320
[ 14.413772] do_init_module+0x60/0x240
[ 14.413775] __do_sys_init_module+0x17f/0x1b0
[ 14.413776] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413782] do_syscall_64+0x83/0x170
[ 14.413784] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413786] ? __count_memcg_events+0x4d/0xc0
[ 14.413788] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413790] ? count_memcg_events.constprop.0+0x1a/0x30
[ 14.413792] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413793] ? handle_mm_fault+0xa2/0x360
[ 14.413795] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413797] ? do_user_addr_fault+0x304/0x670
[ 14.413800] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413801] ? srso_alias_return_thunk+0x5/0xfbef5
[ 14.413803] entry_SYSCALL_64_after_hwframe+0x78/0x80
[ 14.413805] RIP: 0033:0x7f5fdea2cb9e
[ 14.413808] Code: 48 8b 0d 95 12 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 62 12 0c 00 f7 d8 64 89 01 48
[ 14.413809] RSP: 002b:00007ffc13be8998 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 14.413811] RAX: ffffffffffffffda RBX: 00005623741c55a0 RCX: 00007f5fdea2cb9e
[ 14.413812] RDX: 00005623741be530 RSI: 00000000019d58ce RDI: 00007f5fdb000010
[ 14.413813] RBP: 00007ffc13be8a50 R08: 0000562374199010 R09: 0000000000000007
[ 14.413814] R10: 0000000000000001 R11: 0000000000000246 R12: 00005623741be530
[ 14.413814] R13: 0000000000020000 R14: 00005623741c0030 R15: 00005623741c9120
[ 14.413817] </TASK>
[ 14.413818] ---[ end trace 0000000000000000 ]---
My kernel bisect yielded the following commit:
# first bad commit: [f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101]
vfio/pci: implement huge_fault support
With the addition of pfnmap support in vmf_insert_pfn_{pmd,pud}() we can
take advantage of PMD and PUD faults to PCI BAR mmaps and create more
efficient mappings. PCI BARs are always a power of two and will typically
get at least PMD alignment without userspace even trying. Userspace
alignment for PUD mappings is also not too difficult.
Consolidate faults through a single handler with a new wrapper for
standard single page faults. The pre-faulting behavior of commit
d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault") is removed
in this refactoring since huge_fault will cover the bulk of the faults and
results in more efficient page table usage. We also want to avoid that
pre-faulted single page mappings preempt huge page mappings.
Link: https://lkml.kernel.org/r/20240826204353.2228736-20-peterx@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reverting commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101 and rebuilding the vfio-pci-core module fixes the issue for me on v6.12.4 and v6.13-rc2.
For reference, the svm_set_msr regression does not appear to affect my system.
Last edited by Precific (2024-12-22 00:36:26)
Offline
For reference Precific's upstream report:
https://bugzilla.kernel.org/show_bug.cgi?id=219619
https://lore.kernel.org/regressions/202 … @bhelgaas/
Offline