You are not logged in.

#1 2023-07-19 22:41:23

Cheeto
Member
Registered: 2022-09-11
Posts: 64

GNOME/X11 system extremely buggy after modifying vfio.conf

TLDR: working on setting up VMs with nvidia dGPU passthrough. When I make some changes to /etc/modprobe.d/vfio.conf, I get these issues. This has been replicated 3 times on fresh installs for confirmation:

- system takes several minutes to log out, to log in after a logout (but not after a fresh reboot), or to do the shutdown part of a reboot (not the boot part)
- often after logging in, nothing will open, requiring a reboot or at least a logout/login (takes 7+ minutes)
- even when everything seems normal, apps will take 2 seconds or longer to open (settings and nautilus, for example, should be instant, on a system with 13900k and plenty of RAM, and it IS instant before modifying vfio.conf)
- half the time when I open virt-manager, it's stuck on "connecting" to qemu/kvm
- it takes 1-2 minutes to stop and start libvirtd.service
- `lspci -k` and `lspci -nn` take 1-2 minutes to complete, when it should be instant
- additional complication related to waking from suspend, conditional depending on circumstances, details in my comment here: https://bbs.archlinux.org/viewtopic.php … 7#p2110647

=====

Not sure where to start troubleshooting this.

~$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2684,10de:22ba
softdep nvidia pre: vfio-pci

SYSTEM:
OS - vanilla arch, installed with archinstall selecting GNOME desktop and nvidia proprietary drivers, X11
CPU - i9-13900k
GPU - RTX 4090
Motherboard - Asus ROG z690 Extreme

I mention the motherboard because whenever I run journalctl or check the status of libvirtd.service when virt-manager won't connect, both of these always list PCI issues in red text. And I know `lspci` is related to PCI stuff, and it's taking minutes to complete. So maybe this is a good starting point...?

I won't flood the thread with massive outputs for now because like I said, I don't know where to start but I'll provide any output you need on request.

Last edited by Cheeto (2023-07-20 00:50:53)

Offline

#2 2023-07-19 22:47:03

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

So maybe this is a good starting point...?

Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'f:1=<-' ix.io

Offline

#3 2023-07-20 00:49:53

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

output here: http://ix.io/4B0G

Also, editing the OP to add that it takes more than a minute for the monitors to turn on after waking from sleep—this, weirdly, is iffy in terms of happening regardless of whether I've done any configuration post-install, further suggesting (to my ignorant mind) that this may all be related to PCI/IO issues.

May also be worth noting that the machine can turn all the lights off if it puts itself in suspend, but if I manually suspend it it instantly turns back on unless I first unplug the wireless receivers for keyboard and mouse and use the power button to suspend it.

Offline

#4 2023-07-20 14:12:28

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Jul 19 21:59:11 archlinux kernel: microcode: updated early: 0x10e -> 0x113, date = 2023-02-06
Jul 19 21:59:12 archlinux kernel: nvidia: loading out-of-tree module taints kernel.
Jul 19 21:59:12 archlinux kernel: nvidia: module license 'NVIDIA' taints kernel.
Jul 19 21:59:12 archlinux kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jul 19 21:59:12 archlinux kernel: nvidia: module license taints kernel.
Jul 19 21:59:12 archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 509
Jul 19 21:59:12 archlinux kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
                                  NVRM: nouveau, rivafb, nvidiafb or rivatv 
                                  NVRM: was loaded and obtained ownership of the NVIDIA device(s).
                                  NVRM: driver(s)), then try loading the NVIDIA kernel module
Jul 19 21:59:12 archlinux kernel: NVRM: No NVIDIA devices probed.

Try to blacklist nvidia, https://wiki.archlinux.org/title/Kernel … and_line_2
Something esp. later, after the session start is calling the driver but all nvidia devices are taken by vfio.

Jul 19 22:01:02 archlinux sudo[2915]:    user1 : TTY=pts/0 ; PWD=/home/user1 ; USER=root ; COMMAND=/usr/bin/nano /etc/modprobe.d/vfio.conf

"man sudoedit"

Offline

#5 2023-07-20 17:46:49

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

OK I believe I blacklisted nvidia, I have this line now in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 intel_iommu=on vfio-pci.ids=10de:2684,10de:22ba module_blacklist=nvidia"

However, the machine is still having the same issues taking several minutes to open an app after logging in so I can't actually open a terminal on it. Maybe worse now but it's hard to tell, it's pretty bad overall.

With regards to the second part:

Something esp. later, after the session start is calling the driver but all nvidia devices are taken by vfio.

Jul 19 22:01:02 archlinux sudo[2915]:    user1 : TTY=pts/0 ; PWD=/home/user1 ; USER=root ; COMMAND=/usr/bin/nano /etc/modprobe.d/vfio.conf

"man sudoedit"

I don't know what you mean there, I'm sorry for not understanding.

EDIT: Are you saying to learn about sudoedit, and suggesting to change the default editor to nano? I did that, and now I can use sudoedit with nano. Thanks.

Last edited by Cheeto (2023-07-20 19:33:02)

Offline

#6 2023-07-20 19:38:05

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Are you saying to learn about sudoedit

Yes.

I believe I blacklisted nvidia, I have this line now in /etc/default/grub:

Editing that file does nothing, you'll still have to grub-mkconfig.
Check "lsmod | grep nvidia" to check whether the module got effectively blacklisted (you want that to be empty)

taking several minutes to open an app after logging in

Is this strictly related to a gdm+gnome session or does the system also misbehave on a different VT (ctrl+alt+f3)?

Offline

#7 2023-07-20 19:51:31

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Sorry for being unclear, I did regenerate the grub config with grub-mkconfig earlier when blacklisting, using the usual command:

sudo grub-mkconfig -o /boot/grub/grub.cfg

And upon checking now, lsmod shows no nvidia, as expected.

~$ lsmod | grep nvidia
> 

Switching to another VT (TTY?) with ctrl+alt+f3, it does take >2 minutes to run lspci -nn or lspci -k. Not sure how to test apps there so that's the best test I could think of.

Upon returning to the regular screen with ctrl+alt+f2, all my monitors are out of order and apps are back to taking minutes to open.

Before doing all that, I had got my monitors in order (they're constantly doing this whenever I log out and back in or reboot) and set to the right refresh rates and stuff.

An important note: if I wait all these minutes for everything to stop being slow—say, 45min to 1.5hr—then apps will open normally and the system is responsive. But after switching back to ctrl+alt+f2, the monitors were messed up as I said and apps were back to opening ultra slow.

Last edited by Cheeto (2023-07-20 20:05:40)

Offline

#8 2023-07-20 20:55:40

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

What if you only boot the multi-user.target (2nd link below), no GDM/gnome?

Edit: btw, the journal shows you running gnome on wayland, not X11?

Last edited by seth (2023-07-20 21:01:48)

Offline

#9 2023-07-20 23:23:55

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

EDIT2: after spending more time, I think a big part of the problems may have been that I was somehow in Wayland, but that only remedies some of the tangential issues.
- apps open normally now, in x11
- lspci commands still take minutes to complete
- monitors are still completely borked
    - I have three identical monitors, 2560x1440 capable of 120hz (capable of much higher, but the one with the lowest grade cable--hdmi 2.0--can do 120hz, so I set them all to 120hz)
    - If I have two monitors going, they can both run fine at that resolution and framerate
    - If I add a third, the middle one goes down to 60hz
        - Sometimes it shows the option for 120hz, sometimes not, but whenever I enable it things flash a few times and it stays at 60hz even though it asks if I want to keep the changes
        - If I attempt to force the framerate in xrandr, shit gets weird but it ultimately turns the third monitor off and then because there are only two monitors again they are able to work at 120hz
    - This happens with or without any video cables plugged into the dGPU

The iGPU and motherboard are MORE than capable of running the three monitors at this resolution and framerate. They all work fine in windows at their max rates which is higher than 120, on the same configuration (except the one monitor on hdmi 2.0).

=============

EDIT: http://ix.io/4B5e

So after getting back on xorg, things were still wack, so I rebooted, things are still wack, and now I can't set the correct refresh rate on my middle monitor, and the cursor has vertical "scanlines".

All three monitors are identical and I've had them set to 120hz at 2560x1440 before in this exact setup so I know the system is capable of it (also windows has no problem doing it).

=============

I'm not sure what all to test in the TTY environment so I tried lspci -nn and lspci -k. They complete basically instantly, unlike in GNOME (unless I wait about 45min to 1.5hr as describe above, then everything starts happening at normal speeds).

Also this is crazy! I occasionally run echo $XDG_SESSION_TYPE just for the heck of it, and I even checked before logging out to boot into multi-user.target, and it WAS X11, and it has been every time I checked. I checked just now and it's saying wayland!

Any idea why or what's up with that? I don't want to get too sidetracked but now I'm wondering how much that's played into this.

Last edited by Cheeto (2023-07-21 02:06:37)

Offline

#10 2023-07-21 05:12:27

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

https://wiki.archlinux.org/title/GNOME/ … _available
https://wiki.archlinux.org/title/GDM#Wa … DIA_driver

Wayland was blocked while there were two GPUs, you removed one through vfio => you get a wayland session.
Why gnome on wayland then throws a tantrum, no idea (for now)

About the monitor situation, run the system w/o vfio and w/o blacklisting nvidia (so when everything is "good") and post your Xorg log and the output of "xrandr --verbose"

Finally

strace -tt -o /tmp/lspci.strace lspci

Offline

#11 2023-07-21 05:44:36

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

What do you think about this?

I didn't want to sit around with a broken system, so I reinstalled again. Same as before, except:

- This time I didn't change systemd to udev in mkinitcpio.conf HOOKS. I believe this may have fixed the insanely slow shutdowns and reboots, and logging back in after a logout, or restarting X. However, this means I can't use grub-overlay-btrfs, which is unfortunate but the costs in this setup outweigh the benefits.
- Made sure to constantly check to make sure I'm logging in with an xorg session. I believe this may have fixed apps taking minutes to open. But obviously it's hard to say.
- Finally, I waited until I'd isolated the dGPU (this way there were less monitors in the settings) before configuring my monitors. Now I've only had to set them once, and they remain configured as expected.

For now everything seems to be resolved.

I don't want to mark the issue as resolved yet for one reason: under every configuration so far over the past two weeks that I've been working on all this, the machine has had trouble waking from sleep, as mentioned in the OP. I want to make sure that's fixed, which means I'll need to check on it tomorrow.

I'll report back and hopefully be able to close the thread. If you have any thoughts on why replacing systemd with udev broke things, I'm all ears. However, it may be worth waiting to check on it tomorrow to confirm everything really is stable.

Offline

#12 2023-07-22 02:23:51

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

It still takes more than a minute to show the login screen when waking the machine, despite the machine and the monitors turning on instantly when I press the power button while it's suspended.

Per my comment here https://bbs.archlinux.org/viewtopic.php … 7#p2110647 it was always happening regardless of whether or how I configured anything at all, so it doesn't seem related to the udev HOOK or accidental wayland session.

Any ideas on this last issue? Otherwise the rest of the issues have cleared up like I said, and the system has been perfectly stable all day.

Last edited by Cheeto (2023-07-22 02:24:44)

Offline

#13 2023-07-22 13:11:05

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

What's in the journal after the S3?
It might help what's going one while the system sesms stalled.

Offline

#14 2023-07-22 16:09:25

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Well check this out https://pastebin.com/stiAP6je, it's the journalctl entries only since I pressed the power button this morning to wake it up.

If you look at entries with "d3cold" or "d0" or "power" or "inaccessible", it's clearly having trouble communicating with some PCI devices or groups.

Here are some lspci outputs if they help with context.

~$ lspci -k
00:00.0 Host bridge: Intel Corporation Device a700 (rev 01)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Device 8694
00:01.0 PCI bridge: Intel Corporation Device a70d (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: pcieport
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
	DeviceName: Onboard - Video
	Subsystem: ASUSTeK Computer Inc. Raptor Lake-S GT1 [UHD Graphics 770]
	Kernel driver in use: i915
	Kernel modules: i915
00:06.0 PCI bridge: Intel Corporation Device a74d (rev 01)
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: pcieport
00:0a.0 Signal processing controller: Intel Corporation Device a77d (rev 01)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: intel_vsec
	Kernel modules: intel_vsec
00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Volume Management Device NVMe RAID Controller Intel Corporation
	Kernel driver in use: vmd
	Kernel modules: vmd
00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH Shared SRAM
00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
	DeviceName: Onboard - Ethernet
	Subsystem: Intel Corporation Wi-Fi 6 AX201 160MHz
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi
00:15.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH Serial IO I2C Controller
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
00:15.1 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH Serial IO I2C Controller
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
00:15.2 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH Serial IO I2C Controller
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci
00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH HECI Controller
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
	DeviceName: Onboard - SATA
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH SATA Controller [AHCI Mode]
	Kernel driver in use: ahci
00:1a.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH PCI Express Root Port
	Kernel driver in use: pcieport
00:1b.0 PCI bridge: Intel Corporation Device 7ac0 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: pcieport
00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH PCI Express Root Port
	Kernel driver in use: pcieport
00:1c.3 PCI bridge: Intel Corporation Device 7abb (rev 11)
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: pcieport
00:1c.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH PCI Express Root Port
	Kernel driver in use: pcieport
00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH PCI Express Root Port
	Kernel driver in use: pcieport
00:1d.1 PCI bridge: Intel Corporation Device 7ab1 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Device 8694
	Kernel driver in use: pcieport
00:1d.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH PCI Express Root Port
	Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation Z690 Chipset LPC/eSPI Controller (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Z690 Chipset LPC/eSPI Controller
00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
	DeviceName: Onboard - Sound
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S HD Audio Controller
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH SMBus Controller
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
	DeviceName: Onboard - Other
	Subsystem: ASUSTeK Computer Inc. Alder Lake-S PCH SPI Controller
	Kernel driver in use: intel-spi
	Kernel modules: spi_intel_pci
01:00.0 VGA compatible controller: NVIDIA Corporation AD102 [GeForce RTX 4090] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] AD102 [GeForce RTX 4090]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] AD102 High Definition Audio Controller
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
02:00.0 Non-Volatile memory controller: Phison Electronics Corporation E18 PCIe4 NVMe Controller (rev 01)
	Subsystem: Phison Electronics Corporation E18 PCIe4 NVMe Controller
	Kernel driver in use: nvme
	Kernel modules: nvme
05:00.0 Ethernet controller: Aquantia Corp. AQC113CS NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 03)
	Subsystem: ASUSTeK Computer Inc. ProArt X570-CREATOR WIFI
	Kernel driver in use: atlantic
	Kernel modules: atlantic
06:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
	Subsystem: ASUSTeK Computer Inc. ASM1062 Serial ATA Controller
	Kernel driver in use: ahci
72:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
	Subsystem: ASUSTeK Computer Inc. Ethernet Controller I225-V
	Kernel driver in use: igc
	Kernel modules: igc
73:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
	Subsystem: Samsung Electronics Co Ltd SSD 970 EVO
	Kernel driver in use: nvme
	Kernel modules: nvme
~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a700] (rev 01)
00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d] (rev 01)
00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)
00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:a74d] (rev 01)
00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:a77d] (rev 01)
00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]
00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 [8086:7acc] (rev 11)
00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 [8086:7acd] (rev 11)
00:15.2 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 [8086:7ace] (rev 11)
00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
00:1a.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 [8086:7ac8] (rev 11)
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)
00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7abb] (rev 11)
00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
00:1d.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 [8086:7ab0] (rev 11)
00:1d.1 PCI bridge [0604]: Intel Corporation Device [8086:7ab1] (rev 11)
00:1d.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 [8086:7ab4] (rev 11)
00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
02:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E18 PCIe4 NVMe Controller [1987:5018] (rev 01)
05:00.0 Ethernet controller [0200]: Aquantia Corp. AQC113CS NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] [1d6a:94c0] (rev 03)
06:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
72:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
73:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

In the pastebin, lines after 753 show that some devices were removed; I'm not sure what that means, and it looks like it might have already been after I typed in my password, which means the displays were already on by that point.

Offline

#15 2023-07-22 19:14:01

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Jul 22 10:37:42 archlinux kernel: pcieport 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:01.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:03.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:02.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: thunderbolt 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: xhci_hcd 0000:3d:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: xhci_hcd 0000:3d:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:03.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:01.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:37:42 archlinux kernel: pcieport 0000:08:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 22 10:38:52 archlinux kernel: thunderbolt 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible

None of those show up in your lspci at all.
The main problem seems to be thunderbolt & boltd, check whether it's in the same IOMMU group as the GPU.
https://wiki.archlinux.org/title/PCI_pa … _are_valid

https://archlinux.org/packages/extra/x86_64/bolt/ is apparently dragged in by KDE/Gnome system settings but idk why it wants to look at the device to being with (given it's passed away to the VM)

Offline

#16 2023-07-22 20:24:44

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

None of those show up in your lspci at all.

Is it possible that that's because they've been disabled or whatever, during the wake-up, like the pastebin journal shows?

Here's `lspci -nn` on a clean boot, before the machine has slept overnight which is when the slow wake happens. Every one of those items is listed--and they're all thunderbolt.

~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a700] (rev 01)
00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d] (rev 01)
00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)
00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:a74d] (rev 01)
00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:a77d] (rev 01)
00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]
00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 [8086:7acc] (rev 11)
00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 [8086:7acd] (rev 11)
00:15.2 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 [8086:7ace] (rev 11)
00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
00:1a.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 [8086:7ac8] (rev 11)
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)
00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7abb] (rev 11)
00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
00:1d.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 [8086:7ab0] (rev 11)
00:1d.1 PCI bridge [0604]: Intel Corporation Device [8086:7ab1] (rev 11)
00:1d.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 [8086:7ab4] (rev 11)
00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
02:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E18 PCIe4 NVMe Controller [1987:5018] (rev 01)
05:00.0 Ethernet controller [0200]: Aquantia Corp. AQC113CS NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] [1d6a:94c0] (rev 03)
06:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
07:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
08:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
08:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
08:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
08:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
09:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137]
3d:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020] [8086:1138]
72:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
73:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

And to answer your other comment, the relevant groups appear to be 20 (nvidia) and 24 through 30 (thunderbolt).

~$ shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

IOMMU Group 0:
    00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)
IOMMU Group 1:
    00:00.0 Host bridge [0600]: Intel Corporation Device [8086:a700] (rev 01)
IOMMU Group 2:
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:a70d] (rev 01)
IOMMU Group 3:
    00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:a74d] (rev 01)
IOMMU Group 4:
    00:0a.0 Signal processing controller [1180]: Intel Corporation Device [8086:a77d] (rev 01)
IOMMU Group 5:
    00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation [8086:a77f]
IOMMU Group 6:
    00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
    00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
IOMMU Group 7:
    00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
IOMMU Group 8:
    00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 [8086:7acc] (rev 11)
    00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 [8086:7acd] (rev 11)
    00:15.2 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 [8086:7ace] (rev 11)
IOMMU Group 9:
    00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
IOMMU Group 10:
    00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
IOMMU Group 11:
    00:1a.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 [8086:7ac8] (rev 11)
IOMMU Group 12:
    00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)
IOMMU Group 13:
    00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
IOMMU Group 14:
    00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7abb] (rev 11)
IOMMU Group 15:
    00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
IOMMU Group 16:
    00:1d.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 [8086:7ab0] (rev 11)
IOMMU Group 17:
    00:1d.1 PCI bridge [0604]: Intel Corporation Device [8086:7ab1] (rev 11)
IOMMU Group 18:
    00:1d.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 [8086:7ab4] (rev 11)
IOMMU Group 19:
    00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
    00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
    00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
    00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
IOMMU Group 20:
    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
    01:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
IOMMU Group 21:
    02:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E18 PCIe4 NVMe Controller [1987:5018] (rev 01)
IOMMU Group 22:
    05:00.0 Ethernet controller [0200]: Aquantia Corp. AQC113CS NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] [1d6a:94c0] (rev 03)
IOMMU Group 23:
    06:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
IOMMU Group 24:
    07:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU Group 25:
    08:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU Group 26:
    08:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU Group 27:
    08:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU Group 28:
    08:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU Group 29:
    09:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137]
IOMMU Group 30:
    3d:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020] [8086:1138]
IOMMU Group 31:
    72:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
IOMMU Group 32:
    73:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

(And there don't appear to be any conflicts with unisolated CPU-based PCIe slots.)

Not sure of the importance but it may be worth noting that two of my three monitors use the thunderbolt ports.

Last edited by Cheeto (2023-07-22 20:27:43)

Offline

#17 2023-07-22 20:39:42

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Not sure of the importance but it may be worth noting that two of my three monitors use the thunderbolt ports.

There's a very easy test for this: remove them, see what happens wink

Offline

#18 2023-07-22 21:03:06

Cheeto
Member
Registered: 2022-09-11
Posts: 64

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Hehe I can tell you exactly what will happen: I'll have one monitor on a three monitor setup tongue

They work on a fresh boot, and they even work after waking from a night of suspend, so I'm not sure what "removed" means.

What's changing after 8 hours of suspend that's different from say 30min of suspend? As far as I can tell the only effect is has is the slow wake and some things being "removed".

After the initial slow wake, it continues to wake quickly from short suspends.

Offline

#19 2023-07-22 21:33:53

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 76,129

Re: GNOME/X11 system extremely buggy after modifying vfio.conf

Memory decay, common w/ nvidia setups (VRAM, not RAM) but you're passing that through.
What does the resume after a short suspend look like then? More thunderbolt issues?

And of course the ide awas to detach the TB monitors, send the system to sleep, go to bed and 8h later resume the system (w/o the TB monitors) and see whether you're still getting delays.

Offline

Board footer

Powered by FluxBB