You are not logged in.
I'm currently trying to setup a GPU passthrough for a Windows VM, based off the guides of MentalOutlaw, as well as the dedicated page on ArchWiki: https://wiki.archlinux.org/title/PCI_pa … h_via_OVMF
I'm running into a strange issue however where whenever my PC boots up without an NVIDIA driver loaded, the gpu completely disappears from lspci.
For starters, my PC specs:
OS: Arch Linux
Motherboard: MSI MAG X670E TOMAHAWK WIFI
Kernel: 64 bit 6.7.0-arch3-1
DE: MATE 1.26.1
Graphics platform: X-Org
CPU: AMD Ryzen 7 7800X3D
GPU: AMD Radeon RX 580
GPU: NVIDIA GeForce RTX 3070 Lite Hash Rate (This is the GPU I'm trying to passthrough)
Memory: 31.3 GiB
Next, I'll show you how things look without enabling vfio:
Running iommu.sh as shown on the wiki page and searching for NVIDIA in less gives:
IOMMU Group 12:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070 Lite Hash Rate] [10de:2488] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
and doing the same in lspci -nnk gives
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070 Lite Hash Rate] [10de:2488] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd GA104 [GeForce RTX 3070 Lite Hash Rate] [1458:404c]
Kernel modules: nouveau, nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd GA104 High Definition Audio Controller [1458:404c]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
Finally a quick look at the relevant grub, mkinitconf, and vfio.conf files (with the unnecessary comments removed):
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b"
MODULES=(vfio_pci vfio vfio_iommu_type1 nvidia nvidia_modeset nvidia_uvm nvidia_drm)
HOOKS=(base udev autodetect modconf block lvm2 filesystems keyboard fsck)
options vfio-pci ids=10de:2488,10de:228b
softdep nvidia pre: vfio-pci
After running mkinitcpio and rebooting, running iommu.sh, IOMMU Group 12 is conspicuously missing
IOMMU Group 11:
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e0]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e1]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e2]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e3]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e4]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e5]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e6]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e7]
IOMMU Group 13:
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
Neofetch doesn't display the gpu other than a "GPU: AMD ATI 16:00.0 Raphael" above my regular 580 GPU, lspci doesn't have NVIDIA anywhere, etc.
The only clue I have is the dmesg which when grepping for vfio gives:
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/volgroup0-lv_root rw loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b
[ 0.027382] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/volgroup0-lv_root rw loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b
[ 3.267976] VFIO - User Level meta-driver version: 0.3
[ 3.277201] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
[ 3.277323] vfio_pci: add [10de:2488[ffffffff:ffffffff]] class 0x000000/00000000
[ 3.323851] vfio_pci: add [10de:228b[ffffffff:ffffffff]] class 0x000000/00000000
[ 7.000569] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
as well as a bunch of error messages that can be found within following the last message of the nvidia driver complaining
[ 3.339015] nvidia: loading out-of-tree module taints kernel.
[ 3.339020] nvidia: module license 'NVIDIA' taints kernel.
[ 3.339021] Disabling lock debugging due to kernel taint
[ 3.339023] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 3.339023] nvidia: module license taints kernel.
[ 3.620913] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 3.620916] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 3.621653] NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 3.621653] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 3.621654] NVRM: No NVIDIA devices probed.
[ 3.621742] nvidia-nvlink: Unregistered Nvlink Core, major device number 236
[ 3.964131] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 3.964135] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[ 3.964901] NVRM: This can occur when a driver such as:
NVRM: nouveau, rivafb, nvidiafb or rivatv
NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 3.964902] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 3.964902] NVRM: No NVIDIA devices probed.
Is there anything I have overlooked or anything I need to find to potentially fix this?
P.S. This is my first time posting here, so I apologize if my formatting isn't great.
Last edited by Azero (2024-01-23 10:07:14)
Offline
Please use [code][/code] tags, not "quote" tags. Edit your post in this regard.
That's the point of vfio, the device is "removed" from the host and only available for the guest.
You reasonably would however remove the nvidia modules from the initramfs
Offline
Please use [code][/code] tags, not "quote" tags. Edit your post in this regard.
That's the point of vfio, the device is "removed" from the host and only available for the guest.
You reasonably would however remove the nvidia modules from the initramfs
Thanks, I edited the post with these changes, and I removed the nvidia modules from the initramfs.
The issue is that based off the wiki page as well as the MO video, this isn't the expected behaviour.
According to the wikipage, upon running "lspci -nnk -d xxxx:xxxx", you should see a message similar to:
06:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
Kernel driver in use: vfio-pci
Kernel modules: nouveau nvidia
Special emphasis here is on it listing vfio-pci as the kernel driver in use. I'm not seeing this, running "lspci -nnk -d 10de:2488" returns nothing, and there is no mention of NVIDIA or vfio anywhere within lspci.
Edit: By extension, the GPU isn't visible in VirtManager when trying to add a device.
Last edited by Azero (2024-01-22 23:29:25)
Offline
Please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
and the actual "lspci -k" output when passing through the GPU
Offline
Please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
and the actual "lspci -k" output when passing through the GPU
lspci: http://0x0.st/HGq5.txt
journalctl: https://0x0.st/HGqn.txt
Offline
Jan 23 04:02:15 nickp kernel: nvidia: loading out-of-tree module taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 23 04:02:15 nickp kernel: nvidia: module license taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Jan 23 04:02:15 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 236
Jan 23 04:02:15 nickp systemd-modules-load[337]: Failed to insert module 'nvidia_uvm': No such device
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:17 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:18 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510
Why are you still (explicitly?) loading this? Make sure to remove it from the initramfs.
Offline
Jan 23 04:02:15 nickp kernel: nvidia: loading out-of-tree module taints kernel. Jan 23 04:02:15 nickp kernel: nvidia: module license 'NVIDIA' taints kernel. Jan 23 04:02:15 nickp kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel Jan 23 04:02:15 nickp kernel: nvidia: module license taints kernel. Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236 Jan 23 04:02:15 nickp kernel: NVRM: No NVIDIA GPU found. Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 236 Jan 23 04:02:15 nickp systemd-modules-load[337]: Failed to insert module 'nvidia_uvm': No such device Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510 Jan 23 04:02:17 nickp kernel: NVRM: No NVIDIA GPU found. Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510 Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510 Jan 23 04:02:18 nickp kernel: NVRM: No NVIDIA GPU found. Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510
Why are you still (explicitly?) loading this? Make sure to remove it from the initramfs.
Ok that was strange, I did remove all mentions of NVIDIA from the conf file.
That being said, I managed to solve the issue, which seems to have been caused by a udev rules file which contained lines that removed the devices if present. I'm not sure how it got there. (Path: /etc/udev/rules.d/00-remove-nvidia.rules)
Offline
You added and forgot about it?
https://wiki.archlinux.org/title/hybrid … udev_rules
Offline
You added and forgot about it?
https://wiki.archlinux.org/title/hybrid … udev_rules
Yes, most likely.
Offline