You are not logged in.

#1 2024-01-22 22:31:10

Azero
Member
Registered: 2024-01-22
Posts: 5

[Solved] NVIDIA GPU disappears from lspci when when trying to vfio

I'm currently trying to setup a GPU passthrough for a Windows VM, based off the guides of MentalOutlaw, as well as the dedicated page on ArchWiki: https://wiki.archlinux.org/title/PCI_pa … h_via_OVMF

I'm running into a strange issue however where whenever my PC boots up without an NVIDIA driver loaded, the gpu completely disappears from lspci.

For starters, my PC specs:

  • OS: Arch Linux

  • Motherboard: MSI MAG X670E TOMAHAWK WIFI

  • Kernel: 64 bit 6.7.0-arch3-1

  • DE: MATE 1.26.1

  • Graphics platform: X-Org

  • CPU: AMD Ryzen 7 7800X3D

  • GPU: AMD Radeon RX 580

  • GPU: NVIDIA GeForce RTX 3070 Lite Hash Rate (This is the GPU I'm trying to passthrough)

  • Memory: 31.3 GiB

Next, I'll show you how things look without enabling vfio:

Running iommu.sh as shown on the wiki page and searching for NVIDIA in less gives:

IOMMU Group 12:
        01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070 Lite Hash Rate] [10de:2488] (rev a1)
        01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)

and doing the same in lspci -nnk gives

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070 Lite Hash Rate] [10de:2488] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GA104 [GeForce RTX 3070 Lite Hash Rate] [1458:404c]
        Kernel modules: nouveau, nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GA104 High Definition Audio Controller [1458:404c]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

Finally a quick look at the relevant grub, mkinitconf, and vfio.conf files (with the unnecessary comments removed):

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b"
MODULES=(vfio_pci vfio vfio_iommu_type1 nvidia nvidia_modeset nvidia_uvm nvidia_drm)
HOOKS=(base udev autodetect modconf block lvm2 filesystems keyboard fsck)
options vfio-pci ids=10de:2488,10de:228b
softdep nvidia pre: vfio-pci

After running mkinitcpio and rebooting, running iommu.sh, IOMMU Group 12 is conspicuously missing

IOMMU Group 11:
        00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e0]
        00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e1]
        00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e2]
        00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e3]
        00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e4]
        00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e5]
        00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e6]
        00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e7]
IOMMU Group 13:
        02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)

Neofetch doesn't display the gpu other than a "GPU: AMD ATI 16:00.0 Raphael" above my regular 580 GPU, lspci doesn't have NVIDIA anywhere, etc.

The only clue I have is the dmesg which when grepping for vfio gives:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/volgroup0-lv_root rw loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b
[    0.027382] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/volgroup0-lv_root rw loglevel=3 amd_iommu=on vfio-pci.ids=10de:2488,10de:228b
[    3.267976] VFIO - User Level meta-driver version: 0.3
[    3.277201] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
[    3.277323] vfio_pci: add [10de:2488[ffffffff:ffffffff]] class 0x000000/00000000
[    3.323851] vfio_pci: add [10de:228b[ffffffff:ffffffff]] class 0x000000/00000000
[    7.000569] vfio-pci 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none

as well as a bunch of error messages that can be found within following the last message of the nvidia driver complaining

[    3.339015] nvidia: loading out-of-tree module taints kernel.
[    3.339020] nvidia: module license 'NVIDIA' taints kernel.
[    3.339021] Disabling lock debugging due to kernel taint
[    3.339023] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    3.339023] nvidia: module license taints kernel.
[    3.620913] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[    3.620916] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    3.621653] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    3.621653] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    3.621654] NVRM: No NVIDIA devices probed.
[    3.621742] nvidia-nvlink: Unregistered Nvlink Core, major device number 236
[    3.964131] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[    3.964135] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    3.964901] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    3.964902] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    3.964902] NVRM: No NVIDIA devices probed.

Is there anything I have overlooked or anything I need to find to potentially fix this?

P.S. This is my first time posting here, so I apologize if my formatting isn't great.

Last edited by Azero (2024-01-23 10:07:14)

Offline

#2 2024-01-22 22:55:08

seth
Member
Registered: 2012-09-03
Posts: 51,322

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

Please use [code][/code] tags, not "quote" tags. Edit your post in this regard.

That's the point of vfio, the device is "removed" from the host and only available for the guest.
You reasonably would however remove the nvidia modules from the initramfs

Online

#3 2024-01-22 23:28:33

Azero
Member
Registered: 2024-01-22
Posts: 5

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

seth wrote:

Please use [code][/code] tags, not "quote" tags. Edit your post in this regard.

That's the point of vfio, the device is "removed" from the host and only available for the guest.
You reasonably would however remove the nvidia modules from the initramfs

Thanks, I edited the post with these changes, and I removed the nvidia modules from the initramfs.

The issue is that based off the wiki page as well as the MO video, this isn't the expected behaviour.

According to the wikipage, upon running "lspci -nnk -d xxxx:xxxx", you should see a message similar to:

06:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau nvidia

Special emphasis here is on it listing vfio-pci as the kernel driver in use. I'm not seeing this, running "lspci -nnk -d 10de:2488" returns nothing, and there is no mention of NVIDIA or vfio anywhere within lspci.

Edit: By extension, the GPU isn't visible in VirtManager when trying to add a device.

Last edited by Azero (2024-01-22 23:29:25)

Offline

#4 2024-01-23 08:31:23

seth
Member
Registered: 2012-09-03
Posts: 51,322

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

and the actual "lspci -k" output when passing through the GPU

Online

#5 2024-01-23 09:06:10

Azero
Member
Registered: 2024-01-22
Posts: 5

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

seth wrote:

Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

and the actual "lspci -k" output when passing through the GPU

lspci: http://0x0.st/HGq5.txt
journalctl: https://0x0.st/HGqn.txt

Offline

#6 2024-01-23 09:45:44

seth
Member
Registered: 2012-09-03
Posts: 51,322

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

Jan 23 04:02:15 nickp kernel: nvidia: loading out-of-tree module taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 23 04:02:15 nickp kernel: nvidia: module license taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Jan 23 04:02:15 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 236
Jan 23 04:02:15 nickp systemd-modules-load[337]: Failed to insert module 'nvidia_uvm': No such device
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:17 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:18 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510

Why are you still (explicitly?) loading this? Make sure to remove it from the initramfs.

Online

#7 2024-01-23 10:06:50

Azero
Member
Registered: 2024-01-22
Posts: 5

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

seth wrote:
Jan 23 04:02:15 nickp kernel: nvidia: loading out-of-tree module taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jan 23 04:02:15 nickp kernel: nvidia: module license taints kernel.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Jan 23 04:02:15 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:15 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 236
Jan 23 04:02:15 nickp systemd-modules-load[337]: Failed to insert module 'nvidia_uvm': No such device
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:17 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:17 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
Jan 23 04:02:18 nickp kernel: NVRM: No NVIDIA GPU found.
Jan 23 04:02:18 nickp kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 510

Why are you still (explicitly?) loading this? Make sure to remove it from the initramfs.

Ok that was strange, I did remove all mentions of NVIDIA from the conf file.

That being said, I managed to solve the issue, which seems to have been caused by a udev rules file which contained lines that removed the devices if present. I'm not sure how it got there. (Path: /etc/udev/rules.d/00-remove-nvidia.rules)

Offline

#8 2024-01-23 10:09:41

seth
Member
Registered: 2012-09-03
Posts: 51,322

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

Online

#9 2024-01-23 10:13:29

Azero
Member
Registered: 2024-01-22
Posts: 5

Re: [Solved] NVIDIA GPU disappears from lspci when when trying to vfio

seth wrote:

Yes, most likely.

Offline

Board footer

Powered by FluxBB