You are not logged in.
I apologize in advance if this is not the correct forum, it seemed like the most appropriate category.
So I have had this installation for a year or two now and have done a successful PCI passthrough of a GeForce GTX 1060 to a Windows guest. There were some Windows related problems after that but the gist is that vfio_pci worked exactly as described.
FFW to today, I do not own the 1060 anymore, I have an RTX 4070 in my primary PCI slot (isolated to 01:00) and a GTX 1050 which I am attempting to passthrough to the guest (isolated to 04:00). I am using the proprietary NVIDIA driver. None of my disk drives are encrypted and I am not booting in CSM mode. I am also not modesetting, since KMS causes framebuffer lockups on my system.
Knowing that I already had an existing setup which was not installed to the initramfs but rather hooked to udev, I went back to my /etc/modprobe.d/vfio.conf and simply changed the vendor and device IDs to the 1050. Upon doing so, it did not hook successfully.
Arch Wiki suggested that I hook to nvidia instead of drm, however, this causes a system lock-up during "Create Static Device Nodes in /dev gracefully".
After this I tried to load vfio_pci and two the other required modules into the initramfs. However, this causes another lock-up the moment systemd-udev loads. If I remove the line from the kernel params that hooks it to the GTX 1050's device ID, then it boots successfully and it binds to the 1050 thanks to the options in vfio.conf, however, X11 crashes when Trinity Display Manager starts loading. I will see a loading cursor for a couple of seconds before the screen goes black and reverts to 800x600, any attempts to resuscitate X11 by this point will fail. X will just close instantly with no error message and it does not even output anything of interest to the journal. At this point nvidia and vfio_pci are both correctly loaded onto their assigned devices, but I cannot do anything with them without having a desktop.
I had double checked that my X config declares the right BusID for my host graphics adapter. It seems this was already done by the NVIDIA XServer Settings GUI previously.
As a last ditch effort after trying various combinations of different settings being on/off I decided to just start over and remove all of the configuration for vfio, but after doing this, vfio_pci STILL hooks to the GTX 1050 even when not instructed to do so, as long as the module is loaded into the initramfs. I have never followed any guide except for the PCI passthrough via OVMF guide on the Arch Wiki, so this should not be happening. This, predictably, causes X to crash.
At this point, the only thing I have not tried is to use NVreg_GpuBlacklist, but I don't know how to use it and I get the feeling it is unlikely to help me. I haven't ruled out the possibility that X crashing could be a Trinity thing.
Last edited by bonkmaykr (2024-06-11 22:48:15)
I bought shoes from my drug dealer. I dunno what he laced them with, but I've been tripping all day.
Website - KangWorlds - Screw Gravity!
Offline
If I remove the line from the kernel params that hooks it to the GTX 1050's device ID, then it boots successfully and it binds to the 1050 thanks to the options in vfio.conf, however, X11 crashes when Trinity Display Manager starts loading.
What kernel parameter and why is there one if you can successfully pick up the device w/ the modprobe.conf?
Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General and also your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
for that case.
vfio_pci STILL hooks to the GTX 1050 even when not instructed to do so, as long as the module is loaded into the initramfs
(likely) because the modprobe hook picked up the /etc/modprobe.d/vfio.conf - you'll have to regenerate the initramfs after changes to that file.
Online
Okay so I updated my system yesterday for unrelated reasons and now I can't reproduce that limbo state I was getting where I'd boot and TDM would instantly crash. It either doesn't hook or I get a lockup instantly. No longer any difference from how I specify IDs to pass to vfio_pci. So without getting past that systemd-udev screen I'm prettymuch screwed.
This is the best I can provide:
X11 https://pastebin.com/1iAfBBFc
systemd https://pastebin.com/9zxzyY5T
It is booting. I'm able to SSH into the system while the screen is frozen.
vfio_pci STILL hooks to the GTX 1050 even when not instructed to do so, as long as the module is loaded into the initramfs
(likely) because the modprobe hook picked up the /etc/modprobe.d/vfio.conf - you'll have to regenerate the initramfs after changes to that file.
That makes sense.
Last edited by bonkmaykr (2024-06-11 21:51:01)
I bought shoes from my drug dealer. I dunno what he laced them with, but I've been tripping all day.
Website - KangWorlds - Screw Gravity!
Offline
[ 4.215] (==) Using config file: "/etc/X11/xorg.conf"
Move that away.
Jun 11 16:04:21 bonkmaykr-arch kernel: pci 0000:01:00.0: [10de:2786] type 00 class 0x030000 PCIe Legacy Endpoint
Jun 11 16:04:21 bonkmaykr-arch kernel: pci 0000:01:00.1: [10de:22bc] type 00 class 0x040300 PCIe Endpoint
Jun 11 16:04:21 bonkmaykr-arch kernel: pci 0000:04:00.0: [10de:1c81] type 00 class 0x030000 PCIe Legacy Endpoint
Jun 11 16:04:21 bonkmaykr-arch kernel: pci 0000:04:00.1: [10de:0fb9] type 00 class 0x040300 PCIe Endpoint
Jun 11 16:04:21 bonkmaykr-arch kernel: vfio_pci: add [10de:2786[ffffffff:ffffffff]] class 0x000000/00000000
Jun 11 16:04:21 bonkmaykr-arch kernel: vfio_pci: add [10de:0fb9[ffffffff:ffffffff]] class 0x000000/00000000
Look at the product IDs that get added to vfio, what PCI devices do they belong to?
[ 4.218] (--) PCI: (0@0:2:0) 8086:4680:1849:4680 rev 12, Mem @ 0x6422000000/16777216, 0x4000000000/268435456, I/O @ 0x00007000/64
[ 4.218] (--) PCI:*(1@0:0:0) 10de:2786:196e:13cf rev 161, Mem @ 0x53000000/16777216, 0x6000000000/17179869184, 0x6400000000/33554432, I/O @ 0x00006000/128, BIOS @ 0x????????/524288
[ 4.218] (--) PCI: (4@0:0:0) 10de:1c81:1028:11c0 rev 161, Mem @ 0x51000000/16777216, 0x6410000000/268435456, 0x6420000000/33554432, I/O @ 0x00005000/128, BIOS @ 0x????????/524288
[ 4.216] (II) Platform probe for /sys/devices/pci0000:00/0000:00:1b.0/0000:04:00.0/drm/card0
[ 4.252] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[ 4.945] (--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:4:0:0
[ 4.946] (II) NVIDIA(G0): NVIDIA GPU NVIDIA GeForce GTX 1050 (GP107-A) at PCI:4:0:0
Online
Disregard my last post. I made a typo when replacing vfio.conf after I had deleted it yesterday.
This wasn't the issue from before.
Logs for real this time:
X11: https://pastebin.com/BQbhAgwN
systemd: https://pastebin.com/kY98acPF
I bought shoes from my drug dealer. I dunno what he laced them with, but I've been tripping all day.
Website - KangWorlds - Screw Gravity!
Offline
You can close this thread. I got curious and swapped to SDDM for a moment and the problem went away. This is a Trinity Display Manager bug. It seems to crash when vfio-pci hooks to one of the video cards and nothing is printed to the TDM journal so it was hard to pick up on.
I'll be sending this to the TDE mailing list. Thanks for trying to help
I bought shoes from my drug dealer. I dunno what he laced them with, but I've been tripping all day.
Website - KangWorlds - Screw Gravity!
Offline