You are not logged in.
Pages: 1
Followed the guide to set up identical GPU pass-through from the wiki (https://wiki.archlinux.org/title/PCI_pa … _host_GPUs), and ended up in an interesting position.
I have two Nvidia M4000's (it's what I had) and would like to share one with a Windows VM. Everything seems to work except:
If I load Nvidia modules first in mkinitcpio.conf I can see my screen during boot for LUKS encryption - but I won't get vfio pci passthrough on the second GPU
If I load vfio modules first, I can't see the screen when I need to type in the password for LUKS (since Nvidia is not loaded) but when I type inn the password and everything works - VFIO works as well.
mkinitcpio.conf:
MODULES=(vmd nvidia nvidia_modeset nvidia_uvm nvidia_drm vfio_pci vfio vfio_iommu_type1) - this show my screen, but no VFIO
MODULES=(vmd vfio_pci vfio vfio_iommu_type1 nvidia nvidia_modeset nvidia_uvm nvidia_drm ) - blank screen at boot, but VFIO works
FILES=(/usr/local/bin/vfio-pci-override.sh)
HOOKS=(base udev autodetect keyboard encrypt lvm2 keymap consolefont block filesystems modconf fsck)/boot/loader/entries/arch-lts.conf
title Arch Linux
linux /vmlinuz-linux-lts
initrd /intel-ucode.img
initrd /initramfs-linux-lts.img
options cryptdevice=UUID=xyz:luks root=/dev/mapper/vg0-root rw quiet splash intel_iommu=on/usr/local/bin/vfio-pci-override.sh
#!/bin/sh
DEVS="0000:08:00.0 0000:08:00.1"
if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done
fi
modprobe -i vfio-pciIs there some sort of order I can put the Modules/Hooks in, to allow for me to load VFIO, but also see when I need to input password and when I fail to put it in right, need to repeat?
Offline
You're probably booting on 0000:08:00.0, did you try to pass through the other GPU?
Offline
To check this I tried the "Passthrough all GPUs but the boot GPU" but same issue.
Once again, everything works fine once it's booted - it's just during the LUKS password phase where the screen is black.
#!/bin/sh
for i in /sys/bus/pci/devices/*/boot_vga; do
if [ $(cat "$i") -eq 0 ]; then
GPU="${i%/boot_vga}"
AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
USB="$(echo "$GPU" | sed -e "s/0$/2/")"
echo "vfio-pci" > "$GPU/driver_override"
if [ -d "$AUDIO" ]; then
echo "vfio-pci" > "$AUDIO/driver_override"
fi
if [ -d "$USB" ]; then
echo "vfio-pci" > "$USB/driver_override"
fi
fi
done
modprobe -i vfio-pciIt very much seems like because I am not loading the Nvidia driver, the screen is black when I need to input the password for LUKS, but if I load the Nvidia driver, VFIO cannot claim the graphics card that has already been loaded using the Nvidia driver?
Offline
It's supposed to be a driver override so I expect it will work .
Let's verify which gpu is seen as boot vga .
What is the output of
$ for i in /sys/bus/pci/devices/*/boot_vga; do echo $i ; doneDisliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
for i in /sys/bus/pci/devices/*/boot_vga; do echo $i ; done
/sys/bus/pci/devices/0000:01:00.0/boot_vga
/sys/bus/pci/devices/0000:08:00.0/boot_vgaOffline
The content of that file matters, notably wrt that script:
tail /sys/bus/pci/devices/*/boot_vgaOffline
tail /sys/bus/pci/devices/*/boot_vga
==> /sys/bus/pci/devices/0000:01:00.0/boot_vga <==
1
==> /sys/bus/pci/devices/0000:08:00.0/boot_vga <==
0Offline
0000:01:00.0 is the boot device, 0000:08:00.0 should™ be available for vfio.
Try a slightly adjusted script
#!/bin/bash
set +x
DEVS="0000:08:00.0 0000:08:00.1"
if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
tail -v /sys/bus/pci/devices/$DEV/driver_override
done
fi
modprobe -v -i vfio-pciPost the output and compare the outputs of
lspci -knns 0000:01:00.0
lspci -knns 0000:08:00.0before and after running it.
Offline
0000:01:00.0 is the boot device, 0000:08:00.0 should™ be available for vfio.
Try a slightly adjusted script
#!/bin/bash set +x DEVS="0000:08:00.0 0000:08:00.1" if [ ! -z "$(ls -A /sys/class/iommu)" ]; then for DEV in $DEVS; do echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override tail -v /sys/bus/pci/devices/$DEV/driver_override done fi modprobe -v -i vfio-pciPost the output and compare the outputs of
lspci -knns 0000:01:00.0 lspci -knns 0000:08:00.0before and after running it.
Before (with Nvidia After VFIO in Modules)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: vfio-pci
Kernel modules: nouveau, nvidia_drm, nvidiaAfter (With Nvidia After VFIO in Modules)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: vfio-pci
Kernel modules: nouveau, nvidia_drm, nvidiaNew script and NVIDIA before VFIO in modules
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidiaLast edited by rasmuoy (2023-08-20 18:37:10)
Offline
"With Nvidia After VIFO in Modules" both have 08:00.0 on vfio-pci (what's not much of a surprise), "NVIDIA before VIFO in modules" have both GPUs on nvidia - is the (single) output for that condition before or after running the script and most importantly: what's actually the output of the script?
Offline
"With Nvidia After VIFO in Modules" both have 08:00.0 on vfio-pci (what's not much of a surprise), "NVIDIA before VIFO in modules" have both GPUs on nvidia - is the (single) output for that condition before or after running the script and most importantly: what's actually the output of the script?
It's the same both before and after running the script (and rebooting with the script)
Output from the script:
==> /sys/bus/pci/devices/0000:08:00.0/driver_override <==
vfio-pci
==> /sys/bus/pci/devices/0000:08:00.1/driver_override <==
vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst But it leaves "Kernel driver in use" as nvidia.
Offline
Unload vfio-pci first, then re-run the script.
It also should have printed more because of the "set -x" … except i typo'd "set +x" - doesn't matter: the relevant overrides get applied so the script principally works.
Offline
with set -x and unloading vfio-pci:
+ DEVS='0000:08:00.0 0000:08:00.1'
++ ls -A /sys/class/iommu
+ '[' '!' -z 'dmar0
dmar1' ']'
+ for DEV in $DEVS
+ echo vfio-pci
+ tail -v /sys/bus/pci/devices/0000:08:00.0/driver_override
==> /sys/bus/pci/devices/0000:08:00.0/driver_override <==
vfio-pci
+ for DEV in $DEVS
+ echo vfio-pci
+ tail -v /sys/bus/pci/devices/0000:08:00.1/driver_override
==> /sys/bus/pci/devices/0000:08:00.1/driver_override <==
vfio-pci
+ modprobe -v -i vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci-core.ko.zst
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst rerunning lspci still shows nvidia driver
Offline
grumpf.
Does it help to re-bind the device?
echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.1 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe
echo 0000:08:00.1 | sudo tee /sys/bus/pci/drivers_probeEdit: both devices…
Last edited by seth (2023-08-20 20:12:43)
Offline
grumpf.
Does it help to re-bind the device?echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind echo 0000:08:00.1 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe echo 0000:08:00.1 | sudo tee /sys/bus/pci/drivers_probeEdit: both devices…
I'm probably doing something wrong, but when I try to run the commands, it just says:
bash: /sys/bus/pci/devices/0000:08:00.0/driver/unbind: No such file or directory
Is this a case of just make the drives and run the command again, or is something missing?
Offline
cat /sys/bus/pci/devices/0000:08:00.0/vendor
ls /sys/bus/pci/devices/0000:08:00.0/driverOffline
cat /sys/bus/pci/devices/0000:08:00.0/vendor
0x10de
ls /sys/bus/pci/devices/0000:08:00.0/driver
ls: cannot access '/sys/bus/pci/devices/0000:08:00.0/driver': No such file or directoryOffline
What is the condition of "lspci -knns 0000:08:00.0" at this point?
Did you try whether the device is actually available for pass-through now?
Sanity check:
ls /sys/bus/pci/devices/0000:01:00.0/driverOffline
ls /sys/bus/pci/devices/0000:01:00.0/driver
0000:01:00.0 bind module new_id remove_id uevent unbind
lspci -knns 0000:08:00.0
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel modules: nouveau, nvidia_drm, nvidiaThat is interesting! It is not persistent, so resets if I reboot, but I can run the 'tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind' commands again to get it back to this state.
Trying to run unload vfio-pci and running the script does nothing new though.
Offline
Trying to run unload vfio-pci and running the script does nothing new though.
lspci -knns 0000:08:00.0
echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
sudo modprobe -r vfio-pci
sudo /usr/local/bin/vfio-pci-override.sh
lspci -knns 0000:08:00.0
echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe
lspci -knns 0000:08:00.0Offline
'lspci -knns 0000:08:00.0'
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia'echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind'
Outputs '0000:08:00.0' but seems to crash the terminal, I cannot ctrl+c or anything to get out of it, and can only close the terminal.
'journalctl -k' says:
kernel: NVRM: Attempting to remove device 0000:08:00.0 with non-zero usage count!'lspci -knns 0000:08:00.0' at this time says:
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel modules: nouveau, nvidia_drm, nvidia'ls /sys/bus/pci/devices/0000:08:00.0/driver'
ls: cannot access '/sys/bus/pci/devices/0000:08:00.0/driver': No such file or directorythe driver folder is gone from the device (was there before).
'sudo modprobe -r vfio-pci' works just fine.
'sudo /usr/local/bin/vfio-pci-override.sh'
+ DEVS='0000:08:00.0 0000:08:00.1'
++ ls -A /sys/class/iommu
+ '[' '!' -z 'dmar0
dmar1' ']'
+ for DEV in $DEVS
+ echo vfio-pciThe script stops here, again freezing the terminal, and not allowing my to do anything but close down the terminal window.
cat /usr/local/bin/vfio-pci-override.sh
#!/bin/bash
set -x
DEVS="0000:08:00.0 0000:08:00.1"
if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
tail -v /sys/bus/pci/devices/$DEV/driver_override
done
fisudo modprobe -v -i vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst 'lspci -knns 0000:08:00.0'
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel modules: nouveau, nvidia_drm, nvidia'echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe'
Outputs '0000:08:00.0' with no error.
lspci -knns 0000:08:00.0
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
Kernel modules: nouveau, nvidia_drm, nvidiaFinally, 'journalctl -k' outputs after a little while after finishing all the steps:
kernel: INFO: task vfio-pci-overri:3109 blocked for more than 122 seconds.
kernel: Tainted: P OE 6.1.46-1-lts #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:vfio-pci-overri state:D stack:0 pid:3109 ppid:3108 flags:0x00000002
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x370/0x12c0
kernel: ? path_init+0x386/0x3c0
kernel: ? terminate_walk+0x61/0x100
kernel: schedule+0x5e/0xd0
kernel: schedule_preempt_disabled+0x15/0x30
kernel: __mutex_lock.constprop.0+0x39a/0x6a0
kernel: ? driver_set_override+0x7c/0x140
kernel: driver_set_override+0x94/0x140
kernel: driver_override_store+0x19/0x30
kernel: kernfs_fop_write_iter+0x133/0x1d0
kernel: vfs_write+0x236/0x3f0
kernel: ksys_write+0x6f/0xf0
kernel: do_syscall_64+0x5d/0x90
kernel: ? do_user_addr_fault+0x237/0x580
kernel: ? exc_page_fault+0x7c/0x180
kernel: entry_SYSCALL_64_after_hwframe+0x69/0xd3
kernel: RIP: 0033:0x7fca09b04664
kernel: RSP: 002b:00007ffeda99e3f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fca09b04664
kernel: RDX: 0000000000000009 RSI: 00005620669ac360 RDI: 0000000000000001
kernel: RBP: 00005620669ac360 R08: 0000000000000000 R09: 0000000000000001
kernel: R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000009
kernel: R13: 00007fca09c3f5c0 R14: 00007fca09bdd0e0 R15: 0000000000000000
kernel: </TASK>Offline
I still cannot make this work, but I appreciate all the help!
I might just get a AMD graphics card for the Linux machine, which will probably improve my experience anyway.
Offline
You could try to make this happen for the entire IOMMU group and earlier:
https://wiki.archlinux.org/title/PCI_pa … sed_of_GPU
https://wiki.archlinux.org/title/PCI_pa … stallation
Offline
Pages: 1