You are not logged in.

#1 2023-08-20 03:09:41

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

GPU passthrough

Followed the guide to set up identical GPU pass-through from the wiki (https://wiki.archlinux.org/title/PCI_pa … _host_GPUs), and ended up in an interesting position.

I have two Nvidia M4000's (it's what I had) and would like to share one with a Windows VM. Everything seems to work except:

If I load Nvidia modules first in mkinitcpio.conf I can see my screen during boot for LUKS encryption - but I won't get vfio pci passthrough on the second GPU
If I load vfio modules first, I can't see the screen when I need to type in the password for LUKS (since Nvidia is not loaded) but when I type inn the password and everything works - VFIO works as well.

mkinitcpio.conf:

MODULES=(vmd nvidia nvidia_modeset nvidia_uvm nvidia_drm vfio_pci vfio vfio_iommu_type1) - this show my screen, but no VFIO
MODULES=(vmd vfio_pci vfio vfio_iommu_type1 nvidia nvidia_modeset nvidia_uvm nvidia_drm ) - blank screen at boot, but VFIO works
FILES=(/usr/local/bin/vfio-pci-override.sh)
HOOKS=(base udev autodetect keyboard encrypt lvm2 keymap consolefont block filesystems modconf fsck)

/boot/loader/entries/arch-lts.conf

title   Arch Linux
linux   /vmlinuz-linux-lts
initrd  /intel-ucode.img
initrd  /initramfs-linux-lts.img
options cryptdevice=UUID=xyz:luks root=/dev/mapper/vg0-root rw quiet splash intel_iommu=on

/usr/local/bin/vfio-pci-override.sh

#!/bin/sh

DEVS="0000:08:00.0 0000:08:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    done
fi

modprobe -i vfio-pci

Is there some sort of order I can put the Modules/Hooks in, to allow for me to load VFIO, but also see when I need to input password and when I fail to put it in right, need to repeat?

Offline

#2 2023-08-20 08:21:13

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

You're probably booting on 0000:08:00.0, did you try to pass through the other GPU?

Offline

#3 2023-08-20 11:58:25

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

To check this I tried the "Passthrough all GPUs but the boot GPU" but same issue.

Once again, everything works fine once it's booted - it's just during the LUKS password phase where the screen is black.

#!/bin/sh

for i in /sys/bus/pci/devices/*/boot_vga; do
    if [ $(cat "$i") -eq 0 ]; then
        GPU="${i%/boot_vga}"
        AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
        USB="$(echo "$GPU" | sed -e "s/0$/2/")"
        echo "vfio-pci" > "$GPU/driver_override"
        if [ -d "$AUDIO" ]; then
            echo "vfio-pci" > "$AUDIO/driver_override"
        fi
        if [ -d "$USB" ]; then
            echo "vfio-pci" > "$USB/driver_override"
        fi
    fi
done

modprobe -i vfio-pci

It very much seems like because I am not loading the Nvidia driver, the screen is black when I need to input the password for LUKS, but if I load the Nvidia driver, VFIO cannot claim the graphics card that has already been loaded using the Nvidia driver?

Offline

#4 2023-08-20 12:12:14

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,001

Re: GPU passthrough

It's supposed to be  a driver override so I expect it will work .

Let's verify which gpu is seen as boot vga .

What is the output of

$ for i in /sys/bus/pci/devices/*/boot_vga; do echo $i ; done

Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#5 2023-08-20 12:21:46

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

for i in /sys/bus/pci/devices/*/boot_vga; do echo $i ; done

/sys/bus/pci/devices/0000:01:00.0/boot_vga
/sys/bus/pci/devices/0000:08:00.0/boot_vga

Offline

#6 2023-08-20 12:28:19

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

The content of that file matters, notably wrt that script:

tail /sys/bus/pci/devices/*/boot_vga

Offline

#7 2023-08-20 12:29:58

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

tail /sys/bus/pci/devices/*/boot_vga

==> /sys/bus/pci/devices/0000:01:00.0/boot_vga <==
1

==> /sys/bus/pci/devices/0000:08:00.0/boot_vga <==
0

Offline

#8 2023-08-20 12:41:44

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

0000:01:00.0 is the boot device, 0000:08:00.0 should™ be available for vfio.

Try a slightly adjusted script

#!/bin/bash
set +x
DEVS="0000:08:00.0 0000:08:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
        tail -v /sys/bus/pci/devices/$DEV/driver_override
    done
fi

modprobe -v -i vfio-pci

Post the output and compare the outputs of

lspci -knns 0000:01:00.0
lspci -knns 0000:08:00.0

before and after running it.

Offline

#9 2023-08-20 18:07:47

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

seth wrote:

0000:01:00.0 is the boot device, 0000:08:00.0 should™ be available for vfio.

Try a slightly adjusted script

#!/bin/bash
set +x
DEVS="0000:08:00.0 0000:08:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
        tail -v /sys/bus/pci/devices/$DEV/driver_override
    done
fi

modprobe -v -i vfio-pci

Post the output and compare the outputs of

lspci -knns 0000:01:00.0
lspci -knns 0000:08:00.0

before and after running it.



Before (with Nvidia After VFIO in Modules)

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

After (With Nvidia After VFIO in Modules)

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

New script and NVIDIA before VFIO in modules

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Last edited by rasmuoy (2023-08-20 18:37:10)

Offline

#10 2023-08-20 18:41:26

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

"With Nvidia After VIFO in Modules" both have 08:00.0 on vfio-pci (what's not much of a surprise), "NVIDIA before VIFO in modules" have both GPUs on nvidia - is the (single) output for that condition before or after running the script and most importantly: what's actually the output of the script?

Offline

#11 2023-08-20 18:48:30

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

seth wrote:

"With Nvidia After VIFO in Modules" both have 08:00.0 on vfio-pci (what's not much of a surprise), "NVIDIA before VIFO in modules" have both GPUs on nvidia - is the (single) output for that condition before or after running the script and most importantly: what's actually the output of the script?

It's the same both before and after running the script (and rebooting with the script)

Output from the script:

==> /sys/bus/pci/devices/0000:08:00.0/driver_override <==
vfio-pci
==> /sys/bus/pci/devices/0000:08:00.1/driver_override <==
vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst 

But it leaves "Kernel driver in use" as nvidia.

Offline

#12 2023-08-20 18:57:40

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

Unload vfio-pci first, then re-run the script.
It also should have printed more because of the "set -x" … except i typo'd "set +x" - doesn't matter: the relevant overrides get applied so the script principally works.

Offline

#13 2023-08-20 19:56:07

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

with set -x and unloading vfio-pci:

+ DEVS='0000:08:00.0 0000:08:00.1'
++ ls -A /sys/class/iommu
+ '[' '!' -z 'dmar0
dmar1' ']'
+ for DEV in $DEVS
+ echo vfio-pci
+ tail -v /sys/bus/pci/devices/0000:08:00.0/driver_override
==> /sys/bus/pci/devices/0000:08:00.0/driver_override <==
vfio-pci
+ for DEV in $DEVS
+ echo vfio-pci
+ tail -v /sys/bus/pci/devices/0000:08:00.1/driver_override
==> /sys/bus/pci/devices/0000:08:00.1/driver_override <==
vfio-pci
+ modprobe -v -i vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci-core.ko.zst 
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst 

rerunning lspci still shows nvidia driver

Offline

#14 2023-08-20 20:10:33

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

grumpf.
Does it help to re-bind the device?

echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.1 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe
echo 0000:08:00.1 | sudo tee /sys/bus/pci/drivers_probe

Edit: both devices…

Last edited by seth (2023-08-20 20:12:43)

Offline

#15 2023-08-20 20:59:40

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

seth wrote:

grumpf.
Does it help to re-bind the device?

echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.1 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe
echo 0000:08:00.1 | sudo tee /sys/bus/pci/drivers_probe

Edit: both devices…


I'm probably doing something wrong, but when I try to run the commands, it just says:
bash: /sys/bus/pci/devices/0000:08:00.0/driver/unbind: No such file or directory

Is this a case of just make the drives and run the command again, or is something missing?

Offline

#16 2023-08-20 21:15:13

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

cat /sys/bus/pci/devices/0000:08:00.0/vendor
ls /sys/bus/pci/devices/0000:08:00.0/driver

Offline

#17 2023-08-20 21:22:25

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

cat /sys/bus/pci/devices/0000:08:00.0/vendor
0x10de

ls /sys/bus/pci/devices/0000:08:00.0/driver
ls: cannot access '/sys/bus/pci/devices/0000:08:00.0/driver': No such file or directory

Offline

#18 2023-08-20 22:09:28

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

What is the condition of "lspci -knns 0000:08:00.0" at this point?
Did you try whether the device is actually available for pass-through now?

Sanity check:

ls /sys/bus/pci/devices/0000:01:00.0/driver

Offline

#19 2023-08-20 22:25:05

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

ls /sys/bus/pci/devices/0000:01:00.0/driver
0000:01:00.0  bind  module  new_id  remove_id  uevent  unbind

lspci -knns 0000:08:00.0
08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel modules: nouveau, nvidia_drm, nvidia

That is interesting! It is not persistent, so resets if I reboot, but I can run the 'tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind' commands again to get it back to this state.

Trying to run unload vfio-pci and running the script does nothing new though.

Offline

#20 2023-08-21 07:17:52

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Re: GPU passthrough

Trying to run unload vfio-pci and running the script does nothing new though.

lspci -knns 0000:08:00.0
echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind
sudo modprobe -r vfio-pci
sudo /usr/local/bin/vfio-pci-override.sh
lspci -knns 0000:08:00.0
echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe
lspci -knns 0000:08:00.0

Offline

#21 2023-08-21 13:18:03

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

'lspci -knns 0000:08:00.0'

08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

'echo 0000:08:00.0 | sudo tee /sys/bus/pci/devices/0000:08:00.0/driver/unbind'
Outputs '0000:08:00.0' but seems to crash the terminal, I cannot ctrl+c or anything to get out of it, and can only close the terminal.

'journalctl -k' says:

kernel: NVRM: Attempting to remove device 0000:08:00.0 with non-zero usage count!

'lspci -knns 0000:08:00.0' at this time says:

08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel modules: nouveau, nvidia_drm, nvidia

'ls /sys/bus/pci/devices/0000:08:00.0/driver'

ls: cannot access '/sys/bus/pci/devices/0000:08:00.0/driver': No such file or directory

the driver folder is gone from the device (was there before).

'sudo modprobe -r vfio-pci' works just fine.

'sudo /usr/local/bin/vfio-pci-override.sh'

+ DEVS='0000:08:00.0 0000:08:00.1'
++ ls -A /sys/class/iommu
+ '[' '!' -z 'dmar0
dmar1' ']'
+ for DEV in $DEVS
+ echo vfio-pci

The script stops here, again freezing the terminal, and not allowing my to do anything but close down the terminal window.

cat  /usr/local/bin/vfio-pci-override.sh
#!/bin/bash
set -x
DEVS="0000:08:00.0 0000:08:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
        tail -v /sys/bus/pci/devices/$DEV/driver_override
    done
fi
sudo modprobe -v -i vfio-pci
insmod /lib/modules/6.1.46-1-lts/kernel/drivers/vfio/pci/vfio-pci.ko.zst 

'lspci -knns 0000:08:00.0'

08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel modules: nouveau, nvidia_drm, nvidia

'echo 0000:08:00.0 | sudo tee /sys/bus/pci/drivers_probe'
Outputs '0000:08:00.0' with no error.

lspci -knns 0000:08:00.0

08:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Quadro M4000] [10de:1153]
        Kernel modules: nouveau, nvidia_drm, nvidia

Finally, 'journalctl -k' outputs after a little while after finishing all the steps:

kernel: INFO: task vfio-pci-overri:3109 blocked for more than 122 seconds.
kernel:       Tainted: P           OE      6.1.46-1-lts #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:vfio-pci-overri state:D stack:0     pid:3109  ppid:3108   flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x370/0x12c0
kernel:  ? path_init+0x386/0x3c0
kernel:  ? terminate_walk+0x61/0x100
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  ? driver_set_override+0x7c/0x140
kernel:  driver_set_override+0x94/0x140
kernel:  driver_override_store+0x19/0x30
kernel:  kernfs_fop_write_iter+0x133/0x1d0
kernel:  vfs_write+0x236/0x3f0
kernel:  ksys_write+0x6f/0xf0
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? do_user_addr_fault+0x237/0x580
kernel:  ? exc_page_fault+0x7c/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x69/0xd3
kernel: RIP: 0033:0x7fca09b04664
kernel: RSP: 002b:00007ffeda99e3f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fca09b04664
kernel: RDX: 0000000000000009 RSI: 00005620669ac360 RDI: 0000000000000001
kernel: RBP: 00005620669ac360 R08: 0000000000000000 R09: 0000000000000001
kernel: R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000009
kernel: R13: 00007fca09c3f5c0 R14: 00007fca09bdd0e0 R15: 0000000000000000
kernel:  </TASK>

Offline

#22 2023-08-27 22:36:54

rasmuoy
Member
Registered: 2023-07-19
Posts: 15

Re: GPU passthrough

I still cannot make this work, but I appreciate all the help!

I might just get a AMD graphics card for the Linux machine, which will probably improve my experience anyway.

Offline

#23 2023-08-28 06:47:17

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,494

Offline

Board footer

Powered by FluxBB