You are not logged in.

#1 2018-01-28 12:30:17

jkhsjdhjs
Member
Registered: 2017-09-05
Posts: 39

Binding primary nvidia gpu to vfio

Hey,

I'm currently trying to pass my primary gpu (also selected in bios as primary) to a windows vm I don't yet have, but eventually will have in the future.

In order to do that I updated my gpu to a bios that supports uefi, added intel_iommu=on iommu=pt as kernel paramters and added MODULES=(vfio vfio_iommu_type1 vfio_pci vfio_virqfd) to my /etc/mkinitcpio.conf and of course regenerated my initramfs afterwards and rebooted.

Then I created myself a little script for unbinding all drivers from my nvidia gpu and binding the vconsole to my internal intel gpu:

#!/usr/bin/env bash

pci_bus_id_nvidia_gpu="0000:01:00.0"
pci_bus_id_nvidia_audio="0000:01:00.1"

device_id_nvidia_gpu="10de 11c0"
device_id_nvidia_audio="10de 0e0b"

# stop xorg
systemctl stop sddm

# unbind virtual console
echo 0 > /sys/class/vtconsole/vtcon1/bind

# unbind efi framebuffer
echo "efi-framebuffer.0" > /sys/bus/platform/devices/efi-framebuffer.0/driver/unbind

# create new vfio-pci devices with ids of gpu and hdmi audio
echo "$device_id_nvidia_gpu" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "$device_id_nvidia_audio" > /sys/bus/pci/drivers/vfio-pci/new_id

# unbind gpu and hdmi audio
echo "$pci_bus_id_nvidia_gpu" > /sys/bus/pci/devices/"$pci_bus_id_nvidia_gpu"/driver/unbind
echo "$pci_bus_id_nvidia_audio" > /sys/bus/pci/devices/"$pci_bus_id_nvidia_audio"/driver/unbind

# bind gpu and hdmi audio to vfio-pci
echo "$pci_bus_id_nvidia_gpu" > /sys/bus/pci/drivers/vfio-pci/bind
echo "$pci_bus_id_nvidia_audio" > /sys/bus/pci/drivers/vfio-pci/bind

# remove created ids
echo "$device_id_nvidia_gpu" > /sys/bus/pci/drivers/vfio-pci/remove_id
echo "$device_id_nvidia_audio" > /sys/bus/pci/drivers/vfio-pci/remove_id

# bind virtual console to internal gpu
echo 1 > /sys/class/vtconsole/vtcon1/bind

# start xorg again
systemctl start sddm

I think the unbinding of the nvidia gpu works fine, the screen freezes with the text that was last displayed on the screen. Also the vconsole with the text that was previously displayed on the nvidia screen is displayed on the second screen, but flickers. It constantly switches from tty2 (where i ran the script) to tty1, where the boot process was shown when booting. tty1 is displayed for about 1 second, then it switches to tty2 for about 0.2 seconds. This is repeated endlessly.

I think that this behavior is caused by sddm constantly crashing and starting again. Here you can find my sddm logs: https://gist.github.com/jkhsjdhjs/b7b75 … dffdc9d9a4

However, if I remove the systemctl start sddm from my script it shows a frozen vconsole on my intel screen and keeps using the nvidia screen afterwards.

Now I have two questions:

  1. What is causing sddm to crash and what can I do in order to prevent it?

  2. I currently have to run my script from outside of the graphical environment, because when Xorg is killed my script is killed too. What can I do to run the script from my graphical environment and keep the script alive after Xorg is killed?

Thanks in advance!

Last edited by jkhsjdhjs (2018-02-02 16:54:10)

Offline

#2 2018-01-29 13:27:16

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,964

Re: Binding primary nvidia gpu to vfio

1. the logs suggests it has to do with authenticating.
I'm no systemd/logind wizard, but i do think Logind treats disappereance of the vt it's started/logged on from as breaking it's session.

2.

What can I do to run the script from my graphical environment and keep the script alive after Xorg is killed?

That is almost a contradictio in terminis.

Maybe you can solve both issues by running multiple X instances.

Instance A would have sddm and your script.
All other Instances could then get their own logind sessions.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#3 2018-01-31 14:39:38

jkhsjdhjs
Member
Registered: 2017-09-05
Posts: 39

Re: Binding primary nvidia gpu to vfio

First of all, thanks for your reply!

But I think systemd-logind isn't causing the problem here.
Apparently X is unable to find any screens. The following logs are written when I run X from command line as root, after binding my nvidia card to vfio: https://gist.github.com/jkhsjdhjs/030ab … cba72dab12
For comparison: These logs are written when I run X from command line with my nvidia card still bound to the proprietary nvidia driver: https://gist.github.com/jkhsjdhjs/936a7 … a9f05fb26e

This problem might be related to one I had earlier (and still have): https://bbs.archlinux.org/viewtopic.php?id=229719

Any ideas why X is unable to find my intel screen?

Last edited by jkhsjdhjs (2018-01-31 21:17:35)

Offline

#4 2018-01-31 14:43:50

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: Binding primary nvidia gpu to vfio

There's no need to use expiring pastebin services for such small amounts of text - It reduces the long-term usefulness of a thread. Instead just put the information directly into your post inbetween [code⁣] [/code⁣] tags. Also please read this note.


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#5 2018-01-31 14:47:31

jkhsjdhjs
Member
Registered: 2017-09-05
Posts: 39

Re: Binding primary nvidia gpu to vfio

slithery wrote:

There's no need to use expiring pastebin services for such small amounts of text - It reduces the long-term usefulness of a thread. Instead just put the information directly into your post inbetween [code⁣] [/code⁣] tags. Also please read this note.

If you're referring to my script, I understand. But all other pastes aren't small amounts of text. I'll edit my first post and insert the script in code tags. Thanks!

Offline

#6 2018-01-31 14:50:38

Slithery
Administrator
From: Norfolk, UK
Registered: 2013-12-01
Posts: 5,776

Re: Binding primary nvidia gpu to vfio

Sorry, I only looked at the first paste. <300 lines is small smile
More important is to not use pastebin as a lot of us can't access it.


No, it didn't "fix" anything. It just shifted the brokeness one space to the right. - jasonwryan
Closing -- for deletion; Banning -- for muppetry. - jasonwryan

aur - dotfiles

Offline

#7 2018-02-02 19:12:11

jkhsjdhjs
Member
Registered: 2017-09-05
Posts: 39

Re: Binding primary nvidia gpu to vfio

jkhsjdhjs wrote:

However, if I remove the systemctl start sddm from my script it shows a frozen vconsole on my intel screen and keeps using the nvidia screen afterwards.

I just tried it again, and it seems like I was wrong here. If I don't start sddm afterwards the vconsole is visible on my intel screen, like it should.

I also tried to disconnecting my first monitor from my nvidia card, setting my intel card as primary in bios and booted. Then my system boots like usually, but when it reaches sddm no picture is shown, my second monitor just turns off. I guess sddm still displays it on my nvdia card, but since no monitor is attached nothing is displayed.
So I tried removing the nvidia card from my system and booted again, still with intel as my primary graphics card. Now sddm works fine with my intel card, it displays the login screen without problems. I was also able to switch to tty2, stop sddm and start Xorg manually from the vconsole without errors.

So apparently Xorg only detects my intel screen when I don't have a nvidia card installed, because it said no screens found, when I had my nvidia card attached to vfio-pci, but still installed in the system.

EDIT: Just for clarification:
lspci lists my intel graphics device as Display controller, it just seems like Xorg is ignoring it if I have my nvidia card installed.
When logged in and KDE is running my intel gpu works fine, but I think that's just because if KDE's screen management software. Without kscreen Xorg wouldn't be able to use my intel card, except when I remove my nvidia card.

EDIT2: xrandr --listproviders outputs the following when logged in KDE:

Providers: number : 2
Provider 0: id: 0x2c5 cap: 0x1, Source Output crtcs: 4 outputs: 6 associated providers: 1 name:NVIDIA-0
Provider 1: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 4 associated providers: 1 name:modesetting

EDIT3: I don't have any xorg config in /etc/X11/xorg.conf.d related to screens. Just one 00-keyboard.conf.

Last edited by jkhsjdhjs (2018-02-03 17:46:34)

Offline

#8 2018-02-03 18:25:15

jkhsjdhjs
Member
Registered: 2017-09-05
Posts: 39

Re: Binding primary nvidia gpu to vfio

Okay, I'm able to do this now with X restart. To do this I have to create an intel drop-in config in /etc/X11/xorg.conf.d after I stopped sddm:

Section "Device"
    Identifier      "Intel Graphics"
    Driver          "intel"
    BusID           "PCI:0:2:0"
EndSection

This enables Xorg detecting my internal gpu and also disabled the recognition of my nvidia gpu for some reason, but since I want to bind my nvidia gpu to vfio-pci anyways this doesn't matter.

What I'm trying to accomplish now is to pass my nvidia gpu without having to restart X.

To do this I have to completely remove my nvidia GPU from Xorg. Till now I was only able to swap my screens and disable the nvidia screen:

#!/usr/bin/env bash

function getName {
    echo "$1" | rev | cut -d ' ' -f 1 | rev
}

screens="$(xrandr --listmonitors | tail -n+2)"
primary="$(getName "$(echo "$screens" | grep '\*')")"
secondary="$(getName "$(echo "$screens" | grep -v '\*')")"

if [[ "$primary" -eq "" ]]; then
    echo "ERROR: No primary screen"
    exit 1
fi

#swap nvidia and intel
xrandr --output "$secondary" --primary --left-of "$primary"
#turn nvidia screen off
xrandr --output "$primary" --off

But apparently turning my nvidia screen off doesn't completely detach it from Xorg since I can't rebind my nvidia gpu at this point. In other posts I've read that I can only do this using an open source driver (nouveau for example). But is there really no way doing this with the propreitary driver?

EDIT:
I tried running

# rmmod -f nvidia_drm
# rmmod -f nvidia_modeset
# rmmod -f nvidia

after running my screen-swap script above, but the last command isn't returning. If I try to abort it using ctrl+c the kernel starts using 100% cpu on one core. Can't kill that process using sigkill or whatsoever, guess the kernel just crashes then.

I also tried running it before swapping screens, when my nvidia screen is still active. In this case my nvidia screen turns off after removing the nvidia_modeset module and the rest of my system completely hangs up within 20 seconds.

Furthermore I tried disabling/disconnecting my GPU - before removing the kernel modules - using

# echo 0 > /sys/bus/pci/devices/0000:01:00.0/enable #gpu works still fine
# echo 0 > /sys/bus/pci/devices/0000:01:00.1/enable #gpu works still fine
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove #command hangs, just like rmmod -f nvidia

EDIT2: I think I'll switch to wayland in the next time, so I'll see what that gives me. Won't research further until then.

Last edited by jkhsjdhjs (2018-02-08 17:31:03)

Offline

Board footer

Powered by FluxBB