You are not logged in.

#1 2024-09-19 20:53:45

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Switching from one dedicated GPU to another

Hello.

I want to switch my system from using my RTX 4060 in PCIe slot 3 to GTX 970 in PCIe slot 4,
so that I can passthrough the 4060 to a Windows virt-manager VM while still using GTX 970 on the host.

I tried simply isolating the GPU using gpu-passthrough-manager, thinking Arch would pick up on that and use 970 instead,
but it still tried to use 4060 despite it being isolated with VFIO, so I had to boot into a live USB to remove the VFIO kernel parameter.

Can I somehow switch from 4060 to 970 without physically switching them around,
and, since I'd like to keep using Wayland on the host, without using optimus-manager?

Thanks.

Offline

#2 2024-09-19 22:26:55

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

it still tried to use 4060 despite it being isolated with VFIO

What is "it"? And how do you conclude this? You do have an output wired to the GTX970?
Can you still boot the multi-user.target (2nd link below)?

Please post your complete system journal for a boot w/ the 4060 passed through, eg.

sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st

for the previous (-1) one.

Unless you're booting a UKI, you btw. don't need a live distro to remove the kernel parameter, but can likely edit the kernel commandline from the bootloader?

Offline

#3 2024-09-19 22:41:16

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

By "it" I meant basically the entire system. Neither the bootloader, nor Plasma, nor TTYs did output to GTX 970.

I do have a monitor wired to the GTX 970 through DVI and a TV wired to the RTX 4060 via HDMI.
When both GPUs are used, 4060 handles the renderer and translates the image through 970's DVI port,
that's what I gathered from "glxinfo" and "glxgears -info", since both GPUs output image, but only 4060 is reported.

The system does boot up normally with 4060 isolated though,
I could login to TTY4 blind and run "shutdown now" successfully.

And yeah, I do use UKIs for Secure Boot and LUKS2.

Could you please tell me how I could find out which journal I should post?
Pretty sure I did a couple more boots since I've disabled vfio-pci on RTX 4060,
so the previous one might not be it.

Offline

#4 2024-09-19 22:49:30

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

The bootloader happens way before anything gets passed through? Did it still show on the TV?

You'll have to grep the journals for "vfio-pci.ids"

i=1; while ! journalctl -b -${i} | grep 'vfio-pci.ids'; do ((++i)); done;  journalctl -b -${i}

Offline

#5 2024-09-20 00:04:28

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

It shows for a few seconds, but freezes right before LUKS password prompt appears.
I can still enter the password, but the image on the TV won't update.

Also, the script didn't seem to find anything, it just eventually went to this

No journal boot entry found from the specified boot (-32).
No journal boot entry found from the specified boot (-33).
No journal boot entry found from the specified boot (-34).

Until I interrupted it.

Offline

#6 2024-09-20 07:33:02

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

When you fail to boot w/ vfio-pci.ids set, do you reboot w/ the power button?
Try ctrl+alt+del or https://wiki.archlinux.org/title/Keyboa … el_(SysRq) to preserve the journal.

Offline

#7 2024-09-20 13:42:02

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

I think I got it - https://0x0.st/X30j.txt

Offline

#8 2024-09-20 14:23:41

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

Sep 20 18:24:27 archlinux kernel: nvidia 0000:04:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Sep 20 18:24:28 archlinux kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:04:00.0 on minor 0

So no surprises there.
But

Sep 20 18:24:54 ABLPHA sddm[770]: Initializing...
Sep 20 18:24:54 ABLPHA sddm[770]: Starting...
Sep 20 18:24:54 ABLPHA sddm[770]: Logind interface found
Sep 20 18:24:54 ABLPHA sddm[770]: Adding new display...
Sep 20 18:24:54 ABLPHA sddm[770]: Loaded empty theme configuration
Sep 20 18:24:54 ABLPHA sddm[770]: Xauthority path: "/run/sddm/xauth_Hywzvm"
Sep 20 18:24:54 ABLPHA sddm[770]: Using VT 2
Sep 20 18:24:54 ABLPHA sddm[770]: Display server starting...
Sep 20 18:24:54 ABLPHA sddm[770]: Writing cookie to "/run/sddm/xauth_Hywzvm"
Sep 20 18:24:54 ABLPHA sddm[770]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_Hywzvm -noreset -displayfd 16
Sep 20 18:24:55 ABLPHA sddm[770]: Failed to read display number from pipe
Sep 20 18:24:55 ABLPHA sddm[770]: Display server stopping...
Sep 20 18:24:55 ABLPHA sddm[770]: Attempt 1 starting the Display server on vt 2 failed
Sep 20 18:24:57 ABLPHA sddm[770]: Display server starting...
Sep 20 18:24:57 ABLPHA sddm[770]: Writing cookie to "/run/sddm/xauth_Hywzvm"
Sep 20 18:24:57 ABLPHA sddm[770]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_Hywzvm -noreset -displayfd 16
Sep 20 18:24:57 ABLPHA sddm[770]: Failed to read display number from pipe
Sep 20 18:24:57 ABLPHA sddm[770]: Display server stopping...
Sep 20 18:24:57 ABLPHA sddm[770]: Attempt 2 starting the Display server on vt 2 failed
Sep 20 18:24:59 ABLPHA sddm[770]: Display server starting...
Sep 20 18:24:59 ABLPHA sddm[770]: Writing cookie to "/run/sddm/xauth_Hywzvm"
Sep 20 18:24:59 ABLPHA sddm[770]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_Hywzvm -noreset -displayfd 16
Sep 20 18:25:00 ABLPHA sddm[770]: Failed to read display number from pipe
Sep 20 18:25:00 ABLPHA sddm[770]: Display server stopping...
Sep 20 18:25:00 ABLPHA sddm[770]: Attempt 3 starting the Display server on vt 2 failed
Sep 20 18:25:00 ABLPHA sddm[770]: Could not start Display server on vt 2
Sep 20 18:32:21 ABLPHA sddm[770]: Signal received: SIGTERM
Sep 20 18:32:21 ABLPHA systemd[1]: sddm.service: Deactivated successfully.
Sep 20 18:32:21 ABLPHA systemd[1]: sddm.service: Consumed 1.472s CPU time, 252.1M memory peak.

sddm fails to start X11, please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General  - afaict you can ssh into the system so secure if from the live system after the failure (otherwise prevent sddm from starting or look at the .old log)

Offline

#9 2024-09-20 14:40:23

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

I believe this is the Xorg log from the same boot as the journal - https://0x0.st/X3Gr.txt

Offline

#10 2024-09-20 14:50:14

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

Good:

[    34.222] (II) NVIDIA(G0): NVIDIA GPU NVIDIA GeForce GTX 970 (GM204-A) at PCI:4:0:0
…
[    34.364] (--) NVIDIA(GPU-1): Samsung SMBX2440 (DFP-0): connected
[    34.364] (--) NVIDIA(GPU-1): Samsung SMBX2440 (DFP-0): Internal TMDS
[    34.364] (--) NVIDIA(GPU-1): Samsung SMBX2440 (DFP-0): 330.0 MHz maximum pixel clock

Bad:

[    34.369] (II) NVIDIA(G0): Validated MetaModes:
[    34.369] (II) NVIDIA(G0):     "NULL"
[    34.369] (II) NVIDIA(G0): Virtual screen size determined to be 640 x 480
[    34.378] (WW) NVIDIA(G0): Cannot find size of first mode for Samsung SMBX2440 (DFP-0);
[    34.378] (WW) NVIDIA(G0):     cannot compute DPI from Samsung SMBX2440 (DFP-0)'s EDID.
[    34.378] (==) NVIDIA(G0): DPI set to (75, 75); computed from built-in default

Ugly:

[    34.378] (EE) Screen(s) found, but none have a usable configuration.
…
[    34.378] (EE) Server terminated with error (1). Closing log file.

You've "nvidia_drm.modeset=1", so the problem is the Samsung SMBX2440
Can you extract an edid from it (eg. when booting w/ both GPUs or attaching it to the other GPU)

for OUT in /sys/class/drm/card*; do echo $OUT; edid-decode $OUT/edid; echo "================="; done

Will print all edids, you'll need https://aur.archlinux.org/packages/edid-decode-git but you actually just need the binary /sys/class/drm/card*/edid file to inject it into the system:
https://wiki.archlinux.org/title/Kernel … s_and_EDID
Though for X11 and nvidia you can also add it in a xorg configlet,

Option "CustomEDID" "DFP-0:/var/stuff/samsung_edid.bin"

Offline

#11 2024-09-20 15:46:27

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

Here's the output of the script - https://0x0.st/X3G4.txt

So, if I understand correctly, I need to copy the edid file that reports the monitor correctly and then specify it in the kernel command line?

I do use Wayland in Plasma session though, will that still work if I only modify the xorg configlet?

Offline

#12 2024-09-20 15:50:11

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

will that still work if I only modify the xorg configlet?

Nope.

/sys/class/drm/card1-DVI-I-1/edid looks ok, though (and the correct output name of the injection would then be "DVI-I-1")

Offline

#13 2024-09-20 18:37:26

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

Nothing seems to have changed after installing custom edid, it behaves exactly the same during startup. Xorg log - https://0x0.st/X3kc.txt

Kernel command line

rd.luks.name=7e8b3f21-2d15-46ec-8682-951972ede506=root root=/dev/mapper/root rw rootflags=subvol=/@ drm.edid_firmware=DVI-D-1:edid/samsung.bin intel_iommu=on iommu=pt rd.driver.pre=vfio-pci nvidia_drm.modeset=1

mkinitcpio.conf

FILES=(/usr/lib/firmware/edid/samsung.bin)

I did replug the monitor from DVI-I into DVI-D since I realized that the monitor and the cable aren’t actually DVI-I. And I did copy the file from /sys/class/drm/… and confirmed that the contents are the same.

Offline

#14 2024-09-20 19:55:06

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

Check the systmem journal about whether the edid got actually applied, resp. the in-place edid when ssh'ing into the "dead" system.
As long as you're testing on X11 anyway, add the CustomEDID to check whether
a) it's picked up by the X11 server
b) that changes anything

Offline

#15 2024-09-21 20:28:47

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

Not sure how to check the journal for edid, so I'll send it here - https://0x0.st/XYsZ.txt

As for X11 config, I don't really know how to write it, so I came up with this

Section "Device"
        Identifier "Card0"
        Option "CustomEDID" "DFP-4:/usr/lib/firmware/edid/samsung.bin"
EndSection

And it got picked up by X11, but nothing changed.

Xorg log - https://0x0.st/XYsq.txt

Offline

#16 2024-09-24 12:49:28

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: Switching from one dedicated GPU to another

Update - it works perfectly fine if there's no output connected to 4060 during Arch Linux boot.

I then reconnect the HDMI output to 4060 and the VM behaves as expected.

Still, this is a workaround really. I don't want to move the table and reconnect outputs each time I reboot the PC.

Offline

#17 2024-09-24 14:53:13

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,165

Re: Switching from one dedicated GPU to another

Can you swap the card slots?
You can explcitily disable outputs, https://www.kernel.org/doc/Documentation/fb/modedb.rst (eg. "video=eDP-1:d") or influence which one gets to drive which console, https://raw.githubusercontent.com/torva … /fbcon.rst (eg. "fbcon=map:10") but I'm not sure how well any of this works for two GPUs using the same driver.

Also try to pass the vfio IDs on the kernel commandline and make sure to have teh vfio modules in the initramfs to draw away the cards as fast as possible, but if the bootloader is really affected, none of this will do anything, the behavior is likely dictated by the UEFI at this point hmm

Offline

Board footer

Powered by FluxBB