You are not logged in.

#1 2021-03-22 01:07:49

hnsl
Member
Registered: 2021-03-22
Posts: 3

Nvidia eGPU graphics broken after update

So I have an Acer Swift 5 with an eGPU (GeForce RTX 3080). This works fairly bad in Arch but I managed to get it somewhat working with a custom Xorg config that looks like this:

Section "Module"
        Load "modesetting"
EndSection

#Section "Device"
#       Identifier "intel"
#       Driver "modesetting"
#EndSection

Section "Device"
        Identifier "nvidia"
        Driver "nvidia"
        BusID "PCI:4:0:0"
        Option "AllowEmptyInitialConfiguration"
        Option "AllowExternalGpus"
        Option "Coolbits" "28"
EndSection

Section "ServerFlags"
        Option "AutoAddGPU" "off"
EndSection

You see that commented out section? That's what I have to comment in (and comment out the other device) and then reboot when I switch Arch from in desktop mode and only using my eGPU, to using my laptop as a laptop instead. (For reference, in windows you just disconnect the cable.) I've tried using bumblebee, but I had so many issues that I gave up and it just seemed like way too complicated. Bumblebee seems to be designed for GPU switching for power saving and not for plug and play eGPU although it could maybe work in theory if you where an expert at Xorg config. (I'm not, and I prefer if I could just delete my config and the system just doing the sensible things it should by default.)

Anyway, so I upgraded my system today (to fix an unrelated kernel segfault that happened to from time to time) and I was greeted with this wonderful Xorg core dump:

mar 22 01:33:32 keff /usr/lib/gdm-x-session[481]: Xorg: ../xserver/dix/privates.c:384: dixRegisterPrivateKey: Assertion `!global_keys[type].created' failed.
mar 22 01:33:32 keff /usr/lib/gdm-x-session[481]: (EE)
mar 22 01:33:32 keff /usr/lib/gdm-x-session[481]: (EE) Backtrace:
mar 22 01:33:32 keff /usr/lib/gdm-x-session[481]: (EE) 0: /usr/lib/Xorg (xorg_backtrace+0x53) [0x5567256c6f63]
mar 22 01:33:32 keff /usr/lib/gdm-x-session[481]: (EE) 1: /usr/lib/Xorg (0x556725580000+0x151da5) [0x5567256d1da5]

I tried downgrading packages to go back, but when I do that mkinitcpio cannot find the nvidia module anymore?

After adding Option "AutoAddGPU" "off" X11 seems to boot again using the actual eGPU, but the performance is horrible. There's latency when I type, screen refreshes seem to take forever. Plenty of tearing. And I can no longer use CUDA or any hardware acceleration. I'm actually not sure my eGPU is even used, it might just be running bypass to the external screen, using my internal GPU. Not sure how to check that though? I think it's not, because when it did that before it would just completely hang trying to support 4K with no video memory. But then it could have been using bypass from my internal intel rather than internal nvidia chip.

My lspci:

~> lspci | grep -E "VGA|3D"
0000:00:02.0 VGA compatible controller: Intel Corporation Iris Xe Graphics (rev 01)
0000:01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce MX350] (rev a1)
0000:04:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080] (rev a1)

Not sure if it's worth fixing. Thinking about going back to Windows now where graphics just works and is plug and play, exactly like you would expect. I feel like I'm back in 2004, having to change x11 config just to run my desktop environment. Suggestions welcome.

Last edited by hnsl (2021-03-23 20:39:40)

Offline

#2 2021-03-22 08:06:29

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,707

Re: Nvidia eGPU graphics broken after update

Try the LTS kernel and nvidia-lts wouldn't put it past a regression in the 5.11 kernel with regards to PRIME sync.

Last edited by V1del (2021-03-22 08:07:10)

Offline

#3 2021-03-22 22:24:32

hnsl
Member
Registered: 2021-03-22
Posts: 3

Re: Nvidia eGPU graphics broken after update

Good suggestion. I tried installing nvidia and kernel LTS but it only solved some minor problems like my laptop screen not turning off and all laptop input being active when I closed the lid.

So I tried downgrading my system completely to 2021-02-01 and this fixed most of the major issues and I can use CUDA again, and xorg does exactly what I expect again. However it broke my terminal emulator for some reason (?!). Sadly things are still slow and choppy as hell. GPU video acceleration no longer works for some reason (this worked before for sure).

Now I'm wondering if the nvidia update contained some firmware that poisoned my GPU because performance is also choppy in Windows as well. In arch is actually much better, I can move around desktop windows without any lag, but there's constant freezing when I type, and any 3d game render very slowly and looks choppy when when FPS is ok. Hilariously glxgears only gives me like 36 FPS on my GeForce RTX 3080 when I resize it to my full 4K screen. And even when games, etc say 30/60 FPS it looks FAR from it, on the actual screen it looks more like 5 FPS and like the GPU is really struggling to move the painted frames to the screen. Whenever I type in Chrome it's like my computer has to freeze for 100-400 ms sometimes to redraw the window.

I tried doing some crypto mining just to verify performance and this works with good MH rate (this was completely broken after the update) but it turns all windows into a literal slideshow. It looks exactly like images loading in shitty movies where the computer has to actually take a long time to draw the image from the top down. Maybe this is nvidia deploying poisoned firmware to break the performance of any GPU that detected to be used for  mining (like they said they would)?

Last edited by hnsl (2021-03-22 22:25:33)

Offline

#4 2021-03-23 20:38:25

hnsl
Member
Registered: 2021-03-22
Posts: 3

Re: Nvidia eGPU graphics broken after update

Okay so I fixed the problem. This was tricky because I got a number of issues stacking on top of each other.

First of all, like you said, there's probably a regression in the kernel related to PRIME sync, the revert solved that. For anyone that finds this thread, I basically put this in my /etc/pacman.conf so I can update my system to an exact date. I will continue to do this since this is apparently quite unstable, so I can stay at one apparent "working" date until a new date is stable.

[core]
SigLevel = PackageRequired
Server=https://archive.archlinux.org/repos/2021/02/15/$repo/os/$arch

[extra]
SigLevel = PackageRequired
Server=https://archive.archlinux.org/repos/2021/02/15/$repo/os/$arch

[community]
SigLevel = PackageRequired
Server=https://archive.archlinux.org/repos/2021/02/15/$repo/os/$arch

Then I had the fun problem that I was using software rendering in X11. I found out about this by running

glxinfo|grep '\<renderer\>'

which told me I was rendering with LLVM instead of GeForce. Everything made sense at that point. After looking in the logs I saw that X11 couldn't find the path to the nvidia glx server module, and after some googling I realized I had to put this in my X11 conf:

Section "Files"
    ModulePath "/usr/lib/nvidia/xorg"
    ModulePath "/usr/lib/xorg/modules"
EndSection

For some reason the upgrade made it so I had to have these paths explicitly in my config, and reverting didn't help. Having to have a custom X11 config so your eGPU works sure is a lot of fun! /s

And now my Arch setup actually works better than Windows IMHO, completely smooth desktop rendering, no lag or stutter at all.

Offline

Board footer

Powered by FluxBB