You are not logged in.
Hi,
I'm having a problem with running Plasma Wayland session with proprietary Nvidia drivers. Session uses llvmpipe instead of nvidia drivers (line "Graphics Processor: llvmpipe" in Info Center) and is super glitchy to the point of being unusable. Here's neofetch info from a wayland session:
-`
.o+`
`ooo/
`+oooo:
`+oooooo:
-+oooooo+:
`/:-:++oooo+:
`/++++/+++++++:
`/++++++++++++++:
`/+++ooooooooooooo/`
./ooosssso++osssssso+`
.oossssso-````/ossssss+`
-osssssso. :ssssssso.
:osssssss/ osssso+++.
/ossssssss/ +ssssooo/-
`/ossssso+/:- -:/+osssso+-
`+sso+:-` `.-/+oso:
`++:. `-/+/
.` `/
-------------
OS: Arch Linux x86_64
Host: Alienware m16 R2
Kernel: Linux 6.11.1-zen1-1-zen
Uptime: 35 mins
Packages: 1890 (pacman)
Shell: bash 5.2.37
Display (BOE0C06): 2560x1600 @ 240 Hz (as 2048x1280) in 16″ [Built-in]
Display (27E1QA): 2560x1440 @ 144 Hz in 27″ [External] *
DE: KDE Plasma 6.1.5
WM: KWin (Wayland)
WM Theme: Breeze
Theme: Breeze (Dark) [Qt], Breeze-Dark [GTK2], Breeze [GTK3]
Icons: breeze-dark [Qt], breeze-dark [GTK2/3/4]
Font: Noto Sans (12pt) [Qt], Noto Sans (12pt) [GTK2/3/4]
Cursor: breeze (24px)
Terminal: konsole 24.8.1
Terminal Font: Hack (14pt)
CPU: Intel(R) Core(TM) Ultra 9 185H (22) @ 5.10 GHz
GPU 1: NVIDIA GeForce RTX 4070 Max-Q / Mobile
GPU 2: NVIDIA GeForce RTX 4070 Ti
Memory: 6.82 GiB / 93.81 GiB (7%)
Swap: 886.75 MiB / 1024.00 MiB (87%)
Disk (/): 497.90 GiB / 938.61 GiB (53%) - ext4
Local IP (enp45s0): 192.168.15.167/24
Battery (DELL 1969N476): 100% [AC Connected]
Locale: en_US.UTF-8
Here's a boot log from a run when a Wayland session was started: http://0x0.st/XgC5.txt
I have DRM kernel mode setting enabled:
# cat /sys/module/nvidia_drm/parameters/modeset
Y
And 'fbdev=1' too:
$ cat /etc/modprobe.d/nvidia.conf
options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/var/tmp
options nvidia_drm modeset=1 fbdev=1
I don't use nvidia modules early loading, since for some unclear reason this leads to a system freeze on an early stage when eGPU is connected on boot, with even SysRq combinations ignored.
X11 sessions work fine, but I'm missing some features that Wayland has, namely per display scaling. I know about xrandr command, but scaling with it doesn't work well for me.
Thank you!
Last edited by alllexx88 (2024-10-08 07:12:46)
Offline
Oct 03 15:51:24 AlexArch kernel: NVRM: The NVIDIA GPU 0000:04:00.0
NVRM: (PCI ID: 10de:2782) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
Oct 03 15:51:24 AlexArch kernel: nvidia 0000:04:00.0: probe with driver nvidia failed with error -1
Oct 03 15:51:24 AlexArch kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Oct 03 15:57:31 AlexArch systemd-coredump[6454]: Process 1453 (kwin_x11) of user 1000 dumped core.
Stack trace of thread 1702:
#0 0x00007dc6a68a53f4 n/a (libc.so.6 + 0x963f4)
#1 0x00007dc6a684c120 raise (libc.so.6 + 0x3d120)
#2 0x00007dc6aa0372a1 _ZN6KCrash19defaultCrashHandlerEi (libKF6Crash.so.6 + 0x62a1)
#3 0x00007dc6a684c1d0 n/a (libc.so.6 + 0x3d1d0)
#4 0x00007dc6a6f952b9 _ZN7QObject11deleteLaterEv (libQt6Core.so.6 + 0x1952b9)
#5 0x00007dc6a883a346 _ZN9KDirWatchD1Ev (libKF6CoreAddons.so.6 + 0x5a346)
#6 0x00007dc6a8834fbe n/a (libKF6CoreAddons.so.6 + 0x54fbe)
#7 0x00007dc6a684e891 n/a (libc.so.6 + 0x3f891)
#8 0x00007dc6a684e95e exit (libc.so.6 + 0x3f95e)
#9 0x00007dc6a88fa8fb _XDefaultIOError (libX11.so.6 + 0x3b8fb)
#10 0x00007dc6a88fd99c _XIOError (libX11.so.6 + 0x3e99c)
#11 0x00007dc6a8903af8 _XReply (libX11.so.6 + 0x44af8)
#12 0x00007dc6942dd408 n/a (libGLX_nvidia.so.0 + 0xa6408)
#13 0x00007dc69428e41a n/a (libGLX_nvidia.so.0 + 0x5741a)
#14 0x00007dc6942db5d1 n/a (libGLX_nvidia.so.0 + 0xa45d1)
#15 0x00007dc6942db9d8 n/a (libGLX_nvidia.so.0 + 0xa49d8)
X11 sessions work fine
Hardly.
https://gitlab.archlinux.org/archlinux/ … -/issues/9 ?
Offline
Thank you for your reply seth!
Issue https://gitlab.archlinux.org/archlinux/ … -/issues/9 looks unrelated. First, I've been having the same problem before kernel upgrade to 6.11, and second, I already had fbdev and modeset enabled since long ago.
I indeed have these "NVRM: The NVIDIA GPU 0000:04:00.0" error lines on every boot when egpu is connected, but still both output from egpu to the external monitor, and CUDA work fine. As for "Process 1453 (kwin_x11) of user 1000 dumped core.", this doesn't usually happen, I checked the log from the last boot yesterday http://0x0.st/XEX6.txt and the boot log ATM http://0x0.st/XEXI.txt
Maybe I did something unusual, I don't remember. I had issues on X11 too, running into freezes on shutdown/reboot/logout, but they seem to have gone away after kernel upgrade to 6.11. I also run into glitches when scaling internal display with xrandr, so I don't do that (and it's not as bad as kwin_x11 crashing).
UPD: to get a clearer log, I rebooted and logged in into wayland session from SDDM directly (the last time I logged in to X11, logged out, and logged into Wayland) and dumped the boot log from the session: http://0x0.st/XE84.txt
UPD2: some more logs: 1) disabled sddm, launched plasma wayland session from tty with "/usr/lib/plasma-dbus-run-session-if-needed /usr/bin/startplasma-wayland": http://0x0.st/XE8E.txt (the system was even more laggy, with cursor glitching and leaving "footsteps", and external monitor wasn't detected; couldn't get to system info to see what graphics processor it reports). 2) launched sddm on wayland and plasma wayland session via it: http://0x0.st/XE86.txt (the symptoms are pretty much the same as when running from sddm on xorg, just sddm self was also lagging)
Last edited by alllexx88 (2024-10-04 10:36:02)
Offline
Oct 04 12:11:22 AlexArch sddm-helper[1337]: Starting Wayland user session: "/usr/share/sddm/scripts/wayland-session" "/usr/lib/plasma-dbus-run-session-if-needed /usr/bin/startplasma-wayland"
Oct 04 12:11:23 AlexArch kwin_wayland[1392]: No backend specified, automatically choosing drm
Oct 04 12:11:23 AlexArch kwin_wayland[1392]: kwin_scene_opengl: eglInitialize failed
Oct 04 12:11:23 AlexArch kwin_wayland[1392]: kwin_scene_opengl: Error during eglInitialize 12289
Oct 04 12:11:23 AlexArch kwin_wayland[1392]: kwin_scene_opengl: Creating the OpenGL rendering failed: "Could not initialize egl"
Oct 04 12:11:26 AlexArch kwin_wayland_wrapper[1494]: Xwayland glamor: GBM Wayland interfaces not available
Oct 04 12:11:26 AlexArch kwin_wayland_wrapper[1494]: Failed to initialize glamor, falling back to sw
Enable https://wiki.archlinux.org/title/NVIDIA … de_setting - use the "nvidia_drm.modeset=1" kernel parameter (modprobe.conf won't do!) and also nvidia_drm.fbdev=1 because of the 6.11 situation.
EDIT: STAY AWAY FROM FBDEV FOR THE MOMENT! First try "nvidia_drm.modeset=1" only!
https://bbs.archlinux.org/viewtopic.php … 7#p2200177
Also
lsm=landlock,lockdown,yama,integrity,apparmor,bpf
skip apparmor (the entire line) for the moment.
Last edited by seth (2024-10-04 12:15:04)
Offline
Thank you
Unfortunately this didn't help. I removed the modprobe.d config, removed "lsm=landlock,lockdown,yama,integrity,apparmor,bpf" option and added "nvidia_drm.modeset=1": http://0x0.st/XEPH.txt From the symptoms point of view, the system got even more laggy, and still Info Center says "Graphics Processor: llvmpipe". With also "nvidia_drm.fbdev=1" set it's a little bit less laggy (as it was before) and the same llvmpipe in Info Center: http://0x0.st/XEPP.txt
And I upgraded nvidia-dkms to 560.35.03-5 before these experiments.
Offline
I don't think anything about the situation will have changed.
Oct 04 15:14:54 AlexArch kwin_wayland[1357]: kwin_scene_opengl: eglInitialize failed
pacman -Qikk nvidia-utils
pacman -Qs nvidia
LD_DEBUG=libs eglinfo -B
Edit: no don't, was pointless idea
Last edited by seth (2024-10-04 13:52:13)
Offline
Thanks, the first 2 commands:
$ pacman -Qikk nvidia-utils
Name : nvidia-utils
Version : 560.35.03-5
Description : NVIDIA drivers utilities
Architecture : x86_64
URL : http://www.nvidia.com/
Licenses : custom
Groups : None
Provides : vulkan-driver opengl-driver nvidia-libgl
Depends On : libglvnd egl-wayland egl-gbm
Optional Deps : nvidia-settings: configuration tool [installed]
xorg-server: Xorg support [installed]
xorg-server-devel: nvidia-xconfig [installed]
opencl-nvidia: OpenCL support [installed]
Required By : cuda furmark libglvnd nvidia-dkms nvidia-settings wlroots-git
Optional For : ffmpeg ffmpeg4.4 furmark libvdpau vulkan-icd-loader
Conflicts With : nvidia-libgl
Replaces : nvidia-libgl
Installed Size : 652.24 MiB
Packager : Sven-Hendrik Haase <svenstaro@archlinux.org>
Build Date : Thu Oct 3 06:30:41 2024
Install Date : Fri Oct 4 13:21:30 2024
Install Reason : Explicitly installed
Install Script : Yes
Validated By : Signature
nvidia-utils: 251 total files, 0 altered files
$ pacman -Qs nvidia
local/cuda 11.8.0-1
NVIDIA's GPU programming toolkit
local/cudnn 8.7.0.84-1
NVIDIA CUDA Deep Neural Network library
local/egl-gbm 1.1.2-1
The GBM EGL external platform library
local/egl-wayland 4:1.1.16-1
EGLStream-based Wayland external platform
local/libvdpau 1.5-3
Nvidia VDPAU library
local/libxnvctrl 560.35.03-1
NVIDIA NV-CONTROL X extension
local/nvidia-dkms 560.35.03-5
NVIDIA drivers - module sources
local/nvidia-settings 560.35.03-1
Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 560.35.03-5
NVIDIA drivers utilities
local/nvtop 3.1.0-1
GPUs process monitoring for AMD, Intel and NVIDIA
local/opencl-nvidia 560.35.03-5
OpenCL implemention for NVIDIA
The third command on Wayland: http://0x0.st/XEPW.txt and on X11: http://0x0.st/XEPV.txt
btw X11 session works fine, and it did, I'm using it for while Wayland session glitches.
Offline
11023: /usr/lib/libnvidia-eglcore.so.560.35.03: error: symbol lookup error: undefined symbol: ErrorF (fatal)
11023: /usr/lib/libnvidia-eglcore.so.560.35.03: error: symbol lookup error: undefined symbol: __malloc_hook (fatal)
11023: /usr/lib/libnvidia-eglcore.so.560.35.03: error: symbol lookup error: undefined symbol: __realloc_hook (fatal)
11023: /usr/lib/libnvidia-eglcore.so.560.35.03: error: symbol lookup error: undefined symbol: __free_hook (fatal)
11023: /usr/lib/libnvidia-eglcore.so.560.35.03: error: symbol lookup error: undefined symbol: __memalign_hook (fatal)
11023: /usr/lib/libnvidia-glcore.so.560.35.03: error: symbol lookup error: undefined symbol: __malloc_hook (fatal)
11023: /usr/lib/libnvidia-glcore.so.560.35.03: error: symbol lookup error: undefined symbol: __realloc_hook (fatal)
11023: /usr/lib/libnvidia-glcore.so.560.35.03: error: symbol lookup error: undefined symbol: __free_hook (fatal)
11023: /usr/lib/libnvidia-glcore.so.560.35.03: error: symbol lookup error: undefined symbol: __memalign_hook (fatal)
11023: /usr/lib/libnvidia-glcore.so.560.35.03: error: symbol lookup error: undefined symbol: ErrorF (fatal)
ldd /usr/lib/libnvidia-eglcore.so.560.35.03
I doubt that X11 "works fine", what's the output of "glxinfo -B"?
Offline
Those "undefined symbol" errors look bad. Since I don't do anything graphics-related on Arch (for that I dual-boot to windows), X11 "works fine" for me if DE doesn't glitch or lag and there's output to both external and internal displays. Now it's only annoying that the image on the internal display is too tiny (16" and 2560x1600 is the only option, no per display scaling).
$ ldd /usr/lib/libnvidia-eglcore.so.560.35.03
linux-vdso.so.1 (0x00007d925b001000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007d925aec7000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007d9258e0f000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007d925b003000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007d925aec2000)
libnvidia-glsi.so.560.35.03 => /usr/lib/libnvidia-glsi.so.560.35.03 (0x00007d9258d75000)
libnvidia-gpucomp.so.560.35.03 => /usr/lib/libnvidia-gpucomp.so.560.35.03 (0x00007d9256400000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007d925aebb000)
librt.so.1 => /usr/lib/librt.so.1 (0x00007d925aeb6000)
$ glxinfo -B
name of display: :0
display: :0 screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 12282 MB
Total available memory: 12282 MB
Currently available dedicated video memory: 10590 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce RTX 4070 Ti/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 560.35.03
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6.0 NVIDIA 560.35.03
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 560.35.03
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Thanks
Offline
Those hooks were long deprecated and are gone out of recent glibc versions, but it doesn't look like anything unexpected gets linked in - and the glxinfo output is fine.
echo $__EGL_VENDOR_LIBRARY_FILENAMES
What if you simply globally export
__GLX_VENDOR_LIBRARY_NAME=nvidia
__EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_nvidia.json
in some /etc/profile.d/* script?
Offline
Thanks seth
"__EGL_VENDOR_LIBRARY_FILENAMES" was unset, I set it with /etc/profile.d/egl.sh:
export __GLX_VENDOR_LIBRARY_NAME=nvidia
export __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json
(yes, the file name on my system is /usr/share/glvnd/egl_vendor.d/10_nvidia.json, not /usr/share/glvnd/egl_vendor.d/50_nvidia.json)
Things have definately changed: http://0x0.st/XEx-.txt
Now I get "Could not initialize egl" due to
kwin_wayland_drm: "EGL_KHR_platform_gbm" client extension is not supported by the platform
So I googled a bit, and found out that EGL_KHR_platform_gbm should be provided by "egl-gbm" package on Arch. And then I found out that somehow /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json file which is part of "egl-gbm" was missing. So I reinstalled "egl-gbm", and the error went away. I get "NVIDIA GeForce RTX 4070 Ti/PCIe/SSE2" (my egpu) recognized as the graphics processor and wayland session on the external screen is perfectly fine. The image on the internal display is broken, like this:
https://photos.app.goo.gl/inHWMHnJoLkzzVfe7
This is probably related to the oneshot service I run on boot:
[Unit]
Description=Reload nvidia_drm kernel module
Before=display-manager.service
After=bolt.service
[Service]
Type=oneshot
ExecStart=/opt/bin/re_drm.sh
[Install]
WantedBy=graphical.target
/opt/bin/re_drm.sh script is:
#!/bin/sh
RETRY_INTERVAL=0.1
MAX_RETRIES=30
n=0
modprobe -r nvidia_drm && modprobe nvidia_drm
while [ "$?" != 0 ] && [ "$n" -lt "${MAX_RETRIES}" ]; do
sleep ${RETRY_INTERVAL}
n=$(expr $n + 1)
modprobe -r nvidia_drm && modprobe nvidia_drm
done
Without this script, when nvidia_drm module is loaded (due to nvidia dgpu), the egpu is not yet connected to the system, and the drm device doesn't get created for it, without the script:
$ sudo find /sys/devices -name drm
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm
With the script it gets created:
$ sudo find /sys/devices -name drm
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm
/sys/devices/pci0000:00/0000:00:07.0/0000:02:00.0/0000:03:01.0/0000:04:00.0/drm
And this workaround works for X11, but on Wayland I get broken internal screen as on the photo above. If I don't use it, I get image on the internal display only, but it's not broken.
This may be a separate issue. Should I mark this one as solved and create a new one?
P.S. Here's a boot journal with the original egl issue resolved: http://0x0.st/XE3Z.txt. Maybe you could spot anything possibly related to the broken image
UPD: marking this topic as resolved. The fix for the problem I had was to reinstall "egl-gbm" package
Last edited by alllexx88 (2024-10-08 07:12:29)
Offline
Oct 07 17:55:39 AlexArch kernel: WARNING: CPU: 4 PID: 594 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110
Oct 07 17:55:39 AlexArch kernel: WARNING: CPU: 4 PID: 594 at lib/refcount.c:22 refcount_warn_saturate+0x55/0x110
Oct 07 17:55:39 AlexArch kernel: WARNING: CPU: 4 PID: 594 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
Oct 07 17:55:39 AlexArch kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000400] Failed to register device
…
Oct 07 17:55:39 AlexArch egpu-switcher[895]: [info] looking for eGPU...
Oct 07 17:55:40 AlexArch egpu-switcher[895]: [info] the egpu is connected
Wrt activating the eGPU, simply rescanning the bus doesn't do?
echo 1 | sudo tee /sys/bus/pci/rescan
Offline
Thank you seth
I tried rescan, it doesn't help. I found that adding "MODULES=(nvidia nvidia_drm nvidia_uvm nvidia_modeset)" to /etc/mkinitcpio.conf gets "[drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver" fine, but produces later errors: https://bbs.archlinux.org/viewtopic.php?id=300049, which are more informative, but I haven't been able to google up a solution.
I also had problems with the same eGPU on a different laptop. I don't know if it's the GPU, or the enclosure, since I don't have means to test them separately.
Offline