You are not logged in.

(Original title: Ditching NVIDIA?)
I just have to get something off my chest:
I'm getting fed up with NVIDIA. I've been using Linux for decades and nothing is more consistent than the constant mess with the NVIDIA drivers.
I have a narrow system with X11, DWM and a small compositor (picom). But I just can't put the computer to sleep and wake up again without black screens, without panics, without lock-ups. Once with DRM, once without, then with fbdev, then without. Once with Picom running, once without. I am unable to switch back and forth between the Linux console and X11 without getting black screens again. It is simply not possible to have a stable system.
I switched to nvidia-open. However, it's not much better than the closed source blob. Every driver update (no matter whether open or closed) brings new surprises and drives me crazy. The new open driver for example 565 forces me to turn off picom. I need sleep mode so many times a day, I can't do without it. But it's always a balancing act. Why the hell is it so hard with these drivers? If I have switched off picom and I switch back and forth between two tags in DWM, I have briefly flickering checkerboards. I wanted to record these with OBS. These checkerboards are not on the video.
I have always supported NVIDIA. There is currently an NVIDIA GeForce RTX 4070 in my computer. But I'm slowly getting to the point where I'm ditching NVIDIA forever. I'm reading along quietly because other users here have similar problems. The poor Arch devs and maintainers also struggle with these drivers. I really feel sorry for you. And great respect for your tireless efforts.
I went back to 560.35.03 and pinned it, without fbdev and with simpledrm running. This is the most stable combination at the moment, despite the error messages in dmesg every time my computer wakes up.
Will I possibly have peace of mind if I switch to AMD? Is that possibly the solution? Or are there problems in other Couleurs too?
Sorry guys, but that just had to come out. Love you.
freanux
Last edited by freanux (2024-11-01 20:03:27)
Offline
I was always an AMD fanboy - mostly due to the first PC my family owned was sn AMD K6 and the first GPU I bought from my own money was an ATI on. Over the years I just stuck to it.
It's a bit of "I do know better - but experience tells me otherwise": Same as with AMD I mostly used Seagate HDDs, only - first by just chance cause the systems came with them, later because it was the brand I bought. Although I know better I always had some issues with drives from WD while those from Seagate were fine. Recently multiple drives failed and I had to replace them (thanks to ZFS without data loss). This time I chose the only competitor remain to this day: Toshiba. Don't get me wrong, I also had picked WDs but they were just too expensive. Over the past few weeks: They have nice high performance - but they're very loud: It's like back in the 90s again when you hear every file access.
I switched over to Linux quite some time ago and ended up on Arch because a friend used it at this time. And aside from one specific game I have no issues, all works as on Windows. But that doesn't mean AMD has its own share of issues:
- Ryzen CPUs are known for random hangs and crashes
- the amdgpu driver had some issues back in jan/feb this year where it only worked on first cold boot but had issues on warm reboots
- compared to a system using an Intel CPU and an nVidia GPU the performance in games can be as low as just half
- lately AMD got quite aggressive about dropping support for older GPUs: most recent drivers only properly support RDNA, that is RX5000 and on - GCN, or Vega and older, got dropped and is only supported by legacy drivers
TL;DR: Although AMD seems to be less hassle it comes with its own pitfalls. So choosing between AMD, Intel and nVidia highly depends on your specific use case.
Last edited by cryptearth (2024-11-01 10:09:56)
Online

- lately AMD got quite aggressive about dropping support for older GPUs: most recent drivers only properly support RDNA, that is RX5000 and on - GCN, or Vega and older, got dropped and is only supported by legacy drivers
cryptearch, the legacy statement is way to generic, please qualify it.
Amd has dropped support for older hw from projects for which they are the maintainer like amdvlk and ROCm .
Support for older amd hw in kernel, mesa and other projects where amd cooperates with others has NOT been dropped.
freanux, it depends on your needs whether ditching nvidia is a smart move.
Is there any specific software you use that performs better with an nvidia card (blender is one of them) ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
 Try clean chroot manager by graysky
Offline

No, not really. I'm a developer, I mainly have an editor and a terminal open. I don't even have two screens, but I have an ultrawide screen. I also often need Inkscape, LibreOffice, GIMP, Mail, Calendar, etcetera. These are all software that you use for everyday use. But I often have to leave my workplace, which is why I put the computer to sleep.
And every now and then I play games, which is why Steam is installed. Thanks to Proton, playing on Linux is no longer a problem. That's why I treated myself to an RTX 4070.
Ergo, I use Linux for everything, up to 12 hours a day, business and personal, the same computer. Actually everything works perfectly (except for the problem mentioned). In the past, you were well served if you bought supported hardware. This is no longer so acutely the case today.
Offline

https://bbs.archlinux.org/viewtopic.php … 6#p2205806
And can we see a journal covering the issues?
Also try https://aur.archlinux.org/packages/nvidia-535xx-dkms - the nvidia issues started w/ the 545xx drivers, got worse w/ the 55yxx ones and 56yxx brought fbdev fun, notably since the entire framebuffer system in 6.11 has issues (hence also test the LTS behavior) - there're various threads reg. amdgpu stalls, display freezes and GTT/GART leaks.
The gras isn't only green on the other side of the fence either 
Online

Okay. I have reinstalled the current drivers:
$ pacman -Q nvidia-open nvidia-utils
nvidia-open 565.57.01-1
nvidia-utils 565.57.01-1modeset is on, fbdev is off. :
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux root=UUID=0fb8d6c4-3661-472e-a265-7e1e7bb52c4e rw loglevel=3 quiet nvidia_drm.modeset=1 nvidia_drm.fbdev=0 nvidia.NVreg_EnableGpuFirmware=0NVIDIA modules are preloaded in mkinitcpio.conf:
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)Unfortunately no success. After the computer wakes up the screen is black. I have to kill picom, then the picture comes. Or I can put the PC to sleep without picom, that would work too.
And can we see a journal covering the issues?
Also try https://aur.archlinux.org/packages/nvidia-535xx-dkms - the nvidia issues started w/ the 545xx drivers, got worse w/ the 55yxx ones and 56yxx brought fbdev fun, notably since the entire framebuffer system in 6.11 has issues (hence also test the LTS behavior) - there're various threads reg. amdgpu stalls, display freezes and GTT/GART leaks.
If I saw a silver lining, I wouldn't care about this difficult journey. But this moldy cheese will soon be maturing for 20 years.
------------------
EDIT: I just returned to version 560.35.03-19:
$ pacman -Q nvidia-open nvidia-utils
nvidia-open 560.35.03-19
nvidia-utils 560.35.03-19and it works here, going to sleep and waking up again works. With picom running:
http://0x0.st/XGmQ.txt
Something is wrong with the new driver and I have no idea what.
Last edited by freanux (2024-11-01 17:22:47)
Offline

There're just a bunch of warnings triggered by nvidia-sleep.sh what's supposed to prevent
computer wakes up the screen is black. I have to kill picom, then the picture comes
Other than the 535xx driver, what if you completely disable https://wiki.archlinux.org/title/NVIDIA … er_suspend
Online

I have disabled all nvidia-*.service units. Now the machine no longer switches off, but simply continues to run. However, I can't type anything anymore. The last 6 lines in the log are because I made an SSH connection from the laptop:
https://0x0.st/XGao.txt
Offline

"nvidia.NVreg_PreserveVideoMemoryAllocations=0", https://wiki.archlinux.org/title/Kernel_parameters
Edit, because:
Nov 01 18:49:41 atlantis kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5Last edited by seth (2024-11-01 19:16:28)
Online

"nvidia.NVreg_PreserveVideoMemoryAllocations=0"
That did the trick. I simply overlooked this parameter. I put my machine to sleep three times in a row, with success. Picture there. everything there. Seth, you're a Jack of all trades. Thank you.
Here is my log output:
http://0x0.st/XGaD.txt
--------
EDIT: Should I really mark this thread as solved? That would mean that I had actually thrown out NVIDIA. I'll probably adjust the topic.
--------
EDIT2:
Nov 01 18:49:41 atlantis kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README. Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5 Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5 Nov 01 18:49:41 atlantis kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
/tmp is not yet available in the initramfs, hence this error.
Last edited by freanux (2024-11-01 19:51:14)
Offline

You were getting the warning because the module parameter was active but the services missing. The initramfs is only relevant for S4/hibernation/suspend-to-disk
If you want, you could try to re-enable the preservation in general, but keep the resume.service disabled and also steer away the cache from /tmp (if you're short on RAM but use lots of VRAM this would become an issue)
It actually conflicts w/ S4 (hibernation) but if it works might help preventing texture errors on longer sleep cycles (the underlying problem is VRAM decay because unlike the RAM, the VRAM isn't refreshed during the sleep)
Online

preservation re-enabled:
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux root=UUID=0fb8d6c4-3661-472e-a265-7e1e7bb52c4e rw loglevel=3 quiet nvidia_drm.modeset=1 nvidia_drm.fbdev=0 nvidia.NVreg_EnableGpuFirmware=0and resume.service disabled:
$ systemctl status nvidia-resume.service
○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; disabled; preset: disabled)
     Active: inactive (dead)But unfortunately no success, black screen, had to kill picom to have a working screen:
http://0x0.st/XGBY.txt
Offline

You're still getting the kernel warnings from the sleep hook.
My gut feeling is that this is probably related to GDM and the second X11 server, if you don't care you could test the behavior w/ a startx session (you don't rely on any gnome stuff anyway?)
At 132GB RAM you're probably not short on that…
Online
nvidia has been nothing but absolute misery for me the last 2 years. the last driver version i remember having the least issues with was 530, then 545 came along and for half a year i couldn't play ANY xwayland game because of horrible flickering, then 550/555 fixed most of those issues, but introduced even more problems like unreal engine crashes, horrible kde performance and random freezes. there's many many more issues that are still left unsolved like opengl graphical corruptions, bad directx12 performance, VRR, and many more and that is why every time someone asks whether they should choose nvidia or amd i always tell them my experience with nvidia, except people think i'm exaggerating and the classic "works on my end". it's an absolute shitshow that never ends, god how i hate nvidia.
Last edited by zbik (2024-11-02 12:11:01)

You're still getting the kernel warnings from the sleep hook.
My gut feeling is that this is probably related to GDM and the second X11 server, if you don't care you could test the behavior w/ a startx session
Okay, here we go: gdm disabled and restarted my system. Logged in, startx, sleep, wake-up, everything is fine. Here is my log:
http://0x0.st/XG__.txt
(you don't rely on any gnome stuff anyway?)
Yes, I would like to throw out GNOME completely. The only application I still need from GNOME is the gnome-calendar. It's really good, I can't do without it at the moment. And there really are no good alternatives. I need to see what packages gnome-calendar pulls. Maybe I can ditch gdm and gnome-shell.
At 132GB RAM you're probably not short on that…
Everybody needs a little pampering.
Offline

[SNIP] ... and that is why every time someone asks whether they should choose nvidia or amd i always tell them my experience with nvidia, except people think i'm exaggerating and the classic "works on my end". it's an absolute shitshow that never ends, god how i hate nvidia.
As discussed at the beginning of this thread, switching to AMD doesn't really seem to be any better. but I can't judge it. Basically, I'm happy with NVIDIA's graphics cards and actually still hope that the ridiculous business with drivers will stop.
Offline
After downgrade system can't start...
I've been with Arch for 20 years, and there were never any problems. But now I just don't have the strength to keep struggling with these drivers. This issue has been going on for months. Maybe it's time to switch distributions.
Last edited by gerwazy (2024-11-02 19:25:24)
Offline

Hello,
what is solution? Downgrade the drivers 560.35.03-19?
That was the solution to my problem:
https://bbs.archlinux.org/viewtopic.php … 7#p2205987
I don't know if it is suitable for yours. It's best to open a new thread and ask there.
Offline

After downgrade system can't start...
I've been with Arch for 20 years, and there were never any problems. But now I just don't have the strength to keep struggling with these drivers. This issue has been going on for months. Maybe it's time to switch distributions.
I hardly believe that another distribution solves these NVIDIA problems. Did you rebuild the initramfs?
# mkinitcpio -POffline
Everything is working now, thank you, @freanux. It worked when I changed the settings from
"nvidia.NVreg_PreserveVideoMemoryAllocations=1"
to 
"nvidia.NVreg_PreserveVideoMemoryAllocations=0"
Offline

Logged in, startx, sleep, wake-up, everything is fine. Here is my log:
The nvidia-sleep.sh warnings are still there, but
Nov 02 18:44:40 atlantis betterlockscreen[1525]: xset:  unable to open display ":1"Sure it's not just because you're now running on :0 and betterlockscreen "fails"?
=> Is betterlockscreen the cause of the problems?
Online

Nov 02 18:44:40 atlantis betterlockscreen[1525]: xset: unable to open display ":1"
I noticed that too. If I start the session with GDM, I end up in the screen :1, without (startx) in the :0. I don't get it yet, why xset always wants screen :1. I'm investigating.
Offline

That's normal (for GDM)
GDM starts an X11 server for itself (:0) and one for the session (:1)
Most (all?) other DMs just hand over the existing :0 server and of course when using startx, there's nothing that could take another X11 server anyway - so your session runs on :0 there as well.
Online

Ah, I remember: I installed betterlockscreen from AUR. The included service template is specified with DISPLAY=:0. I've adapted this to my needs:
$ cat /usr/lib/systemd/system/betterlockscreen@.service
[Unit]
Description=Lock screen when going to sleep/suspend
Before=sleep.target
Before=suspend.target
[Service]
User=%I
Type=simple
Environment=DISPLAY=:1
ExecStart=/usr/bin/betterlockscreen --lock blur
TimeoutSec=infinity
[Install]
WantedBy=sleep.target
WantedBy=suspend.targetTime to ditch betterlockscreen and install xsecurelock.
Offline

I dunno man.
I run a simple TV box, no hibernation, no sleep, just want the box to run.
The 565 drivers are jank garbage, Kodi screws up playback, games mess up. Just terrible.
Huge issues with it. So I'd recommend rolling back to an earlier version.
I'd go back to 560 but there's no dkms version in the AUR yet, so I've gone back (as I saw seth suggest in another thread or earlier post) to 535 dkms.
Happy days.
P.S. 550 is better.
Last edited by pezz (2024-11-03 09:09:46)
Offline