You are not logged in.
I updated my NVIDIA drivers from 535.113.01 to 545.29.02, and now my PC freezes shortly after my desktop loads.
When KDE starts up, the splash screen's framerate changes erratically, sometimes freezing, sometimes slowing down and sometimes running at a normal framerate. Then as the desktop loads, lines of static appear on the screen, and then it crashes around when the panel appears.
I'm using a NVIDIA GeForce RTX 4090 and an AMD Ryzen 7 5800X3D.
I've tried using Ctrl+Alt+F3 to reinstall linux and linux-headers, but all that's done is make the monitors lose signal when I Ctrl+Alt+F3 after reboot.
Here's the output of `sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st` after reinstalling linux and linux-headers:
This line seems relevant, as this is where it differs from a successful pre-driver upgrade boot:
Nov 18 06:10:50 MetalGear (udev-worker)[471]: nvidia_modeset: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidia-modeset c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) 254'' failed with exit code 1.
Interestingly searching that line on google gets me no results.
The time on my clock widget when it froze was 06:12:34, so those lines may be relevant.
Does anyone know how to get my desktop working again without rolling back the drivers? I'm using timeshift and am currently running yesterday's btrfs image from grub, so I'm not without a usable system.
Offline
You can forget about the mknod message.
This bad:
Nov 18 06:11:29 MetalGear kernel: NVRM: Xid (PCI:0000:09:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000005 00000000 00000000 00000001 00000000
Nov 18 06:11:35 MetalGear kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=843, name=Xorg, Channel ID 00000008 intr 00008000
Nov 18 06:11:36 MetalGear kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=843, name=Xorg, Channel ID 00000008 intr 00008000
Nov 18 06:11:39 MetalGear kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=1129, name=plasmashell, Channel ID 00000016 intr0 00040000
Nov 18 06:11:39 MetalGear kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=1129, name=plasmashell, Channel ID 00000016 intr0 00040000
https://docs.nvidia.com/deploy/xid-errors/index.html
Xid 56 is "Display Engine error" what's either the HW or the driver, the "Invalid or corrupted push buffer stream" are likely just follow-up errors.
Try to downgrade to the 535 nvidia-dkms and nvidia-utils versions from the https://wiki.archlinux.org/title/Ala
https://wiki.archlinux.org/title/Dynami … le_Support (don't forget to install "linux-headers")
There're quite some problems reported w/ the 545xx drivers, so your HW might very much still be ok.
Offline
Thanks for pointing out where it's going wrong, but I think you skipped the last line of my post; I'm trying to get things working without downgrading the graphics drivers. I know my hardware is running fine, I've gone back to an old backup and have pacman configured to ignore all nvidia packages until I can figure out how to update them without breaking everything.
Offline
I think you skipped the gist of my post: If it's not a bug in the GPU driver your GPu is toast.
So try to downgrade the driver, if the system run stable you'll have to work with that until nvidia releases a fix.
Offline
Sorry, what do you mean by toast? Are you telling me that the GPU is dying?
Offline
No.
Xid 56 is "Display Engine error" what's either the HW or the driver
There're quite some problems reported w/ the 545xx drivers, so your HW might very much still be ok.
You *need* to downgrade the driver to ensure it's not the HW. If the GPU behaves w/ the 535xx drivers, it's likely just a bug in the 545xx drivers and you can (post at the nvidia forums to fix it and) hold up for an update.
If the problem remains w/ the 535xx drivers, we'll have to look a bit closer at the possibility of broken HW.
A path where you're not downgrading the driver will not lead to better understanding, let alone a resolution of the situation.
Offline
Ah, you mean downgrade temporarily for diagnostic purposes and not as a solution. I already did that by going back to the timeshift backup made before I upgraded.
Offline
timeshift would downgrade everything - though it helps to rule out HW issues, if you want to narrow down the driver issue to a driver regression (and not tied to the kernel) you'll have to downgrade that in isolation - and this will also allow to continue to update the system until the nvidia driver is fixed.
Offline