You are not logged in.

#1 2024-11-29 15:17:12

mine_diver
Member
Registered: 2024-03-18
Posts: 62

GPU started artifacting, system froze, and is now unable to start

Hello.

I have a 2 dGPU setup - RTX 4060 and GTX 970, using 4060 for renderer and 970 for connecting my old display via DVI-D. For some reason, I recently started having this issue from time to time, the 970 would start artifacting (4060 outputs just fine), the system would freeze soon after, and rebooting the system would result in 970 still artifacting and this error being shown on the 4060 output:

[  OK  ] Reached target Graphical Interface.
[  OK  ] Mounted /mnt/coldest.
[ 1244.666366] nvidia-modeset: ERROR: GPU:1: Idling display engine timed out: 0x0000957d:0:0:417

I’d think the 970 finally got cooked after years and years of service, but no, booting into Windows 11 and then back into Linux temporarily fixes the issue. Any advice? I have no idea how to fix this. Using proprietary nvidia drivers (because nvidia-open doesn’t support 970) latest version as of right now.

Offline

#2 2024-11-29 16:06:18

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: GPU started artifacting, system froze, and is now unable to start

This is one of those cases where I don't see the benefit of  the 970  any workloiad if you actually have access to the newer GPU. Get yourself an adapter cable for the old display protocol, hook it into the new card. Though DVI did have some issues in a few newer drivers, afaik these should be fixed by now/it wouldn't work at all if it was still that issue

Last edited by V1del (2024-11-29 16:08:37)

Offline

#3 2024-11-29 16:16:06

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

I also use 970 for virtual machines with passthrough (either to share my PC with a friend, or run an older OS which doesn't have the drivers for 4060), so I'd much rather prefer to keep it.
It had been working flawlessly since February of this year, when I installed Arch, up until 14th of November, when it first did this.
Plus, booting into Windows to fix the issue seems to imply that it's not even hardware related if it works just fine there and on Linux afterwards too.

Offline

#4 2024-11-29 17:04:56

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: GPU started artifacting, system froze, and is now unable to start

When did this start? After a certain kernel/nvidia driver update? I could see some hickup with the fbdev enablement/firmware issues if a reboot from Windows works so try the kernel parameters

nvidia_drm.modeset=1 nvidia_drm.fbdev=0 nvidia.NVreg_EnableGpuFirmware=0

Offline

#5 2024-11-29 17:12:28

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

I unfortunately do not remember, since it didn't start immediately after an update but at some point during runtime later.

I do not have fbdev and GPU firmware enabled, here are my kernel parameters:

rd.luks.name=7e8b3f21-2d15-46ec-8682-951972ede506=root root=/dev/mapper/root rw rootflags=subvol=/@ intel_iommu=on iommu=pt rd.driver.pre=vfio-pci nvidia_drm.modeset=1

But I guess I can try explicitly setting them to 0 and run with these for some time to see if the issue reappears.

Offline

#6 2024-11-29 17:14:12

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: GPU started artifacting, system froze, and is now unable to start

that's one thing I know that's changed in the driver in the last few releases, fbdev got enabled by default as well as the GSP firmware.

Offline

#7 2024-11-29 17:17:36

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

Well, that'd make all the sense, thank you. I'll test it.

Offline

#8 2024-12-02 20:19:55

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

After testing for a bit, disabling GSP firmware didn't help - artifacting still occurs, and disabling fbdev leads to a black sreen after SDDM. I guess I'm out of luck.

Offline

#9 2024-12-02 21:23:04

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,404

Re: GPU started artifacting, system froze, and is now unable to start

Disabling fbdev will leave you w/ the simplydumb device on 6.12 kernels, the nvidida_drm.modeset=1 hack was removed.
This has caused multiple issues already, https://gitlab.archlinux.org/archlinux/ … /issues/94
Try the behavior w/ the LTS kernel

Offline

#10 2024-12-03 08:37:40

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

I'm using UKIs, how would I go about safely adding the LTS kernel?

From what I can tell from the wiki, I need to install the package, navigate to /etc/mkinitcpio.d/linux-lts.preset, and edit it accordingly.
As for Nvidia drivers, I'd just replace nvidia with nvidia-dkms.

Is this correct or am I missing something? Is there a way to check for packages that depend on a specific kernel so I can replace them with dkms versions?

Offline

#11 2024-12-03 09:34:00

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: GPU started artifacting, system froze, and is now unable to start

Any package that ships a kernel module, doing a

pacman -Qo /usr/lib/modules

will give you a list of packages that write something into that path. FWIW you could also opt for nvidia-lts instead of nvidia-dkms for the LTS kernel (and note that all DKMS variants require the appropriate linux-headers (or linux-lts-headers) package to be present before DKMS can build a module)

Offline

#12 2024-12-03 17:44:46

mine_diver
Member
Registered: 2024-03-18
Posts: 62

Re: GPU started artifacting, system froze, and is now unable to start

Thank you everyone, managed to get LTS kernel working, was way easier than I thought.

Disabled both GSP and fbdev and logged into Plasma just fine:

[mine_diver@ABLPHA ~]$ sudo cat /sys/module/nvidia_drm/parameters/fbdev
N
[mine_diver@ABLPHA ~]$ cat /proc/driver/nvidia/params | grep EnableGpuFirmware
EnableGpuFirmware: 0
EnableGpuFirmwareLogs: 2

Time to see if it actually helps against the artifacts.

Offline

Board footer

Powered by FluxBB