You are not logged in.

#1 2023-10-06 12:53:46

AnakTeka
Member
Registered: 2015-11-20
Posts: 7

[SOLVED] GNOME-shell crash on amdgpu kernel 6.5.5

Hi

I'm on Ryzen 7 7700x with iGPU and also a RTX 2080Ti, the iGPU is plugged to a monitor but the 2080Ti is not plugged anywhere (I'm planning to use the CUDA), the GDM hangs everytime I click the username on the login screen. I tried the linux-lts and got the same problem.

this is the log http://0x0.st/HWgq.txt

it crashes with

Oct 06 19:34:16 rumah-arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=113, emitted seq=114
Oct 06 19:34:16 rumah-arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 891 thread gnome-shel:cs0 pid 905
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: MODE2 reset
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 06 19:34:16 rumah-arch kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).
Oct 06 19:34:16 rumah-arch kernel: [drm] PSP is resuming...
Oct 06 19:34:16 rumah-arch kernel: [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: RAP: optional rap ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resuming...
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000004, smu fw if version = 0x00000005, smu fw program = 0, smu fw version = 0x00544fdd (84.79.221)
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resumed successfully!
Oct 06 19:34:16 rumah-arch kernel: [drm] DMUB hardware initialized: version=0x05000C00
Oct 06 19:34:16 rumah-arch kernel: [drm] kiq ring mec 2 pipe 1 q 0
Oct 06 19:34:16 rumah-arch kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Oct 06 19:34:16 rumah-arch kernel: [drm] JPEG decode initialized successfully.
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow start
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow done
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) succeeded!

let me know if further log is needed, thanks a lot

Last edited by AnakTeka (2023-11-19 09:03:10)

Offline

#2 2023-10-24 09:29:33

AnakTeka
Member
Registered: 2015-11-20
Posts: 7

Re: [SOLVED] GNOME-shell crash on amdgpu kernel 6.5.5

This is kinda fixed, I need to add this as kernel parameter

amdgpu.ppfeaturemask=0xfffd3fff

and also set the iGPU to some "low" mode

First, find the address of my AMD iGPU (card2 on my case)

[root@arch rules.d]# ls /sys/class/drm/ -l
total 0
lrwxrwxrwx 1 root root    0 Oct 24 16:13 card1 -> ../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/card1
lrwxrwxrwx 1 root root    0 Oct 24 16:13 card2 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2
lrwxrwxrwx 1 root root    0 Oct 24 16:13 card2-DP-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-DP-1
lrwxrwxrwx 1 root root    0 Oct 24 16:13 card2-DP-2 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-DP-2
lrwxrwxrwx 1 root root    0 Oct 24 16:13 card2-HDMI-A-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-HDMI-A-1
lrwxrwxrwx 1 root root    0 Oct 24 16:13 renderD128 -> ../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/renderD128
lrwxrwxrwx 1 root root    0 Oct 24 16:13 renderD129 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/renderD129
-r--r--r-- 1 root root 4096 Oct 24 16:13 version

and set it to "low mode"

# echo low > /sys/class/drm/card2/device/power_dpm_force_performance_level

to automate line above, create

 /etc/udev/rules.d/30-amdgpu-pm.rules 

with content below content

KERNEL=="card2", SUBSYSTEM=="drm", DRIVERS=="amdgpu", ATTR{device/power_dpm_force_performance_level}="low"

I'm on i3wm with GDM as the display manager, the X server will crash once in 100% case, but after the first crash I now can login and use the computer.

But sometimes it will still crash for few hours of running, but at least I'm able to login now

Last edited by AnakTeka (2023-10-24 09:30:24)

Offline

#3 2023-10-24 13:19:35

seth
Member
Registered: 2012-09-03
Posts: 57,052

Re: [SOLVED] GNOME-shell crash on amdgpu kernel 6.5.5

Was nvidia blacklisted as a mitigational effort or in order to get GDM/gnome run on wayland?

Offline

#4 2023-10-28 03:50:35

AnakTeka
Member
Registered: 2015-11-20
Posts: 7

Re: [SOLVED] GNOME-shell crash on amdgpu kernel 6.5.5

Yeah, I did blacklisted nvidia module and I even plugged out the nvidia card just to make sure, the exact issue is there

Offline

#5 2023-11-19 09:02:54

AnakTeka
Member
Registered: 2015-11-20
Posts: 7

Re: [SOLVED] GNOME-shell crash on amdgpu kernel 6.5.5

an update on this one in case anyone stumbled on same issue (sorry for the double post), the workaround that I posted above works to some extend, in the end what fixed the problem is; I RMA'd the ryzen 7700x hmm

Offline

Board footer

Powered by FluxBB