You are not logged in.
Hi
I'm on Ryzen 7 7700x with iGPU and also a RTX 2080Ti, the iGPU is plugged to a monitor but the 2080Ti is not plugged anywhere (I'm planning to use the CUDA), the GDM hangs everytime I click the username on the login screen. I tried the linux-lts and got the same problem.
this is the log http://0x0.st/HWgq.txt
it crashes with
Oct 06 19:34:16 rumah-arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=113, emitted seq=114
Oct 06 19:34:16 rumah-arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 891 thread gnome-shel:cs0 pid 905
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: MODE2 reset
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 06 19:34:16 rumah-arch kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F4FFC00000).
Oct 06 19:34:16 rumah-arch kernel: [drm] PSP is resuming...
Oct 06 19:34:16 rumah-arch kernel: [drm] reserve 0xa00000 from 0xf4fe000000 for PSP TMR
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: RAP: optional rap ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resuming...
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000004, smu fw if version = 0x00000005, smu fw program = 0, smu fw version = 0x00544fdd (84.79.221)
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resumed successfully!
Oct 06 19:34:16 rumah-arch kernel: [drm] DMUB hardware initialized: version=0x05000C00
Oct 06 19:34:16 rumah-arch kernel: [drm] kiq ring mec 2 pipe 1 q 0
Oct 06 19:34:16 rumah-arch kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Oct 06 19:34:16 rumah-arch kernel: [drm] JPEG decode initialized successfully.
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow start
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow done
Oct 06 19:34:16 rumah-arch kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) succeeded!
let me know if further log is needed, thanks a lot
Last edited by AnakTeka (2023-11-19 09:03:10)
Offline
This is kinda fixed, I need to add this as kernel parameter
amdgpu.ppfeaturemask=0xfffd3fff
and also set the iGPU to some "low" mode
First, find the address of my AMD iGPU (card2 on my case)
[root@arch rules.d]# ls /sys/class/drm/ -l
total 0
lrwxrwxrwx 1 root root 0 Oct 24 16:13 card1 -> ../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/card1
lrwxrwxrwx 1 root root 0 Oct 24 16:13 card2 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2
lrwxrwxrwx 1 root root 0 Oct 24 16:13 card2-DP-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-DP-1
lrwxrwxrwx 1 root root 0 Oct 24 16:13 card2-DP-2 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-DP-2
lrwxrwxrwx 1 root root 0 Oct 24 16:13 card2-HDMI-A-1 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/card2/card2-HDMI-A-1
lrwxrwxrwx 1 root root 0 Oct 24 16:13 renderD128 -> ../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/drm/renderD128
lrwxrwxrwx 1 root root 0 Oct 24 16:13 renderD129 -> ../../devices/pci0000:00/0000:00:08.1/0000:0c:00.0/drm/renderD129
-r--r--r-- 1 root root 4096 Oct 24 16:13 version
and set it to "low mode"
# echo low > /sys/class/drm/card2/device/power_dpm_force_performance_level
to automate line above, create
/etc/udev/rules.d/30-amdgpu-pm.rules
with content below content
KERNEL=="card2", SUBSYSTEM=="drm", DRIVERS=="amdgpu", ATTR{device/power_dpm_force_performance_level}="low"
I'm on i3wm with GDM as the display manager, the X server will crash once in 100% case, but after the first crash I now can login and use the computer.
But sometimes it will still crash for few hours of running, but at least I'm able to login now
Last edited by AnakTeka (2023-10-24 09:30:24)
Offline
Was nvidia blacklisted as a mitigational effort or in order to get GDM/gnome run on wayland?
Online
Yeah, I did blacklisted nvidia module and I even plugged out the nvidia card just to make sure, the exact issue is there
Offline
an update on this one in case anyone stumbled on same issue (sorry for the double post), the workaround that I posted above works to some extend, in the end what fixed the problem is; I RMA'd the ryzen 7700x
Offline