You are not logged in.

#1 2023-05-13 18:36:34

anomaly
Member
Registered: 2022-09-09
Posts: 4

amd gpu randomly crashes and exits xorg session

so basically my amd gpu randomly crashes, it's an rx580 and i have amdgpu installed
this is my from my journalctl -xef log:

May 13 15:29:03 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=8848, emitted seq=8849
May 13 15:29:03 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 13 15:29:03 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
May 13 15:29:04 arch kernel: amdgpu 0000:01:00.0: amdgpu: BACO reset
May 13 15:29:04 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
May 13 15:29:04 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
May 13 15:29:04 arch kernel: [drm] VRAM is lost due to GPU reset!
May 13 15:29:04 arch kernel: [drm] UVD and UVD ENC initialized successfully.
May 13 15:29:04 arch kernel: [drm] VCE initialized successfully.
May 13 15:29:04 arch kernel: amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow start
May 13 15:29:04 arch kernel: amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow done
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: [drm] Skip scheduling IBs!
May 13 15:29:04 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded!
May 13 15:29:04 arch telegram-desktop[1513]: The X11 connection broke (error 1). Did the X11 server die?
May 13 15:29:04 arch systemd[1]: Stopping Session 2 of User anomaly...

Last edited by anomaly (2023-05-13 18:42:47)

Offline

#2 2023-05-13 19:08:39

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,671

Re: amd gpu randomly crashes and exits xorg session

If you remove xf86-video-amdgpu does that help the situation?

Online

#3 2023-05-17 23:21:03

anomaly
Member
Registered: 2022-09-09
Posts: 4

Re: amd gpu randomly crashes and exits xorg session

V1del wrote:

If you remove xf86-video-amdgpu does that help the situation?

i removed it and now i got this error message

May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0008480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102401
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057793, read from 'TC4' (0x54433400) (72)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102496
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0400800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057942, read from 'TC0' (0x54433000) (8)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102400
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057792, read from 'TC4' (0x54433400) (72)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0010440c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010247E
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404400C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057918, read from 'TC5' (0x54433500) (68)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0010480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102402
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057794, read from 'TC4' (0x54433400) (72)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0008480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001024D2
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0408800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1058002, read from 'TC6' (0x54433600) (136)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102400
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057792, read from 'TC4' (0x54433400) (72)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0010480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102480
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0400800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057920, read from 'TC0' (0x54433000) (8)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0000480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102400
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057792, read from 'TC4' (0x54433400) (72)
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0008480c for process Xorg pid 487 thread Xorg:cs0 pid 492
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00102464
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0408800C
May 17 19:58:01 arch kernel: amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 32769) at page 1057892, read from 'TC6' (0x54433600) (136)
May 17 20:00:08 arch telegram-desktop[1910]: qt.svg: Error while inflating gzip file: SVG format check failed
May 17 20:01:09 arch telegram-desktop[1910]: qt.svg: Error while inflating gzip file: SVG format check failed
May 17 20:06:25 arch telegram-desktop[1910]: qt.svg: Error while inflating gzip file: SVG format check failed
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:56 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:08:58 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:10:01 arch telegram-desktop[1910]: qt.svg: Error while inflating gzip file: SVG format check failed
May 17 20:10:45 arch telegram-desktop[1910]: qt.gui.imageio.jpeg: Corrupt JPEG data: premature end of data segment
May 17 20:14:21 arch telegram-desktop[1910]: qt.svg: Error while inflating gzip file: SVG format check failed
May 17 20:18:45 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=19057, emitted seq=19059
May 17 20:18:45 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: BACO reset
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
May 17 20:18:45 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
May 17 20:18:45 arch kernel: [drm] VRAM is lost due to GPU reset!
May 17 20:18:45 arch kernel: [drm] UVD and UVD ENC initialized successfully.
May 17 20:18:45 arch kernel: [drm] VCE initialized successfully.
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow start
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow done
May 17 20:18:45 arch kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded!

Offline

#4 2023-05-18 07:08:24

seth
Member
Registered: 2012-09-03
Posts: 51,056

Re: amd gpu randomly crashes and exits xorg session

https://gitlab.freedesktop.org/drm/amd/-/issues/2220

Does the X11 server still cras for the GPU reset?

Offline

#5 2023-05-22 18:12:34

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

I not sure if my randomly crashes are connected with this, but I tried  the testpage proposed in the link above.

I can see the screen go black and come back, but the system is frozen, keyboard and mouse don't react anymore.

The logs when starting the test:

May 22 19:40:53 amd64-archlinux dbus-daemon[877]: [session uid=1000 pid=877] Activating service name='org.xfce.Xfconf' requested by ':1.18' (uid=1000 pid=1585 comm="xfce4-panel --display :0.0 --sm-client-id 287aa513")
May 22 19:40:53 amd64-archlinux dbus-daemon[877]: [session uid=1000 pid=877] Successfully activated service 'org.xfce.Xfconf'
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 0 threads of 0 processes of 0 users.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Successfully made thread 52751 of process 52559 owned by '1000' RT at priority 10.
May 22 19:44:53 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:54 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:54 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:55 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:55 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:57 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:57 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:57 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:57 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:58 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:44:58 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:00 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:00 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:04 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:04 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:05 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:45:05 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:46:16 amd64-archlinux dbus-daemon[877]: [session uid=1000 pid=877] Activating service name='org.xfce.Xfconf' requested by ':1.13' (uid=1000 pid=1266 comm="xfwm4 --display :0.0 --sm-client-id 2315a3812-8cfa")
May 22 19:46:16 amd64-archlinux dbus-daemon[877]: [session uid=1000 pid=877] Successfully activated service 'org.xfce.Xfconf'
May 22 19:47:02 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:47:02 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:47:12 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:47:12 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:47:19 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:47:19 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:48:13 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:48:13 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:49:14 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:49:14 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:50:15 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:50:15 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:51:16 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:51:16 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:52:12 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:52:12 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:52:17 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:52:17 amd64-archlinux rtkit-daemon[2021]: Supervising 1 threads of 1 processes of 1 users.
May 22 19:53:20 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=2828751, emitted seq=2828753
May 22 19:53:20 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 52559 thread firefox:cs0 pid 52643
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ed00 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ed40 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ed80 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888edc0 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ee00 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ee40 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ee80 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888eec0 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ef00 flags=0x0070]
May 22 19:53:20 amd64-archlinux kernel: amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10888ef40 flags=0x0070]
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
May 22 19:53:21 amd64-archlinux kernel: [drm] PCIE GART of 1024M enabled.
May 22 19:53:21 amd64-archlinux kernel: [drm] PTB located at 0x000000F400A00000
May 22 19:53:21 amd64-archlinux kernel: [drm] PSP is resuming...
May 22 19:53:21 amd64-archlinux kernel: [drm] reserve 0x400000 from 0xf401c00000 for PSP TMR
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 22 19:53:21 amd64-archlinux kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
May 22 19:53:21 amd64-archlinux kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(2) failed
May 22 19:53:21 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
May 22 19:53:21 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
May 22 19:53:31 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=2828753, emitted seq=2828756
May 22 19:53:31 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 52559 thread firefox:cs0 pid 52643
May 22 19:53:31 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
May 22 19:53:31 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:31 amd64-archlinux kernel: WARNING: CPU: 4 PID: 59049 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:31 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:31 amd64-archlinux kernel: CPU: 4 PID: 59049 Comm: kworker/u64:4 Tainted: G           OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:31 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:31 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:31 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:31 amd64-archlinux kernel: RSP: 0018:ffffb7d409447c90 EFLAGS: 00010246
May 22 19:53:31 amd64-archlinux kernel: RAX: ffff93f481b1cb30 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b4bee8 RDI: ffff93f485b40000
May 22 19:53:31 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 000000000003ac80 R09: 0000000000000006
May 22 19:53:31 amd64-archlinux kernel: R10: ffff93fb9f33bd80 R11: 000000000000000a R12: 0000000000001050
May 22 19:53:31 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f5645a4800 R15: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80b00000(0000) knlGS:0000000000000000
May 22 19:53:31 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:31 amd64-archlinux kernel: CR2: 00007f35fec59000 CR3: 00000001e355e000 CR4: 00000000003506e0
May 22 19:53:31 amd64-archlinux kernel: Call Trace:
May 22 19:53:31 amd64-archlinux kernel:  <TASK>
May 22 19:53:31 amd64-archlinux kernel:  gfx_v9_0_hw_fini+0x35/0x700 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:31 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:31 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:31 amd64-archlinux kernel:  </TASK>
May 22 19:53:31 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:31 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:31 amd64-archlinux kernel: WARNING: CPU: 4 PID: 59049 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:31 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:31 amd64-archlinux kernel: CPU: 4 PID: 59049 Comm: kworker/u64:4 Tainted: G        W  OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:31 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:31 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:31 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:31 amd64-archlinux kernel: RSP: 0018:ffffb7d409447c90 EFLAGS: 00010246
May 22 19:53:31 amd64-archlinux kernel: RAX: ffff93f481b1cba0 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b4bf00 RDI: ffff93f485b40000
May 22 19:53:31 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 000000000003ac80 R09: 0000000000000006
May 22 19:53:31 amd64-archlinux kernel: R10: ffff93fb9f33bd80 R11: 000000000000000a R12: 0000000000001050
May 22 19:53:31 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f5645a4800 R15: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80b00000(0000) knlGS:0000000000000000
May 22 19:53:31 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:31 amd64-archlinux kernel: CR2: 00007f35fec59000 CR3: 00000001e355e000 CR4: 00000000003506e0
May 22 19:53:31 amd64-archlinux kernel: Call Trace:
May 22 19:53:31 amd64-archlinux kernel:  <TASK>
May 22 19:53:31 amd64-archlinux kernel:  gfx_v9_0_hw_fini+0x46/0x700 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:31 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:31 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:31 amd64-archlinux kernel:  </TASK>
May 22 19:53:31 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:31 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:31 amd64-archlinux kernel: WARNING: CPU: 4 PID: 59049 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:31 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:31 amd64-archlinux kernel: CPU: 4 PID: 59049 Comm: kworker/u64:4 Tainted: G        W  OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:31 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:31 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:31 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:31 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:31 amd64-archlinux kernel: RSP: 0018:ffffb7d409447ca8 EFLAGS: 00010246
May 22 19:53:31 amd64-archlinux kernel: RAX: ffff93f481b1c7e0 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b40c48 RDI: ffff93f485b40000
May 22 19:53:31 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 0000000000000000 R09: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: R10: 0000000000000001 R11: 0000000000000100 R12: 0000000000001050
May 22 19:53:31 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f5645a4800 R15: 0000000000000000
May 22 19:53:31 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80b00000(0000) knlGS:0000000000000000
May 22 19:53:31 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:31 amd64-archlinux kernel: CR2: 00007f35fec59000 CR3: 00000001e355e000 CR4: 00000000003506e0
May 22 19:53:31 amd64-archlinux kernel: Call Trace:
May 22 19:53:31 amd64-archlinux kernel:  <TASK>
May 22 19:53:31 amd64-archlinux kernel:  gmc_v9_0_hw_fini+0x6d/0x90 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:31 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:31 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:31 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:31 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:31 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:31 amd64-archlinux kernel:  </TASK>
May 22 19:53:31 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:31 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
May 22 19:53:31 amd64-archlinux kernel: [drm] PCIE GART of 1024M enabled.
May 22 19:53:31 amd64-archlinux kernel: [drm] PTB located at 0x000000F400A00000
May 22 19:53:31 amd64-archlinux kernel: [drm] PSP is resuming...
May 22 19:53:31 amd64-archlinux kernel: [drm] reserve 0x400000 from 0xf401c00000 for PSP TMR
May 22 19:53:31 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 22 19:53:31 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 22 19:53:31 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 22 19:53:32 amd64-archlinux kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 22 19:53:32 amd64-archlinux kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
May 22 19:53:32 amd64-archlinux kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
May 22 19:53:32 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(4) failed
May 22 19:53:32 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
May 22 19:53:32 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
May 22 19:53:42 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=2828756, emitted seq=2828759
May 22 19:53:42 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 671 thread Xorg:cs0 pid 692
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
May 22 19:53:42 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:42 amd64-archlinux kernel: WARNING: CPU: 5 PID: 46047 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:42 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:42 amd64-archlinux kernel: CPU: 5 PID: 46047 Comm: kworker/u64:0 Tainted: G        W  OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:42 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:42 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:42 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:42 amd64-archlinux kernel: RSP: 0018:ffffb7d40f77fc90 EFLAGS: 00010246
May 22 19:53:42 amd64-archlinux kernel: RAX: ffff93f481b1cb30 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b4bee8 RDI: ffff93f485b40000
May 22 19:53:42 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 000000000003ac80 R09: 0000000000000006
May 22 19:53:42 amd64-archlinux kernel: R10: ffff93fb9f33bd80 R11: 0000000000000000 R12: 0000000000001050
May 22 19:53:42 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f4f864d000 R15: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80b40000(0000) knlGS:0000000000000000
May 22 19:53:42 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:42 amd64-archlinux kernel: CR2: 00007f5514f1f000 CR3: 00000001074a4000 CR4: 00000000003506e0
May 22 19:53:42 amd64-archlinux kernel: Call Trace:
May 22 19:53:42 amd64-archlinux kernel:  <TASK>
May 22 19:53:42 amd64-archlinux kernel:  gfx_v9_0_hw_fini+0x35/0x700 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:42 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:42 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:42 amd64-archlinux kernel:  </TASK>
May 22 19:53:42 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:42 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:42 amd64-archlinux kernel: WARNING: CPU: 5 PID: 46047 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:42 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:42 amd64-archlinux kernel: CPU: 5 PID: 46047 Comm: kworker/u64:0 Tainted: G        W  OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:42 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:42 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:42 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:42 amd64-archlinux kernel: RSP: 0018:ffffb7d40f77fc90 EFLAGS: 00010246
May 22 19:53:42 amd64-archlinux kernel: RAX: ffff93f481b1cba0 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b4bf00 RDI: ffff93f485b40000
May 22 19:53:42 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 000000000003ac80 R09: 0000000000000006
May 22 19:53:42 amd64-archlinux kernel: R10: ffff93fb9f33bd80 R11: 0000000000000000 R12: 0000000000001050
May 22 19:53:42 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f4f864d000 R15: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80b40000(0000) knlGS:0000000000000000
May 22 19:53:42 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:42 amd64-archlinux kernel: CR2: 00007f5514f1f000 CR3: 00000001074a4000 CR4: 00000000003506e0
May 22 19:53:42 amd64-archlinux kernel: Call Trace:
May 22 19:53:42 amd64-archlinux kernel:  <TASK>
May 22 19:53:42 amd64-archlinux kernel:  gfx_v9_0_hw_fini+0x46/0x700 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:42 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:42 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:42 amd64-archlinux kernel:  </TASK>
May 22 19:53:42 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:42 amd64-archlinux kernel: ------------[ cut here ]------------
May 22 19:53:42 amd64-archlinux kernel: WARNING: CPU: 0 PID: 46047 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device overlay rpcrdma rdma_cm iw_cm ib_cm ib_core rc_hauppauge em28xx_rc si2157 si2168 i2c_mux em28xx_dvb dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_common cfg80211 8021q garp mrp stp llc hwmon_vid joydev mousedev amdgpu intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi kvm_amd snd_hda_intel snd_intel_dspcfg drm_buddy gpu_sched snd_intel_sdw_acpi kvm em28xx r8168(OE) eeepc_wmi i2c_algo_bit snd_hda_codec asus_wmi drm_ttm_helper tveeprom irqbypass ledtrig_audio snd_hda_core ttm crct10dif_pclmul videodev snd_hwdep r8169 sparse_keymap polyval_clmulni realtek platform_profile polyval_generic drm_display_helper gf128mul usbhid mc ghash_clmulni_intel snd_pcm mdio_devres rfkill psmouse pcspkr cec snd_timer rapl wmi_bmof sp5100_tco libphy snd ccp i2c_piix4 soundcore k10temp gpio_amdpt gpio_generic acpi_cpufreq mac_hid vboxnetflt(OE) vboxnetadp(OE) nfsd vboxdrv(OE) auth_rpcgss nfs_acl
May 22 19:53:42 amd64-archlinux kernel:  tun lockd dm_multipath grace sg crypto_user loop sunrpc dm_mod fuse ip_tables x_tables xfs libcrc32c crc32c_generic serio_raw atkbd libps2 vivaldi_fmap crc32_pclmul crc32c_intel sha512_ssse3 aesni_intel crypto_simd cryptd xhci_pci i8042 xhci_pci_renesas video serio wmi
May 22 19:53:42 amd64-archlinux kernel: CPU: 0 PID: 46047 Comm: kworker/u64:0 Tainted: G        W  OE      6.3.3-arch1-1 #1 fa7b7e0107004b3021a57a74b951e0a25e7e8584
May 22 19:53:42 amd64-archlinux kernel: Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5602 07/14/2020
May 22 19:53:42 amd64-archlinux kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 22 19:53:42 amd64-archlinux kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 22 19:53:42 amd64-archlinux kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ef ed 65 d1 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 de ed 65 d1 b8 ea ff ff ff e9 d4 ed 65 d1
May 22 19:53:42 amd64-archlinux kernel: RSP: 0018:ffffb7d40f77fca8 EFLAGS: 00010246
May 22 19:53:42 amd64-archlinux kernel: RAX: ffff93f481b1c7e0 RBX: ffff93f485b40000 RCX: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: RDX: 0000000000000000 RSI: ffff93f485b40c48 RDI: ffff93f485b40000
May 22 19:53:42 amd64-archlinux kernel: RBP: ffff93f485b40000 R08: 0000000000000000 R09: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: R10: 0000000000000001 R11: 0000000000000100 R12: 0000000000001050
May 22 19:53:42 amd64-archlinux kernel: R13: ffff93f485b589a0 R14: ffff93f4f864d000 R15: 0000000000000000
May 22 19:53:42 amd64-archlinux kernel: FS:  0000000000000000(0000) GS:ffff93fb80a00000(0000) knlGS:0000000000000000
May 22 19:53:42 amd64-archlinux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 19:53:42 amd64-archlinux kernel: CR2: 00007f9434267000 CR3: 000000010fe2e000 CR4: 00000000003506f0
May 22 19:53:42 amd64-archlinux kernel: Call Trace:
May 22 19:53:42 amd64-archlinux kernel:  <TASK>
May 22 19:53:42 amd64-archlinux kernel:  gmc_v9_0_hw_fini+0x6d/0x90 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu 5af098d821b9ea38affa857c76960dcde50be6ea]
May 22 19:53:42 amd64-archlinux kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched bd28276126c967b276065acf591bf7c139793842]
May 22 19:53:42 amd64-archlinux kernel:  process_one_work+0x1c7/0x3d0
May 22 19:53:42 amd64-archlinux kernel:  worker_thread+0x51/0x390
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_worker_thread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  kthread+0xde/0x110
May 22 19:53:42 amd64-archlinux kernel:  ? __pfx_kthread+0x10/0x10
May 22 19:53:42 amd64-archlinux kernel:  ret_from_fork+0x2c/0x50
May 22 19:53:42 amd64-archlinux kernel:  </TASK>
May 22 19:53:42 amd64-archlinux kernel: ---[ end trace 0000000000000000 ]---
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
May 22 19:53:42 amd64-archlinux kernel: [drm] PCIE GART of 1024M enabled.
May 22 19:53:42 amd64-archlinux kernel: [drm] PTB located at 0x000000F400A00000
May 22 19:53:42 amd64-archlinux kernel: [drm] PSP is resuming...
May 22 19:53:42 amd64-archlinux kernel: [drm] reserve 0x400000 from 0xf401c00000 for PSP TMR
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 22 19:53:42 amd64-archlinux kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
May 22 19:53:42 amd64-archlinux kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(6) failed
May 22 19:53:42 amd64-archlinux kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110
May 22 19:53:42 amd64-archlinux kernel: [drm] Skip scheduling IBs!
May 22 19:53:42 amd64-archlinux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110

lspci

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset USB 3.1 xHCI Controller (rev 02)
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset SATA Controller (rev 02)
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b2 (rev 02)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
02:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port (rev 02)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 04)
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c6)
07:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller
07:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
07:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
07:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
07:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
08:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)
uname -r
6.3.3-arch1-1

Last edited by bernd_b (2023-05-22 18:34:37)

Offline

#6 2023-05-22 18:39:24

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

Short further test: Removing
xf86-video-amdgpu
doesn't change this behaviour.

Offline

#7 2023-05-22 19:42:09

seth
Member
Registered: 2012-09-03
Posts: 51,056

Re: amd gpu randomly crashes and exits xorg session

"amdgpu: GPU reset end with ret = -110" shows up in https://gitlab.freedesktop.org/drm/amd/-/issues/2447
It's also related to FF

Using Google Streetview in Chromium instead of Firefox doesn't cause the driver to crash

(which mght be down to FF defaulting to xwayland and chromium to native wayland) and

But only when fractional scaling is used! (I have set it to 125%; no crash when at 100%)

Do you use a wayland compositor?
Can you trigger it w/ https://wiki.archlinux.org/title/Firefox#Wayland ?
(Make sure to kill all FF processes before)

Offline

#8 2023-05-22 22:47:28

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

This is all beyond my knowledge, I must confess.

Starting firefox with

MOZ_ENABLE_WAYLAND=1 firefox

doesn't change the behaviour. The screen turns black and comes back several times until it is totally frozen, even the mouse won't move anymore.

Doing the test with Chromium immedeatly freezes everything including mouse movement. No black screens before this.

I am on xfce and use its default window manager with compositing. So I guess no wayland in charge here.

Last edited by bernd_b (2023-05-22 22:47:51)

Offline

#9 2023-05-23 06:51:00

seth
Member
Registered: 2012-09-03
Posts: 51,056

Re: amd gpu randomly crashes and exits xorg session

I am on xfce and use its default window manager with compositing.

The the findings in that bug aren't relevant to you (though you might be hitting the same problem and it's just sidestepped this way)

From the symptoms

amdgpu.dpm=0 amdgpu.runpm=0 amdgpu.aspm=0 amdgpu.bapm=0 pcie_aspm=off

https://wiki.archlinux.org/title/Kernel_parameters
"amdgpu.dpm=0" might prevent the boot altogether, in that case try only the remaining ones w/o it.

Last edited by seth (2023-05-23 06:52:04)

Offline

#10 2023-05-23 07:52:44

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

seth wrote:

"amdgpu.dpm=0" might prevent the boot altogether, in that case try only the remaining ones w/o it.

Yes it did. The screen turned black in the middle of the boot process and the booting seemed to stop.

Having set

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=LABEL=arch_ssd rw net.ifnames=0 rootflags=discard cpufreq.default_governor=conservative amdgpu.runpm=0 amdgpu.aspm=0 amdgpu.bapm=0 pcie_aspm=off

and starting the test again didn't change anything (using firefox again). The screen turns black for several seconds and comes back, but after a few loops like this even the mouse can't be moved. But it had no power anymore anyway once the test got started.

Offline

#11 2023-05-23 11:10:27

seth
Member
Registered: 2012-09-03
Posts: 51,056

Re: amd gpu randomly crashes and exits xorg session

Just to make sure that it's actually the amdgpu, https://wiki.archlinux.org/title/Firefo … shoot_Mode

Offline

#12 2023-05-23 16:18:50

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

Starting firefox with "-safe-mode" the webpage returns:

WebGL could not be initialized.
WebGL may not be available or enabled in your browser.
Please enable WebGL to run this test, or try a different browser.

Offline

#13 2023-05-23 19:32:32

seth
Member
Registered: 2012-09-03
Posts: 51,056

Re: amd gpu randomly crashes and exits xorg session

hmm
Try

export LIBGL_ALWAYS_SOFTWARE=1
firefox

to use softare GL

Offline

#14 2023-05-23 21:23:34

bernd_b
Member
Registered: 2013-07-30
Posts: 164

Re: amd gpu randomly crashes and exits xorg session

Interesting. This seems to bring the testpage itself to crash once the start button is hidden. As I am writing this, the cpu usage is at 100 percent, but firefox and the desktop keeps usable.
Even closing the tab with the testpage won't calm the cpu usage. I will submit this post and close firefox for good.

Offline

Board footer

Powered by FluxBB