You are not logged in.

#1 2025-11-16 21:48:29

Areoform
Member
Registered: 2025-05-25
Posts: 7

amdgpu crash on 6.17.8-zen1-1

Since I've seen others mention these ring 0 timeouts across social media I thought I'd post logs, unredacted starting with the first relevant event
Other users had 9070 XT but same kernel etc. I'm using a 7800 XT.
Possibly relevant: I have Discord and OpenRGB running, crashes occur when loading in to a game (when heavy 3D acceleration begins), I can still swap tty and perform an orderly shutdown.

Nov 16 20:31:51 hostname lact[739]: 2025-11-16T20:31:51.562789Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
Nov 16 20:31:51 hostname lact[739]: 2025-11-16T20:31:51.568534Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:747E-1DA2:D475-0000:03:00.0 at '/sys/class/drm/card1/device'
Nov 16 20:31:51 hostname lact[739]: 2025-11-16T20:31:51.568541Z  INFO lact_daemon::server::handler: GPU list reloaded with 1 devices, reapplying configuration
Nov 16 20:31:51 hostname lact[739]: 2025-11-16T20:31:51.568934Z  INFO lact_daemon::server::handler: configuration applied
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Nov 16 20:31:53 hostname steam[37171]: radv/amdgpu: The CS has been cancelled because the context is lost. This context is guilty of a hard recovery.
Nov 16 20:31:53 hostname steam[37171]: radv: GPUVM fault detected at address 0x80000ec00000.
Nov 16 20:31:53 hostname steam[37171]: GCVM_L2_PROTECTION_FAULT_STATUS: 0x701431
Nov 16 20:31:53 hostname steam[37171]:          CLIENT_ID: (SQC (data)) 0xa
Nov 16 20:31:53 hostname steam[37171]:          MORE_FAULTS: 1
Nov 16 20:31:53 hostname steam[37171]:          WALKER_ERROR: 0
Nov 16 20:31:53 hostname steam[37171]:          PERMISSION_FAULTS: 3
Nov 16 20:31:53 hostname steam[37171]:          MAPPING_ERROR: 0
Nov 16 20:31:53 hostname steam[37171]:          RW: 0
Nov 16 20:31:53 hostname kwin_wayland[879]: atomic commit failed: Permission denied
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Nov 16 20:31:53 hostname kded6[1026]: Service  ":1.157" unregistered
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=8077556, emitted seq=8077559
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu:  Process GameThread pid 39630 thread vkd3d_queue pid 39808
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
Nov 16 20:31:53 hostname kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
Nov 16 20:31:53 hostname plasmashell[1071]: No object for name "auto_null.monitor"
Nov 16 20:31:53 hostname lact[739]: 2025-11-16T20:31:53.639449Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
Nov 16 20:31:53 hostname lact[739]: 2025-11-16T20:31:53.645014Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:747E-1DA2:D475-0000:03:00.0 at '/sys/class/drm/card1/device'
Nov 16 20:31:53 hostname lact[739]: 2025-11-16T20:31:53.645021Z  INFO lact_daemon::server::handler: GPU list reloaded with 1 devices, reapplying configuration
Nov 16 20:31:53 hostname lact[739]: 2025-11-16T20:31:53.645379Z  INFO lact_daemon::server::handler: configuration applied

Last edited by Areoform (2025-11-17 04:39:40)

Offline

#2 2025-11-17 16:18:53

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,004

Re: amdgpu crash on 6.17.8-zen1-1

Disable lact - the reload seems to precede the coredump.
The following crash seems induced by steams radv invocation, can you avoid it using https://wiki.archlinux.org/title/Steam/ … _emulation ?

Offline

#3 2025-11-17 23:42:19

Areoform
Member
Registered: 2025-05-25
Posts: 7

Re: amdgpu crash on 6.17.8-zen1-1

I had assumed LACT was unrelated seeing as the initial log line contains 'got kernel drm subsystem event' but I'll disable the service, it's not doing anything other than boosting the default power usage limit to 244W, well within vBIOS limits.

The game in question is Squad, running via Steam using proton-hotfix (required due to an EAC change, I believe). Attempting to run it with PROTON_USE_WINED3D=1 generates 'DirectX 12 is not supported on your system. Try running without the -dx12 or -d3d12 command line argument.'

edit: while it's hard to unpick whether this was coincidental timing between a game update, kernel update or something, I note the latest release of Mesa contains a fix 'amdgpu: ring gfx_0.0.0 timeout, in vr when opening apps' that may be related?

- still crashes after disabling LACTD via systemctl and rebooting.

Last edited by Areoform (2025-11-18 02:51:13)

Offline

#4 2025-11-18 09:11:46

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,004

Re: amdgpu crash on 6.17.8-zen1-1

still crashes after disabling LACTD via systemctl and rebooting

and also updating mesa?
What does the error in the journal (plus > 2 minutes of context) now look like?

Can you trigger this w/ any of  https://aur.archlinux.org/packages?O=0&K=unigine ?

Offline

Board footer

Powered by FluxBB