You are not logged in.
Setup:
AMD 7950X
RX 9070 XT
KDE Plasma
multiple monitors connected to both the iGPU and dGPU
It is stable with kernel 6.17.9 and prior.
On 6.18 it crashes only with video playback on the browser, with or without llama.cpp running on the dGPU.
At least has of 2025-12-17, don't know if there was any update since then.
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 timeout, signaled seq=1, emitted seq=3
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Process brave pid 4180 thread brave:cs0 pid 4244
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Starting comp_1.1.0 ring reset
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: reset compute queue (1:1:0)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x1
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Ring comp_1.1.0 reset failed
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!. Source: 1
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Suspending all queues failed
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: remove_all_kfd_queues_mes: Failed to remove queue 3 for dev 3197
Dec 17 10:12:53 dt1 kernel: traps: llama-server[6229] general protection fault ip:7f0c7d6b09a2 sp:7f0c157fd4b0 error:0 in libc.so.6[289a2,7f0c7d6b0000+188000]
Dec 17 10:12:53 dt1 systemd-coredump[8938]: Process 6114 (llama-server) of user 0 terminated abnormally with signal 11/SEGV, processing...
...skipping...
Dec 17 10:12:57 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* Failed to initialize parser -125!
Dec 17 10:12:57 dt1 flatpak[4180]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
Dec 17 10:12:57 dt1 flatpak[3827]: [1217/101257.183840:ERROR:third_party/crashpad/crashpad/util/linux/scoped_ptrace_attach.cc:27] ptrace: Operation not permitted (1)
Dec 17 10:12:57 dt1 plasmashell[2892]: QRhiGles2: Context is lost.
Dec 17 10:12:57 dt1 plasmashell[2892]: Graphics device lost, cleaning up scenegraph and releasing RHI
Dec 17 10:12:57 dt1 systemd-coredump[8978]: Process 4180 (brave) of user 1000 terminated abnormally with signal 6/ABRT, processing...
Dec 17 10:12:57 dt1 systemd-coredump[8979]: Process 1057 (Xorg) of user 0 terminated abnormally with signal 6/ABRT, processing...
Dec 17 10:12:57 dt1 systemd[1]: Started Process Core Dump (PID 8978/UID 0).
Dec 17 10:12:57 dt1 systemd[1]: Started Process Core Dump (PID 8979/UID 0).
Dec 17 10:12:57 dt1 kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
Dec 17 10:12:57 dt1 systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.
Dec 17 10:12:57 dt1 systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.Offline
There is a bug in the 6.18.1 kernel that's caused by AI (ollama / comfyui / llama.cpp). No fixes yet.
Offline
hmm, I did try with llamacpp off and it still crashed... interesting!
Well I am going to test again to see if it is the issue.
Thanks!
Offline
In my case it freezes even after I close ComfyUI so if llama.cpp is running as a service for you that might be the reason even if you turn it off afterwards.
Offline
Your right, just tested it, probably something to do with ROCm, lets wait and see.
Thanks again!
Offline
Glad to see someone has the same problem with me, so I know that it isn't my problem, I was so panic until I see this post.
Do the kernel developers know this issue?
Last edited by laichiaheng (2025-12-18 17:16:33)
Offline
Yes, but I guess it would be OK to comment on either https://gitlab.freedesktop.org/drm/amd/-/issues/4765 or https://gitlab.freedesktop.org/drm/amd/-/issues/4783 to let them know it's affecting more people.
Offline
Hello, I have the exact same problem with a AMD 7800X3D CPU and a AMD 9070XT GPU since Linux kernel version 6.18.1.
Would just like to add that the crash/coredump does not only happen when running LLMs, but indeed also when attempting to play a video or a game.
Edit: reverting to kernel version 6.17.9 "fixed" it for now. I've added the kernel to ignored packages in /etc/pacman.conf, guess I'll stick with it for now ![]()
Last edited by pixeled (2025-12-21 20:05:00)
Offline
Well the bug is in the kernel, so it makes sense. But it's 100% reproducible with LLMs so it's easier to report. I will try to play some games today (holidays yey) and comment if it crashes for me.
Offline
It still happens on 6.18.2
Last edited by laichiaheng (2025-12-21 04:59:39)
Offline
I have the same bug, a random crash can occur when watching on full screen mode a youtube video in Firefox, graphic card Radeon RX7600 is used.
Last edited by Potomac (Yesterday 23:55:55)
Offline