You are not logged in.
So according to #24 it sabilized around the 15th and now it broke on the 26th - any suspiciously relevant updates in your pacman log for those dates?
Did anything else differ during this period (climate, access/usage patterns, etc.)?
Offline
So according to #24 it sabilized around the 15th and now it broke on the 26th - any suspiciously relevant updates in your pacman log for those dates?
Did anything else differ during this period (climate, access/usage patterns, etc.)?
I may not have worded it correctly, it was stable from around 21st to 26th so slightly under a week.
When I had a browse it was mainly only firefox that got an update that I'm aware has been part of the issue (even though it has still occured with just chromium too). The rest of my pacman.log was mainly just the installs when switching to wayland.
In terms of climate I did start to notice that with X.org/i3 it seemed to mainly occur first thing in the morning when the pc was first switched on so I was thinking maybe it's a temp thing but then the 2 crashes on wayland were both at around 3pm. It seems less frequent on wayland but the effects of the crash impact wayland more. I'll keep note of the times and if it happens again today around 3pm that would seem to be a pattern emerging.
Last edited by mearkat7 (2023-03-28 01:20:15)
Offline
Tried it again after 6.2.7 and can't reproduce it anymore. So very possible that fixed it.
Offline
Hello there
Running Arch 6.2.9-arch1-1 kernel
DE - Gnome (on Wayland)
Mesa 23.0.1-2
Constantly crashing GPU when using Vivaldi browser. Chances are huge when I have microsoft Teams url opened, can crash few times during calls, or when chatting with someone.
Need to restart gnome-session if gpu driver crashed.
Doing other tasks in this browser - is okay.
Playing games on this GPU - is also okay, no issues.
Ryzen 6900HX + Radeon RX6600M
$ lspci | grep ' VGA '
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
e8:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev c7)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32774, for process vivaldi-bin pid 46941 thread vivaldi-bi:cs0 pid 46954)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x00008001454ff000 from client 0x1b (UTCL2)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32774, for process vivaldi-bin pid 46941 thread vivaldi-bi:cs0 pid 46954)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x00008001454ff000 from client 0x1b (UTCL2)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46941:46941:0405/100654.155185:ERROR:shared_context_state.cc(860)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_INNOCENT_CONTEXT_RESET_KHR
Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46941:46941:0405/100654.155430:ERROR:gpu_service_impl.cc(1011)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
Apr 05 10:06:54 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46901:46901:0405/100654.163985:ERROR:gpu_process_host.cc(954)] GPU process exited unexpectedly: exit_code=8704
Offline
Hi there,
I've been having the same issue on Linux 6.2.9, Mesa 23.0.1 with a 5700 XT and an i7 4790K. This was while playing Minecraft, I've had issues while playing Cyberpunk too:
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0020113B
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:2 pasid:32771, for process java pid 1604 thread java:cs0 pid 1701)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000080012b140000 from client 0x1b (UTCL2)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 07 20:24:37 jpenuchot-nzxt kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Apr 07 20:24:43 jpenuchot-nzxt input-leapc[1485]: InputLeap 2.4.0-release: [2023-04-07T20:24:43] INFO: leaving screen
Apr 07 20:24:47 jpenuchot-nzxt kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=611913, emitted seq=611915
Apr 07 20:24:47 jpenuchot-nzxt kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process java pid 1604 thread java:cs0 pid 1701
Last edited by JPenuchot (2023-04-07 18:41:49)
Offline
Can confirm I am also experiencing this crash, and have been experiencing similar troubles with my AMD GPU for some time now.
Kernel version 6.2.10, Mesa 23.0.2, RX 6700XT, Ryzen 5 5600.
Relevant log from today:
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031d428000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101031
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031d430000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031143f000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031d431000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800311437000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031143e000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031d438000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x000080031d439000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32769, for process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x0000800311436000 from client 0x1b (UTCL2)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0
Apr 10 23:11:24 prospit kernel: amdgpu 0000:0b:00.0: amdgpu: RW: 0x0
Apr 10 23:11:34 prospit kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=84609, emitted seq=84611
Apr 10 23:11:34 prospit kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_wayland pid 1746 thread kwin_wayla:cs0 pid 1885
Apr 10 23:11:35 prospit kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Offline
Hello there
Running Arch 6.2.9-arch1-1 kernel
DE - Gnome (on Wayland)
Mesa 23.0.1-2Constantly crashing GPU when using Vivaldi browser. Chances are huge when I have microsoft Teams url opened, can crash few times during calls, or when chatting with someone.
Need to restart gnome-session if gpu driver crashed.
Doing other tasks in this browser - is okay.
Playing games on this GPU - is also okay, no issues.Ryzen 6900HX + Radeon RX6600M
$ lspci | grep ' VGA ' 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) e8:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev c7)
Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32774, for process vivaldi-bin pid 46941 thread vivaldi-bi:cs0 pid 46954) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x00008001454ff000 from client 0x1b (UTCL2) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32774, for process vivaldi-bin pid 46941 thread vivaldi-bi:cs0 pid 46954) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x00008001454ff000 from client 0x1b (UTCL2) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0) Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 Apr 05 10:06:44 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0 Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46941:46941:0405/100654.155185:ERROR:shared_context_state.cc(860)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_INNOCENT_CONTEXT_RESET_KHR Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46941:46941:0405/100654.155430:ERROR:gpu_service_impl.cc(1011)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly. Apr 05 10:06:54 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered Apr 05 10:06:54 arch vivaldi-stable.desktop[46906]: [46901:46901:0405/100654.163985:ERROR:gpu_process_host.cc(954)] GPU process exited unexpectedly: exit_code=8704
Looks like in kernel 6.2.12 problem disappeared for me. Now my system is stable again.
UPD: after writing this comment browser crashed GPU driver again.
Last edited by Bodyash (2023-04-27 07:57:22)
Offline
Just to add that I'm also having this bug in 6.3.1-zen1-1:
uname -r
6.3.1-zen1-1-zen
lspci | grep ' VGA '
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c3)
Crashes while browsing (Firefox, Twitter), happens once a week, more or less, since a couple months (Ie., since I have this computer). There's a sudden black screen followed by a "static" screen with the las displayed frame of my desktop. I was able to go to a VT and reboot the computer without further issue.
may 05 11:55:55 blanquita kernel: [drm] failed to load ucode VCN0_RAM(0x3A)
may 05 11:55:55 blanquita kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
may 05 11:55:55 blanquita kernel: [drm] failed to load ucode VCN1_RAM(0x3B)
may 05 11:55:55 blanquita kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
may 05 11:56:05 blanquita kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=6610, emitted seq=6614
may 05 11:56:05 blanquita kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 2067 thread firefox:cs0 pid 3092
may 05 11:56:05 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
may 05 11:56:05 blanquita kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
may 05 11:56:05 blanquita kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000000e0 != 0x00000000
may 05 11:56:05 blanquita kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
may 05 11:56:05 blanquita kernel: [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
may 05 11:56:06 blanquita kernel: [drm] Register(1) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000000a0 != 0x00000000
may 05 11:56:06 blanquita kernel: [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
may 05 11:56:06 blanquita plasmashell[1469]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
may 05 11:56:06 blanquita kernel: ------------[ cut here ]------------
may 05 11:56:06 blanquita kernel: WARNING: CPU: 5 PID: 7700 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0xf3/0x120 [amdgpu]
may 05 11:56:06 blanquita kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq ccm cmac algif_hash algif_skcipher af_alg hid_logitech_hidpp bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic snd_usb_audio snd_usbmidi_lib snd_rawmidi xpad hid_logitech_dj snd_>
may 05 11:56:06 blanquita kernel: gpio_generic acpi_cpufreq mac_hid dm_multipath snd_aloop snd_pcm snd_timer snd soundcore v4l2loopback_dc(OE) videodev mc crypto_user loop fuse dm_mod bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid amdgpu i2c_algo_bit >
may 05 11:56:06 blanquita kernel: CPU: 5 PID: 7700 Comm: kworker/u24:1 Tainted: G OE 6.3.1-zen1-1-zen #1 156717576b1d5c8078aee7319e8186602a31f594
may 05 11:56:06 blanquita kernel: Hardware name: ASUS System Product Name/ROG STRIX B650E-F GAMING WIFI, BIOS 1412 04/25/2023
may 05 11:56:06 blanquita kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
may 05 11:56:06 blanquita kernel: RIP: 0010:amdgpu_irq_put+0xf3/0x120 [amdgpu]
may 05 11:56:06 blanquita kernel: Code: 89 ff ff d0 0f 1f 00 4c 89 ee 4c 89 f7 89 04 24 e8 a2 a9 2a c7 8b 04 24 48 83 c4 08 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b b8 ea ff ff ff e9 64 ff ff ff b8 ea ff ff ff e9 5a ff ff ff
may 05 11:56:06 blanquita kernel: RSP: 0018:ffffb8a2539d7c70 EFLAGS: 00010246
may 05 11:56:06 blanquita kernel: RAX: ffff9d7414c61a80 RBX: ffff9d7414820000 RCX: 0000000000000000
may 05 11:56:06 blanquita kernel: RDX: 0000000000000000 RSI: ffff9d7414822510 RDI: ffff9d7414820000
may 05 11:56:06 blanquita kernel: RBP: 0000000000000000 R08: 0000000000040000 R09: 0000000000000000
may 05 11:56:06 blanquita kernel: R10: 0000000000000001 R11: ffff9d74009c26c0 R12: 0000000000000000
may 05 11:56:06 blanquita kernel: R13: ffff9d74148389a0 R14: ffff9d7502ac6c00 R15: 0000000000000000
may 05 11:56:06 blanquita kernel: FS: 0000000000000000(0000) GS:ffff9d7b5dd40000(0000) knlGS:0000000000000000
may 05 11:56:06 blanquita kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
may 05 11:56:06 blanquita kernel: CR2: 00007f46ecbc4000 CR3: 0000000105ba2000 CR4: 0000000000750ee0
may 05 11:56:06 blanquita kernel: PKRU: 55555554
may 05 11:56:06 blanquita kernel: Call Trace:
may 05 11:56:06 blanquita kernel: <TASK>
may 05 11:56:06 blanquita kernel: gmc_v10_0_suspend+0x53/0x90 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: amdgpu_device_ip_suspend_phase2+0x104/0x1a0 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: ? amdgpu_device_ip_suspend_phase1+0x64/0xe0 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: amdgpu_device_pre_asic_reset+0xe3/0x2c0 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: amdgpu_device_gpu_recover+0x484/0xec0 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: amdgpu_job_timedout+0x18d/0x240 [amdgpu 847874e39535bccfb19fd35aab9642c104bab4b6]
may 05 11:56:06 blanquita kernel: drm_sched_job_timedout+0x77/0x110 [gpu_sched 9b9c9d3603e187ddb26f6cc1cb6c604387b484a9]
may 05 11:56:06 blanquita kernel: process_one_work+0x24f/0x460
may 05 11:56:06 blanquita kernel: worker_thread+0x55/0x4f0
may 05 11:56:06 blanquita kernel: ? __pfx_worker_thread+0x10/0x10
may 05 11:56:06 blanquita kernel: kthread+0xdb/0x110
may 05 11:56:06 blanquita kernel: ? __pfx_kthread+0x10/0x10
may 05 11:56:06 blanquita kernel: ret_from_fork+0x29/0x50
may 05 11:56:06 blanquita kernel: </TASK>
may 05 11:56:06 blanquita kernel: ---[ end trace 0000000000000000 ]---
may 05 11:56:06 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
may 05 11:56:06 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
may 05 11:56:06 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
may 05 11:56:06 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
may 05 11:56:06 blanquita kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000F00000).
may 05 11:56:06 blanquita kernel: [drm] VRAM is lost due to GPU reset!
may 05 11:56:06 blanquita kernel: [drm] PSP is resuming...
may 05 11:56:06 blanquita plasmashell[1469]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:06 blanquita plasmashell[1469]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:06 blanquita plasmashell[1469]: [GFX1]: Device reset due to WR context
may 05 11:56:06 blanquita plasmashell[1469]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
may 05 11:56:06 blanquita plasmashell[1702]: [GFX1-]: Failed to connect WebRenderBridgeChild. isParent=false
may 05 11:56:07 blanquita kernel: [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
may 05 11:56:07 blanquita kernel: [drm] DMUB hardware initialized: version=0x02020017
may 05 11:56:07 blanquita kwin_wayland[798]: kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred.
may 05 11:56:07 blanquita kernel: [drm] kiq ring mec 2 pipe 1 q 0
may 05 11:56:07 blanquita kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
may 05 11:56:07 blanquita kernel: [drm] JPEG decode initialized successfully.
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
may 05 11:56:07 blanquita plasmashell[2067]: amdgpu: amdgpu_cs_query_fence_status failed.
may 05 11:56:07 blanquita plasmashell[2067]: amdgpu: amdgpu_cs_query_fence_status failed.
may 05 11:56:07 blanquita plasmashell[2067]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
may 05 11:56:07 blanquita plasmashell[2067]: amdgpu: The process will be terminated.
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
may 05 11:56:07 blanquita kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) succeeded!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm] Skip scheduling IBs!
may 05 11:56:07 blanquita kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
may 05 11:56:07 blanquita plasmashell[1469]: [GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
may 05 11:56:08 blanquita kwin_wayland[798]: kwin_scene_opengl: Waiting for glGetGraphicsResetStatus to return GL_NO_ERROR timed out!
may 05 11:56:08 blanquita kwin_wayland[798]: OpenGL vendor string: AMD
may 05 11:56:08 blanquita kwin_wayland[798]: OpenGL renderer string: AMD Radeon RX 6800 (navi21, LLVM 15.0.7, DRM 3.52, 6.3.1-zen1-1-zen)
may 05 11:56:08 blanquita kwin_wayland[798]: OpenGL version string: 4.6 (Core Profile) Mesa 23.0.3
may 05 11:56:08 blanquita kwin_wayland[798]: OpenGL shading language version string: 4.60
may 05 11:56:08 blanquita kwin_wayland[798]: Driver: Unknown
may 05 11:56:08 blanquita kwin_wayland[798]: GPU class: Unknown
may 05 11:56:08 blanquita kwin_wayland[798]: OpenGL version: 4.6
may 05 11:56:08 blanquita kwin_wayland[798]: GLSL version: 4.60
may 05 11:56:08 blanquita kwin_wayland[798]: Mesa version: 23.0.3
may 05 11:56:08 blanquita kwin_wayland[798]: X server version: 1.23.1
may 05 11:56:08 blanquita kwin_wayland[798]: Linux kernel version: 6.3.1
may 05 11:56:08 blanquita kwin_wayland[798]: Requires strict binding: no
may 05 11:56:08 blanquita kwin_wayland[798]: GLSL shaders: yes
may 05 11:56:08 blanquita kwin_wayland[798]: Texture NPOT support: yes
may 05 11:56:08 blanquita kwin_wayland[798]: Virtual Machine: no
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland[798]: BlurConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland_wrapper[798]: amdgpu: The CS has been rejected (-125). Recreate the context.
may 05 11:56:08 blanquita kwin_wayland[798]: ZoomConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: WindowViewConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: SlidingPopupsConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: SlideConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: OverviewConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: KscreenConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: DesktopGridConfig::instance called after the first use - ignoring
may 05 11:56:08 blanquita kwin_wayland[798]: kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred.
may 05 11:56:09 blanquita kwin_wayland[798]: kwin_scene_opengl: Waiting for glGetGraphicsResetStatus to return GL_NO_ERROR timed out!
may 05 11:56:09 blanquita kwin_wayland[798]: OpenGL vendor string: AMD
may 05 11:56:09 blanquita kwin_wayland[798]: OpenGL renderer string: AMD Radeon RX 6800 (navi21, LLVM 15.0.7, DRM 3.52, 6.3.1-zen1-1-zen)
may 05 11:56:09 blanquita kwin_wayland[798]: OpenGL version string: 4.6 (Core Profile) Mesa 23.0.3
may 05 11:56:09 blanquita kwin_wayland[798]: OpenGL shading language version string: 4.60
may 05 11:56:09 blanquita kwin_wayland[798]: Driver: Unknown
may 05 11:56:09 blanquita kwin_wayland[798]: GPU class: Unknown
may 05 11:56:09 blanquita kwin_wayland[798]: OpenGL version: 4.6
may 05 11:56:09 blanquita kwin_wayland[798]: GLSL version: 4.60
may 05 11:56:09 blanquita kwin_wayland[798]: Mesa version: 23.0.3
may 05 11:56:09 blanquita kwin_wayland[798]: X server version: 1.23.1
may 05 11:56:09 blanquita kwin_wayland[798]: Linux kernel version: 6.3.1
may 05 11:56:09 blanquita kwin_wayland[798]: Requires strict binding: no
may 05 11:56:09 blanquita kwin_wayland[798]: GLSL shaders: yes
may 05 11:56:09 blanquita kwin_wayland[798]: Texture NPOT support: yes
may 05 11:56:09 blanquita kwin_wayland[798]: Virtual Machine: no
may 05 11:56:09 blanquita kwin_wayland[798]: BlurConfig::instance called after the first use - ignoring
may 05 11:56:09 blanquita kwin_wayland[798]: ZoomConfig::instance called after the first use - ignoring
...
And it keeps repeating the same messages.
ADDENDUM: Happened again just a moment ago. When loading youtube. It seems 6.3 has made it worse.
Last edited by tonatiuhmira (2023-05-05 19:11:06)
Offline
tonatiuhmira, you may not be facing the same bug.
The other problem logs all have (atleast) 2 things in common :
- a gpu reset happens
- messages mentioning GCVM_L2_PROTECTION_FAULT_STATUS appear
The snippet you posted is about a gpu reset, but doesn't show the other message.
Please check if GCVM_L2_PROTECTION_FAULT is present in your logs after a crash .
In case that phrase is not present you are facing another bug and should start a new thread.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
So, it's still crashing sometimes:
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32771, for process vivaldi-bin pid 254791 thread vivaldi-bi:cs0 pid 254806)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000800100639000 from client 0x1b (UTCL2)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00241051
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x1
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32771, for process vivaldi-bin pid 254791 thread vivaldi-bi:cs0 pid 254806)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000800100639000 from client 0x1b (UTCL2)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
May 19 10:03:28 arch kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
May 19 10:03:38 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered
May 19 10:03:49 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=26948552, emitted seq=26948555
May 19 10:03:49 arch kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
May 19 10:03:49 arch kernel: ------------[ cut here ]------------
May 19 10:03:49 arch kernel: WARNING: CPU: 12 PID: 254207 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 19 10:03:49 arch kernel: Modules linked in: uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common vhost_net vhost vhost_iotlb tap tun uinput hid_microsoft ff_memless hidp ccm rfcomm snd_seq_dummy sn>
May 19 10:03:49 arch kernel: sha512_ssse3 snd_hda_core snd_seq_device aesni_intel drm_ttm_helper mc snd_hwdep crypto_simd libarc4 ttm snd_pci_acp5x wmi_bmof cryptd usbhid snd_pcm snd_rn_pci_acp3x cfg80211 drm_display_helper rapl snd_acp>
May 19 10:03:49 arch kernel: CPU: 12 PID: 254207 Comm: kworker/u32:2 Tainted: G W 6.3.2-arch1-1 #1 44a850778a68c42d012ba8e685997cb0375875a4
May 19 10:03:49 arch kernel: Hardware name: Micro Computer (HK) Tech Limited HX99G/F7BAA, BIOS 0.18 03/02/2023
May 19 10:03:49 arch kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
May 19 10:03:49 arch kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 19 10:03:49 arch kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 c3 cc cc cc cc e9 5a fd ff ff <0f> 0b b8 ea ff ff ff c3 cc cc cc cc b8 ea ff ff ff c3 cc cc cc cc
May 19 10:03:49 arch kernel: RSP: 0018:ffff9dac23907ca0 EFLAGS: 00010246
May 19 10:03:49 arch kernel: RAX: ffff8df0c48a0a40 RBX: ffff8df0e1e00000 RCX: 0000000000000000
May 19 10:03:49 arch kernel: RDX: 0000000000000000 RSI: ffff8df0e1e02510 RDI: ffff8df0e1e00000
May 19 10:03:49 arch kernel: RBP: ffff8df0e1e00000 R08: 0000000000000000 R09: 0000000000000000
May 19 10:03:49 arch kernel: R10: 0000000000000001 R11: 0000000000000100 R12: 0000000000001050
May 19 10:03:49 arch kernel: R13: ffff8df0e1e189a0 R14: ffff8dfc1ab70a00 R15: 0000000000000000
May 19 10:03:49 arch kernel: FS: 0000000000000000(0000) GS:ffff8dffbe900000(0000) knlGS:0000000000000000
May 19 10:03:49 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 19 10:03:49 arch kernel: CR2: 000025a40d0ffc50 CR3: 00000002cec20000 CR4: 0000000000750ee0
May 19 10:03:49 arch kernel: PKRU: 55555554
May 19 10:03:49 arch kernel: Call Trace:
May 19 10:03:49 arch kernel: <TASK>
May 19 10:03:49 arch kernel: gmc_v10_0_hw_fini+0x53/0x90 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: gmc_v10_0_suspend+0xe/0x20 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: amdgpu_device_ip_suspend+0x36/0x70 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: amdgpu_device_gpu_recover+0x4c7/0xd60 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: amdgpu_job_timedout+0x18d/0x240 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:49 arch kernel: drm_sched_job_timedout+0x7a/0x110 [gpu_sched f3b1fbe337249fc10d476a810e412960d1b556b8]
May 19 10:03:49 arch kernel: process_one_work+0x1c7/0x3d0
May 19 10:03:49 arch kernel: worker_thread+0x51/0x390
May 19 10:03:49 arch kernel: ? __pfx_worker_thread+0x10/0x10
May 19 10:03:49 arch kernel: kthread+0xde/0x110
May 19 10:03:49 arch kernel: ? __pfx_kthread+0x10/0x10
May 19 10:03:49 arch kernel: ret_from_fork+0x2c/0x50
May 19 10:03:49 arch kernel: </TASK>
May 19 10:03:49 arch kernel: ---[ end trace 0000000000000000 ]---
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
May 19 10:03:49 arch kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
May 19 10:03:49 arch kernel: [drm] VRAM is lost due to GPU reset!
May 19 10:03:49 arch kernel: [drm] PSP is resuming...
May 19 10:03:49 arch kernel: [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2a00 (59.42.0)
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
May 19 10:03:49 arch kernel: [drm] DMUB hardware initialized: version=0x02020017
May 19 10:03:49 arch kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 19 10:03:49 arch kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
May 19 10:03:49 arch kernel: [drm] JPEG decode initialized successfully.
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 19 10:03:49 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
May 19 10:03:50 arch kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(4) succeeded!
May 19 10:03:50 arch kernel: [drm] Skip scheduling IBs!
May 19 10:03:50 arch gnome-shell[164568]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
May 19 10:03:50 arch gnome-shell[164568]: amdgpu: The process will be terminated.
May 19 10:03:50 arch WebKitWebProces[255448]: Error reading events from display: Broken pipe
May 19 10:03:50 arch gnome-shell[167627]: (EE) failed to read Wayland events: Broken pipe
This time I had no luck to switch to tty and kill session, image stuck, but I have second error occured:
May 19 10:03:55 arch kernel: ------------[ cut here ]------------
May 19 10:03:55 arch kernel: WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
May 19 10:03:55 arch kernel: Modules linked in: uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common vhost_net vhost vhost_iotlb tap tun uinput hid_microsoft ff_memless hidp ccm rfcomm snd_seq_dummy sn>
May 19 10:03:55 arch kernel: sha512_ssse3 snd_hda_core snd_seq_device aesni_intel drm_ttm_helper mc snd_hwdep crypto_simd libarc4 ttm snd_pci_acp5x wmi_bmof cryptd usbhid snd_pcm snd_rn_pci_acp3x cfg80211 drm_display_helper rapl snd_acp>
May 19 10:03:55 arch kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.2-arch1-1 #1 44a850778a68c42d012ba8e685997cb0375875a4
May 19 10:03:55 arch kernel: Hardware name: Micro Computer (HK) Tech Limited HX99G/F7BAA, BIOS 0.18 03/02/2023
May 19 10:03:55 arch kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
May 19 10:03:55 arch kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 c3 cc cc cc cc e9 5a fd ff ff <0f> 0b b8 ea ff ff ff c3 cc cc cc cc b8 ea ff ff ff c3 cc cc cc cc
May 19 10:03:55 arch kernel: RSP: 0018:ffff9dac00003e20 EFLAGS: 00010046
May 19 10:03:55 arch kernel: RAX: ffff8df0c4ba2d60 RBX: ffff8df0ec749800 RCX: 0000000000000000
May 19 10:03:55 arch kernel: RDX: 0000000000000000 RSI: ffff8df0e1e065c8 RDI: ffff8df0e1e00000
May 19 10:03:55 arch kernel: RBP: 0000000000000000 R08: ffffffffc1a1fb22 R09: 0000000000000000
May 19 10:03:55 arch kernel: R10: ffff9dac00003d10 R11: ffff9dac00003d14 R12: ffff8df0e1e00010
May 19 10:03:55 arch kernel: R13: ffff8df0e1e00000 R14: ffff8dfac3324e00 R15: ffff8dffbe621fc0
May 19 10:03:55 arch kernel: FS: 0000000000000000(0000) GS:ffff8dffbe600000(0000) knlGS:0000000000000000
May 19 10:03:55 arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 19 10:03:55 arch kernel: CR2: 000014fc00a0c000 CR3: 00000002cec20000 CR4: 0000000000750ef0
May 19 10:03:55 arch kernel: PKRU: 55555554
May 19 10:03:55 arch kernel: Call Trace:
May 19 10:03:55 arch kernel: <IRQ>
May 19 10:03:55 arch kernel: dm_set_vblank+0x187/0x1b0 [amdgpu 0323f8491ce43051980d5a5c8bf0138a850d0341]
May 19 10:03:55 arch kernel: drm_vblank_disable_and_save+0xba/0xf0
May 19 10:03:55 arch kernel: vblank_disable_fn+0x67/0x80
May 19 10:03:55 arch kernel: ? __pfx_vblank_disable_fn+0x10/0x10
May 19 10:03:55 arch kernel: call_timer_fn+0x27/0x130
May 19 10:03:55 arch kernel: ? __pfx_vblank_disable_fn+0x10/0x10
May 19 10:03:55 arch kernel: __run_timers+0x222/0x2c0
May 19 10:03:55 arch kernel: run_timer_softirq+0x1d/0x40
May 19 10:03:55 arch kernel: __do_softirq+0xd4/0x2c8
May 19 10:03:55 arch kernel: __irq_exit_rcu+0xbb/0xf0
May 19 10:03:55 arch kernel: sysvec_apic_timer_interrupt+0x72/0x90
May 19 10:03:55 arch kernel: </IRQ>
May 19 10:03:55 arch kernel: <TASK>
May 19 10:03:55 arch kernel: asm_sysvec_apic_timer_interrupt+0x1a/0x20
May 19 10:03:55 arch kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x440
May 19 10:03:55 arch kernel: Code: aa 6f 3d ff e8 c5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 a3 71 3c ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
May 19 10:03:55 arch kernel: RSP: 0018:ffffffffb4803e40 EFLAGS: 00000246
May 19 10:03:55 arch kernel: RAX: ffff8dffbe633e80 RBX: ffff8df0c0bab800 RCX: 0000000000000000
May 19 10:03:55 arch kernel: RDX: 0000000000000000 RSI: fffffffcdf9d35d3 RDI: 0000000000000000
May 19 10:03:55 arch kernel: RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000026dc593f
May 19 10:03:55 arch kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb4949360
May 19 10:03:55 arch kernel: R13: 00010ac3ebb9db45 R14: 0000000000000003 R15: 0000000000000000
May 19 10:03:55 arch kernel: cpuidle_enter+0x2d/0x40
May 19 10:03:55 arch kernel: do_idle+0x1bf/0x220
May 19 10:03:55 arch kernel: cpu_startup_entry+0x1d/0x20
May 19 10:03:55 arch kernel: rest_init+0xc8/0xd0
May 19 10:03:55 arch kernel: arch_call_rest_init+0xe/0x30
May 19 10:03:55 arch kernel: start_kernel+0x778/0xb80
May 19 10:03:55 arch kernel: secondary_startup_64_no_verify+0xe5/0xeb
May 19 10:03:55 arch kernel: </TASK>
May 19 10:03:55 arch kernel: ---[ end trace 0000000000000000 ]---
Kernel 6.3.2-arch1-1
Offline
Have you managed to run memtest86+ for at least a day (24 hrs)?
Offline
Having the same issue using a 6700XT, completely random, might take 5 minutes might take 5 hours. Has only happened to me while playing Assassin's Creed Valhalla.
lspci
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] (rev df)
uname -rms
Linux 6.3.3-arch1-1 x86_64
log output:
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32775, for process ACValhalla.exe pid 1732 thread vkd3d_queue pid 1824)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x000080032aab0000 from client 0x1b (UTCL2)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501031
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x1
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: RW: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32775, for process ACValhalla.exe pid 1732 thread vkd3d_queue pid 1824)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x000080032aab1000 from client 0x1b (UTCL2)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x0
May 22 17:10:29 bl kernel: amdgpu 0000:0c:00.0: amdgpu: RW: 0x0
May 22 17:10:39 bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=141786, emitted seq=141788
May 22 17:10:39 bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ACValhalla.exe pid 1732 thread vkd3d_queue pid 1824
Offline
there are a few things that stand out as possible culprits.
#1
you might actually have defective cards. i know its not something people like to hear, but your cards could actually be defective. you can try enabling overclocking to gain access to being able to modifiy voltage and core / memory frequencies on your gpu. you can use something like corectrl to downclock your gpu core by lowering frequency down by 100-200mhz and see if that helps stabilize the cards.
#2
seems like a lot of you have amd cpu's. amd cpu's are notoriously dependent on system memory speeds due to infinity fabric being tied to system memory. if system memory is remotely unstable, infinity fabric will be unstable and it will manifest in unsuspecting ways. such as gpu's crashing. the gpu 16x pci-e lane is connected to the cpu over infinity fabric. you can try loosing up your timings or resetting you ram to their default jdec standard rather than the xmp profile they are advertised for to see if that helps.
#3
the 5700 xt series is notoriously known for how unstable that series was. the entire rdna1 line up was prone to crashing from day 1. if you're on a 5xxx series card, i would just look at upgrading to something else. amd should have issued a recall for that series and refunded people.
#4
buggy motherboard firmware. motherboards play a big role and can cause really odd problems. i had one motherboard from msi, a b550 tomahawk, that with my creative fx v2 and ae-5 sound card it would cause audio popping / crackling sounds when first starting to play any sound. it would go away once the sound card was playing audio continuously. but from idle to sound really bad crackling and popping. i also would get journactl spammed heavily with "Too many BDL entries." i didn't have this with my gigabyte x470 before it, nor with my current msi board, a z690 edge wifi ddr5 with my intel 13700k. everything might seem fine and stable, but in reality it could be the motherboard. jay2centz on youtube did a video recently with a nvidia 3070 iirc that would crash in one particular motherboard but be 100% stable in a completely different motherboard with the same cpu and memory. but a different gpu in that motherboard that would crash with that 3070 would run fine. completely bizarre.
#5
power supply could be the culprit as well. even power cables.
Last edited by orlfman (2023-05-24 06:30:48)
Offline
Personally I have had these issues since migrating my 6750 XT from a Ryzen 3000 to a Ryzen 7000 system.
At least in my case I can eliminate the power supply as a possible source since I get crashes with the new and the old power supply in the new system, but it ran without issues in the old system.
With the amount of people reporting these issues, I also doubt its all defective cards.
It could be a motherboard or infinity fabric issue. I'm running DDR5 6000, will try slower speeds.
I've had multiple crashes per day for a few weeks now (since I built the system), although its kind of hard to reproduce. Days without crashes were few and far between.
But three days ago I started using this kernel from the AUR: https://aur.archlinux.org/packages/linux-drm-next-git which has yet to crash even once.
Perhaps others could try this as well and report their findings.
Offline
Personally I have had these issues since migrating my 6750 XT from a Ryzen 3000 to a Ryzen 7000 system.
At least in my case I can eliminate the power supply as a possible source since I get crashes with the new and the old power supply in the new system, but it ran without issues in the old system.
With the amount of people reporting these issues, I also doubt its all defective cards.
It could be a motherboard or infinity fabric issue. I'm running DDR5 6000, will try slower speeds.I've had multiple crashes per day for a few weeks now (since I built the system), although its kind of hard to reproduce. Days without crashes were few and far between.
But three days ago I started using this kernel from the AUR: https://aur.archlinux.org/packages/linux-drm-next-git which has yet to crash even once.
Perhaps others could try this as well and report their findings.
You can try 5.19-lts kernel and expect ~0 crashes.
Offline
In my long life, I had similar problems with nvidia too and people guessed that there could be a defective card or memory or ...
Of course, it is possible that the hardware got broken, but in my case a new feature of the nvidia driver not supported by my old card was the culprit. So some updates later the mystery was gone, in between I did days of memory testing and nearly bought a new card which I definitely didn't need. Before the upstream fix, adding a simple configuration file was the fix needed and everything was back at normal.
So my point is:
Don't blame your hardware too soon.
I did some tests here regarding webgl:
They are pretty clearly repeatable and as long someone doesn't succeed having the same hardware, I would not know why to assume a hardware defect.
As for the stability in the last weeks in general:
In my case, downgrading kernel to 5.19.x seem to ease the problem here with me. At the moment, I only downgraded amd-ucode, I chose the last package of 2022 for a first shot:
[bernd_b@amd64-archlinux ~]$ pacman -Qs amd-ucode
local/amd-ucode 20221214.f3c283e-1
Microcode update image for AMD CPUs
Every other package (linux, mesa-xy ...) are updated and fingers crossed, my system didn't crash out of the blue until now for two days (about 2x 10 hours).
Last edited by bernd_b (2023-05-24 07:45:54)
Offline
there are a few things that stand out as possible culprits.
#1
you might actually have defective cards. i know its not something people like to hear, but your cards could actually be defective. you can try enabling overclocking to gain access to being able to modifiy voltage and core / memory frequencies on your gpu. you can use something like corectrl to downclock your gpu core by lowering frequency down by 100-200mhz and see if that helps stabilize the cards.
I really hope it's not my card because I just got done RMAing it after getting a faulty one
This happens far too infrequently for me to try to diagnose it, I guess I'll have to wait until I can find a way to cause it to happen.
Offline
orlfman wrote:there are a few things that stand out as possible culprits.
#1
you might actually have defective cards. i know its not something people like to hear, but your cards could actually be defective. you can try enabling overclocking to gain access to being able to modifiy voltage and core / memory frequencies on your gpu. you can use something like corectrl to downclock your gpu core by lowering frequency down by 100-200mhz and see if that helps stabilize the cards.I really hope it's not my card because I just got done RMAing it after getting a faulty one
This happens far too infrequently for me to try to diagnose it, I guess I'll have to wait until I can find a way to cause it to happen.
You can check your journalctl. Also check this: https://gitlab.freedesktop.org/drm/amd/-/issues
There is so many similar crashes.
Offline