You are not logged in.
Recently (past 2-3 newest kernels or since last update of Firefox) I'm getting from time to time an rendering issue - random pixels appearing below address bar - slowly growing up to 40-50 lines and for that random pixels rendering time (couple of seconds) computer gets frozen. Have no idea if it is a bug or hardware issue, anyone anything?
Dmesg output:
[ 3.919785] [drm:dc_link_detect_helper [amdgpu]] *ERROR* No EDID read.
...
[25176.304122] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304137] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a00000 from client 27
[25176.304147] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304151] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304154] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304157] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304159] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304162] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304164] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304171] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304177] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a01000 from client 27
[25176.304185] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304188] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304191] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304193] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304195] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304197] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304200] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304204] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304210] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a02000 from client 27
[25176.304218] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304221] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304223] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304226] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304228] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304230] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304232] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304236] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304242] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a03000 from client 27
[25176.304250] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304252] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304255] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304257] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304259] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304261] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304264] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304269] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304274] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a04000 from client 27
[25176.304282] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304285] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304287] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304290] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304292] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304294] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304296] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304301] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304306] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a0a000 from client 27
[25176.304314] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304316] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304319] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304321] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304323] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304325] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304327] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304332] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304337] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a0b000 from client 27
[25176.304345] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304348] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304350] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304352] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304354] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304357] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304359] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304363] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304368] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a0c000 from client 27
[25176.304376] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304379] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304381] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304383] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304385] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304388] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304390] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304394] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304399] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a14000 from client 27
[25176.304407] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304410] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304412] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304414] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304416] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304419] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304421] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25176.304425] amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process Xorg pid 525 thread Xorg:cs0 pid 526)
[25176.304430] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x800107a0d000 from client 27
[25176.304438] amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[25176.304440] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[25176.304443] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[25176.304445] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[25176.304447] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[25176.304449] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[25176.304452] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[25181.358205] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[25186.479284] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[25196.505269] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Last edited by danielem (2021-05-14 15:09:38)
Offline
Please edit your post and use [ code ] tags (not quote tags) when posting output. This makes the output easier to read and provides a scroll box for long output.
https://gitlab.archlinux.org/archlinux/ … s-and-code
https://bbs.archlinux.org/help.php#bbcode
Offline
Hi! I'm also running into this same error, however strangely enough I'm on Wayland and my crash is always either with sway (my window manager) or with firefox, either way this doesn't look like a xorg issue... Firefox maybe? I do always have Firefox on so it is hard to debug.
My CPU is am AMD Ryzen 3700
Sometimes it recovers, sometimes it doesn't and I just have to reboot. Could this really be the same hardware failure we both got around the same time?
My own logs:
May 14 10:11:46 LizArch4 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
May 14 10:11:46 LizArch4 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
May 14 10:11:46 LizArch4 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=194349, emitted seq=194350
May 14 10:11:46 LizArch4 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
May 14 10:11:46 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
May 14 10:11:46 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 reset
May 14 10:11:46 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
May 14 10:11:46 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 14 10:11:46 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow start
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: recover vram bo from shadow done
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset(1) succeeded!
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c01000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c05000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c04000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c00000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c07000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c06000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c03000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c02000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800105470000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104c09000 from client 27
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:47 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:47 LizArch4 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no status for stream 00000000ecf880f4 on acrtc000000007082ea2c
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cd3000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cd8000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cd9000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cd7000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cdd000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cdb000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cd0000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cde000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cda000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104cdf000 from client 27
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:52 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d85000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d83000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d80000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d7e000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d7f000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d81000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d84000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d82000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d7a000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32769, for process sway pid 525 thread sway:cs0 pid 530)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800104d87000 from client 27
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 10:11:57 LizArch4 kernel: amdgpu 0000:05:00.0: amdgpu: RW: 0x0
May 14 10:11:57 LizArch4 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
I will downgrade Firefox and report back.
Edit: I have downgraded to `firefox-85.0.2-1`, which is from February 2021, let's see if I keep getting these.
Last edited by srslizness (2021-05-15 16:23:15)
Offline
Could this really be the same hardware failure we both got around the same time?
If it is two of us then it is less likely...
It is hard to reproduce, it appears after couple of hours of use. I keep my laptop working for like 12 hours a day and it happens twice a day, and sometimes it also freezes permanently. And I also keep firefox always open.
Offline
I have the same problem with my 3700U laptop. I use sway and have crash when i use hardware video acceleration, for example when i watch youtube videos with MPV using vaapi
Offline
Update: This just happened again with the old Firefox.
I'm betting my money on an actual amd gpu driver bug.
Offline
I have the same problem with my 3700U laptop. I use sway and have crash when i use hardware video acceleration, for example when i watch youtube videos with MPV using vaapi
I can confirm i'm sure I always had a youtube video playing when the crash happened, and I also use hardware decoding with vaapi.
Offline
Continuing my iterative approach before we blame the driver folks, I'm downgrading to
libva-mesa-driver-21.0.1-1
, see if that fixes it.
Offline
Continuing my iterative approach before we blame the driver folks, I'm downgrading to
libva-mesa-driver-21.0.1-1
, see if that fixes it.
That did not fix it. Should we file a ticket about this? This could be pretty bad if it's a widespread bug that just hasn't reached a lot of people yet.
Offline
I downgraded to kernel 5.11.16 and all issues are gone. I have crash on linux 5.12.X and linux-lts 5.10.X
UPDATE i have problem also on linux 5.11.16
Last edited by nisby (2021-05-15 23:09:18)
Offline
Can confirm. My CPU is Ryzen 7 3700U.
Note: Downgrading to kernel version 5.11 did not fix anything. Surprisingly, Xanmod kernel 5.12 were immune to this error.
I don't think this is a hardware issue. I used Pop!_OS for a long time with 5.10/5.9/5.8 kernel. There were no errors.
It seems that the Arch based distros are facing this problem. I use both Arch and Manjaro and both are facing these
bugs.
PS:
I don't use Xanmod because the kernel is not laptop battery friendly.
Last edited by NullFigga (2021-05-15 15:50:32)
Offline
I downgraded to kernel 5.11.16 and all issues are gone. I have crash on linux 5.12.X and linux-lts 5.10.X
Currently running 5.11.16 and it's working fine so far as well.
Offline
Ryzen 7 PRO 3700U here. I've experienced the same problem on both 5.12.X and 5.11.16, when watching Youtube with Firefox and hardware video acceleration enabled.
Offline
I have the same issue since april, the amdgpu hangs and starts showing garbage on screen, sometimes it recovers but i have to reboot, happens more frequently with 5.12 kernel, 5.11 sometimes, and today that I booted with LTS 5.10 it happened again
AMD Laptop Ryzen 3750H
today 5.10 LTS
May 15 07:51:53 asus kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
May 15 07:51:53 asus kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=5056568, emitted seq=5056569
May 15 07:51:53 asus kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
May 15 07:51:53 asus kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
...
May 15 07:51:55 asus kernel: [drm] Skip scheduling IBs!
May 15 07:51:55 asus kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(1) succeeded!
May 15 07:51:55 asus kernel: [drm] Skip scheduling IBs!
...
May 15 07:51:55 asus kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
...
if it recovers shows the same as op, this is with 5.12 zen yesterday
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(1) succeeded!
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 1298 thread sway:cs0 pid 1455)
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x800104c00000 from client 27
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 13:49:42 asus kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x0
...
I'm also using sway (wayland), firefox opened, and happened today when I tried to click something on brave browser (x-wayland).
Last edited by gnox (2021-05-15 22:28:45)
Offline
I have also had a similar issue since my last update 2 days ago, on May 14th.
I'm using a Athlon 3000G with vega 3 integrated GPU.
I dont get any graphical corruption myself. The screen freezes for a few seconds, then just cuts out.
it does respond to sysrq keys but doesn't recover so Im forced to reboot.
The journalctl logs are pretty much the same as in the first post with a few differences as well as more logs following them.
First time triggered with Terraria as the process causing it, 2nd was Xorg.
also some differences in the values after PERMISSION_FAULTS (0x5 instead of 0x3) and RW (0x1 instead of 0x0) one log posted to opensuses forums has same values as mine, none others do.
Theres also a bit more after it in my logs where the reset fails.
The first time, on booting the game.
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
May 14 14:09:43 mir kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
May 14 14:09:54 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
May 14 14:10:04 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=12430814, emitted seq=12430816
May 14 14:10:04 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Main Thread pid 200481 thread Terraria.b:cs0 pid 200519
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02ae0 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02b00 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02b20 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02b40 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02b60 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02b80 flags=0x0070]
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x110f02ba0 flags=0x0070]
May 14 14:10:04 mir kernel: amd_iommu_report_page_fault: 8 callbacks suppressed
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02bc0 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02be0 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02c00 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02c20 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02c40 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f02c60 flags=0x0070]
May 14 14:10:04 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x110f40000 flags=0x0070]
May 14 14:10:04 mir kernel: [drm] free PSP TMR buffer
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
May 14 14:10:04 mir kernel: mce: [Hardware Error]: Machine check events logged
May 14 14:10:04 mir kernel: [Hardware Error]: Deferred error, no action required.
May 14 14:10:04 mir kernel: [Hardware Error]: CPU:0 (17:18:1) MC20_STATUS[-|-|MiscV|AddrV|-|-|SyndV|UECC|Deferred|-|-]: 0x9c2030000001085b
May 14 14:10:04 mir kernel: [Hardware Error]: Error Addr: 0x00007ffcffffff40
May 14 14:10:04 mir kernel: [Hardware Error]: IPID: 0x0000002e00000000, Syndrome: 0x000000005b240203
May 14 14:10:04 mir kernel: [Hardware Error]: Coherent Slave Ext. Error Code: 1, Address Violation.
May 14 14:10:04 mir kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
May 14 14:10:04 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
May 14 14:10:04 mir kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
May 14 14:10:04 mir kernel: [drm] PSP is resuming...
May 14 14:10:04 mir kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
May 14 14:10:06 mir kernel: [drm] failed to load ucode id (0)
May 14 14:10:06 mir kernel: [drm] psp command (0x6) failed and response status is (0x0)
May 14 14:10:06 mir kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
May 14 14:10:06 mir kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
May 14 14:10:06 mir kernel: [drm] Skip scheduling IBs!
May 14 14:10:06 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(3) failed
May 14 14:10:06 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -22
May 14 14:10:16 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=12430816, emitted seq=12430820
May 14 14:10:16 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Main Thread pid 200481 thread Terraria.b:cs0 pid 200519
May 14 14:10:16 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
The second time, opening a tab on palemoon.
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00141051
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
May 16 17:40:24 mir kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
May 16 17:40:34 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
May 16 17:40:44 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=19410308, emitted seq=19410310
May 16 17:40:44 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 506 thread Xorg:cs0 pid 507
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac120 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac140 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac160 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac180 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac1a0 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102c0000 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac1c0 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac1e0 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102c0000 flags=0x0070]
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1102ac200 flags=0x0070]
May 16 17:40:44 mir kernel: amd_iommu_report_page_fault: 8 callbacks suppressed
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac220 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102c0000 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac240 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac260 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102c0000 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac280 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac2a0 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102c0000 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac2c0 flags=0x0070]
May 16 17:40:44 mir kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x1102ac2e0 flags=0x0070]
May 16 17:40:44 mir kernel: [drm] free PSP TMR buffer
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
May 16 17:40:44 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
May 16 17:40:44 mir kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
May 16 17:40:44 mir kernel: [drm] PSP is resuming...
May 16 17:40:44 mir kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
May 16 17:40:47 mir kernel: [drm] failed to load ucode id (0)
May 16 17:40:47 mir kernel: [drm] psp command (0x6) failed and response status is (0x0)
May 16 17:40:47 mir kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
May 16 17:40:47 mir kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
May 16 17:40:47 mir kernel: [drm] Skip scheduling IBs!
May 16 17:40:47 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(3) failed
May 16 17:40:47 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -22
May 16 17:40:57 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=19410310, emitted seq=19410314
May 16 17:40:57 mir kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 506 thread Xorg:cs0 pid 507
May 16 17:40:57 mir kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
The only things that stands out from my last update as possible causes are
amd-ucode (20210315.3568f96-2 -> 20210426.fa0efef-1) -- Rolled back, still happened
linux (5.11.16.arch1-1 -> 5.12.3.arch1-1)
linux-firmware (20210315.3568f96-2 -> 20210426.fa0efef-1)
vulkan-radeon (21.0.3-2 -> 21.1.0-1)
mesa (21.0.3-2 -> 21.1.0-1)
Searching around, I found a thread on opensuse forums here which also discusses similar issues.
Last edited by taiyu (2021-05-21 08:05:04)
Offline
I dont get any graphical corruption myself. The screen freezes for a few seconds, then just cuts out.
I don't get graphical image corruption every time, today it happen without graphical corruption, just "freeze", and today it happen like 5 min after turning on computer, so it is probably not related to any sort of graphical memory overflow error due to leaks. It must be something different.
Offline
Ryzen 3700U, same situation here - my laptop started to randomly freeze when watching a video (doesn't matter if watching youtube/twitch or just VLC).
Offline
By any chance, any of you have " amd_iommu=on" and "iommu=pt" as kernel parameter?
Offline
By any chance, any of you have " amd_iommu=on" and "iommu=pt" as kernel parameter?
sudo sysctl -a | grep iommu
outputs nothing
sudo dmesg | grep iommu
outputs:
[ 0.282044] iommu: Default domain type: Translated
[ 0.370926] pci 0000:00:01.0: Adding to iommu group 0
[ 0.370944] pci 0000:00:01.2: Adding to iommu group 1
[ 0.370954] pci 0000:00:01.3: Adding to iommu group 2
[ 0.370972] pci 0000:00:08.0: Adding to iommu group 3
[ 0.370982] pci 0000:00:08.1: Adding to iommu group 4
[ 0.370992] pci 0000:00:08.2: Adding to iommu group 3
[ 0.371006] pci 0000:00:14.0: Adding to iommu group 5
[ 0.371014] pci 0000:00:14.3: Adding to iommu group 5
[ 0.371048] pci 0000:00:18.0: Adding to iommu group 6
[ 0.371057] pci 0000:00:18.1: Adding to iommu group 6
[ 0.371067] pci 0000:00:18.2: Adding to iommu group 6
[ 0.371076] pci 0000:00:18.3: Adding to iommu group 6
[ 0.371084] pci 0000:00:18.4: Adding to iommu group 6
[ 0.371092] pci 0000:00:18.5: Adding to iommu group 6
[ 0.371100] pci 0000:00:18.6: Adding to iommu group 6
[ 0.371108] pci 0000:00:18.7: Adding to iommu group 6
[ 0.371119] pci 0000:01:00.0: Adding to iommu group 7
[ 0.371131] pci 0000:02:00.0: Adding to iommu group 8
[ 0.371168] pci 0000:03:00.0: Adding to iommu group 9
[ 0.371196] pci 0000:03:00.1: Adding to iommu group 10
[ 0.371218] pci 0000:03:00.2: Adding to iommu group 10
[ 0.371233] pci 0000:03:00.3: Adding to iommu group 10
[ 0.371247] pci 0000:03:00.4: Adding to iommu group 10
[ 0.371263] pci 0000:03:00.6: Adding to iommu group 10
[ 0.371267] pci 0000:04:00.0: Adding to iommu group 3
[ 0.374954] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
Offline
I started getting this error recently as well. The system just freezes randomly. Nothing I can do at that point except hard reboots and lose potential data. I've already been dealing with broken suspend/resume on Linux for more than 2+ years now on both Intel and AMD hardware.
I thought it was mostly while playing videos on Firefox or Chromium but I have experienced freezes while doing nothing, using KDE system settings, or simply using the terminal.
I doubt raising an issue will help considering this issue seems to go back to 2018 or maybe earlier and hasn't been fixed yet. At this point, I'm apprehensive about raising bug reports because some people would just tell me to compile my own kernel or write driver code or shut up because this is open source.
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32777, for process systemsettings5 pid 417305 thread systemsett:cs0 pid 417308)
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x8001094d0000 from client 27
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00141051
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x5
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
May 12 20:41:58 kernel: amdgpu 0000:04:00.0: amdgpu: RW: 0x1
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32782, for process chromium pid 4408 thread chromium:cs0 pid 4441)
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x800134c21000 from client 27
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00701031
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x3
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
May 17 13:14:43 kernel: amdgpu 0000:04:00.0: amdgpu: RW: 0x0
Offline
Update! I just got the bug with 5.11.16, so downgrading DOES NOT fix it. But it was way less bad (computer only froze for 5s and then resumed), so I still recommend downgrading.
By any chance, any of you have " amd_iommu=on" and "iommu=pt" as kernel parameter?
~|⇒ sudo sysctl -a | grep iommu
[sudo] password for liz:
~|⇒ sudo dmesg | grep iommu
[ 0.325165] iommu: Default domain type: Translated
[ 0.389614] pci 0000:00:01.0: Adding to iommu group 0
[ 0.389628] pci 0000:00:01.2: Adding to iommu group 1
[ 0.389637] pci 0000:00:01.3: Adding to iommu group 2
[ 0.389646] pci 0000:00:01.4: Adding to iommu group 3
[ 0.389655] pci 0000:00:01.7: Adding to iommu group 4
[ 0.389673] pci 0000:00:08.0: Adding to iommu group 5
[ 0.389682] pci 0000:00:08.1: Adding to iommu group 6
[ 0.389695] pci 0000:00:14.0: Adding to iommu group 7
[ 0.389703] pci 0000:00:14.3: Adding to iommu group 7
[ 0.389735] pci 0000:00:18.0: Adding to iommu group 8
[ 0.389742] pci 0000:00:18.1: Adding to iommu group 8
[ 0.389749] pci 0000:00:18.2: Adding to iommu group 8
[ 0.389757] pci 0000:00:18.3: Adding to iommu group 8
[ 0.389764] pci 0000:00:18.4: Adding to iommu group 8
[ 0.389771] pci 0000:00:18.5: Adding to iommu group 8
[ 0.389779] pci 0000:00:18.6: Adding to iommu group 8
[ 0.389786] pci 0000:00:18.7: Adding to iommu group 8
[ 0.389796] pci 0000:01:00.0: Adding to iommu group 9
[ 0.389806] pci 0000:02:00.0: Adding to iommu group 10
[ 0.389830] pci 0000:03:00.0: Adding to iommu group 11
[ 0.389841] pci 0000:03:00.1: Adding to iommu group 11
[ 0.389851] pci 0000:03:00.2: Adding to iommu group 11
[ 0.389863] pci 0000:03:00.3: Adding to iommu group 11
[ 0.389873] pci 0000:03:00.4: Adding to iommu group 11
[ 0.389882] pci 0000:04:00.0: Adding to iommu group 12
[ 0.389913] pci 0000:05:00.0: Adding to iommu group 13
[ 0.389941] pci 0000:05:00.1: Adding to iommu group 14
[ 0.389960] pci 0000:05:00.2: Adding to iommu group 14
[ 0.389973] pci 0000:05:00.3: Adding to iommu group 14
[ 0.389985] pci 0000:05:00.4: Adding to iommu group 14
[ 0.389997] pci 0000:05:00.5: Adding to iommu group 14
[ 0.390008] pci 0000:05:00.6: Adding to iommu group 14
Also this latest error i experienced was NOT while playing video, sound, or anything like that!
Last edited by srslizness (2021-05-17 18:20:11)
Offline
Hit again today, some additional stack traces from the kernel after failing to kill Xorg.
May 17 16:06:37 mir kernel: INFO: task Xorg:500 blocked for more than 122 seconds.
May 17 16:06:37 mir kernel: Not tainted 5.12.3-arch1-1 #1
May 17 16:06:37 mir kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 16:06:37 mir kernel: task:Xorg state:D stack: 0 pid: 500 ppid: 499 flags:0x00000004
May 17 16:06:37 mir kernel: Call Trace:
May 17 16:06:37 mir kernel: __schedule+0x2fc/0x8b0
May 17 16:06:37 mir kernel: schedule+0x5b/0xc0
May 17 16:06:37 mir kernel: schedule_preempt_disabled+0x11/0x20
May 17 16:06:37 mir kernel: __mutex_lock.constprop.0+0x31c/0x500
May 17 16:06:37 mir kernel: amdgpu_dm_atomic_commit_tail+0x5e4/0x2660 [amdgpu]
May 17 16:06:37 mir kernel: ? amdgpu_sa_bo_new+0x33a/0x550 [amdgpu]
May 17 16:06:37 mir kernel: commit_tail+0x94/0x130 [drm_kms_helper]
May 17 16:06:37 mir kernel: drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
May 17 16:06:37 mir kernel: drm_atomic_helper_disable_plane+0x85/0xe0 [drm_kms_helper]
May 17 16:06:37 mir kernel: drm_mode_cursor_universal+0x128/0x240 [drm]
May 17 16:06:37 mir kernel: drm_mode_cursor_common+0x102/0x230 [drm]
May 17 16:06:37 mir kernel: ? drm_mode_setplane+0x330/0x330 [drm]
May 17 16:06:37 mir kernel: drm_mode_cursor_ioctl+0x4d/0x70 [drm]
May 17 16:06:37 mir kernel: drm_ioctl_kernel+0xb2/0x100 [drm]
May 17 16:06:37 mir kernel: drm_ioctl+0x215/0x390 [drm]
May 17 16:06:37 mir kernel: ? drm_mode_setplane+0x330/0x330 [drm]
May 17 16:06:37 mir kernel: amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
May 17 16:06:37 mir kernel: __x64_sys_ioctl+0x83/0xb0
May 17 16:06:37 mir kernel: do_syscall_64+0x33/0x40
May 17 16:06:37 mir kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
May 17 16:06:37 mir kernel: RIP: 0033:0x7f1225c4fe6b
May 17 16:06:37 mir kernel: RSP: 002b:00007ffdd7b5bbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 17 16:06:37 mir kernel: RAX: ffffffffffffffda RBX: 00007ffdd7b5bc20 RCX: 00007f1225c4fe6b
May 17 16:06:37 mir kernel: RDX: 00007ffdd7b5bc20 RSI: 00000000c01c64a3 RDI: 000000000000000a
May 17 16:06:37 mir kernel: RBP: 00000000c01c64a3 R08: 0000000000000080 R09: 0000000000000000
May 17 16:06:37 mir kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00005648af3e6380
May 17 16:06:37 mir kernel: R13: 000000000000000a R14: 0000000000000001 R15: 00005648af9c3b80
May 17 16:06:37 mir kernel: INFO: task kworker/0:2:227424 blocked for more than 122 seconds.
May 17 16:06:37 mir kernel: Not tainted 5.12.3-arch1-1 #1
May 17 16:06:37 mir kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 16:06:37 mir kernel: task:kworker/0:2 state:D stack: 0 pid:227424 ppid: 2 flags:0x00004000
May 17 16:06:37 mir kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
May 17 16:06:37 mir kernel: Call Trace:
May 17 16:06:37 mir kernel: __schedule+0x2fc/0x8b0
May 17 16:06:37 mir kernel: schedule+0x5b/0xc0
May 17 16:06:37 mir kernel: schedule_preempt_disabled+0x11/0x20
May 17 16:06:37 mir kernel: __mutex_lock.constprop.0+0x31c/0x500
May 17 16:06:37 mir kernel: ? __wake_up_common_lock+0x8b/0xc0
May 17 16:06:37 mir kernel: dm_suspend+0xa6/0x1d0 [amdgpu]
May 17 16:06:37 mir kernel: ? soc15_update_drm_light_sleep+0x22/0x60 [amdgpu]
May 17 16:06:37 mir kernel: ? soc15_common_set_clockgating_state+0x141/0x170 [amdgpu]
May 17 16:06:37 mir kernel: amdgpu_device_ip_suspend_phase1+0x75/0xd0 [amdgpu]
May 17 16:06:37 mir kernel: ? amdgpu_fence_process+0x4d/0x130 [amdgpu]
May 17 16:06:37 mir kernel: amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
May 17 16:06:37 mir kernel: amdgpu_device_pre_asic_reset+0x185/0x19c [amdgpu]
May 17 16:06:37 mir kernel: amdgpu_device_gpu_recover.cold+0x5ae/0x9f2 [amdgpu]
May 17 16:06:37 mir kernel: amdgpu_job_timedout+0x121/0x140 [amdgpu]
May 17 16:06:37 mir kernel: drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
May 17 16:06:37 mir kernel: process_one_work+0x214/0x3e0
May 17 16:06:37 mir kernel: worker_thread+0x4d/0x3d0
May 17 16:06:37 mir kernel: ? process_one_work+0x3e0/0x3e0
May 17 16:06:37 mir kernel: kthread+0x133/0x150
May 17 16:06:37 mir kernel: ? __kthread_bind_mask+0x60/0x60
May 17 16:06:37 mir kernel: ret_from_fork+0x22/0x30
On the bugzilla.kernel.org there are a few reports of this issue, the earliest mention from February 28th, with 3 others in the past month
https://bugzilla.kernel.org/show_bug.cgi?id=201957#c46
https://bugzilla.kernel.org/show_bug.cgi?id=212913
https://bugzilla.kernel.org/show_bug.cgi?id=212739
https://bugzilla.kernel.org/show_bug.cgi?id=211157
Edit: one more reports here https://gitlab.freedesktop.org/drm/amd/-/issues/1598
Last edited by taiyu (2021-05-18 23:05:18)
Offline
Just managed to catch the moment of artifacts appearing on screen:
https://ibb.co/R2HCRyZ
(hopefully don't have to change all my passwords now ;-) )
Offline
I'm also suffering from this same issue. Sometimes it recovers and after which it never hangs again. Sometimes it doesn't. Does anybody know which kernel works fine?
Offline
I'm also suffering from this same issue. Sometimes it recovers and after which it never hangs again. Sometimes it doesn't. Does anybody know which kernel works fine?
As of today I've been running `5.11.16` for 4 days now and it occurs significantly less.
Offline