You are not logged in.

#1 2021-10-16 19:30:37

hossbeast
Member
From: Seattle
Registered: 2017-06-26
Posts: 24

amdgpu / radeon : amdgpu_job_timedout / ring gfx timeout

Since upgrading my system on 2021-10-15 I started experiencing graphics glitches followed by crashes. I've managed to find a workaround, but I spent a lot of time on this, so I thought I'd share the details.

System Info

0 % cat /proc/cpuinfo (truncated for brevity)
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 23
model        : 113
model name    : AMD Ryzen 9 3900X 12-Core Processor
stepping    : 0
microcode    : 0x8701021
cpu MHz        : 3800.000
cache size    : 512 KB
physical id    : 0
siblings    : 24
core id        : 0
cpu cores    : 12
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 16
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
bugs        : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 7588.88
TLB size    : 3072 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

(EDIT : my card is an RX580)
0 % lshw -C display
  *-display
       description: VGA compatible controller
       product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:09:00.0
       version: e7
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:78 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:d000(size=256) memory:fcf00000-fcf3ffff memory:c0000-dffff

Relevant package updates from the 10/15 system update

[2021-10-15T17:51:45-0700] [ALPM] upgraded kcodecs (5.86.0-1 -> 5.87.0-1)
[2021-10-15T17:51:45-0700] [ALPM] upgraded qt5-declarative (5.15.2+kde+r32-1 -> 5.15.2+kde+r33-1)
[2021-10-15T17:51:45-0700] [ALPM] upgraded qt5-wayland (5.15.2+kde+r33-1 -> 5.15.2+kde+r34-1)
[2021-10-15T17:51:46-0700] [ALPM] upgraded linux (5.14.9.arch2-1 -> 5.14.12.arch1-1)
[2021-10-15T17:51:47-0700] [ALPM] upgraded linux-headers (5.14.9.arch2-1 -> 5.14.12.arch1-1)
[2021-10-15T17:51:48-0700] [ALPM] upgraded linux-lts (5.10.71-1 -> 5.10.73-1)
[2021-10-15T17:51:50-0700] [ALPM] upgraded linux-lts-headers (5.10.71-1 -> 5.10.73-1)

I had previously upgraded mesa, but had no issues until the 10/15 update

[2021-10-03T08:39:11-0700] [ALPM] upgraded mesa (21.2.2-1 -> 21.2.3-1)

I reproduced the crashes under both wayland and xorg

wayland 1.19.0-1
sway 1:1.6.1-1
xorg-server 1.20.13-2

The crash would happen within 10 seconds of launching the desktop session, though it seems to have been sped up by running an application that touches the gpu (firefox / alacritty / etc).

On to the crash logs,

  Oct 15 18:06:39 euclid kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
  Oct 15 18:06:39 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=484, emitted seq=486
  Oct 15 18:06:39 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process alacritty pid 2211 thread alacritty:cs0 pid 2213
  Oct 15 18:06:39 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
  Oct 15 18:06:39 euclid kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
  Oct 15 18:06:39 euclid kernel: [drm:dce110_vblank_set [amdgpu]] *ERROR* Failed to get VBLANK!
  Oct 15 18:06:39 euclid kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
  Oct 15 18:06:39 euclid kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
  Oct 15 18:06:39 euclid kernel: amdgpu: cp is busy, skip halt cp
  Oct 15 18:06:40 euclid kernel: amdgpu: rlc is busy, skip halt rlc
  Oct 15 18:06:40 euclid kernel: amdgpu 0000:09:00.0: amdgpu: BACO reset
  Oct 15 18:06:40 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
  Oct 15 18:06:40 euclid kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
  Oct 15 18:06:40 euclid kernel: [drm] VRAM is lost due to GPU reset!
  Oct 15 18:06:40 euclid kernel: ------------[ cut here ]------------
  Oct 15 18:06:40 euclid kernel: amdgpu 0000:09:00.0: drm_WARN_ON(atomic_read(&vblank->refcount) == 0)
  Oct 15 18:06:40 euclid kernel: WARNING: CPU: 16 PID: 0 at drivers/gpu/drm/drm_vblank.c:1210 drm_vblank_put+0xe4/0xf0 [drm]
  Oct 15 18:06:40 euclid kernel: Modules linked in: sr_mod cdrom usb_storage uvcvideo videobuf2_vmalloc snd_usb_audio videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usbmidi_lib videodev snd_rawmidi snd_seq_device ti_usb_3410_5052 mc wireguard curve25519_x86_64 libchacha20poly13
  Oct 15 18:06:40 euclid kernel:  sysfillrect sysimgblt zfs(POE) video wmi_bmof mxm_wmi fb_sys_fops rapl soundcore pcspkr i2c_piix4 k10temp wmi gpio_amdpt pinctrl_amd gpio_generic mac_hid zunicode(POE) acpi_cpufreq zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(O
  Oct 15 18:06:40 euclid kernel: CPU: 16 PID: 0 Comm: swapper/16 Tainted: P S         OE     5.14.12-arch1-1 #1 67368bca17a1c518e2f20656bc1c93aa65e7e6fe
  Oct 15 18:06:40 euclid kernel: Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5603 07/28/2020
  Oct 15 18:06:40 euclid kernel: RIP: 0010:drm_vblank_put+0xe4/0xf0 [drm]
  Oct 15 18:06:40 euclid kernel: Code: 8b 7f 08 4c 8b 67 50 4d 85 e4 74 22 e8 35 fe b5 d9 48 c7 c1 30 66 39 c0 4c 89 e2 48 c7 c7 5c 96 39 c0 48 89 c6 e8 98 e2 ef d9 <0f> 0b eb c3 4c 8b 27 eb d9 0f 1f 00 0f 1f 44 00 00 8b b7 90 00 00
  Oct 15 18:06:40 euclid kernel: RSP: 0018:ffff9b8dc0624db0 EFLAGS: 00010082
  Oct 15 18:06:40 euclid kernel: RAX: 0000000000000000 RBX: ffff8a391dbc0000 RCX: 0000000000000027
  Oct 15 18:06:40 euclid kernel: RDX: ffff8a3c0ee18728 RSI: 0000000000000001 RDI: ffff8a3c0ee18720
  Oct 15 18:06:40 euclid kernel: RBP: 0000000000000086 R08: 0000000000000000 R09: ffff9b8dc0624be0
  Oct 15 18:06:40 euclid kernel: R10: ffff9b8dc0624bd8 R11: ffff8a3c1f2ae228 R12: ffff8a3901a8eea0
  Oct 15 18:06:40 euclid kernel: R13: ffff8a391dbc0178 R14: ffff8a393373ac80 R15: ffff8a391dbd4900
  Oct 15 18:06:40 euclid kernel: FS:  0000000000000000(0000) GS:ffff8a3c0ee00000(0000) knlGS:0000000000000000
  Oct 15 18:06:40 euclid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Oct 15 18:06:40 euclid kernel: CR2: 00007fc692db2948 CR3: 00000001122d0000 CR4: 0000000000350ee0
  Oct 15 18:06:40 euclid kernel: Call Trace:
  Oct 15 18:06:40 euclid kernel:  <IRQ>
  Oct 15 18:06:40 euclid kernel:  dm_pflip_high_irq+0xd3/0x2b0 [amdgpu 568aeec51fc6046678e38e6f3ca013ae9e146b65]
  Oct 15 18:06:40 euclid kernel:  amdgpu_dm_irq_handler+0x89/0x1f0 [amdgpu 568aeec51fc6046678e38e6f3ca013ae9e146b65]
  Oct 15 18:06:40 euclid kernel:  amdgpu_irq_dispatch+0xca/0x210 [amdgpu 568aeec51fc6046678e38e6f3ca013ae9e146b65]
  Oct 15 18:06:40 euclid kernel:  amdgpu_ih_process+0x7b/0xf0 [amdgpu 568aeec51fc6046678e38e6f3ca013ae9e146b65]
  Oct 15 18:06:40 euclid kernel:  amdgpu_irq_handler+0x21/0xa0 [amdgpu 568aeec51fc6046678e38e6f3ca013ae9e146b65]
  Oct 15 18:06:40 euclid kernel:  __handle_irq_event_percpu+0x3d/0x190
  Oct 15 18:06:40 euclid kernel:  handle_irq_event+0x58/0xb0
  Oct 15 18:06:40 euclid kernel:  handle_edge_irq+0x96/0x260
  Oct 15 18:06:40 euclid kernel:  __common_interrupt+0x41/0xa0
  Oct 15 18:06:40 euclid kernel:  common_interrupt+0x7e/0xa0
  Oct 15 18:06:40 euclid kernel:  </IRQ>
  Oct 15 18:06:40 euclid kernel:  asm_common_interrupt+0x1e/0x40
  Oct 15 18:06:40 euclid kernel: RIP: 0010:cpuidle_enter_state+0xc7/0x380
  Oct 15 18:06:40 euclid kernel: Code: 8b 3d 95 5f fe 65 e8 18 6a 8a ff 49 89 c5 0f 1f 44 00 00 31 ff e8 39 77 8a ff 45 84 ff 0f 85 da 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 11 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d
  Oct 15 18:06:40 euclid kernel: RSP: 0018:ffff9b8dc01dfea8 EFLAGS: 00000246
  Oct 15 18:06:40 euclid kernel: RAX: ffff8a3c0ee2d700 RBX: 0000000000000002 RCX: 000000000000001f
  Oct 15 18:06:40 euclid kernel: RDX: 0000000000000000 RSI: 0000000021bf5c7a RDI: 0000000000000000
  Oct 15 18:06:40 euclid kernel: RBP: ffff8a3905a37800 R08: 000000145ac8c06f R09: 0000000000000006
  Oct 15 18:06:40 euclid kernel: R10: 0000000000000016 R11: 000000000000000e R12: ffffffff9b34e3e0
  Oct 15 18:06:40 euclid kernel: R13: 000000145ac8c06f R14: 0000000000000002 R15: 0000000000000000
  Oct 15 18:06:40 euclid kernel:  ? cpuidle_enter_state+0xb7/0x380
  Oct 15 18:06:40 euclid kernel:  cpuidle_enter+0x29/0x40
  Oct 15 18:06:40 euclid kernel:  do_idle+0x1e1/0x270
  Oct 15 18:06:40 euclid kernel:  cpu_startup_entry+0x19/0x20
  Oct 15 18:06:40 euclid kernel:  secondary_startup_64_no_verify+0xc2/0xcb
  Oct 15 18:06:40 euclid kernel: ---[ end trace 197cc30198bae4cb ]---
  Oct 15 18:06:40 euclid kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
  Oct 15 18:06:40 euclid kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
  Oct 15 18:06:41 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!

Next

  Oct 16 10:32:13 euclid kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
  Oct 16 10:32:13 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1540, emitted seq=1543
  Oct 16 10:32:13 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process alacritty pid 2780 thread alacritty:cs0 pid 2782
  Oct 16 10:32:13 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
  Oct 16 10:32:13 euclid kernel: amdgpu: cp is busy, skip halt cp
  Oct 16 10:32:13 euclid kernel: amdgpu: rlc is busy, skip halt rlc
  Oct 16 10:32:13 euclid kernel: amdgpu 0000:09:00.0: amdgpu: BACO reset
  Oct 16 10:32:14 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
  Oct 16 10:32:14 euclid kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
  Oct 16 10:32:14 euclid kernel: [drm] VRAM is lost due to GPU reset!
  Oct 16 10:32:15 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:16 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:17 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:18 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:19 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:20 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:21 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:22 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:23 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:24 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, trying to reset the VCPU!!!
  Oct 16 10:32:24 euclid kernel: [drm:uvd_v6_0_start [amdgpu]] *ERROR* UVD not responding, giving up!!!
  Oct 16 10:32:24 euclid kernel: [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <uvd_v6_0> failed -1
  Oct 16 10:32:24 euclid kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring uvd test failed (-110)
  Oct 16 10:32:24 euclid kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <uvd_v6_0> failed -110

Next

  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU fault detected: 147 0x00e22010 for process sway pid 3313 thread sway:cs0 pid 3331
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010101C
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D020010
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: VM fault (0x10, vmid 6, pasid 32769) at page 1052700, write from 'CB2' (0x43423200) (32)
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU fault detected: 147 0x0080c402 for process sway pid 3313 thread sway:cs0 pid 3331
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000810
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C4002
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: VM fault (0x02, vmid 6, pasid 32769) at page 2064, read from 'TC3' (0x54433300) (196)
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU fault detected: 147 0x0080c802 for process sway pid 3313 thread sway:cs0 pid 3331
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000801
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C088002
  Oct 15 18:00:00 euclid kernel: amdgpu 0000:09:00.0: amdgpu: VM fault (0x02, vmid 6, pasid 32769) at page 2049, read from 'TC6' (0x54433600) (136)
  Oct 15 18:00:08 euclid kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
  Oct 15 18:00:15 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=943, emitted seq=945
  Oct 15 18:00:15 euclid kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process sway pid 3313 thread sway:cs0 pid 3331
  Oct 15 18:00:15 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
  Oct 15 18:00:16 euclid kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
  Oct 15 18:00:16 euclid kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
  Oct 15 18:00:16 euclid kernel: amdgpu: cp is busy, skip halt cp
  Oct 15 18:00:16 euclid kernel: amdgpu: rlc is busy, skip halt rlc
  Oct 15 18:00:16 euclid kernel: amdgpu 0000:09:00.0: amdgpu: BACO reset
  Oct 15 18:00:17 euclid kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
  Oct 15 18:00:17 euclid kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
  Oct 15 18:00:17 euclid kernel: [drm] VRAM is lost due to GPU reset!

(I have dozens of those, but these are representative).

As for the workaround, I have found this to be 100% successful so far. I run this after boot, and before launching the desktop session.

echo performance > /sys/class/drm/card0/device/power_dpm_state
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level

I read about this here : https://forum.manjaro.org/t/graphics-gl … eout/55979

Cheers.

[EDIT] updated the workaround (there are 2 commands)

Last edited by hossbeast (2021-10-25 18:42:52)

Offline

#2 2021-10-25 18:15:52

cdelorme
Member
Registered: 2013-11-23
Posts: 18

Re: amdgpu / radeon : amdgpu_job_timedout / ring gfx timeout

I'm also experiencing system crashes since the recent updates anytime 3D acceleration kicks on.

I've got an RX 5700 XT and an AMD 3700X.

I tried the commands and it made no difference.

Launching native or wine games pretty much immediately foobars my system.

I tried downgrading my kernel after the update to the last installed/working version, but that didn't appear to fix anything.  I'm not sure what other packages I should be looking at to downgrade.

Any idea what I should try or look into to fix it?

Offline

#3 2021-10-25 18:42:14

hossbeast
Member
From: Seattle
Registered: 2017-06-26
Posts: 24

Re: amdgpu / radeon : amdgpu_job_timedout / ring gfx timeout

Interesting, I am not doing any gaming on my system, so it's possible that workload still causes a crash. How long is your uptime if you don't launch any games?

First thing I would verify is that /sys/class/drm/card0 is the right path - your card might be at a different path.

Looks like the lshw output I shared is ambiguous - my card is an RX580.

Offline

#4 2021-10-25 19:18:28

cdelorme
Member
Registered: 2013-11-23
Posts: 18

Re: amdgpu / radeon : amdgpu_job_timedout / ring gfx timeout

Good news, I seem to be working again!

Also I apologize, I didn't mean to hijack your thread @hossbeast, but I did get exactly the same errors in my dmesg (ring gfx timeout followed by fences, then disconnected until a hard reset).  The screen either froze before going black or glitched out in a very discolored way.

Before I ran your commands I did check for other devices besides card0, but there were only specific inputs on card0, and card0 is the only one that had the full file paths.  I also did a cursory check to see what the operations would do.

Unlike your case however, I am using openbox and did not crash on the desktop, or with a web browser (google chrome, firefox, and brave) or video playback (mpv).  So my system was otherwise functional except when 3D acceleration was required, and it did not matter whether that was fullscreen or windowed.  Games I had tried were No Man's Sky (Steam proton), FFXIV (Lutris Wine 6.10), and Shadows of Mordor (Native Linux steam install).  This is why I felt safe ruling out a wine specific problem.

I install updates every week or two, and this particular issue started on the 18th.  Previously I had only tried rolling back the kernel and still experienced hard lockups.  This time I scanned my pacman log and identified all the mesa, vulkan, and linux/linux-header package changes, then rolled back from the cache:

pacman -U vulkan-radeon-21.2.3-1-x86_64.pkg.tar.zst mesa-21.2.3-1-x86_64.pkg.tar.zst lib32-libva-mesa-driver-21.2.3-1-x86_64.pkg.tar.zst lib32-mesa-21.2.3-1-x86_64.pkg.tar.zst lib32-mesa-vdpau-21.2.3-1-x86_64.pkg.tar.zst vulkan-mesa-layers-21.2.3-1-x86_64.pkg.tar.zst lib32-vulkan-mesa-layers-21.2.3-1-x86_64.pkg.tar.zst lib32-vulkan-radeon-21.2.3-1-x86_64.pkg.tar.zst libva-mesa-driver-21.2.3-1-x86_64.pkg.tar.zst linux-5.14.8.arch1-1-x86_64.pkg.tar.zst linux-headers-5.14.8.arch1-1-x86_64.pkg.tar.zst mesa-vdpau-21.2.3-1-x86_64.pkg.tar.zst

Now I'm not sure which of the above packages are bugged or if it's some combination of them.  I'm not sure if this is any help to you either, but I appreciate the reply.

edit: Forgot to include the specific package upgrades from the logs:

[2021-10-18T15:25:57-0400] [ALPM] upgraded mesa (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded lib32-libva-mesa-driver (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded lib32-mesa (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded lib32-mesa-vdpau (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded vulkan-mesa-layers (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded lib32-vulkan-mesa-layers (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:58-0400] [ALPM] upgraded lib32-vulkan-radeon (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:59-0400] [ALPM] upgraded libva-mesa-driver (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:26:01-0400] [ALPM] upgraded mesa-vdpau (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:26:01-0400] [ALPM] upgraded vulkan-radeon (21.2.3-1 -> 21.2.4-1)
[2021-10-18T15:25:59-0400] [ALPM] upgraded linux (5.14.8.arch1-1 -> 5.14.12.arch1-1)
[2021-10-18T15:26:00-0400] [ALPM] upgraded linux-headers (5.14.8.arch1-1 -> 5.14.12.arch1-1)
 lshw -C display
  *-display                 
       description: VGA compatible controller
       product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0c:00.0
       version: c1
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:88 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fcd00000-fcd7ffff memory:c0000-dffff

Last edited by cdelorme (2021-10-25 19:21:23)

Offline

#5 2021-10-26 12:15:36

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 12,939

Re: amdgpu / radeon : amdgpu_job_timedout / ring gfx timeout

Ditch mesa-vdpau and lib32-mesa-vdpau , they add very little for amd hardware .

The output of

$ glxinfo -B      #comes with mesa-demos
$ glxinfo32 -B  # comes with lib32-mesa-demos
$ vainfo           #comes with libva-utils

would help to diagnose this.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

Board footer

Powered by FluxBB